<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/block/blk.h, branch v3.13.3</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v3.13.3</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v3.13.3'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2013-10-25T10:56:00+00:00</updated>
<entry>
<title>blk-mq: new multi-queue block IO queueing mechanism</title>
<updated>2013-10-25T10:56:00+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2013-10-24T08:20:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=320ae51feed5c2f13664aa05a76bec198967e04d'/>
<id>urn:sha1:320ae51feed5c2f13664aa05a76bec198967e04d</id>
<content type='text'>
Linux currently has two models for block devices:

- The classic request_fn based approach, where drivers use struct
  request units for IO. The block layer provides various helper
  functionalities to let drivers share code, things like tag
  management, timeout handling, queueing, etc.

- The "stacked" approach, where a driver squeezes in between the
  block layer and IO submitter. Since this bypasses the IO stack,
  driver generally have to manage everything themselves.

With drivers being written for new high IOPS devices, the classic
request_fn based driver doesn't work well enough. The design dates
back to when both SMP and high IOPS was rare. It has problems with
scaling to bigger machines, and runs into scaling issues even on
smaller machines when you have IOPS in the hundreds of thousands
per device.

The stacked approach is then most often selected as the model
for the driver. But this means that everybody has to re-invent
everything, and along with that we get all the problems again
that the shared approach solved.

This commit introduces blk-mq, block multi queue support. The
design is centered around per-cpu queues for queueing IO, which
then funnel down into x number of hardware submission queues.
We might have a 1:1 mapping between the two, or it might be
an N:M mapping. That all depends on what the hardware supports.

blk-mq provides various helper functions, which include:

- Scalable support for request tagging. Most devices need to
  be able to uniquely identify a request both in the driver and
  to the hardware. The tagging uses per-cpu caches for freed
  tags, to enable cache hot reuse.

- Timeout handling without tracking request on a per-device
  basis. Basically the driver should be able to get a notification,
  if a request happens to fail.

- Optional support for non 1:1 mappings between issue and
  submission queues. blk-mq can redirect IO completions to the
  desired location.

- Support for per-request payloads. Drivers almost always need
  to associate a request structure with some driver private
  command structure. Drivers can tell blk-mq this at init time,
  and then any request handed to the driver will have the
  required size of memory associated with it.

- Support for merging of IO, and plugging. The stacked model
  gets neither of these. Even for high IOPS devices, merging
  sequential IO reduces per-command overhead and thus
  increases bandwidth.

For now, this is provided as a potential 3rd queueing model, with
the hope being that, as it matures, it can replace both the classic
and stacked model. That would get us back to having just 1 real
model for block devices, leaving the stacked approach to dm/md
devices (as it was originally intended).

Contributions in this patch from the following people:

Shaohua Li &lt;shli@fusionio.com&gt;
Alexander Gordeev &lt;agordeev@redhat.com&gt;
Christoph Hellwig &lt;hch@infradead.org&gt;
Mike Christie &lt;michaelc@cs.wisc.edu&gt;
Matias Bjorling &lt;m@bjorling.me&gt;
Jeff Moyer &lt;jmoyer@redhat.com&gt;

Acked-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block,elevator: use new hashtable implementation</title>
<updated>2013-01-11T13:43:13+00:00</updated>
<author>
<name>Sasha Levin</name>
<email>sasha.levin@oracle.com</email>
</author>
<published>2012-12-17T15:01:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=242d98f077ac0ab80920219769eb095503b93f61'/>
<id>urn:sha1:242d98f077ac0ab80920219769eb095503b93f61</id>
<content type='text'>
Switch elevator to use the new hashtable implementation. This reduces the
amount of generic unrelated code in the elevator.

This also removes the dymanic allocation of the hash table. The size of the table is
constant so there's no point in paying the price of an extra dereference when accessing
it.

This patch depends on d9b482c ("hashtable: introduce a small and naive
hashtable") which was merged in v3.6.

Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: Avoid that request_fn is invoked on a dead queue</title>
<updated>2012-12-06T13:32:01+00:00</updated>
<author>
<name>Bart Van Assche</name>
<email>bvanassche@acm.org</email>
</author>
<published>2012-12-06T13:32:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c246e80d86736312933646896c4157daf511dadc'/>
<id>urn:sha1:c246e80d86736312933646896c4157daf511dadc</id>
<content type='text'>
A block driver may start cleaning up resources needed by its
request_fn as soon as blk_cleanup_queue() finished, so request_fn
must not be invoked after draining finished. This is important
when blk_run_queue() is invoked without any requests in progress.
As an example, if blk_drain_queue() and scsi_run_queue() run in
parallel, blk_drain_queue() may have finished all requests after
scsi_run_queue() has taken a SCSI device off the starved list but
before that last function has had a chance to run the queue.

Signed-off-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Cc: James Bottomley &lt;JBottomley@Parallels.com&gt;
Cc: Mike Christie &lt;michaelc@cs.wisc.edu&gt;
Cc: Chanho Min &lt;chanho.min@lge.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: Rename queue dead flag</title>
<updated>2012-12-06T13:30:58+00:00</updated>
<author>
<name>Bart Van Assche</name>
<email>bvanassche@acm.org</email>
</author>
<published>2012-11-28T12:42:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3f3299d5c0268d6cc3f47b446e8aca436e4a5651'/>
<id>urn:sha1:3f3299d5c0268d6cc3f47b446e8aca436e4a5651</id>
<content type='text'>
QUEUE_FLAG_DEAD is used to indicate that queuing new requests must
stop. After this flag has been set queue draining starts. However,
during the queue draining phase it is still safe to invoke the
queue's request_fn, so QUEUE_FLAG_DYING is a better name for this
flag.

This patch has been generated by running the following command
over the kernel source tree:

git grep -lEw 'blk_queue_dead|QUEUE_FLAG_DEAD' |
    xargs sed -i.tmp -e 's/blk_queue_dead/blk_queue_dying/g'      \
        -e 's/QUEUE_FLAG_DEAD/QUEUE_FLAG_DYING/g';                \
sed -i.tmp -e "s/QUEUE_FLAG_DYING$(printf \\t)*5/QUEUE_FLAG_DYING$(printf \\t)5/g" \
    include/linux/blkdev.h;                                       \
sed -i.tmp -e 's/ DEAD/ DYING/g' -e 's/dead queue/a dying queue/' \
    -e 's/Dead queue/A dying queue/' block/blk-core.c

Signed-off-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: James Bottomley &lt;JBottomley@Parallels.com&gt;
Cc: Mike Christie &lt;michaelc@cs.wisc.edu&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Chanho Min &lt;chanho.min@lge.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: Clean up special command handling logic</title>
<updated>2012-09-20T12:31:38+00:00</updated>
<author>
<name>Martin K. Petersen</name>
<email>martin.petersen@oracle.com</email>
</author>
<published>2012-09-18T16:19:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e2a60da74fc8215c68509a89e9a69c66363153db'/>
<id>urn:sha1:e2a60da74fc8215c68509a89e9a69c66363153db</id>
<content type='text'>
Remove special-casing of non-rw fs style requests (discard). The nomerge
flags are consolidated in blk_types.h, and rq_mergeable() and
bio_mergeable() have been modified to use them.

bio_is_rw() is used in place of bio_has_data() a few places. This is
done to to distinguish true reads and writes from other fs type requests
that carry a payload (e.g. write same).

Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Acked-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: remove dead func declaration</title>
<updated>2012-08-01T10:25:54+00:00</updated>
<author>
<name>Yuanhan Liu</name>
<email>yuanhan.liu@linux.intel.com</email>
</author>
<published>2012-08-01T10:25:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=80799fbb7d10c30df78015b3fa21f7ffcfc0eb2c'/>
<id>urn:sha1:80799fbb7d10c30df78015b3fa21f7ffcfc0eb2c</id>
<content type='text'>
__generic_unplug_device() function is removed with commit
7eaceaccab5f40bbfda044629a6298616aeaed50, which forgot to
remove the declaration at meantime. Here remove it.

Signed-off-by: Yuanhan Liu &lt;yuanhan.liu@linux.intel.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: prepare for multiple request_lists</title>
<updated>2012-06-25T09:53:52+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2012-06-05T03:40:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5b788ce3e2acac9bf109743b1281d77347cf2101'/>
<id>urn:sha1:5b788ce3e2acac9bf109743b1281d77347cf2101</id>
<content type='text'>
Request allocation is about to be made per-blkg meaning that there'll
be multiple request lists.

* Make queue full state per request_list.  blk_*queue_full() functions
  are renamed to blk_*rl_full() and takes @rl instead of @q.

* Rename blk_init_free_list() to blk_init_rl() and make it take @rl
  instead of @q.  Also add @gfp_mask parameter.

* Add blk_exit_rl() instead of destroying rl directly from
  blk_release_queue().

* Add request_list-&gt;q and make request alloc/free functions -
  blk_free_request(), [__]freed_request(), __get_request() - take @rl
  instead of @q.

This patch doesn't introduce any functional difference.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-3.5' of ../cgroup into block/for-3.5/core-merged</title>
<updated>2012-04-01T19:55:00+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2012-04-01T19:30:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=959d851caa48829eb85cb85aa949fd6b4c5d5bc6'/>
<id>urn:sha1:959d851caa48829eb85cb85aa949fd6b4c5d5bc6</id>
<content type='text'>
cgroup/for-3.5 contains the following changes which blk-cgroup needs
to proceed with the on-going cleanup.

* Dynamic addition and removal of cftypes to make config/stat file
  handling modular for policies.

* cgroup removal update to not wait for css references to drain to fix
  blkcg removal hang caused by cfq caching cfqgs.

Pull in cgroup/for-3.5 into block/for-3.5/core.  This causes the
following conflicts in block/blk-cgroup.c.

* 761b3ef50e "cgroup: remove cgroup_subsys argument from callbacks"
  conflicts with blkiocg_pre_destroy() addition and blkiocg_attach()
  removal.  Resolved by removing @subsys from all subsys methods.

* 676f7c8f84 "cgroup: relocate cftype and cgroup_subsys definitions in
  controllers" conflicts with -&gt;pre_destroy() and -&gt;attach() updates
  and removal of modular config.  Resolved by dropping forward
  declarations of the methods and applying updates to the relocated
  blkio_subsys.

* 4baf6e3325 "cgroup: convert all non-memcg controllers to the new
  cftype interface" builds upon the previous item.  Resolved by adding
  -&gt;base_cftypes to the relocated blkio_subsys.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>block: interface update for ioc/icq creation functions</title>
<updated>2012-03-06T20:27:24+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2012-03-05T21:15:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=24acfc34fba0b4f62ef9d5c2616eb0faa802b606'/>
<id>urn:sha1:24acfc34fba0b4f62ef9d5c2616eb0faa802b606</id>
<content type='text'>
Make the following interface updates to prepare for future ioc related
changes.

* create_io_context() returning ioc only works for %current because it
  doesn't increment ref on the ioc.  Drop @task parameter from it and
  always assume %current.

* Make create_io_context_slowpath() return 0 or -errno and rename it
  to create_task_io_context().

* Make ioc_create_icq() take @ioc as parameter instead of assuming
  that of %current.  The caller, get_request(), is updated to create
  ioc explicitly and then pass it into ioc_create_icq().

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blkcg: add blkcg_{init|drain|exit}_queue()</title>
<updated>2012-03-06T20:27:23+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2012-03-05T21:15:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5efd611351d1a847c72d74fb12ff4bd187c0cb2c'/>
<id>urn:sha1:5efd611351d1a847c72d74fb12ff4bd187c0cb2c</id>
<content type='text'>
Currently block core calls directly into blk-throttle for init, drain
and exit.  This patch adds blkcg_{init|drain|exit}_queue() which wraps
the blk-throttle functions.  This is to give more control and
visiblity to blkcg core layer for proper layering.  Further patches
will add logic common to blkcg policies to the functions.

While at it, collapse blk_throtl_release() into blk_throtl_exit().
There's no reason to keep them separate.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
