<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/block/blk-mq.c, branch linux-7.1.y</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=linux-7.1.y</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=linux-7.1.y'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-05-26T21:05:30+00:00</updated>
<entry>
<title>blk-mq: reinsert cached request to the list</title>
<updated>2026-05-26T21:05:30+00:00</updated>
<author>
<name>Keith Busch</name>
<email>kbusch@kernel.org</email>
</author>
<published>2026-05-26T15:35:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b051bb6bf0a231117036aa607cadf55be8e63910'/>
<id>urn:sha1:b051bb6bf0a231117036aa607cadf55be8e63910</id>
<content type='text'>
A previous commit removed an optimization out of caution for a scenario
that turns out not to be real: all the "queue_exit" goto's are safe to
reinsert the request into the cached_rq's plug list as they are either
from a non-blocking path, or a successful merge that already holds the
queue reference. This optimization is most needed for small sequential
workloads that successfully merge into larger requests.

Fixes: dc278e9bf2b9 ("blk-mq: pop cached request if it is usable")
Suggested-by: Ming Lei &lt;tom.leiming@gmail.com&gt;
Suggested-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Reviewed-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Link: https://patch.msgid.link/20260526153531.2365935-1-kbusch@meta.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blk-mq: pop cached request if it is usable</title>
<updated>2026-05-21T19:04:11+00:00</updated>
<author>
<name>Keith Busch</name>
<email>kbusch@kernel.org</email>
</author>
<published>2026-05-21T19:02:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=dc278e9bf2b9513a763353e6b9cc21e0f532954e'/>
<id>urn:sha1:dc278e9bf2b9513a763353e6b9cc21e0f532954e</id>
<content type='text'>
When submitting a bio to blk-mq, if the task should sleep after peeking
a cached request, but before it pops it, the plug flushes and calls
blk_mq_free_plug_rqs, freeing the cached_rqs. This creates a
use-after-free bug. Fix this by popping the cached request before any
possible blocking calls if it is suitable for use.

Popping this request first holds a queue reference, so avoid any
serialization races with queue freezes and can safely proceed with
dispatching that request to the driver. This potentially increases a
timing window from when a driver wants to freeze its queue to when
requests stop being dispatched. That scenario is off the fast path
though, and drivers need to appropriately handle requests during a
freeze request anyway.

The downside is the popped element needs to be individually freed when
we performed a bio plug merge. The cached request would have had to be
freed later anyway, but this patch does it inline with building the plug
list instead of after flushing it.

Fixes: b0077e269f6c1 ("blk-mq: make sure active queue usage is held for bio_integrity_prep()")
Fixes: 7b4f36cd22a65 ("block: ensure we hold a queue reference when using queue limits")
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Link: https://patch.msgid.link/20260521190253.242065-1-kbusch@meta.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: recompute nr_integrity_segments in blk_insert_cloned_request</title>
<updated>2026-05-12T15:23:43+00:00</updated>
<author>
<name>Casey Chen</name>
<email>cachen@purestorage.com</email>
</author>
<published>2026-05-11T21:22:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2c6e6a18a37b905cb584eb0dda3ae482162a81ca'/>
<id>urn:sha1:2c6e6a18a37b905cb584eb0dda3ae482162a81ca</id>
<content type='text'>
blk_insert_cloned_request() already recomputes nr_phys_segments
against the bottom queue, because "the queue settings related to
segment counting may differ from the original queue." The exact same
reasoning applies to integrity segments: a stacked driver's underlying
queue can have tighter virt_boundary_mask, seg_boundary_mask, or
max_segment_size than the top queue, in which case
blk_rq_count_integrity_sg() against the bottom queue produces a
different count than the cached rq-&gt;nr_integrity_segments inherited
from the source request by blk_rq_prep_clone().

When the cached count is lower than the bottom queue's actual count,
blk_rq_map_integrity_sg() trips

	BUG_ON(segments &gt; rq-&gt;nr_integrity_segments);

on dispatch. The same families of stacked setups that motivated the
existing nr_phys_segments recompute -- dm-multipath fanning out to
nvme-rdma in particular -- can produce this.

Mirror the nr_phys_segments handling: when the request carries
integrity, recompute nr_integrity_segments against the bottom queue
and reject the request if it exceeds the bottom queue's
max_integrity_segments. blk_rq_count_integrity_sg() and
queue_max_integrity_segments() are both already available via
&lt;linux/blk-integrity.h&gt;, which blk-mq.c includes.

This closes a latent gap in the stacking contract and brings the
integrity-segment accounting in line with the existing
phys-segment accounting.

Fixes: 76c313f658d2 ("blk-integrity: improved sg segment mapping")
Signed-off-by: Casey Chen &lt;cachen@purestorage.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://patch.msgid.link/20260511212230.27511-1-cachen@purestorage.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: clear BIO_QOS flags in blk_steal_bios()</title>
<updated>2026-03-10T13:11:09+00:00</updated>
<author>
<name>Chaitanya Kulkarni</name>
<email>kch@nvidia.com</email>
</author>
<published>2026-02-26T03:12:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=daa6c79858e9ca75c548452bf71db8a9e61bde42'/>
<id>urn:sha1:daa6c79858e9ca75c548452bf71db8a9e61bde42</id>
<content type='text'>
When a bio goes through the rq_qos infrastructure on a path's request
queue, it gets BIO_QOS_THROTTLED or BIO_QOS_MERGED flags set. These
flags indicate that rq_qos_done_bio() should be called on completion
to update rq_qos accounting.

During path failover in nvme_failover_req(), the bio's bi_bdev is
redirected from the failed path's disk to the multipath head's disk
via bio_set_dev(). However, the BIO_QOS flags are not cleared.

When the bio eventually completes (either successfully via a new path
or with an error via bio_io_error()), rq_qos_done_bio() checks for
these flags and calls __rq_qos_done_bio(q-&gt;rq_qos, bio) where q is
obtained from the bio's current bi_bdev - which is now the multipath
head's queue, not the original path's queue.

The multipath head's queue does not have rq_qos enabled (q-&gt;rq_qos is
NULL), but the code assumes that if BIO_QOS_* flags are set, q-&gt;rq_qos
must be valid.

This breaks when a bio is moved between queues during NVMe multipath
failover, leading to a NULL pointer dereference.

Execution Context timeline :-

   * =====&gt; dd process context
   [USER] dd process
     [SYSCALL] write() - dd process context
       submit_bio()
       nvme_ns_head_submit_bio() - path selection
       blk_mq_submit_bio()  #### QOS FLAGS SET HERE

        [USER] dd waits or returns

          ==== I/O in flight on NVMe hardware =====

   ===== End of submission path ====
   ------------------------------------------------------

   * dd ====&gt; Interrupt context;
   [IRQ] NVMe completion interrupt
       nvme_irq()
        nvme_complete_rq()
         nvme_failover_req() ### BIO MOVED TO HEAD
            spin_lock_irqsave (atomic section)
            bio_set_dev() changes bi_bdev
            ### BUG: QOS flags NOT cleared
            kblockd_schedule_work()

   * Interrupt context =====&gt; kblockd workqueue
   [WQ] kblockd workqueue - kworker process
       nvme_requeue_work()
        submit_bio_noacct()
         nvme_ns_head_submit_bio()
          nvme_find_path() returns NULL
           bio_io_error()
            bio_endio()
             rq_qos_done_bio()  ### CRASH ###

   KERNEL PANIC / OOPS

Crash from blktests nvme/058 (rapid namespace remapping):

[ 1339.636033] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 1339.641025] nvme nvme4: rescanning namespaces.
[ 1339.642064] #PF: supervisor read access in kernel mode
[ 1339.642067] #PF: error_code(0x0000) - not-present page
[ 1339.642070] PGD 0 P4D 0
[ 1339.642073] Oops: Oops: 0000 [#1] SMP NOPTI
[ 1339.642078] CPU: 35 UID: 0 PID: 4579 Comm: kworker/35:2H
               Tainted: G   O     N  6.17.0-rc3nvme+ #5 PREEMPT(voluntary)
[ 1339.642084] Tainted: [O]=OOT_MODULE, [N]=TEST
[ 1339.673446] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
           BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 1339.682359] Workqueue: kblockd nvme_requeue_work [nvme_core]
[ 1339.686613] RIP: 0010:__rq_qos_done_bio+0xd/0x40
[ 1339.690161] Code: 75 dd 5b 5d 41 5c c3 cc cc cc cc 66 90 90 90 90 90 90 90
                     90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 f5
             53 48 89 fb &lt;48&gt; 8b 03 48 8b 40 30 48 85 c0 74 0b 48 89 ee
             48 89 df ff d0 0f 1f
[ 1339.703691] RSP: 0018:ffffc900066f3c90 EFLAGS: 00010202
[ 1339.706844] RAX: ffff888148b9ef00 RBX: 0000000000000000 RCX: 0000000000000000
[ 1339.711136] RDX: 00000000000001c0 RSI: ffff8882aaab8a80 RDI: 0000000000000000
[ 1339.715691] RBP: ffff8882aaab8a80 R08: 0000000000000000 R09: 0000000000000000
[ 1339.720472] R10: 0000000000000000 R11: fefefefefefefeff R12: ffff8882aa3b6010
[ 1339.724650] R13: 0000000000000000 R14: ffff8882338bcef0 R15: ffff8882aa3b6020
[ 1339.729029] FS:  0000000000000000(0000) GS:ffff88985c0cf000(0000) knlGS:0000000000000000
[ 1339.734525] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1339.738563] CR2: 0000000000000000 CR3: 0000000111045000 CR4: 0000000000350ef0
[ 1339.742750] DR0: ffffffff845ccbec DR1: ffffffff845ccbed DR2: ffffffff845ccbee
[ 1339.745630] DR3: ffffffff845ccbef DR6: 00000000ffff0ff0 DR7: 0000000000000600
[ 1339.748488] Call Trace:
[ 1339.749512]  &lt;TASK&gt;
[ 1339.750449]  bio_endio+0x71/0x2e0
[ 1339.751833]  nvme_ns_head_submit_bio+0x290/0x320 [nvme_core]
[ 1339.754073]  __submit_bio+0x222/0x5e0
[ 1339.755623]  ? rcu_is_watching+0xd/0x40
[ 1339.757201]  ? submit_bio_noacct_nocheck+0x131/0x370
[ 1339.759210]  submit_bio_noacct_nocheck+0x131/0x370
[ 1339.761189]  ? submit_bio_noacct+0x20/0x620
[ 1339.762849]  nvme_requeue_work+0x4b/0x60 [nvme_core]
[ 1339.764828]  process_one_work+0x20e/0x630
[ 1339.766528]  worker_thread+0x184/0x330
[ 1339.768129]  ? __pfx_worker_thread+0x10/0x10
[ 1339.769942]  kthread+0x10a/0x250
[ 1339.771263]  ? __pfx_kthread+0x10/0x10
[ 1339.772776]  ? __pfx_kthread+0x10/0x10
[ 1339.774381]  ret_from_fork+0x273/0x2e0
[ 1339.775948]  ? __pfx_kthread+0x10/0x10
[ 1339.777504]  ret_from_fork_asm+0x1a/0x30
[ 1339.779163]  &lt;/TASK&gt;

Fix this by clearing both BIO_QOS_THROTTLED and BIO_QOS_MERGED flags
when bios are redirected to the multipath head in nvme_failover_req().
This is consistent with the existing code that clears REQ_POLLED and
REQ_NOWAIT flags when the bio changes queues.

Signed-off-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://patch.msgid.link/20260226031243.87200-3-kch@nvidia.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: move bio queue-transition flag fixups into blk_steal_bios()</title>
<updated>2026-03-10T13:11:09+00:00</updated>
<author>
<name>Chaitanya Kulkarni</name>
<email>kch@nvidia.com</email>
</author>
<published>2026-02-26T03:12:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b2c45ced591e6cf947560d2d290a51855926b774'/>
<id>urn:sha1:b2c45ced591e6cf947560d2d290a51855926b774</id>
<content type='text'>
blk_steal_bios() transfers bios from a request to a bio_list when the
request is requeued to a different queue. The NVMe multipath failover
path (nvme_failover_req) currently open-codes clearing of REQ_POLLED,
bi_cookie, and REQ_NOWAIT on each bio before calling blk_steal_bios().

Move these fixups into blk_steal_bios() itself so that any caller
automatically gets correct flag state when bios cross queue boundaries.
Simplify nvme_failover_req() accordingly.

Signed-off-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://patch.msgid.link/20260226031243.87200-2-kch@nvidia.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-7.1/block-integrity' into for-7.1/block</title>
<updated>2026-03-09T20:30:14+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2026-03-09T20:30:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=89d10b7803a6ab7276850e54b766487666667153'/>
<id>urn:sha1:89d10b7803a6ab7276850e54b766487666667153</id>
<content type='text'>
Merge in integrity changes which are also landing in the VFS tree as
dependencies for fs related changes.

* for-7.1/block-integrity:
  block: pass a maxlen argument to bio_iov_iter_bounce
  block: add fs_bio_integrity helpers
  block: make max_integrity_io_size public
  block: prepare generation / verification helpers for fs usage
  block: add a bdev_has_integrity_csum helper
  block: factor out a bio_integrity_setup_default helper
  block: factor out a bio_integrity_action helper
</content>
</entry>
<entry>
<title>block: factor out a bio_integrity_action helper</title>
<updated>2026-03-09T13:47:02+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2026-02-23T13:20:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7ea25eaad5ae3a6c837a3df9bdb822194f002565'/>
<id>urn:sha1:7ea25eaad5ae3a6c837a3df9bdb822194f002565</id>
<content type='text'>
Split the logic to see if a bio needs integrity metadata from
bio_integrity_prep into a reusable helper than can be called from
file system code.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Anuj Gupta &lt;anuj20.g@samsung.com&gt;
Reviewed-by: Kanchan Joshi &lt;joshi.k@samsung.com&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Reviewed-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Tested-by: Anuj Gupta &lt;anuj20.g@samsung.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>Merge tag 'block-7.0-20260305' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux</title>
<updated>2026-03-06T16:36:18+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-03-06T16:36:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a028739a4330881a6a3b5aa4a39381bbcacf2f2f'/>
<id>urn:sha1:a028739a4330881a6a3b5aa4a39381bbcacf2f2f</id>
<content type='text'>
Pull block fixes from Jens Axboe:

 - NVMe pull request via Keith:
      - Improve quirk visibility and configurability (Maurizio)
      - Fix runtime user modification to queue setup (Keith)
      - Fix multipath leak on try_module_get failure (Keith)
      - Ignore ambiguous spec definitions for better atomics support
        (John)
      - Fix admin queue leak on controller reset (Ming)
      - Fix large allocation in persistent reservation read keys
        (Sungwoo Kim)
      - Fix fcloop callback handling (Justin)
      - Securely free DHCHAP secrets (Daniel)
      - Various cleanups and typo fixes (John, Wilfred)

 - Avoid a circular lock dependency issue in the sysfs nr_requests or
   scheduler store handling

 - Fix a circular lock dependency with the pcpu mutex and the queue
   freeze lock

 - Cleanup for bio_copy_kern(), using __bio_add_page() rather than the
   bio_add_page(), as adding a page here cannot fail. The exiting code
   had broken cleanup for the error condition, so make it clear that the
   error condition cannot happen

 - Fix for a __this_cpu_read() in preemptible context splat

* tag 'block-7.0-20260305' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  block: use trylock to avoid lockdep circular dependency in sysfs
  nvme: fix memory allocation in nvme_pr_read_keys()
  block: use __bio_add_page in bio_copy_kern
  block: break pcpu_alloc_mutex dependency on freeze_lock
  blktrace: fix __this_cpu_read/write in preemptible context
  nvme-multipath: fix leak on try_module_get failure
  nvmet-fcloop: Check remoteport port_state before calling done callback
  nvme-pci: do not try to add queue maps at runtime
  nvme-pci: cap queue creation to used queues
  nvme-pci: ensure we're polling a polled queue
  nvme: fix memory leak in quirks_param_set()
  nvme: correct comment about nvme_ns_remove()
  nvme: stop setting namespace gendisk device driver data
  nvme: add support for dynamic quirk configuration via module parameter
  nvme: fix admin queue leak on controller reset
  nvme-fabrics: use kfree_sensitive() for DHCHAP secrets
  nvme: stop using AWUPF
  nvme: expose active quirks in sysfs
  nvme/host: fixup some typos
</content>
</entry>
<entry>
<title>block: break pcpu_alloc_mutex dependency on freeze_lock</title>
<updated>2026-03-02T16:23:04+00:00</updated>
<author>
<name>Nilay Shroff</name>
<email>nilay@linux.ibm.com</email>
</author>
<published>2026-03-01T12:59:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=539d1b47e935e8384977dd7e5cec370c08b7a644'/>
<id>urn:sha1:539d1b47e935e8384977dd7e5cec370c08b7a644</id>
<content type='text'>
While nr_hw_update allocates tagset tags it acquires -&gt;pcpu_alloc_mutex
after -&gt;freeze_lock is acquired or queue is frozen. This potentially
creates a circular dependency involving -&gt;fs_reclaim if reclaim is
triggered simultaneously in a code path which first acquires -&gt;pcpu_
alloc_mutex. As the queue is already frozen while nr_hw_queue update
allocates tagsets, the reclaim can't forward progress and thus it could
cause a potential deadlock as reported in lockdep splat[1].

Fix this by pre-allocating tagset tags before we freeze queue during
nr_hw_queue update. Later the allocated tagset tags could be safely
installed and used after queue is frozen.

Reported-by: Yi Zhang &lt;yi.zhang@redhat.com&gt;
Closes: https://lore.kernel.org/all/CAHj4cs8F=OV9s3La2kEQ34YndgfZP-B5PHS4Z8_b9euKG6J4mw@mail.gmail.com/ [1]
Signed-off-by: Nilay Shroff &lt;nilay@linux.ibm.com&gt;
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Tested-by: Yi Zhang &lt;yi.zhang@redhat.com&gt;
Reviewed-by: Yu Kuai &lt;yukuai@fnnas.com&gt;
[axboe: fix brace style issue]
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>Convert 'alloc_obj' family to use the new default GFP_KERNEL argument</title>
<updated>2026-02-22T01:09:51+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-02-22T00:37:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bf4afc53b77aeaa48b5409da5c8da6bb4eff7f43'/>
<id>urn:sha1:bf4afc53b77aeaa48b5409da5c8da6bb4eff7f43</id>
<content type='text'>
This was done entirely with mindless brute force, using

    git grep -l '\&lt;k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
