<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/block/blk.h, branch v7.0.10</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v7.0.10</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v7.0.10'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-05-14T13:31:10+00:00</updated>
<entry>
<title>block: add pgmap check to biovec_phys_mergeable</title>
<updated>2026-05-14T13:31:10+00:00</updated>
<author>
<name>Naman Jain</name>
<email>namjain@linux.microsoft.com</email>
</author>
<published>2026-04-10T15:34:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f632dab4b841554cd6416058c61886d7db176581'/>
<id>urn:sha1:f632dab4b841554cd6416058c61886d7db176581</id>
<content type='text'>
commit 13920e4b7b784b40cf4519ff1f0f3e513476a499 upstream.

biovec_phys_mergeable() is used by the request merge, DMA mapping,
and integrity merge paths to decide if two physically contiguous
bvec segments can be coalesced into one. It currently has no check
for whether the segments belong to different dev_pagemaps.

When zone device memory is registered in multiple chunks, each chunk
gets its own dev_pagemap. A single bio can legitimately contain
bvecs from different pgmaps -- iov_iter_extract_bvecs() breaks at
pgmap boundaries but the outer loop in bio_iov_iter_get_pages()
continues filling the same bio. If such bvecs are physically
contiguous, biovec_phys_mergeable() will coalesce them, making it
impossible to recover the correct pgmap for the merged segment
via page_pgmap().

Add a zone_device_pages_have_same_pgmap() check to prevent merging
bvec segments that span different pgmaps.

Fixes: 49580e690755 ("block: add check when merging zone device pages")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Naman Jain &lt;namjain@linux.microsoft.com&gt;
Link: https://patch.msgid.link/20260410153414.4159050-2-namjain@linux.microsoft.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>block: relax pgmap check in bio_add_page for compatible zone device pages</title>
<updated>2026-05-07T04:13:54+00:00</updated>
<author>
<name>Naman Jain</name>
<email>namjain@linux.microsoft.com</email>
</author>
<published>2026-04-10T15:34:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2b27cb78de8bf549de6c030d763129c8886f4770'/>
<id>urn:sha1:2b27cb78de8bf549de6c030d763129c8886f4770</id>
<content type='text'>
commit 41c665aae2b5dbecddddcc8ace344caf630cc7a4 upstream.

bio_add_page() and bio_integrity_add_page() reject pages from different
dev_pagemaps entirely, returning 0 even when those pages have compatible
DMA mapping requirements. This forces callers to start a new bio when
buffers span pgmap boundaries, even though the pages could safely coexist
as separate bvec entries.

This matters for guests where memory is registered through
devm_memremap_pages() with MEMORY_DEVICE_GENERIC in multiple calls,
creating separate dev_pagemaps for each chunk. When a direct I/O buffer
spans two such chunks, bio_add_page() rejects the second page, forcing an
unnecessary bio split or I/O failure.

Introduce zone_device_pages_compatible() in blk.h to check whether two
pages can coexist in the same bio as separate bvec entries. The block DMA
iterator (blk_dma_map_iter_start) caches the P2PDMA mapping state from the
first segment and applies it to all others, so P2PDMA pages from different
pgmaps must not be mixed, and neither must P2PDMA and non-P2PDMA pages.
All other combinations (MEMORY_DEVICE_GENERIC pages from different pgmaps,
or MEMORY_DEVICE_GENERIC with normal RAM) use the same dma_map_phys path
and are safe.

Replace the blanket zone_device_pages_have_same_pgmap() rejection with
zone_device_pages_compatible(), while keeping
zone_device_pages_have_same_pgmap() as a merge guard.
Pages from different pgmaps can be added as separate bvec entries but
must not be coalesced into the same segment, as that would make
it impossible to recover the correct pgmap via page_pgmap().

Fixes: 49580e690755 ("block: add check when merging zone device pages")
Cc: stable@vger.kernel.org
Signed-off-by: Naman Jain &lt;namjain@linux.microsoft.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://patch.msgid.link/20260410153414.4159050-3-namjain@linux.microsoft.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>blk-mq: use NOIO context to prevent deadlock during debugfs creation</title>
<updated>2026-02-16T17:47:25+00:00</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai@fnnas.com</email>
</author>
<published>2026-02-14T05:43:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=dfe48ea179733be948c432f6af2fc3913cf5dd28'/>
<id>urn:sha1:dfe48ea179733be948c432f6af2fc3913cf5dd28</id>
<content type='text'>
Creating debugfs entries can trigger fs reclaim, which can enter back
into the block layer request_queue. This can cause deadlock if the
queue is frozen.

Previously, a WARN_ON_ONCE check was used in debugfs_create_files()
to detect this condition, but it was racy since the queue can be frozen
from another context at any time.

Introduce blk_debugfs_lock()/blk_debugfs_unlock() helpers that combine
the debugfs_mutex with memalloc_noio_save()/restore() to prevent fs
reclaim from triggering block I/O. Also add blk_debugfs_lock_nomemsave()
and blk_debugfs_unlock_nomemrestore() variants for callers that don't
need NOIO protection (e.g., debugfs removal or read-only operations).

Replace all raw debugfs_mutex lock/unlock pairs with these helpers,
using the _nomemsave/_nomemrestore variants where appropriate.

Reported-by: Yi Zhang &lt;yi.zhang@redhat.com&gt;
Closes: https://lore.kernel.org/all/CAHj4cs9gNKEYAPagD9JADfO5UH+OiCr4P7OO2wjpfOYeM-RV=A@mail.gmail.com/
Reported-by: Shinichiro Kawasaki &lt;shinichiro.kawasaki@wdc.com&gt;
Closes: https://lore.kernel.org/all/aYWQR7CtYdk3K39g@shinmob/
Suggested-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Yu Kuai &lt;yukuai@fnnas.com&gt;
Reviewed-by: Nilay Shroff &lt;nilay@linux.ibm.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>Merge tag 'for-7.0/block-stable-pages-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux</title>
<updated>2026-02-10T02:14:52+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-02-10T02:14:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4adc13ed7c281c16152a700e47b65d17de07321a'/>
<id>urn:sha1:4adc13ed7c281c16152a700e47b65d17de07321a</id>
<content type='text'>
Pull bounce buffer dio for stable pages from Jens Axboe:
 "This adds support for bounce buffering of dio for stable pages. This
  was all done by Christoph. In his words:

  This series tries to address the problem that under I/O pages can be
  modified during direct I/O, even when the device or file system
  require stable pages during I/O to calculate checksums, parity or data
  operations. It does so by adding block layer helpers to bounce buffer
  an iov_iter into a bio, then wires that up in iomap and ultimately
  XFS.

  The reason that the file system even needs to know about it, is
  because reads need a user context to copy the data back, and the
  infrastructure to defer ioends to a workqueue currently sits in XFS.
  I'm going to look into moving that into ioend and enabling it for
  other file systems. Additionally btrfs already has it's own
  infrastructure for this, and actually an urgent need to bounce buffer,
  so this should be useful there and could be wire up easily. In fact
  the idea comes from patches by Qu that did this in btrfs.

  This patch fixes all but one xfstests failures on T10 PI capable
  devices (generic/095 seems to have issues with a mix of mmap and
  splice still, I'm looking into that separately), and make qemu VMs
  running Windows, or Linux with swap enabled fine on an XFS file on a
  device using PI.

  Performance numbers on my (not exactly state of the art) NVMe PI test
  setup:

      Sequential reads using io_uring, QD=16.
      Bandwidth and CPU usage (usr/sys):

      | size |        zero copy         |          bounce          |
      +------+--------------------------+--------------------------+
      |   4k | 1316MiB/s (12.65/55.40%) | 1081MiB/s (11.76/49.78%) |
      |  64K | 3370MiB/s ( 5.46/18.20%) | 3365MiB/s ( 4.47/15.68%) |
      |   1M | 3401MiB/s ( 0.76/23.05%) | 3400MiB/s ( 0.80/09.06%) |
      +------+--------------------------+--------------------------+

      Sequential writes using io_uring, QD=16.
      Bandwidth and CPU usage (usr/sys):

      | size |        zero copy         |          bounce          |
      +------+--------------------------+--------------------------+
      |   4k |  882MiB/s (11.83/33.88%) |  750MiB/s (10.53/34.08%) |
      |  64K | 2009MiB/s ( 7.33/15.80%) | 2007MiB/s ( 7.47/24.71%) |
      |   1M | 1992MiB/s ( 7.26/ 9.13%) | 1992MiB/s ( 9.21/19.11%) |
      +------+--------------------------+--------------------------+

  Note that the 64k read numbers look really odd to me for the baseline
  zero copy case, but are reproducible over many repeated runs.

  The bounce read numbers should further improve when moving the PI
  validation to the file system and removing the double context switch,
  which I have patches for that will sent out soon"

* tag 'for-7.0/block-stable-pages-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  xfs: use bounce buffering direct I/O when the device requires stable pages
  iomap: add a flag to bounce buffer direct I/O
  iomap: support ioends for direct reads
  iomap: rename IOMAP_DIO_DIRTY to IOMAP_DIO_USER_BACKED
  iomap: free the bio before completing the dio
  iomap: share code between iomap_dio_bio_end_io and iomap_finish_ioend_direct
  iomap: split out the per-bio logic from iomap_dio_bio_iter
  iomap: simplify iomap_dio_bio_iter
  iomap: fix submission side handling of completion side errors
  block: add helpers to bounce buffer an iov_iter into bios
  block: remove bio_release_page
  iov_iter: extract a iov_iter_extract_bvecs helper from bio code
  block: open code bio_add_page and fix handling of mismatching P2P ranges
  block: refactor get_contig_folio_len
  block: add a BIO_MAX_SIZE constant and use it
</content>
</entry>
<entry>
<title>block: decouple secure erase size limit from discard size limit</title>
<updated>2026-02-05T03:42:12+00:00</updated>
<author>
<name>Luke Wang</name>
<email>ziniu.wang_1@nxp.com</email>
</author>
<published>2026-02-04T03:40:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ee81212f74a57c5d2b56cf504f40d528dac6faaf'/>
<id>urn:sha1:ee81212f74a57c5d2b56cf504f40d528dac6faaf</id>
<content type='text'>
Secure erase should use max_secure_erase_sectors instead of being limited
by max_discard_sectors. Separate the handling of REQ_OP_SECURE_ERASE from
REQ_OP_DISCARD to allow each operation to use its own size limit.

Signed-off-by: Luke Wang &lt;ziniu.wang_1@nxp.com&gt;
Reviewed-by: Ulf Hansson &lt;ulf.hansson@linaro.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: remove bio_release_page</title>
<updated>2026-01-28T12:16:39+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2026-01-26T05:53:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=301f5356521ed90f72a67797156d75093aac786f'/>
<id>urn:sha1:301f5356521ed90f72a67797156d75093aac786f</id>
<content type='text'>
Merge bio_release_page into the only remaining caller.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Anuj Gupta &lt;anuj20.g@samsung.com&gt;
Reviewed-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Reviewed-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Reviewed-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Tested-by: Anuj Gupta &lt;anuj20.g@samsung.com&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: account for bi_bvec_done in bio_may_need_split()</title>
<updated>2026-01-10T17:22:54+00:00</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2026-01-10T13:42:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f7ba87dfa8e42642d43faf29a71cee338086218b'/>
<id>urn:sha1:f7ba87dfa8e42642d43faf29a71cee338086218b</id>
<content type='text'>
When checking if a bio fits in a single segment, bio_may_need_split()
compares bi_size against the current bvec's bv_len. However, for
partially consumed bvecs (bi_bvec_done &gt; 0), such as in cloned or
split bios, the remaining bytes in the current bvec is actually
(bv_len - bi_bvec_done), not bv_len.

This could cause bio_may_need_split() to incorrectly return false,
leading to nr_phys_segments being set to 1 when the bio actually
spans multiple segments. This triggers the WARN_ON in __blk_rq_map_sg()
when the actual mapped segments exceed the expected count.

Fix by subtracting bi_bvec_done from bv_len in the comparison.

Reported-by: Venkat Rao Bagalkote &lt;venkat88@linux.ibm.com&gt;
Close: https://lore.kernel.org/linux-block/9687cf2b-1f32-44e1-b58d-2492dc6e7185@linux.ibm.com/
Repored-and-bisected-by: Christoph Hellwig &lt;hch@infradead.org&gt;
Tested-by: Venkat Rao Bagalkote &lt;venkat88@linux.ibm.com&gt;
Tested-by: Christoph Hellwig &lt;hch@infradead.org&gt;
Fixes: ee623c892aa5 ("block: use bvec iterator helper for bio_may_need_split()")
Cc: Nitesh Shetty &lt;nj.shetty@samsung.com&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: use bvec iterator helper for bio_may_need_split()</title>
<updated>2026-01-07T15:06:33+00:00</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2025-12-31T03:00:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ee623c892aa59003fca173de0041abc2ccc2c72d'/>
<id>urn:sha1:ee623c892aa59003fca173de0041abc2ccc2c72d</id>
<content type='text'>
bio_may_need_split() uses bi_vcnt to determine if a bio has a single
segment, but bi_vcnt is unreliable for cloned bios. Cloned bios share
the parent's bi_io_vec array but iterate over a subset via bi_iter,
so bi_vcnt may not reflect the actual segment count being iterated.

Replace the bi_vcnt check with bvec iterator access via
__bvec_iter_bvec(), comparing bi_iter.bi_size against the current
bvec's length. This correctly handles both cloned and non-cloned bios.

Move bi_io_vec into the first cache line adjacent to bi_iter. This is
a sensible layout since bi_io_vec and bi_iter are commonly accessed
together throughout the block layer - every bvec iteration requires
both fields. This displaces bi_end_io to the second cache line, which
is acceptable since bi_end_io and bi_private are always fetched
together in bio_endio() anyway.

The struct layout change requires bio_reset() to preserve and restore
bi_io_vec across the memset, since it now falls within BIO_RESET_BYTES.

Nitesh verified that this patch doesn't regress NVMe 512-byte IO perf [1].

Link: https://lore.kernel.org/linux-block/20251220081607.tvnrltcngl3cc2fh@green245.gost/ [1]
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Reviewed-by: Nitesh Shetty &lt;nj.shetty@samsung.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: unify elevator tags and type xarrays into struct elv_change_ctx</title>
<updated>2025-11-13T16:27:49+00:00</updated>
<author>
<name>Nilay Shroff</name>
<email>nilay@linux.ibm.com</email>
</author>
<published>2025-11-13T08:58:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=232143b605387b372dee0ec7830f93b93df5f67d'/>
<id>urn:sha1:232143b605387b372dee0ec7830f93b93df5f67d</id>
<content type='text'>
Currently, the nr_hw_queues update path manages two disjoint xarrays —
one for elevator tags and another for elevator type — both used during
elevator switching. Maintaining these two parallel structures for the
same purpose adds unnecessary complexity and potential for mismatched
state.

This patch unifies both xarrays into a single structure, struct
elv_change_ctx, which holds all per-queue elevator change context. A
single xarray, named elv_tbl, now maps each queue (q-&gt;id) in a tagset
to its corresponding elv_change_ctx entry, encapsulating the elevator
tags, type and name references.

This unification simplifies the code, improves maintainability, and
clarifies ownership of per-queue elevator state.

Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Reviewed-by: Yu Kuai &lt;yukuai@fnnas.com&gt;
Signed-off-by: Nilay Shroff &lt;nilay@linux.ibm.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: handle zone management operations completions</title>
<updated>2025-11-05T15:07:21+00:00</updated>
<author>
<name>Damien Le Moal</name>
<email>dlemoal@kernel.org</email>
</author>
<published>2025-11-04T21:22:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=efae226c2ef19528ffd81d29ba0eecf1b0896ca2'/>
<id>urn:sha1:efae226c2ef19528ffd81d29ba0eecf1b0896ca2</id>
<content type='text'>
The functions blk_zone_wplug_handle_reset_or_finish() and
blk_zone_wplug_handle_reset_all() both modify the zone write pointer
offset of zone write plugs that are the target of a reset, reset all or
finish zone management operation. However, these functions do this
modification before the BIO is executed. So if the zone operation fails,
the modified zone write pointer offsets become invalid.

Avoid this by modifying the zone write pointer offset of a zone write
plug that is the target of a zone management operation when the
operation completes. To do so, modify blk_zone_bio_endio() to call the
new function blk_zone_mgmt_bio_endio() which in turn calls the functions
blk_zone_reset_all_bio_endio(), blk_zone_reset_bio_endio() or
blk_zone_finish_bio_endio() depending on the operation of the completed
BIO, to modify a zone write plug write pointer offset accordingly.
These functions are called only if the BIO execution was successful.

Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Reviewed-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
