summaryrefslogtreecommitdiff
path: root/drivers/md
AgeCommit message (Collapse)AuthorFilesLines
8 daysdm cache: fix missing return in invalidate_committed's error pathMing-Hung Tsai1-1/+3
[ Upstream commit 8c0ee19db81f0fa1ff25fd75b22b17c0cc2acde3 ] In passthrough mode, dm-cache defers write submission until after metadata commit completes via the invalidate_committed() continuation. On commit error, invalidate_committed() calls invalidate_complete() to end the bio and free the migration struct, after which it should return immediately. The patch 4ca8b8bd952d ("dm cache: fix write hang in passthrough mode") omitted this early return, causing execution to fall through into the success path on error. This results in use-after-free on the migration struct in the subsequent calls. Fix by adding the missing return after the invalidate_complete() call. Fixes: 4ca8b8bd952d ("dm cache: fix write hang in passthrough mode") Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/dm-devel/adjMq6T5RRjv_uxM@stanley.mountain/ Signed-off-by: Ming-Hung Tsai <mtsai@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
8 daysdm init: ensure device probing has finished in dm-mod.waitfor=Guillaume Gonnet1-1/+3
[ Upstream commit 99a2312f69805f4ba92d98a757625e0300a747ab ] The early_lookup_bdev() function returns successfully when the disk device is present but not necessarily its partitions. In this situation, dm_early_create() fails as the partition block device does not exist yet. In my case, this phenomenon occurs quite often because the device is an SD card with slow reading times, on which kernel takes time to enumerate available partitions. Fortunately, the underlying device is back to "probing" state while enumerating partitions. Waiting for all probing to end is enough to fix this issue. That's also the reason why this problem never occurs with rootwait= parameter: the while loop inside wait_for_root() explicitly waits for probing to be done and then the function calls async_synchronize_full(). These lines were omitted in 035641b, even though the commit says it's based on the rootwait logic... Anyway, calling wait_for_device_probe() after our while loop does the job (it both waits for probing and calls async_synchronize_full). Fixes: 035641b01e72 ("dm init: add dm-mod.waitfor to wait for asynchronously probed block devices") Signed-off-by: Guillaume Gonnet <ggonnet.linux@gmail.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
8 daysdm log: fix out-of-bounds write due to region_count overflowJunrui Luo1-1/+5
[ Upstream commit c20e36b7631d83e7535877f08af8b0af72c44b1a ] The local variable region_count in create_log_context() is declared as unsigned int (32-bit), but dm_sector_div_up() returns sector_t (64-bit). When a device-mapper target has a sufficiently large ti->len with a small region_size, the division result can exceed UINT_MAX. The truncated value is then used to calculate bitset_size, causing clean_bits, sync_bits, and recovering_bits to be allocated far smaller than needed for the actual number of regions. Subsequent log operations (log_set_bit, log_clear_bit, log_test_bit) use region indices derived from the full untruncated region space, causing out-of-bounds writes to kernel heap memory allocated by vmalloc. This can be reproduced by creating a mirror target whose region_count overflows 32 bits: dmsetup create bigzero --table '0 8589934594 zero' dmsetup create mymirror --table '0 8589934594 mirror \ core 2 2 nosync 2 /dev/mapper/bigzero 0 \ /dev/mapper/bigzero 0' The status output confirms the truncation (sync_count=1 instead of 4294967297, because 0x100000001 was truncated to 1): $ dmsetup status mymirror 0 8589934594 mirror 2 254:1 254:1 1/4294967297 ... This leads to a kernel crash in core_in_sync: BUG: scheduling while atomic: (udev-worker)/9150/0x00000000 RIP: 0010:core_in_sync+0x14/0x30 [dm_log] CR2: 0000000000000008 Fixing recursive fault but reboot is needed! Fix by widening the local region_count to sector_t and adding an explicit overflow check before the value is assigned to lc->region_count. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Yuhao Jiang <danisjiang@gmail.com> Signed-off-by: Junrui Luo <moonafterrain@outlook.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
8 daysdm cache metadata: fix memory leak on metadata abort retryMing-Hung Tsai1-5/+8
[ Upstream commit 044ca491d4086dc5bf233e9fcb71db52df32f633 ] When failing to acquire the root_lock in dm_cache_metadata_abort because the block_manager is read-only, the temporary block_manager created outside the root_lock is not properly released, causing a memory leak. Reproduce steps: This can be reproduced by reloading a new table while the metadata is read-only. While the second call to dm_cache_metadata_abort is caused by lack of support for table preload in dm-cache, mentioned in commit 9b1cc9f251af ("dm cache: share cache-metadata object across inactive and active DM tables"), it exposes the memory leak in dm_cache_metadata_abort when the function is called multiple times. Specifically, dm-cache fails to sync the new cache object's mode during preresume, creating the reproducer condition. This issue could also occur through concurrent metadata_operation_failed calls due to races in cache mode updates, but the table preload scenario below provides a reliable reproducer. 1. Create a cache device with some faulty trailing metadata blocks dmsetup create cmeta <<EOF 0 200 linear /dev/sdc 0 200 7992 error EOF dmsetup create cdata --table "0 131072 linear /dev/sdc 8192" dmsetup create corig --table "0 262144 linear /dev/sdc 262144" dd if=/dev/zero of=/dev/mapper/cmeta bs=4k count=1 oflag=direct dmsetup create cache --table "0 131072 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 1 writethrough smq 0" 2. Suspend and resume the cache to start a new metadata transaction and trigger metadata io errors on the next metadata commit. dmsetup suspend cache dmsetup resume cache 3. Write to the cache device to update metadata fio --filename=/dev/mapper/cache --name test --rw=randwrite --bs=4k \ --randrepeat=0 --direct=1 --size 64k 4. Preload the same table dmsetup reload cache --table "$(dmsetup table cache)" 5. Resume the new table. This triggers the memory leak. dmsetup suspend cache dmsetup resume cache kmemleak logs: <snip> unreferenced object 0xffff8880080c2010 (size 16): comm "dmsetup", pid 132, jiffies 4294982580 hex dump (first 16 bytes): 00 38 b9 07 80 88 ff ff 6a 6b 6b 6b 6b 6b 6b a5 ... backtrace (crc 3118f31c): kmemleak_alloc+0x28/0x40 __kmalloc_cache_noprof+0x3d9/0x510 dm_block_manager_create+0x51/0x140 dm_cache_metadata_abort+0x85/0x320 metadata_operation_failed+0x103/0x1e0 cache_preresume+0xacd/0xe70 dm_table_resume_targets+0xd3/0x320 __dm_resume+0x1b/0xf0 dm_resume+0x127/0x170 <snip> Fixes: 352b837a5541 ("dm cache: Fix ABBA deadlock between shrink_slab and dm_cache_metadata_abort") Signed-off-by: Ming-Hung Tsai <mtsai@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
8 daysdm cache: fix dirty mapping checking in passthrough mode switchingMing-Hung Tsai3-33/+8
[ Upstream commit 322586745bd1a0e5f3559fd1635fdeb4dbd1d6b8 ] As mentioned in commit 9b1cc9f251af ("dm cache: share cache-metadata object across inactive and active DM tables"), dm-cache assumed table reload occurs after suspension, while LVM's table preload breaks this assumption. The dirty mapping check for passthrough mode was designed around this assumption and is performed during table creation, causing the check to fail with preload while metadata updates are ongoing. This risks loading dirty mappings into passthrough mode, resulting in data loss. Reproduce steps: 1. Create a writeback cache with zero migration_threshold to produce dirty mappings dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" dmsetup create cdata --table "0 131072 linear /dev/sdc 8192" dmsetup create corig --table "0 262144 linear /dev/sdc 262144" dd if=/dev/zero of=/dev/mapper/cmeta bs=4k count=1 oflag=direct dmsetup create cache --table "0 262144 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writeback smq \ 2 migration_threshold 0" 2. Preload a table in passthrough mode dmsetup reload cache --table "0 262144 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 passthrough smq 0" 3. Write to the first cache block to make it dirty fio --filename=/dev/mapper/cache --name=populate --rw=write --bs=4k \ --direct=1 --size=64k 4. Resume the inactive table. Now it's possible to load the dirty block into passthrough mode. dmsetup resume cache Fix by moving the checks to the preresume phase to support table preloading. Also remove the unused function dm_cache_metadata_all_clean. Fixes: 2ee57d587357 ("dm cache: add passthrough mode") Signed-off-by: Ming-Hung Tsai <mtsai@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
8 daysdm cache: support shrinking the origin deviceMing-Hung Tsai1-3/+69
[ Upstream commit c2662b1544cbd8ea3181381bb899b8e681dfedc7 ] This patch introduces formal support for shrinking the cache origin by reducing the cache target length via table reloads. Cache blocks mapped beyond the new target length must be clean and are invalidated during preresume. If any dirty blocks exist in the area being removed, the preresume operation fails without setting the NEEDS_CHECK flag in superblock, and the resume ioctl returns EFBIG. The cache device remains suspended until a table reload with target length that fits existing mappings is performed. Without this patch, reducing the cache target length could result in io errors (RHBZ: 2134334), out-of-bounds memory access to the discard bitset, and security concerns regarding data leakage. Verification steps: 1. create a cache metadata with some cached blocks mapped to the tail of the origin device. Here we use cache_restore v1.0 to build a metadata with one clean block mapped to the last origin block. cat <<EOF >> cmeta.xml <superblock uuid="" block_size="128" nr_cache_blocks="512" \ policy="smq" hint_width="4"> <mappings> <mapping cache_block="0" origin_block="4095" dirty="false"/> </mappings> </superblock> EOF dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" cache_restore -i cmeta.xml -o /dev/mapper/cmeta --metadata-version=2 dmsetup remove cmeta 2. bring up the cache whilst shrinking the cache origin by one block: dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" dmsetup create cdata --table "0 65536 linear /dev/sdc 8192" dmsetup create corig --table "0 524160 linear /dev/sdc 262144" dmsetup create cache --table "0 524160 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0" 3. check the number of cached data blocks via dmsetup status. It is expected to be zero. dmsetup status cache | cut -d ' ' -f 7 In addition to the script above, this patch can be verified using the "cache/resize" tests in dmtest-python: ./dmtest run --rx cache/resize/shrink_origin --result-set default Signed-off-by: Ming-Hung Tsai <mtsai@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Stable-dep-of: 322586745bd1 ("dm cache: fix dirty mapping checking in passthrough mode switching") Signed-off-by: Sasha Levin <sashal@kernel.org>
8 daysdm cache: fix concurrent write failure in passthrough modeMing-Hung Tsai1-0/+9
[ Upstream commit e4f66341779d0cf4c83c74793753a84094286d9e ] When bio prison cell lock acquisition fails due to concurrent writes to the same block in passthrough mode, dm-cache incorrectly returns an I/O error instead of properly handling the concurrency. This can occur in both process and workqueue contexts when invalidate_lock() is called for exclusive access to a data block. Fix this by deferring the write bios to ensure proper block device behavior. Reproduce steps: 1. Create a cache device dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" dmsetup create cdata --table "0 131072 linear /dev/sdc 8192" dmsetup create corig --table "0 262144 linear /dev/sdc 262144" dd if=/dev/zero of=/dev/mapper/cmeta bs=4k count=1 oflag=direct dmsetup create cache --table "0 262144 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0" 2. Promote the first data block into cache fio --filename=/dev/mapper/cache --name=populate --rw=write --bs=4k \ --direct=1 --size=64k 3. Reload the cache into passthrough mode dmsetup suspend cache dmsetup reload cache --table "0 262144 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 passthrough smq 0" dmsetup resume cache 4. Write to the first cached block concurrently. Sometimes one of the processes will receive I/O errors. fio --filename=/dev/mapper/cache --name test --rw=randwrite --bs=4k \ --randrepeat=0 --direct=1 --numjobs=2 --size 64k <snip> fio-3.41 fio: io_u error on file /dev/mapper/cache: Input/output error: write offset=4096, buflen=4096 fio: pid=106, err=5/file:io_u.c:2008, func=io_u error, error=Input/output error test: (groupid=0, jobs=1): err= 0: pid=105 test: (groupid=0, jobs=1): err= 5 (file:io_u.c:2008, func=io_u error, error=Input/output error): pid=106 <snip> Fixes: b29d4986d0da ("dm cache: significant rework to leverage dm-bio-prison-v2") Signed-off-by: Ming-Hung Tsai <mtsai@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
8 daysdm cache policy smq: fix missing locks in invalidating cache blocksMing-Hung Tsai1-0/+4
[ Upstream commit 2d1f7b65f5deedd2e6b09fdc6ea27f8375f24b45 ] In passthrough mode, the policy invalidate_mapping operation is called simultaneously from multiple workers, thus it should be protected by a lock. Otherwise, we might end up with data races on the allocated blocks counter, or even use-after-free issues with internal data structures when doing concurrent writes. Note that the existing FIXME in smq_invalidate_mapping() doesn't affect passthrough mode since migration tasks don't exist there, but would need attention if supporting fast device shrinking via suspend/resume without target reloading. Reproduce steps: 1. Create a cache device consisting of 1024 cache entries dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" dmsetup create cdata --table "0 131072 linear /dev/sdc 8192" dmsetup create corig --table "0 262144 linear /dev/sdc 262144" dd if=/dev/zero of=/dev/mapper/cmeta bs=4k count=1 oflag=direct dmsetup create cache --table "0 262144 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0" 2. Populate the cache, and record the number of cached blocks fio --name=populate --filename=/dev/mapper/cache --rw=randwrite --bs=4k \ --size=64m --direct=1 nr_cached=$(dmsetup status cache | awk '{split($7, a, "/"); print a[1]}') 3. Reload the cache into passthrough mode dmsetup suspend cache dmsetup reload cache --table "0 262144 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 passthrough smq 0" dmsetup resume cache 4. Write to the passthrough cache. By setting multiple jobs with I/O size equal to the cache block size, cache blocks are invalidated concurrently from different workers. fio --filename=/dev/mapper/cache --name=test --rw=randwrite --bs=64k \ --direct=1 --numjobs=2 --randrepeat=0 --size=64m 5. Check if demoted matches cached block count. These numbers should match but may differ due to the data race. nr_demoted=$(dmsetup status cache | awk '{print $12}') echo "$nr_cached, $nr_demoted" Fixes: b29d4986d0da ("dm cache: significant rework to leverage dm-bio-prison-v2") Signed-off-by: Ming-Hung Tsai <mtsai@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
8 daysdm cache: fix write hang in passthrough modeMing-Hung Tsai1-5/+25
[ Upstream commit 4ca8b8bd952df7c3ccdc68af9bd3419d0839a04b ] The invalidate_remove() function has incomplete logic for handling write hit bios after cache invalidation. It sets up the remapping for the overwrite_bio but then drops it immediately without submission, causing write operations to hang. Fix by adding a new invalidate_committed() continuation that submits the remapped writes to the cache origin after metadata commit completes, while using the overwrite_endio hook to ensure proper completion sequencing. This maintains existing coherency. Also improve error handling in invalidate_complete() to preserve the original error status instead of using bio_io_error() unconditionally. Fixes: b29d4986d0da ("dm cache: significant rework to leverage dm-bio-prison-v2") Signed-off-by: Ming-Hung Tsai <mtsai@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
8 daysdm cache: fix write path cache coherency in passthrough modeMing-Hung Tsai1-0/+1
[ Upstream commit 0c5eef0aad508231d8e43ff8392692925e131b68 ] In passthrough mode, dm-cache defers write bio submission until cache invalidation completes to maintain existing coherency, requiring the target map function to return DM_MAPIO_SUBMITTED. The current map_bio() returns DM_MAPIO_REMAPPED, violating the required ordering constraint. Reproduce steps: 1. Create a cache device dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" dmsetup create cdata --table "0 131072 linear /dev/sdc 8192" dmsetup create corig --table "0 262144 linear /dev/sdc 262144" dd if=/dev/zero of=/dev/mapper/cmeta bs=4k count=1 oflag=direct dmsetup create cache --table "0 262144 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0" 2. Promote the first data block into the cache fio --filename=/dev/mapper/cache --name=populate --rw=write --bs=4k \ --direct=1 --size=64k 3. Reload the cache into passthrough mode dmsetup suspend cache dmsetup reload cache --table "0 262144 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 passthrough smq 0" dmsetup resume cache 4. Write to the first data block, and check io ordering using ftrace echo 1 > /sys/kernel/debug/tracing/events/block/block_bio_queue/enable echo 1 > /sys/kernel/debug/tracing/events/block/block_bio_complete/enable echo 1 > /sys/kernel/debug/tracing/events/block/block_rq_complete/enable fio --filename=/dev/mapper/cache --name=test --rw=write --bs=64k \ --direct=1 --size 64k 5. ftrace logs show that write operations to the cache origin (252:2) and metadata operations (252:0) are unsynchronized: the origin write occurs before metadata commit. <snip> fio-146 [000] ..... 420.139562: block_bio_queue: 252,3 WS 0 + 128 [fio] fio-146 [000] ..... 420.149395: block_bio_queue: 252,2 WS 0 + 128 [fio] fio-146 [000] ..... 420.149763: block_bio_queue: 8,32 WS 262144 + 128 [fio] fio-146 [000] dNh1. 420.151446: block_rq_complete: 8,32 WS () 262144 + 128 be,0,4 [0] fio-146 [000] dNh1. 420.152731: block_bio_complete: 252,2 WS 0 + 128 [0] fio-146 [000] dNh1. 420.154229: block_bio_complete: 252,3 WS 0 + 128 [0] kworker/0:0-9 [000] ..... 420.160530: block_bio_queue: 252,0 W 408 + 8 [kworker/0:0] kworker/0:0-9 [000] ..... 420.161641: block_bio_queue: 8,32 W 408 + 8 [kworker/0:0] kworker/0:0-9 [000] ..... 420.162533: block_bio_queue: 252,0 W 416 + 8 [kworker/0:0] kworker/0:0-9 [000] ..... 420.162821: block_bio_queue: 8,32 W 416 + 8 [kworker/0:0] <snip> Fixes: b29d4986d0da ("dm cache: significant rework to leverage dm-bio-prison-v2") Signed-off-by: Ming-Hung Tsai <mtsai@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
8 daysdm cache: fix null-deref with concurrent writes in passthrough modeMing-Hung Tsai1-2/+4
[ Upstream commit 7d1f98d668ee34c1d15bdc0420fdd062f24a27c0 ] In passthrough mode, when dm-cache starts to invalidate a cache entry and bio prison cell lock fails due to concurrent write to the same cached block, mg->cell remains NULL. The error path in invalidate_complete() attempts to unlock and free the cell unconditionally, causing a NULL pointer dereference: KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 0 UID: 0 PID: 134 Comm: fio Not tainted 6.19.0-rc7 #3 PREEMPT RIP: 0010:dm_cell_unlock_v2+0x3f/0x210 <snip> Call Trace: invalidate_complete+0xef/0x430 map_bio+0x130f/0x1a10 cache_map+0x320/0x6b0 __map_bio+0x458/0x510 dm_submit_bio+0x40e/0x16d0 __submit_bio+0x419/0x870 <snip> Reproduce steps: 1. Create a cache device dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" dmsetup create cdata --table "0 131072 linear /dev/sdc 8192" dmsetup create corig --table "0 262144 linear /dev/sdc 262144" dd if=/dev/zero of=/dev/mapper/cmeta bs=4k count=1 oflag=direct dmsetup create cache --table "0 262144 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0" 2. Promote the first data block into cache fio --filename=/dev/mapper/cache --name=populate --rw=write --bs=4k \ --direct=1 --size=64k 3. Reload the cache into passthrough mode dmsetup suspend cache dmsetup reload cache --table "0 262144 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 passthrough smq 0" dmsetup resume cache 4. Write to the first cached block concurrently fio --filename=/dev/mapper/cache --name test --rw=randwrite --bs=4k \ --randrepeat=0 --direct=1 --numjobs=2 --size 64k Fix by checking if mg->cell is valid before attempting to unlock it. Fixes: b29d4986d0da ("dm cache: significant rework to leverage dm-bio-prison-v2") Signed-off-by: Ming-Hung Tsai <mtsai@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
14 daysmd/raid10: fix divide-by-zero in setup_geo() with zero far_copiesJunrui Luo1-0/+2
commit 9aa6d860b0930e2f72795665c42c44252a558a0c upstream. setup_geo() extracts near_copies (nc) and far_copies (fc) from the user-provided layout parameter without checking for zero. When fc=0 with the "improved" far set layout selected, 'geo->far_set_size = disks / fc' triggers a divide-by-zero. Validate nc and fc immediately after extraction, returning -1 if either is zero. Fixes: 475901aff158 ("MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 1)") Cc: stable@vger.kernel.org Signed-off-by: Junrui Luo <moonafterrain@outlook.com> Link: https://lore.kernel.org/linux-raid/SYBPR01MB7881A5E2556806CC1D318582AF232@SYBPR01MB7881.ausprd01.prod.outlook.com Signed-off-by: Yu Kuai <yukuai@fnnas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
14 daysdm-verity-fec: correctly reject too-small hash devicesEric Biggers1-1/+2
commit 4355142245f7e55336dcc005ec03592df4d546f8 upstream. Fix verity_fec_ctr() to reject too-small hash devices by correctly taking hash_start into account. Note that this is necessary because dm-verity doesn't call dm_bufio_set_sector_offset() on the hash device's bufio client (v->bufio). Thus, dm_bufio_get_device_size(v->bufio) returns a size relative to 0 rather than hash_start. An alternative fix would be to call dm_bufio_set_sector_offset() on v->bufio, but then all the code that reads from the hash device would have to be adjusted accordingly. Fixes: a739ff3f543a ("dm verity: add support for forward error correction") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
14 daysdm-verity-fec: correctly reject too-small FEC devicesEric Biggers1-3/+2
commit 2b14e0bb63cc671120e7791658f5c494fc66d072 upstream. Fix verity_fec_ctr() to reject too-small FEC devices by correctly computing the number of parity blocks as 'f->rounds * f->roots'. Previously it incorrectly used 'div64_u64(f->rounds * f->roots, v->fec->roots << SECTOR_SHIFT)' which is a much smaller value. Note that the units of 'rounds' are blocks, not bytes. This matches the units of the value returned by dm_bufio_get_device_size(), which are also blocks. A later commit will give 'rounds' a clearer name. Fixes: a739ff3f543a ("dm verity: add support for forward error correction") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
14 daysdm: fix a buffer overflow in ioctl processingMikulas Patocka1-0/+4
commit 2fa49cc884f6496a915c35621ba4da35649bf159 upstream. Tony Asleson (using Claude) found a buffer overflow in dm-ioctl in the function retrieve_status: 1. The code in retrieve_status checks that the output string fits into the output buffer and writes the output string there 2. Then, the code aligns the "outptr" variable to the next 8-byte boundary: outptr = align_ptr(outptr); 3. The alignment doesn't check overflow, so outptr could point past the buffer end 4. The "for" loop is iterated again, it executes: remaining = len - (outptr - outbuf); 5. If "outptr" points past "outbuf + len", the arithmetics wraps around and the variable "remaining" contains unusually high number 6. With "remaining" being high, the code writes more data past the end of the buffer Luckily, this bug has no security implications because: 1. Only root can issue device mapper ioctls 2. The commonly used libraries that communicate with device mapper (libdevmapper and devicemapper-rs) use buffer size that is aligned to 8 bytes - thus, "outptr = align_ptr(outptr)" can't overshoot the input buffer and the bug can't happen accidentally Reported-by: Tony Asleson <tasleson@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Bryn M. Reeves <bmr@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
14 daysdm: don't report warning when doing deferred removeMikulas Patocka1-1/+1
commit b7cce3e2cca9cd78418f3c3784474b778e7996fe upstream. If dm_hash_remove_all was called from dm_deferred_remove, it would write a warning "remove_all left %d open device(s)" if there are some other devices active. The warning is bogus, so let's disable it in this case. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Cc: stable@vger.kernel.org Fixes: 2c140a246dc0 ("dm: allow remove to be deferred") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
14 daysdm-thin: fix metadata refcount underflowMikulas Patocka1-0/+8
commit 09a65adc7d8bbfce06392cb6d375468e2728ead5 upstream. There's a bug in dm-thin in the function rebalance_children. If the internal btree node has one entry, the code tries to copy all btree entries from the node's child to the node itself and then decrement the child's reference count. If the child node is shared (it has reference count > 1), we won't free it, so there would be two pointers to each of the grandchildren nodes. But the reference counts of the grandchildren is not increased, thus the reference count doesn't match the number of pointers that point to the grandchildren. This results in "device mapper: space map common: unable to decrement block" errors. Fix this bug by incrementing reference counts on the grandchildren if the btree node is shared. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: 3241b1d3e0aa ("dm: add persistent data library") Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
14 daysdm mirror: fix integer overflow in create_dirty_log()Junrui Luo1-3/+3
commit 4c788c6f921b22f9b6c3f316c4a071c05683e7de upstream. The argument count calculation in create_dirty_log() performs `*args_used = 2 + param_count` before validating against argc. When a user provides a param_count close to UINT_MAX via the device mapper table string, this unsigned addition wraps around to a small value, causing the subsequent `argc < *args_used` check to be bypassed. The overflowed param_count is then passed as argc to dm_dirty_log_create(), where it can cause out-of-bounds reads on the argv array. Fix by comparing param_count against argc - 2 before performing the addition, following the same pattern used by parse_features() in the same file. Since argc >= 2 is already guaranteed, the subtraction is safe. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Reported-by: Yuhao Jiang <danisjiang@gmail.com> Signed-off-by: Junrui Luo <moonafterrain@outlook.com> Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
14 daysmd/raid5: validate payload size before accessing journal metadataJunrui Luo1-15/+33
commit b0cc3ae97e893bf54bbce447f4e9fd2e0b88bff9 upstream. r5c_recovery_analyze_meta_block() and r5l_recovery_verify_data_checksum_for_mb() iterate over payloads in a journal metadata block using on-disk payload size fields without validating them against the remaining space in the metadata block. A corrupted journal contains payload sizes extending beyond the PAGE_SIZE boundary can cause out-of-bounds reads when accessing payload fields or computing offsets. Add bounds validation for each payload type to ensure the full payload fits within meta_size before processing. Fixes: b4c625c67362 ("md/r5cache: r5cache recovery: part 1") Cc: stable@vger.kernel.org Signed-off-by: Junrui Luo <moonafterrain@outlook.com> Link: https://lore.kernel.org/linux-raid/SYBPR01MB78815E78D829BB86CD7C8015AF5FA@SYBPR01MB7881.ausprd01.prod.outlook.com/ Signed-off-by: Yu Kuai <yukuai@fnnas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
14 daysmd/raid5: fix soft lockup in retry_aligned_read()Chia-Ming Chang1-1/+7
commit 7f9f7c697474268d9ef9479df3ddfe7cdcfbbffc upstream. When retry_aligned_read() encounters an overlapped stripe, it releases the stripe via raid5_release_stripe() which puts it on the lockless released_stripes llist. In the next raid5d loop iteration, release_stripe_list() drains the stripe onto handle_list (since STRIPE_HANDLE is set by the original IO), but retry_aligned_read() runs before handle_active_stripes() and removes the stripe from handle_list via find_get_stripe() -> list_del_init(). This prevents handle_stripe() from ever processing the stripe to resolve the overlap, causing an infinite loop and soft lockup. Fix this by using __release_stripe() with temp_inactive_list instead of raid5_release_stripe() in the failure path, so the stripe does not go through the released_stripes llist. This allows raid5d to break out of its loop, and the overlap will be resolved when the stripe is eventually processed by handle_stripe(). Fixes: 773ca82fa1ee ("raid5: make release_stripe lockless") Cc: stable@vger.kernel.org Signed-off-by: FengWei Shih <dannyshih@synology.com> Signed-off-by: Chia-Ming Chang <chiamingc@synology.com> Link: https://lore.kernel.org/linux-raid/20260402061406.455755-1-chiamingc@synology.com/ Signed-off-by: Yu Kuai <yukuai@fnnas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
14 daysmd/raid10: fix deadlock with check operation and nowait requestsJosh Hunt1-2/+2
commit 7d96f3120a7fb7210d21b520c5b6f495da6ba436 upstream. When an array check is running it will raise the barrier at which point normal requests will become blocked and increment the nr_pending value to signal there is work pending inside of wait_barrier(). NOWAIT requests do not block and so will return immediately with an error, and additionally do not increment nr_pending in wait_barrier(). Upstream change commit 43806c3d5b9b ("raid10: cleanup memleak at raid10_make_request") added a call to raid_end_bio_io() to fix a memory leak when NOWAIT requests hit this condition. raid_end_bio_io() eventually calls allow_barrier() and it will unconditionally do an atomic_dec_and_test(&conf->nr_pending) even though the corresponding increment on nr_pending didn't happen in the NOWAIT case. This can be easily seen by starting a check operation while an application is doing nowait IO on the same array. This results in a deadlocked state due to nr_pending value underflowing and so the md resync thread gets stuck waiting for nr_pending to == 0. Output of r10conf state of the array when we hit this condition: crash> struct r10conf barrier = 1, nr_pending = { counter = -41 }, nr_waiting = 15, nr_queued = 0, Example of md_sync thread stuck waiting on raise_barrier() and other requests stuck in wait_barrier(): md1_resync [<0>] raise_barrier+0xce/0x1c0 [<0>] raid10_sync_request+0x1ca/0x1ed0 [<0>] md_do_sync+0x779/0x1110 [<0>] md_thread+0x90/0x160 [<0>] kthread+0xbe/0xf0 [<0>] ret_from_fork+0x34/0x50 [<0>] ret_from_fork_asm+0x1a/0x30 kworker/u1040:2+flush-253:4 [<0>] wait_barrier+0x1de/0x220 [<0>] regular_request_wait+0x30/0x180 [<0>] raid10_make_request+0x261/0x1000 [<0>] md_handle_request+0x13b/0x230 [<0>] __submit_bio+0x107/0x1f0 [<0>] submit_bio_noacct_nocheck+0x16f/0x390 [<0>] ext4_io_submit+0x24/0x40 [<0>] ext4_do_writepages+0x254/0xc80 [<0>] ext4_writepages+0x84/0x120 [<0>] do_writepages+0x7a/0x260 [<0>] __writeback_single_inode+0x3d/0x300 [<0>] writeback_sb_inodes+0x1dd/0x470 [<0>] __writeback_inodes_wb+0x4c/0xe0 [<0>] wb_writeback+0x18b/0x2d0 [<0>] wb_workfn+0x2a1/0x400 [<0>] process_one_work+0x149/0x330 [<0>] worker_thread+0x2d2/0x410 [<0>] kthread+0xbe/0xf0 [<0>] ret_from_fork+0x34/0x50 [<0>] ret_from_fork_asm+0x1a/0x30 Fixes: 43806c3d5b9b ("raid10: cleanup memleak at raid10_make_request") Cc: stable@vger.kernel.org Signed-off-by: Josh Hunt <johunt@akamai.com> Link: https://lore.kernel.org/linux-raid/20260303005619.1352958-1-johunt@akamai.com Signed-off-by: Yu Kuai <yukuai@fnnas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-27md/raid1: fix data lost for writemostly rdevYu Kuai1-1/+1
commit 93dec51e716db88f32d770dc9ab268964fff320b upstream. If writemostly is enabled, alloc_behind_master_bio() will allocate a new bio for rdev, with bi_opf set to 0. Later, raid1_write_request() will clone from this bio, hence bi_opf is still 0 for the cloned bio. Submit this cloned bio will end up to be read, causing write data lost. Fix this problem by inheriting bi_opf from original bio for behind_mast_bio. Fixes: e879a0d9cb08 ("md/raid1,raid10: don't ignore IO flags") Reported-and-tested-by: Ian Dall <ian@beware.dropbear.id.au> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220507 Link: https://lore.kernel.org/linux-raid/20250903014140.3690499-1-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Li Nan <linan122@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-27md/raid1,raid10: don't ignore IO flagsYu Kuai2-11/+0
commit e879a0d9cb086c8e52ce6c04e5bfa63825a6213c upstream. If blk-wbt is enabled by default, it's found that raid write performance is quite bad because all IO are throttled by wbt of underlying disks, due to flag REQ_IDLE is ignored. And turns out this behaviour exist since blk-wbt is introduced. Other than REQ_IDLE, other flags should not be ignored as well, for example REQ_META can be set for filesystems, clearing it can cause priority reverse problems; And REQ_NOWAIT should not be cleared as well, because io will wait instead of failing directly in underlying disks. Fix those problems by keep IO flags from master bio. Fises: f51d46d0e7cb ("md: add support for REQ_NOWAIT") Fixes: e34cbd307477 ("blk-wbt: add general throttling mechanism") Fixes: 5404bc7a87b9 ("[PATCH] Allow file systems to differentiate between data and meta reads") Link: https://lore.kernel.org/linux-raid/20250227121657.832356-1-yukuai1@huaweicloud.com Signed-off-by: Yu Kuai <yukuai3@huawei.com> [ Harshit: Resolve conflicts due to missing commit: f2a38abf5f1c ("md/raid1: Atomic write support") and commit: a1d9b4fd42d9 ("md/raid10: Atomic write support") in 6.12.y, we don't have Atomic writes feature in 6.12.y ] Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [ Based on Harshit's backport for 6.12, fixed minor conflicts for 6.6. ] Signed-off-by: Charles Xu <charles_xu@189.cn> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-27bcache: fix cached_dev.sb_bio use-after-free and crashMingzhe Zou1-0/+7
commit fec114a98b8735ee89c75216c45a78e28be0f128 upstream. In our production environment, we have received multiple crash reports regarding libceph, which have caught our attention: ``` [6888366.280350] Call Trace: [6888366.280452] blk_update_request+0x14e/0x370 [6888366.280561] blk_mq_end_request+0x1a/0x130 [6888366.280671] rbd_img_handle_request+0x1a0/0x1b0 [rbd] [6888366.280792] rbd_obj_handle_request+0x32/0x40 [rbd] [6888366.280903] __complete_request+0x22/0x70 [libceph] [6888366.281032] osd_dispatch+0x15e/0xb40 [libceph] [6888366.281164] ? inet_recvmsg+0x5b/0xd0 [6888366.281272] ? ceph_tcp_recvmsg+0x6f/0xa0 [libceph] [6888366.281405] ceph_con_process_message+0x79/0x140 [libceph] [6888366.281534] ceph_con_v1_try_read+0x5d7/0xf30 [libceph] [6888366.281661] ceph_con_workfn+0x329/0x680 [libceph] ``` After analyzing the coredump file, we found that the address of dc->sb_bio has been freed. We know that cached_dev is only freed when it is stopped. Since sb_bio is a part of struct cached_dev, rather than an alloc every time. If the device is stopped while writing to the superblock, the released address will be accessed at endio. This patch hopes to wait for sb_write to complete in cached_dev_free. It should be noted that we analyzed the cause of the problem, then tell all details to the QWEN and adopted the modifications it made. Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn> Fixes: cafe563591446 ("bcache: A block layer cache") Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Coly Li <colyli@fnnas.com> Link: https://patch.msgid.link/20260322134102.480107-1-colyli@fnnas.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-25dm-verity: disable recursive forward error correctionMikulas Patocka2-6/+1
[ Upstream commit d9f3e47d3fae0c101d9094bc956ed24e7a0ee801 ] There are two problems with the recursive correction: 1. It may cause denial-of-service. In fec_read_bufs, there is a loop that has 253 iterations. For each iteration, we may call verity_hash_for_block recursively. There is a limit of 4 nested recursions - that means that there may be at most 253^4 (4 billion) iterations. Red Hat QE team actually created an image that pushes dm-verity to this limit - and this image just makes the udev-worker process get stuck in the 'D' state. 2. It doesn't work. In fec_read_bufs we store data into the variable "fio->bufs", but fio bufs is shared between recursive invocations, if "verity_hash_for_block" invoked correction recursively, it would overwrite partially filled fio->bufs. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reported-by: Guangwu Zhang <guazhang@redhat.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Eric Biggers <ebiggers@kernel.org> [ The context change is due to the commit bdf253d580d7 ("dm-verity: remove support for asynchronous hashes") in v6.18 and the commit 9356fcfe0ac4 ("dm verity: set DM_TARGET_SINGLETON feature flag") in v6.9 which are irrelevant to the logic of this patch. ] Signed-off-by: Rahul Sharma <black.hawk@163.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-04dm mpath: make pg_init_delay_msecs settableBenjamin Marzinski1-1/+1
[ Upstream commit 218b16992a37ea97b9e09b7659a25a864fb9976f ] "pg_init_delay_msecs X" can be passed as a feature in the multipath table and is used to set m->pg_init_delay_msecs in parse_features(). However, alloc_multipath_stage2(), which is called after parse_features(), resets m->pg_init_delay_msecs to its default value. Instead, set m->pg_init_delay_msecs in alloc_multipath(), which is called before parse_features(), to avoid overwriting a value passed in by the table. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04md/bitmap: fix GPF in write_page caused by resize raceJack Wang1-1/+2
[ Upstream commit 46ef85f854dfa9d5226b3c1c46493d79556c9589 ] A General Protection Fault occurs in write_page() during array resize: RIP: 0010:write_page+0x22b/0x3c0 [md_mod] This is a use-after-free race between bitmap_daemon_work() and __bitmap_resize(). The daemon iterates over `bitmap->storage.filemap` without locking, while the resize path frees that storage via md_bitmap_file_unmap(). `quiesce()` does not stop the md thread, allowing concurrent access to freed pages. Fix by holding `mddev->bitmap_info.mutex` during the bitmap update. Link: https://lore.kernel.org/linux-raid/20260120102456.25169-1-jinpu.wang@ionos.com Closes: https://lore.kernel.org/linux-raid/CAMGffE=Mbfp=7xD_hYxXk1PAaCZNSEAVeQGKGy7YF9f2S4=NEA@mail.gmail.com/T/#u Cc: stable@vger.kernel.org Fixes: d60b479d177a ("md/bitmap: add bitmap_resize function to allow bitmap resizing.") Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Yu Kuai <yukuai@fnnas.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04dm-unstripe: fix mapping bug when there are multiple targets in a tableMatt Whitlock1-1/+1
[ Upstream commit 83c10e8dd43628d0bf86486616556cd749a3c310 ] The "unstriped" device-mapper target incorrectly calculates the sector offset on the mapped device when the target's origin is not zero. Take for example this hypothetical concatenation of the members of a two-disk RAID0: linearized: 0 2097152 unstriped 2 128 0 /dev/md/raid0 0 linearized: 2097152 2097152 unstriped 2 128 1 /dev/md/raid0 0 The intent in this example is to create a single device named /dev/mapper/linearized that comprises all of the chunks of the first disk of the RAID0 set, followed by all of the chunks of the second disk of the RAID0 set. This fails because dm-unstripe.c's map_to_core function does its computations based on the sector number within the mapper device rather than the sector number within the target. The bug turns invisible when the target's origin is at sector zero of the mapper device, as is the common case. In the example above, however, what happens is that the first half of the mapper device gets mapped correctly to the first disk of the RAID0, but the second half of the mapper device gets mapped past the end of the RAID0 device, and accesses to any of those sectors return errors. Signed-off-by: Matt Whitlock <kernel@mattwhitlock.name> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Fixes: 18a5bf270532 ("dm: add unstriped target") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04dm-integrity: fix recalculation in bitmap modeMikulas Patocka1-0/+13
[ Upstream commit 118ba36e446c01e3cd34b3eedabf1d9436525e1d ] There's a logic quirk in the handling of suspend in the bitmap mode: This is the sequence of calls if we are reloading a dm-integrity table: * dm_integrity_ctr reads a superblock with the flag SB_FLAG_DIRTY_BITMAP set. * dm_integrity_postsuspend initializes a journal and clears the flag SB_FLAG_DIRTY_BITMAP. * dm_integrity_resume sees the superblock with SB_FLAG_DIRTY_BITMAP set - thus it interprets the journal as if it were a bitmap. This quirk causes recalculation problem if the user increases the size of the device in the bitmap mode. Fix this by reading a fresh copy on the superblock in dm_integrity_resume. This commit also fixes another logic quirk - the branch that sets bitmap bits if the device was extended should only be executed if the flag SB_FLAG_DIRTY_BITMAP is set. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: Ondrej Kozina <okozina@redhat.com> Fixes: 468dfca38b1a ("dm integrity: add a bitmap mode") Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04dm: clear cloned request bio pointer when last clone bio completesMichael Liang1-3/+10
[ Upstream commit fb8a6c18fb9a6561f7a15b58b272442b77a242dd ] Stale rq->bio values have been observed to cause double-initialization of cloned bios in request-based device-mapper targets, leading to use-after-free and double-free scenarios. One such case occurs when using dm-multipath on top of a PCIe NVMe namespace, where cloned request bios are freed during blk_complete_request(), but rq->bio is left intact. Subsequent clone teardown then attempts to free the same bios again via blk_rq_unprep_clone(). The resulting double-free path looks like: nvme_pci_complete_batch() nvme_complete_batch() blk_mq_end_request_batch() blk_complete_request() // called on a DM clone request bio_endio() // first free of all clone bios ... rq->end_io() // end_clone_request() dm_complete_request(tio->orig) dm_softirq_done() dm_done() dm_end_request() blk_rq_unprep_clone() // second free of clone bios Fix this by clearing the clone request's bio pointer when the last cloned bio completes, ensuring that later teardown paths do not attempt to free already-released bios. Signed-off-by: Michael Liang <mliang@purestorage.com> Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04dm-integrity: fix a typo in the code for write/discard raceMikulas Patocka1-1/+1
[ Upstream commit c698b7f417801fcd79f0dc844250b3361d38e6b8 ] If we send a write followed by a discard, it may be possible that the discarded data end up being overwritten by the previous write from the journal. The code tries to prevent that, but there was a typo in this logic that made it not being activated as it should be. Note that if we end up here the second time (when discard_retried is true), it means that the write bio is actually racing with the discard bio, and in this situation it is not specified which of them should win. Cc: stable@vger.kernel.org Fixes: 31843edab7cb ("dm integrity: improve discard in journal mode") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04dm-verity: correctly handle dm_bufio_client_create() failureEric Biggers1-2/+2
[ Upstream commit 119f4f04186fa4f33ee6bd39af145cdaff1ff17f ] If either of the calls to dm_bufio_client_create() in verity_fec_ctr() fails, then dm_bufio_client_destroy() is later called with an ERR_PTR() argument. That causes a crash. Fix this. Fixes: a739ff3f543a ("dm verity: add support for forward error correction") Cc: stable@vger.kernel.org Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04dm: remove fake timeout to avoid leak requestDing Hui1-2/+1
[ Upstream commit f3a9c95a15d2f4466acad5c68faeff79ca5e9f47 ] Since commit 15f73f5b3e59 ("blk-mq: move failure injection out of blk_mq_complete_request"), drivers are responsible for calling blk_should_fake_timeout() at appropriate code paths and opportunities. However, the dm driver does not implement its own timeout handler and relies on the timeout handling of its slave devices. If an io-timeout-fail error is injected to a dm device, the request will be leaked and never completed, causing tasks to hang indefinitely. Reproduce: 1. prepare dm which has iscsi slave device 2. inject io-timeout-fail to dm echo 1 >/sys/class/block/dm-0/io-timeout-fail echo 100 >/sys/kernel/debug/fail_io_timeout/probability echo 10 >/sys/kernel/debug/fail_io_timeout/times 3. read/write dm 4. iscsiadm -m node -u Result: hang task like below [ 862.243768] INFO: task kworker/u514:2:151 blocked for more than 122 seconds. [ 862.244133] Tainted: G E 6.19.0-rc1+ #51 [ 862.244337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 862.244718] task:kworker/u514:2 state:D stack:0 pid:151 tgid:151 ppid:2 task_flags:0x4288060 flags:0x00080000 [ 862.245024] Workqueue: iscsi_ctrl_3:1 __iscsi_unbind_session [scsi_transport_iscsi] [ 862.245264] Call Trace: [ 862.245587] <TASK> [ 862.245814] __schedule+0x810/0x15c0 [ 862.246557] schedule+0x69/0x180 [ 862.246760] blk_mq_freeze_queue_wait+0xde/0x120 [ 862.247688] elevator_change+0x16d/0x460 [ 862.247893] elevator_set_none+0x87/0xf0 [ 862.248798] blk_unregister_queue+0x12e/0x2a0 [ 862.248995] __del_gendisk+0x231/0x7e0 [ 862.250143] del_gendisk+0x12f/0x1d0 [ 862.250339] sd_remove+0x85/0x130 [sd_mod] [ 862.250650] device_release_driver_internal+0x36d/0x530 [ 862.250849] bus_remove_device+0x1dd/0x3f0 [ 862.251042] device_del+0x38a/0x930 [ 862.252095] __scsi_remove_device+0x293/0x360 [ 862.252291] scsi_remove_target+0x486/0x760 [ 862.252654] __iscsi_unbind_session+0x18a/0x3e0 [scsi_transport_iscsi] [ 862.252886] process_one_work+0x633/0xe50 [ 862.253101] worker_thread+0x6df/0xf10 [ 862.253647] kthread+0x36d/0x720 [ 862.254533] ret_from_fork+0x2a6/0x470 [ 862.255852] ret_from_fork_asm+0x1a/0x30 [ 862.256037] </TASK> Remove the blk_should_fake_timeout() check from dm, as dm has no native timeout handling and should not attempt to fake timeouts. Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04dm: replace -EEXIST with -EBUSYDaniel Gomez4-4/+4
[ Upstream commit b13ef361d47f09b7aecd18e0383ecc83ff61057e ] The -EEXIST error code is reserved by the module loading infrastructure to indicate that a module is already loaded. When a module's init function returns -EEXIST, userspace tools like kmod interpret this as "module already loaded" and treat the operation as successful, returning 0 to the user even though the module initialization actually failed. This follows the precedent set by commit 54416fd76770 ("netfilter: conntrack: helper: Replace -EEXIST by -EBUSY") which fixed the same issue in nf_conntrack_helper_register(). Affected modules: * dm_cache dm_clone dm_integrity dm_mirror dm_multipath dm_pcache * dm_vdo dm-ps-round-robin dm_historical_service_time dm_io_affinity * dm_queue_length dm_service_time dm_snapshot Signed-off-by: Daniel Gomez <da.gomez@samsung.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04md-cluster: fix NULL pointer dereference in process_metadata_updateJiasheng Jiang1-1/+6
[ Upstream commit f150e753cb8dd756085f46e86f2c35ce472e0a3c ] The function process_metadata_update() blindly dereferences the 'thread' pointer (acquired via rcu_dereference_protected) within the wait_event() macro. While the code comment states "daemon thread must exist", there is a valid race condition window during the MD array startup sequence (md_run): 1. bitmap_load() is called, which invokes md_cluster_ops->join(). 2. join() starts the "cluster_recv" thread (recv_daemon). 3. At this point, recv_daemon is active and processing messages. 4. However, mddev->thread (the main MD thread) is not initialized until later in md_run(). If a METADATA_UPDATED message is received from a remote node during this specific window, process_metadata_update() will be called while mddev->thread is still NULL, leading to a kernel panic. To fix this, we must validate the 'thread' pointer. If it is NULL, we release the held lock (no_new_dev_lockres) and return early, safely ignoring the update request as the array is not yet fully ready to process it. Link: https://lore.kernel.org/linux-raid/20260117145903.28921-1-jiashengjiangcool@gmail.com Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com> Signed-off-by: Yu Kuai <yukuai@fnnas.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04dm: use bio_clone_blkg_associationMikulas Patocka1-0/+2
[ Upstream commit 2df8b310bcfe76827fd71092f58a2493ee6590b0 ] The origin bio carries blk-cgroup information which could be set from foreground(task_css(css) - wbc->wb->blkcg_css), so the blkcg won't control buffer io since commit ca522482e3eaf ("dm: pass NULL bdev to bio_alloc_clone"). The synchronous io is still under control by blkcg, because 'bio->bi_blkg' is set by io submitting task which has been added into 'cgroup.procs'. Fix it by using bio_clone_blkg_association when submitting a cloned bio. Link: https://bugzilla.kernel.org/show_bug.cgi?id=220985 Fixes: ca522482e3eaf ("dm: pass NULL bdev to bio_alloc_clone") Reported-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04md/raid10: fix any_working flag handling in raid10_sync_requestLi Nan1-1/+1
[ Upstream commit 99582edb3f62e8ee6c34512021368f53f9b091f2 ] In raid10_sync_request(), 'any_working' indicates if any IO will be submitted. When there's only one In_sync disk with badblocks, 'any_working' might be set to 1 but no IO is submitted. Fix it by setting 'any_working' after badblock checks. Link: https://lore.kernel.org/linux-raid/20260105110300.1442509-11-linan666@huaweicloud.com Fixes: e875ecea266a ("md/raid10 record bad blocks as needed during recovery.") Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Yu Kuai <yukuai@fnnas.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-17dm-snapshot: fix 'scheduling while atomic' on real-time kernelsMikulas Patocka2-40/+35
[ Upstream commit 8581b19eb2c5ccf06c195d3b5468c3c9d17a5020 ] There is reported 'scheduling while atomic' bug when using dm-snapshot on real-time kernels. The reason for the bug is that the hlist_bl code does preempt_disable() when taking the lock and the kernel attempts to take other spinlocks while holding the hlist_bl lock. Fix this by converting a hlist_bl spinlock into a regular spinlock. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reported-by: Jiping Ma <jiping.ma2@windriver.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-11dm-bufio: align write boundary on physical block sizeMikulas Patocka1-4/+6
commit d0ac06ae53be0cdb61f5fe6b62d25d3317c51657 upstream. There may be devices with physical block size larger than 4k. If dm-bufio sends I/O that is not aligned on physical block size, performance is degraded. The 4k minimum alignment limit is there because some SSDs report logical and physical block size 512 despite having 4k internally - so dm-bufio shouldn't send I/Os not aligned on 4k boundary, because they perform badly (the SSD does read-modify-write for them). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reported-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-11dm-ebs: Mark full buffer dirty even on partial writeUladzislau Rezki (Sony)1-1/+1
commit 7fa3e7d114abc9cc71cc35d768e116641074ddb4 upstream. When performing a read-modify-write(RMW) operation, any modification to a buffered block must cause the entire buffer to be marked dirty. Marking only a subrange as dirty is incorrect because the underlying device block size(ubs) defines the minimum read/write granularity. A lower device can perform I/O only on regions which are fully aligned and sized to ubs. This change ensures that write-back operations always occur in full ubs-sized chunks, matching the intended emulation semantics of the EBS target. As for user space visible impact, submitting sub-ubs and misaligned I/O for devices which are tuned to ubs sizes only, will reject such requests, therefore it can lead to losing data. Example: 1) Create a 8K nvme device in qemu by adding -device nvme,drive=drv0,serial=foo,logical_block_size=8192,physical_block_size=8192 2) Setup dm-ebs to emulate 512B to 8K mapping urezki@pc638:~/bin$ cat dmsetup.sh lower=/dev/nvme0n1 len=$(blockdev --getsz "$lower") echo "0 $len ebs $lower 0 1 16" | dmsetup create nvme-8k urezki@pc638:~/bin$ offset 0, ebs=1 and ubs=16(in sectors). 3) Create an ext4 filesystem(default 4K block size) urezki@pc638:~/bin$ sudo mkfs.ext4 -F /dev/dm-0 mke2fs 1.47.0 (5-Feb-2023) Discarding device blocks: done Creating filesystem with 2072576 4k blocks and 518144 inodes Filesystem UUID: bd0b6ca6-0506-4e31-86da-8d22c9d50b63 Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632 Allocating group tables: done Writing inode tables: done Creating journal (16384 blocks): done Writing superblocks and filesystem accounting information: mkfs.ext4: Input/output error while writing out and closing file system urezki@pc638:~/bin$ dmesg <snip> [ 1618.875449] buffer_io_error: 1028 callbacks suppressed [ 1618.875456] Buffer I/O error on dev dm-0, logical block 0, lost async page write [ 1618.875527] Buffer I/O error on dev dm-0, logical block 1, lost async page write [ 1618.875602] Buffer I/O error on dev dm-0, logical block 2, lost async page write [ 1618.875620] Buffer I/O error on dev dm-0, logical block 3, lost async page write [ 1618.875639] Buffer I/O error on dev dm-0, logical block 4, lost async page write [ 1618.894316] Buffer I/O error on dev dm-0, logical block 5, lost async page write [ 1618.894358] Buffer I/O error on dev dm-0, logical block 6, lost async page write [ 1618.894380] Buffer I/O error on dev dm-0, logical block 7, lost async page write [ 1618.894405] Buffer I/O error on dev dm-0, logical block 8, lost async page write [ 1618.894427] Buffer I/O error on dev dm-0, logical block 9, lost async page write <snip> Many I/O errors because the lower 8K device rejects sub-ubs/misaligned requests. with a patch: urezki@pc638:~/bin$ sudo mkfs.ext4 -F /dev/dm-0 mke2fs 1.47.0 (5-Feb-2023) Discarding device blocks: done Creating filesystem with 2072576 4k blocks and 518144 inodes Filesystem UUID: 9b54f44f-ef55-4bd4-9e40-c8b775a616ac Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632 Allocating group tables: done Writing inode tables: done Creating journal (16384 blocks): done Writing superblocks and filesystem accounting information: done urezki@pc638:~/bin$ sudo mount /dev/dm-0 /mnt/ urezki@pc638:~/bin$ ls -al /mnt/ total 24 drwxr-xr-x 3 root root 4096 Oct 17 15:13 . drwxr-xr-x 19 root root 4096 Jul 10 19:42 .. drwx------ 2 root root 16384 Oct 17 15:13 lost+found urezki@pc638:~/bin$ After this change: mkfs completes; mount succeeds. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-11dm log-writes: Add missing set_freezable() for freezable kthreadHaotian Zhang1-0/+1
[ Upstream commit ab08f9c8b363297cafaf45475b08f78bf19b88ef ] The log_writes_kthread() calls try_to_freeze() but lacks set_freezable(), rendering the freeze attempt ineffective since kernel threads are non-freezable by default. This prevents proper thread suspension during system suspend/hibernate. Add set_freezable() to explicitly mark the thread as freezable. Fixes: 0e9cebe72459 ("dm: add log writes target") Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn> Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-11dm-raid: fix possible NULL dereference with undefined raid typeAlexey Simakov1-0/+2
[ Upstream commit 2f6cfd6d7cb165a7af8877b838a9f6aab4159324 ] rs->raid_type is assigned from get_raid_type_by_ll(), which may return NULL. This NULL value could be dereferenced later in the condition 'if (!(rs_is_raid10(rs) && rt_is_raid0(rs->raid_type)))'. Add a fail-fast check to return early with an error if raid_type is NULL, similar to other uses of this function. Found by Linux Verification Center (linuxtesting.org) with Svace. Fixes: 33e53f06850f ("dm raid: introduce extended superblock and new raid types to support takeover/reshaping") Signed-off-by: Alexey Simakov <bigalex934@gmail.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-11md/raid5: fix IO hang when array is broken with IO inflightYu Kuai1-2/+4
[ Upstream commit a913d1f6a7f607c110aeef8b58c8988f47a4b24e ] Following test can cause IO hang: mdadm -CvR /dev/md0 -l10 -n4 /dev/sd[abcd] --assume-clean --chunk=64K --bitmap=none sleep 5 echo 1 > /sys/block/sda/device/delete echo 1 > /sys/block/sdb/device/delete echo 1 > /sys/block/sdc/device/delete echo 1 > /sys/block/sdd/device/delete dd if=/dev/md0 of=/dev/null bs=8k count=1 iflag=direct Root cause: 1) all disks removed, however all rdevs in the array is still in sync, IO will be issued normally. 2) IO failure from sda, and set badblocks failed, sda will be faulty and MD_SB_CHANGING_PENDING will be set. 3) error recovery try to recover this IO from other disks, IO will be issued to sdb, sdc, and sdd. 4) IO failure from sdb, and set badblocks failed again, now array is broken and will become read-only. 5) IO failure from sdc and sdd, however, stripe can't be handled anymore because MD_SB_CHANGING_PENDING is set: handle_stripe handle_stripe if (test_bit MD_SB_CHANGING_PENDING) set_bit STRIPE_HANDLE goto finish // skip handling failed stripe release_stripe if (test_bit STRIPE_HANDLE) list_add_tail conf->hand_list 6) later raid5d can't handle failed stripe as well: raid5d md_check_recovery md_update_sb if (!md_is_rdwr()) // can't clear pending bit return if (test_bit MD_SB_CHANGING_PENDING) break; // can't handle failed stripe Since MD_SB_CHANGING_PENDING can never be cleared for read-only array, fix this problem by skip this checking for read-only array. Link: https://lore.kernel.org/linux-raid/20251117085557.770572-3-yukuai@fnnas.com Fixes: d87f064f5874 ("md: never update metadata when array is read-only.") Signed-off-by: Yu Kuai <yukuai@fnnas.com> Reviewed-by: Li Nan <linan122@huawei.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-07dm-verity: fix unreliable memory allocationMikulas Patocka1-5/+1
commit fe680d8c747f4e676ac835c8c7fb0f287cd98758 upstream. GFP_NOWAIT allocation may fail anytime. It needs to be changed to GFP_NOIO. There's no need to handle an error because mempool_alloc with GFP_NOIO can't fail. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-15dm: fix NULL pointer dereference in __dm_suspend()Zheng Qixing1-3/+4
commit 8d33a030c566e1f105cd5bf27f37940b6367f3be upstream. There is a race condition between dm device suspend and table load that can lead to null pointer dereference. The issue occurs when suspend is invoked before table load completes: BUG: kernel NULL pointer dereference, address: 0000000000000054 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 6 PID: 6798 Comm: dmsetup Not tainted 6.6.0-g7e52f5f0ca9b #62 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 RIP: 0010:blk_mq_wait_quiesce_done+0x0/0x50 Call Trace: <TASK> blk_mq_quiesce_queue+0x2c/0x50 dm_stop_queue+0xd/0x20 __dm_suspend+0x130/0x330 dm_suspend+0x11a/0x180 dev_suspend+0x27e/0x560 ctl_ioctl+0x4cf/0x850 dm_ctl_ioctl+0xd/0x20 vfs_ioctl+0x1d/0x50 __se_sys_ioctl+0x9b/0xc0 __x64_sys_ioctl+0x19/0x30 x64_sys_call+0x2c4a/0x4620 do_syscall_64+0x9e/0x1b0 The issue can be triggered as below: T1 T2 dm_suspend table_load __dm_suspend dm_setup_md_queue dm_mq_init_request_queue blk_mq_init_allocated_queue => q->mq_ops = set->ops; (1) dm_stop_queue / dm_wait_for_completion => q->tag_set NULL pointer! (2) => q->tag_set = set; (3) Fix this by checking if a valid table (map) exists before performing request-based suspend and waiting for target I/O. When map is NULL, skip these table-dependent suspend steps. Even when map is NULL, no I/O can reach any target because there is no table loaded; I/O submitted in this state will fail early in the DM layer. Skipping the table-dependent suspend logic in this case is safe and avoids NULL pointer dereferences. Fixes: c4576aed8d85 ("dm: fix request-based dm's use of dm_wait_for_completion") Cc: stable@vger.kernel.org Signed-off-by: Zheng Qixing <zhengqixing@huawei.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-15dm: fix queue start/stop imbalance under suspend/load/resume racesZheng Qixing2-3/+6
commit 7f597c2cdb9d3263a6fce07c4fc0a9eaa8e8fc43 upstream. When suspend and load run concurrently, before q->mq_ops is set in blk_mq_init_allocated_queue(), __dm_suspend() skip dm_stop_queue(). As a result, the queue's quiesce depth is not incremented. Later, once table load has finished and __dm_resume() runs, which triggers q->quiesce_depth ==0 warning in blk_mq_unquiesce_queue(): Call Trace: <TASK> dm_start_queue+0x16/0x20 [dm_mod] __dm_resume+0xac/0xb0 [dm_mod] dm_resume+0x12d/0x150 [dm_mod] do_resume+0x2c2/0x420 [dm_mod] dev_suspend+0x30/0x130 [dm_mod] ctl_ioctl+0x402/0x570 [dm_mod] dm_ctl_ioctl+0x23/0x30 [dm_mod] Fix this by explicitly tracking whether the request queue was stopped in __dm_suspend() via a new DMF_QUEUE_STOPPED flag. Only call dm_start_queue() in __dm_resume() if the queue was actually stopped. Fixes: e70feb8b3e68 ("blk-mq: support concurrent queue quiesce/unquiesce") Cc: stable@vger.kernel.org Signed-off-by: Zheng Qixing <zhengqixing@huawei.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-12dm-integrity: limit MAX_TAG_SIZE to 255Mikulas Patocka1-1/+1
[ Upstream commit 77b8e6fbf9848d651f5cb7508f18ad0971f3ffdb ] MAX_TAG_SIZE was 0x1a8 and it may be truncated in the "bi->metadata_size = ic->tag_size" assignment. We need to limit it to 255. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-25minmax: add a few more MIN_T/MAX_T usersLinus Torvalds1-3/+3
[ Upstream commit 4477b39c32fdc03363affef4b11d48391e6dc9ff ] Commit 3a7e02c040b1 ("minmax: avoid overly complicated constant expressions in VM code") added the simpler MIN_T/MAX_T macros in order to avoid some excessive expansion from the rather complicated regular min/max macros. The complexity of those macros stems from two issues: (a) trying to use them in situations that require a C constant expression (in static initializers and for array sizes) (b) the type sanity checking and MIN_T/MAX_T avoids both of these issues. Now, in the whole (long) discussion about all this, it was pointed out that the whole type sanity checking is entirely unnecessary for min_t/max_t which get a fixed type that the comparison is done in. But that still leaves min_t/max_t unnecessarily complicated due to worries about the C constant expression case. However, it turns out that there really aren't very many cases that use min_t/max_t for this, and we can just force-convert those. This does exactly that. Which in turn will then allow for much simpler implementations of min_t()/max_t(). All the usual "macros in all upper case will evaluate the arguments multiple times" rules apply. We should do all the same things for the regular min/max() vs MIN/MAX() cases, but that has the added complexity of various drivers defining their own local versions of MIN/MAX, so that needs another level of fixes first. Link: https://lore.kernel.org/all/b47fad1d0cf8449886ad148f8c013dae@AcuMS.aculab.com/ Cc: David Laight <David.Laight@aculab.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Eliav Farber <farbere@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28dm-table: fix checking for rq stackable devicesBenjamin Marzinski1-5/+5
[ Upstream commit 8ca719b81987be690f197e82fdb030580c0a07f3 ] Due to the semantics of iterate_devices(), the current code allows a request-based dm table as long as it includes one request-stackable device. It is supposed to only allow tables where there are no non-request-stackable devices. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Reviewed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-28dm-mpath: don't print the "loaded" message if registering failsMikulas Patocka4-4/+12
[ Upstream commit 6e11952a6abc4641dc8ae63f01b318b31b44e8db ] If dm_register_path_selector, don't print the "version X loaded" message. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>