summaryrefslogtreecommitdiff
path: root/drivers/md
AgeCommit message (Collapse)AuthorFilesLines
2025-04-20dm-integrity: set ti->error on memory allocation failureMikulas Patocka1-0/+3
commit 00204ae3d6712ee053353920e3ce2b00c35ef75b upstream. The dm-integrity target didn't set the error string when memory allocation failed. This patch fixes it. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-20dm-ebs: fix prefetch-vs-suspend raceMikulas Patocka1-0/+7
commit 9c565428788fb9b49066f94ab7b10efc686a0a4c upstream. There's a possible race condition in dm-ebs - dm bufio prefetch may be in progress while the device is suspended. Fix this by calling dm_bufio_client_reset in the postsuspend hook. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-22dm-flakey: Fix memory corruption in optional corrupt_bio_byte featureKent Overstreet1-1/+1
commit 57e9417f69839cb10f7ffca684c38acd28ceb57b upstream. Fix memory corruption due to incorrect parameter being passed to bio_init Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org # v6.5+ Fixes: 1d9a94389853 ("dm flakey: clone pages on write bio before corrupting them") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-07dm vdo: add missing spin_lock_initKen Raeburn1-0/+1
commit 36e1b81f599a093ec7477e4593e110104adcfb96 upstream. Signed-off-by: Ken Raeburn <raeburn@redhat.com> Signed-off-by: Matthew Sakai <msakai@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-07dm-integrity: Avoid divide by zero in table status in Inline modeMilan Broz1-4/+4
commit 7fb39882b20c98a9a393c244c86b56ef6933cff8 upstream. In Inline mode, the journal is unused, and journal_sectors is zero. Calculating the journal watermark requires dividing by journal_sectors, which should be done only if the journal is configured. Otherwise, a simple table query (dmsetup table) can cause OOPS. This bug did not show on some systems, perhaps only due to compiler optimization. On my 32-bit testing machine, this reliably crashes with the following: : Oops: divide error: 0000 [#1] PREEMPT SMP : CPU: 0 UID: 0 PID: 2450 Comm: dmsetup Not tainted 6.14.0-rc2+ #959 : EIP: dm_integrity_status+0x2f8/0xab0 [dm_integrity] ... Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: fb0987682c62 ("dm-integrity: introduce the Inline mode") Cc: stable@vger.kernel.org # 6.11+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-27md/raid*: Fix the set_queue_limits implementationsBart Van Assche3-9/+3
[ Upstream commit fbe8f2fa971c537571994a0df532c511c4fb5537 ] queue_limits_cancel_update() must only be called if queue_limits_start_update() is called first. Remove the queue_limits_cancel_update() calls from the raid*_set_limits() functions because there is no corresponding queue_limits_start_update() call. Cc: Christoph Hellwig <hch@lst.de> Fixes: c6e56cf6b2e7 ("block: move integrity information into queue_limits") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/linux-raid/20250212171108.3483150-1-bvanassche@acm.org/ Signed-off-by: Yu Kuai <yukuai@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-17md: Fix linear_set_limits()Bart Van Assche1-3/+1
commit a572593ac80e51eb69ecede7e614289fcccdbf8d upstream. queue_limits_cancel_update() must only be called if queue_limits_start_update() is called first. Remove the queue_limits_cancel_update() call from linear_set_limits() because there is no corresponding queue_limits_start_update() call. This bug was discovered by annotating all mutex operations with clang thread-safety attributes and by building the kernel with clang and -Wthread-safety. Cc: Yu Kuai <yukuai3@huawei.com> Cc: Coly Li <colyli@kernel.org> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Fixes: 127186cfb184 ("md: reintroduce md-linear") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250129225636.2667932-1-bvanassche@acm.org Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17md/md-linear: Fix a NULL vs IS_ERR() bug in linear_add()Dan Carpenter1-2/+2
commit 62c552070a980363d55a6082b432ebd1cade7a6e upstream. The linear_conf() returns error pointers, it doesn't return NULL. Update the error checking to match. Fixes: 127186cfb184 ("md: reintroduce md-linear") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/add654be-759f-4b2d-93ba-a3726dae380c@stanley.mountain Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17dm-crypt: track tag_offset in convert_contextHou Tao1-6/+7
commit 8b8f8037765757861f899ed3a2bfb34525b5c065 upstream. dm-crypt uses tag_offset to index the integrity metadata for each crypt sector. When the initial crypt_convert() returns BLK_STS_DEV_RESOURCE, dm-crypt will try to continue the crypt/decrypt procedure in a kworker. However, it resets tag_offset as zero instead of using the tag_offset related with current sector. It may return unexpected data when using random IV or return unexpected integrity related error. Fix the problem by tracking tag_offset in per-IO convert_context. Therefore, when the crypt/decrypt procedure continues in a kworker, it could use the next tag_offset saved in convert_context. Fixes: 8abec36d1274 ("dm crypt: do not wait for backlogged crypto request completion in softirq") Cc: stable@vger.kernel.org Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17dm-crypt: don't update io->sector after kcryptd_crypt_write_io_submit()Hou Tao1-11/+3
commit 9fdbbdbbc92b1474a87b89f8b964892a63734492 upstream. The updates of io->sector are the leftovers when dm-crypt allocated pages for partial write request. However, since commit cf2f1abfbd0db ("dm crypt: don't allocate pages for a partial request"), there is no partial request anymore. After the introduction of write request rb-tree, the updates of io->sectors may interfere the insertion procedure, because ->sectors of these write requests which have already been added in the rb-tree may be changed during the insertion of new write request. Fix it by removing these buggy updates of io->sectors. Considering these updates only effect the write request rb-tree, the commit which introduces the write request rb-tree is used as the fix tag. Fixes: b3c5fd305249 ("dm crypt: sort writes") Cc: stable@vger.kernel.org Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17md: reintroduce md-linearYu Kuai5-3/+376
commit 127186cfb184eaccdfe948e6da66940cfa03efc5 upstream. THe md-linear is removed by commit 849d18e27be9 ("md: Remove deprecated CONFIG_MD_LINEAR") because it has been marked as deprecated for a long time. However, md-linear is used widely for underlying disks with different size, sadly we didn't know this until now, and it's true useful to create partitions and assemble multiple raid and then append one to the other. People have to use dm-linear in this case now, however, they will prefer to minimize the number of involved modules. Fixes: 849d18e27be9 ("md: Remove deprecated CONFIG_MD_LINEAR") Cc: stable@vger.kernel.org Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Coly Li <colyli@kernel.org> Acked-by: Mike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/r/20250102112841.1227111-1-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-08md/md-bitmap: Synchronize bitmap_get_stats() with bitmap lifetimeYu Kuai2-1/+9
commit 8d28d0ddb986f56920ac97ae704cc3340a699a30 upstream. After commit ec6bb299c7c3 ("md/md-bitmap: add 'sync_size' into struct md_bitmap_stats"), following panic is reported: Oops: general protection fault, probably for non-canonical address RIP: 0010:bitmap_get_stats+0x2b/0xa0 Call Trace: <TASK> md_seq_show+0x2d2/0x5b0 seq_read_iter+0x2b9/0x470 seq_read+0x12f/0x180 proc_reg_read+0x57/0xb0 vfs_read+0xf6/0x380 ksys_read+0x6c/0xf0 do_syscall_64+0x82/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Root cause is that bitmap_get_stats() can be called at anytime if mddev is still there, even if bitmap is destroyed, or not fully initialized. Deferenceing bitmap in this case can crash the kernel. Meanwhile, the above commit start to deferencing bitmap->storage, make the problem easier to trigger. Fix the problem by protecting bitmap_get_stats() with bitmap_info.mutex. Cc: stable@vger.kernel.org # v6.12+ Fixes: 32a7627cf3a3 ("[PATCH] md: optimised resync using Bitmap based intent logging") Reported-and-tested-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Closes: https://lore.kernel.org/linux-raid/ca3a91a2-50ae-4f68-b317-abd9889f3907@oracle.com/T/#m6e5086c95201135e4941fe38f9efa76daf9666c5 Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20250124092055.4050195-1-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-08md/md-bitmap: move bitmap_{start, end}write to md upper layerYu Kuai7-56/+37
commit cd5fc653381811f1e0ba65f5d169918cab61476f upstream. There are two BUG reports that raid5 will hang at bitmap_startwrite([1],[2]), root cause is that bitmap start write and end write is unbalanced, it's not quite clear where, and while reviewing raid5 code, it's found that bitmap operations can be optimized. For example, for a 4 disks raid5, with chunksize=8k, if user issue a IO (0 + 48k) to the array: ┌────────────────────────────────────────────────────────────┐ │chunk 0 │ │ ┌────────────┬─────────────┬─────────────┬────────────┼ │ sh0 │A0: 0 + 4k │A1: 8k + 4k │A2: 16k + 4k │A3: P │ │ ┼────────────┼─────────────┼─────────────┼────────────┼ │ sh1 │B0: 4k + 4k │B1: 12k + 4k │B2: 20k + 4k │B3: P │ ┼──────┴────────────┴─────────────┴─────────────┴────────────┼ │chunk 1 │ │ ┌────────────┬─────────────┬─────────────┬────────────┤ │ sh2 │C0: 24k + 4k│C1: 32k + 4k │C2: P │C3: 40k + 4k│ │ ┼────────────┼─────────────┼─────────────┼────────────┼ │ sh3 │D0: 28k + 4k│D1: 36k + 4k │D2: P │D3: 44k + 4k│ └──────┴────────────┴─────────────┴─────────────┴────────────┘ Before this patch, 4 stripe head will be used, and each sh will attach bio for 3 disks, and each attached bio will trigger bitmap_startwrite() once, which means total 12 times. - 3 times (0 + 4k), for (A0, A1 and A2) - 3 times (4 + 4k), for (B0, B1 and B2) - 3 times (8 + 4k), for (C0, C1 and C3) - 3 times (12 + 4k), for (D0, D1 and D3) After this patch, md upper layer will calculate that IO range (0 + 48k) is corresponding to the bitmap (0 + 16k), and call bitmap_startwrite() just once. Noted that this patch will align bitmap ranges to the chunks, for example, if user issue a IO (0 + 4k) to array: - Before this patch, 1 time (0 + 4k), for A0; - After this patch, 1 time (0 + 8k) for chunk 0; Usually, one bitmap bit will represent more than one disk chunk, and this doesn't have any difference. And even if user really created a array that one chunk contain multiple bits, the overhead is that more data will be recovered after power failure. Also remove STRIPE_BITMAP_PENDING since it's not used anymore. [1] https://lore.kernel.org/all/CAJpMwyjmHQLvm6zg1cmQErttNNQPDAAXPKM3xgTjMhbfts986Q@mail.gmail.com/ [2] https://lore.kernel.org/all/ADF7D720-5764-4AF3-B68E-1845988737AA@flyingcircus.io/ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20250109015145.158868-6-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Yu Kuai <yukuai1@huaweicloud.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-08md/raid5: implement pers->bitmap_sector()Yu Kuai1-0/+51
commit 9c89f604476cf15c31fbbdb043cff7fbf1dbe0cb upstream. Bitmap is used for the whole array for raid1/raid10, hence IO for the array can be used directly for bitmap. However, bitmap is used for underlying disks for raid5, hence IO for the array can't be used directly for bitmap. Implement pers->bitmap_sector() for raid5 to convert IO ranges from the array to the underlying disks. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20250109015145.158868-5-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Yu Kuai <yukuai1@huaweicloud.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-08md: add a new callback pers->bitmap_sector()Yu Kuai1-0/+3
commit 0c984a283a3ea3f10bebecd6c57c1d41b2e4f518 upstream. This callback will be used in raid5 to convert io ranges from array to bitmap. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Link: https://lore.kernel.org/r/20250109015145.158868-4-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Yu Kuai <yukuai1@huaweicloud.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-08md/md-bitmap: remove the last parameter for bimtap_ops->endwrite()Yu Kuai9-70/+21
commit 4f0e7d0e03b7b80af84759a9e7cfb0f81ac4adae upstream. For the case that IO failed for one rdev, the bit will be mark as NEEDED in following cases: 1) If badblocks is set and rdev is not faulty; 2) If rdev is faulty; Case 1) is useless because synchronize data to badblocks make no sense. Case 2) can be replaced with mddev->degraded. Also remove R1BIO_Degraded, R10BIO_Degraded and STRIPE_DEGRADED since case 2) no longer use them. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20250109015145.158868-3-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Yu Kuai <yukuai1@huaweicloud.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-08md/md-bitmap: factor behind write counters out from bitmap_{start/end}write()Yu Kuai6-40/+56
commit 08c50142a128dcb2d7060aa3b4c5db8837f7a46a upstream. behind_write is only used in raid1, prepare to refactor bitmap_{start/end}write(), there are no functional changes. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Link: https://lore.kernel.org/r/20250109015145.158868-2-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Yu Kuai <yukuai1@huaweicloud.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-17dm-verity FEC: Fix RS FEC repair for roots unaligned to block size (take 2)Milan Broz1-14/+26
commit 6df90c02bae468a3a6110bafbc659884d0c4966c upstream. This patch fixes an issue that was fixed in the commit df7b59ba9245 ("dm verity: fix FEC for RS roots unaligned to block size") but later broken again in the commit 8ca7cab82bda ("dm verity fec: fix misaligned RS roots IO") If the Reed-Solomon roots setting spans multiple blocks, the code does not use proper parity bytes and randomly fails to repair even trivial errors. This bug cannot happen if the sector size is multiple of RS roots setting (Android case with roots 2). The previous solution was to find a dm-bufio block size that is multiple of the device sector size and roots size. Unfortunately, the optimization in commit 8ca7cab82bda ("dm verity fec: fix misaligned RS roots IO") is incorrect and uses data block size for some roots (for example, it uses 4096 block size for roots = 20). This patch uses a different approach: - It always uses a configured data block size for dm-bufio to avoid possible misaligned IOs. - and it caches the processed parity bytes, so it can join it if it spans two blocks. As the RS calculation is called only if an error is detected and the process is computationally intensive, copying a few more bytes should not introduce performance issues. The issue was reported to cryptsetup with trivial reproducer https://gitlab.com/cryptsetup/cryptsetup/-/issues/923 Reproducer (with roots=20): # create verity device with RS FEC dd if=/dev/urandom of=data.img bs=4096 count=8 status=none veritysetup format data.img hash.img --fec-device=fec.img --fec-roots=20 | \ awk '/^Root hash/{ print $3 }' >roothash # create an erasure that should always be repairable with this roots setting dd if=/dev/zero of=data.img conv=notrunc bs=1 count=4 seek=4 status=none # try to read it through dm-verity veritysetup open data.img test hash.img --fec-device=fec.img --fec-roots=20 $(cat roothash) dd if=/dev/mapper/test of=/dev/null bs=4096 status=noxfer Even now the log says it cannot repair it: : verity-fec: 7:1: FEC 0: failed to correct: -74 : device-mapper: verity: 7:1: data block 0 is corrupted ... With this fix, errors are properly repaired. : verity-fec: 7:1: FEC 0: corrected 4 errors Signed-off-by: Milan Broz <gmazyland@gmail.com> Fixes: 8ca7cab82bda ("dm verity fec: fix misaligned RS roots IO") Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-17dm-ebs: don't set the flag DM_TARGET_PASSES_INTEGRITYMikulas Patocka1-1/+1
commit 47f33c27fc9565fb0bc7dfb76be08d445cd3d236 upstream. dm-ebs uses dm-bufio to process requests that are not aligned on logical sector size. dm-bufio doesn't support passing integrity data (and it is unclear how should it do it), so we shouldn't set the DM_TARGET_PASSES_INTEGRITY flag. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Fixes: d3c7b35c20d6 ("dm: add emulated block size target") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-17dm thin: make get_first_thin use rcu-safe list first functionKrister Johansen1-3/+2
commit 80f130bfad1dab93b95683fc39b87235682b8f72 upstream. The documentation in rculist.h explains the absence of list_empty_rcu() and cautions programmers against relying on a list_empty() -> list_first() sequence in RCU safe code. This is because each of these functions performs its own READ_ONCE() of the list head. This can lead to a situation where the list_empty() sees a valid list entry, but the subsequent list_first() sees a different view of list head state after a modification. In the case of dm-thin, this author had a production box crash from a GP fault in the process_deferred_bios path. This function saw a valid list head in get_first_thin() but when it subsequently dereferenced that and turned it into a thin_c, it got the inside of the struct pool, since the list was now empty and referring to itself. The kernel on which this occurred printed both a warning about a refcount_t being saturated, and a UBSAN error for an out-of-bounds cpuid access in the queued spinlock, prior to the fault itself. When the resulting kdump was examined, it was possible to see another thread patiently waiting in thin_dtr's synchronize_rcu. The thin_dtr call managed to pull the thin_c out of the active thins list (and have it be the last entry in the active_thins list) at just the wrong moment which lead to this crash. Fortunately, the fix here is straight forward. Switch get_first_thin() function to use list_first_or_null_rcu() which performs just a single READ_ONCE() and returns NULL if the list is already empty. This was run against the devicemapper test suite's thin-provisioning suites for delete and suspend and no regressions were observed. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Fixes: b10ebd34ccca ("dm thin: fix rcu_read_lock being held in code that can sleep") Cc: stable@vger.kernel.org Acked-by: Ming-Hung Tsai <mtsai@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-17dm array: fix cursor index when skipping across block boundariesMing-Hung Tsai1-0/+1
[ Upstream commit 0bb1968da2737ba68fd63857d1af2b301a18d3bf ] dm_array_cursor_skip() seeks to the target position by loading array blocks iteratively until the specified number of entries to skip is reached. When seeking across block boundaries, it uses dm_array_cursor_next() to step into the next block. dm_array_cursor_skip() must first move the cursor index to the end of the current block; otherwise, the cursor position could incorrectly remain in the same block, causing the actual number of skipped entries to be much smaller than expected. This bug affects cache resizing in v2 metadata and could lead to data loss if the fast device is shrunk during the first-time resume. For example: 1. create a cache metadata consists of 32768 blocks, with a dirty block assigned to the second bitmap block. cache_restore v1.0 is required. cat <<EOF >> cmeta.xml <superblock uuid="" block_size="64" nr_cache_blocks="32768" \ policy="smq" hint_width="4"> <mappings> <mapping cache_block="32767" origin_block="0" dirty="true"/> </mappings> </superblock> EOF dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" cache_restore -i cmeta.xml -o /dev/mapper/cmeta --metadata-version=2 2. bring up the cache while attempt to discard all the blocks belonging to the second bitmap block (block# 32576 to 32767). The last command is expected to fail, but it actually succeeds. dmsetup create cdata --table "0 2084864 linear /dev/sdc 8192" dmsetup create corig --table "0 65536 linear /dev/sdc 2105344" dmsetup create cache --table "0 65536 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 64 2 metadata2 writeback smq \ 2 migration_threshold 0" In addition to the reproducer described above, this fix can be verified using the "array_cursor/skip" tests in dm-unit: dm-unit run /pdata/array_cursor/skip/ --kernel-dir <KERNEL_DIR> Signed-off-by: Ming-Hung Tsai <mtsai@redhat.com> Fixes: 9b696229aa7d ("dm persistent data: add cursor skip functions to the cursor APIs") Reviewed-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-17dm array: fix unreleased btree blocks on closing a faulty array cursorMing-Hung Tsai1-3/+3
[ Upstream commit 626f128ee9c4133b1cfce4be2b34a1508949370e ] The cached block pointer in dm_array_cursor might be NULL if it reaches an unreadable array block, or the array is empty. Therefore, dm_array_cursor_end() should call dm_btree_cursor_end() unconditionally, to prevent leaving unreleased btree blocks. This fix can be verified using the "array_cursor/iterate/empty" test in dm-unit: dm-unit run /pdata/array_cursor/iterate/empty --kernel-dir <KERNEL_DIR> Signed-off-by: Ming-Hung Tsai <mtsai@redhat.com> Fixes: fdd1315aa5f0 ("dm array: introduce cursor api") Reviewed-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-17dm array: fix releasing a faulty array block twice in dm_array_cursor_endMing-Hung Tsai1-4/+8
[ Upstream commit f2893c0804d86230ffb8f1c8703fdbb18648abc8 ] When dm_bm_read_lock() fails due to locking or checksum errors, it releases the faulty block implicitly while leaving an invalid output pointer behind. The caller of dm_bm_read_lock() should not operate on this invalid dm_block pointer, or it will lead to undefined result. For example, the dm_array_cursor incorrectly caches the invalid pointer on reading a faulty array block, causing a double release in dm_array_cursor_end(), then hitting the BUG_ON in dm-bufio cache_put(). Reproduce steps: 1. initialize a cache device dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" dmsetup create cdata --table "0 65536 linear /dev/sdc 8192" dmsetup create corig --table "0 524288 linear /dev/sdc $262144" dd if=/dev/zero of=/dev/mapper/cmeta bs=4k count=1 dmsetup create cache --table "0 524288 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0" 2. wipe the second array block offline dmsteup remove cache cmeta cdata corig mapping_root=$(dd if=/dev/sdc bs=1c count=8 skip=192 \ 2>/dev/null | hexdump -e '1/8 "%u\n"') ablock=$(dd if=/dev/sdc bs=1c count=8 skip=$((4096*mapping_root+2056)) \ 2>/dev/null | hexdump -e '1/8 "%u\n"') dd if=/dev/zero of=/dev/sdc bs=4k count=1 seek=$ablock 3. try reopen the cache device dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" dmsetup create cdata --table "0 65536 linear /dev/sdc 8192" dmsetup create corig --table "0 524288 linear /dev/sdc $262144" dmsetup create cache --table "0 524288 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0" Kernel logs: (snip) device-mapper: array: array_block_check failed: blocknr 0 != wanted 10 device-mapper: block manager: array validator check failed for block 10 device-mapper: array: get_ablock failed device-mapper: cache metadata: dm_array_cursor_next for mapping failed ------------[ cut here ]------------ kernel BUG at drivers/md/dm-bufio.c:638! Fix by setting the cached block pointer to NULL on errors. In addition to the reproducer described above, this fix can be verified using the "array_cursor/damaged" test in dm-unit: dm-unit run /pdata/array_cursor/damaged --kernel-dir <KERNEL_DIR> Signed-off-by: Ming-Hung Tsai <mtsai@redhat.com> Fixes: fdd1315aa5f0 ("dm array: introduce cursor api") Reviewed-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-19dm: Fix dm-zoned-reclaim zone write pointer alignmentDamien Le Moal1-3/+3
commit b76b840fd93374240b59825f1ab8e2f5c9907acb upstream. The zone reclaim processing of the dm-zoned device mapper uses blkdev_issue_zeroout() to align the write pointer of a zone being used for reclaiming another zone, to write the valid data blocks from the zone being reclaimed at the same position relative to the zone start in the reclaim target zone. The first call to blkdev_issue_zeroout() will try to use hardware offload using a REQ_OP_WRITE_ZEROES operation if the device reports a non-zero max_write_zeroes_sectors queue limit. If this operation fails because of the lack of hardware support, blkdev_issue_zeroout() falls back to using a regular write operation with the zero-page as buffer. Currently, such REQ_OP_WRITE_ZEROES failure is automatically handled by the block layer zone write plugging code which will execute a report zones operation to ensure that the write pointer of the target zone of the failed operation has not changed and to "rewind" the zone write pointer offset of the target zone as it was advanced when the write zero operation was submitted. So the REQ_OP_WRITE_ZEROES failure does not cause any issue and blkdev_issue_zeroout() works as expected. However, since the automatic recovery of zone write pointers by the zone write plugging code can potentially cause deadlocks with queue freeze operations, a different recovery must be implemented in preparation for the removal of zone write plugging report zones based recovery. Do this by introducing the new function blk_zone_issue_zeroout(). This function first calls blkdev_issue_zeroout() with the flag BLKDEV_ZERO_NOFALLBACK to intercept failures on the first execution which attempt to use the device hardware offload with the REQ_OP_WRITE_ZEROES operation. If this attempt fails, a report zone operation is issued to restore the zone write pointer offset of the target zone to the correct position and blkdev_issue_zeroout() is called again without the BLKDEV_ZERO_NOFALLBACK flag. The report zones operation performing this recovery is implemented using the helper function disk_zone_sync_wp_offset() which calls the gendisk report_zones file operation with the callback disk_report_zones_cb(). This callback updates the target write pointer offset of the target zone using the new function disk_zone_wplug_sync_wp_offset(). dmz_reclaim_align_wp() is modified to change its call to blkdev_issue_zeroout() to a call to blk_zone_issue_zeroout() without any other change needed as the two functions are functionnally equivalent. Fixes: dd291d77cc90 ("block: Introduce zone write plugging") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20241209122357.47838-4-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14bcache: revert replacing IS_ERR_OR_NULL with IS_ERR againLiequan Che1-1/+1
commit b2e382ae12a63560fca35050498e19e760adf8c0 upstream. Commit 028ddcac477b ("bcache: Remove unnecessary NULL point check in node allocations") leads a NULL pointer deference in cache_set_flush(). 1721 if (!IS_ERR_OR_NULL(c->root)) 1722 list_add(&c->root->list, &c->btree_cache); >From the above code in cache_set_flush(), if previous registration code fails before allocating c->root, it is possible c->root is NULL as what it is initialized. __bch_btree_node_alloc() never returns NULL but c->root is possible to be NULL at above line 1721. This patch replaces IS_ERR() by IS_ERR_OR_NULL() to fix this. Fixes: 028ddcac477b ("bcache: Remove unnecessary NULL point check in node allocations") Signed-off-by: Liequan Che <cheliequan@inspur.com> Cc: stable@vger.kernel.org Cc: Zheng Wang <zyytlz.wz@163.com> Reviewed-by: Mingzhe Zou <mingzhe.zou@easystack.cn> Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20241202115638.28957-1-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09dm thin: Add missing destroy_work_on_stack()Yuan Can1-0/+1
commit e74fa2447bf9ed03d085b6d91f0256cc1b53f1a8 upstream. This commit add missed destroy_work_on_stack() operations for pw->worker in pool_work_wait(). Fixes: e7a3e871d895 ("dm thin: cleanup noflush_work to use a proper completion") Cc: stable@vger.kernel.org Signed-off-by: Yuan Can <yuancan@huawei.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09dm: Fix typo in error messageSsuhung Yeh1-1/+1
commit 2deb70d3e66d538404d9e71bff236e6d260da66e upstream. Remove the redundant "i" at the beginning of the error message. This "i" came from commit 1c1318866928 ("dm: prefer '"%s...", __func__'"), the "i" is accidentally left. Signed-off-by: Ssuhung Yeh <ssuhung@gmail.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: 1c1318866928 ("dm: prefer '"%s...", __func__'") Cc: stable@vger.kernel.org # v6.3+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09md/md-bitmap: Add missing destroy_work_on_stack()Yuan Can1-0/+1
commit 6012169e8aae9c0eda38bbedcd7a1540a81220ae upstream. This commit add missed destroy_work_on_stack() operations for unplug_work.work in bitmap_unplug_async(). Fixes: a022325ab970 ("md/md-bitmap: add a new helper to unplug bitmap asynchrously") Cc: stable@vger.kernel.org Signed-off-by: Yuan Can <yuancan@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20241105130105.127336-1-yuancan@huawei.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09md/raid5: Wait sync io to finish before changing group cntXiao Ni1-0/+4
commit fa1944bbe6220eb929e2c02e5e8706b908565711 upstream. One customer reports a bug: raid5 is hung when changing thread cnt while resync is running. The stripes are all in conf->handle_list and new threads can't handle them. Commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()") removes pers->quiesce from mddev_suspend/resume. Before this patch, mddev_suspend needs to wait for all ios including sync io to finish. Now it's used to only wait normal io. Fix this by calling raid5_quiesce from raid5_store_group_thread_cnt directly to wait all sync requests to finish before changing the group cnt. Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()") Cc: stable@vger.kernel.org Signed-off-by: Xiao Ni <xni@redhat.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20241106095124.74577-1-xni@redhat.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-11dm-cache: fix warnings about duplicate slab cachesMikulas Patocka3-24/+34
The commit 4c39529663b9 adds a warning about duplicate cache names if CONFIG_DEBUG_VM is selected. These warnings are triggered by the dm-cache code. The dm-cache code allocates a slab cache for each device. This commit changes it to allocate just one slab cache in the module init function. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: 4c39529663b9 ("slab: Warn on duplicate cache names when DEBUG_VM=y")
2024-11-11dm-bufio: fix warnings about duplicate slab cachesMikulas Patocka1-4/+8
The commit 4c39529663b9 adds a warning about duplicate cache names if CONFIG_DEBUG_VM is selected. These warnings are triggered by the dm-bufio code. The dm-bufio code allocates a slab cache with each client. It is not possible to preallocate the caches in the module init function because the size of auxiliary per-buffer data is not known at this point. So, this commit changes dm-bufio so that it appends a unique atomic value to the cache name, to avoid the warnings. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: 4c39529663b9 ("slab: Warn on duplicate cache names when DEBUG_VM=y")
2024-11-06Merge tag 'for-6.12/dm-fixes-2' of ↵Linus Torvalds5-35/+42
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mikulas Patocka: - fix memory safety bugs in dm-cache - fix restart/panic logic in dm-verity - fix 32-bit unsigned integer overflow in dm-unstriped - fix a device mapper crash if blk_alloc_disk fails * tag 'for-6.12/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm cache: fix potential out-of-bounds access on the first resume dm cache: optimize dirty bit checking with find_next_bit when resizing dm cache: fix out-of-bounds access to the dirty bitset when resizing dm cache: fix flushing uninitialized delayed_work on cache_ctr error dm cache: correct the number of origin blocks to match the target length dm-verity: don't crash if panic_on_corruption is not selected dm-unstriped: cast an operand to sector_t to prevent potential uint32_t overflow dm: fix a crash if blk_alloc_disk fails
2024-11-04dm cache: fix potential out-of-bounds access on the first resumeMing-Hung Tsai1-21/+16
Out-of-bounds access occurs if the fast device is expanded unexpectedly before the first-time resume of the cache table. This happens because expanding the fast device requires reloading the cache table for cache_create to allocate new in-core data structures that fit the new size, and the check in cache_preresume is not performed during the first resume, leading to the issue. Reproduce steps: 1. prepare component devices: dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" dmsetup create cdata --table "0 65536 linear /dev/sdc 8192" dmsetup create corig --table "0 524288 linear /dev/sdc 262144" dd if=/dev/zero of=/dev/mapper/cmeta bs=4k count=1 oflag=direct 2. load a cache table of 512 cache blocks, and deliberately expand the fast device before resuming the cache, making the in-core data structures inadequate. dmsetup create cache --notable dmsetup reload cache --table "0 524288 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0" dmsetup reload cdata --table "0 131072 linear /dev/sdc 8192" dmsetup resume cdata dmsetup resume cache 3. suspend the cache to write out the in-core dirty bitset and hint array, leading to out-of-bounds access to the dirty bitset at offset 0x40: dmsetup suspend cache KASAN reports: BUG: KASAN: vmalloc-out-of-bounds in is_dirty_callback+0x2b/0x80 Read of size 8 at addr ffffc90000085040 by task dmsetup/90 (...snip...) The buggy address belongs to the virtual mapping at [ffffc90000085000, ffffc90000087000) created by: cache_ctr+0x176a/0x35f0 (...snip...) Memory state around the buggy address: ffffc90000084f00: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ffffc90000084f80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 >ffffc90000085000: 00 00 00 00 00 00 00 00 f8 f8 f8 f8 f8 f8 f8 f8 ^ ffffc90000085080: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ffffc90000085100: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 Fix by checking the size change on the first resume. Signed-off-by: Ming-Hung Tsai <mtsai@redhat.com> Fixes: f494a9c6b1b6 ("dm cache: cache shrinking support") Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Joe Thornber <thornber@redhat.com>
2024-11-04dm cache: optimize dirty bit checking with find_next_bit when resizingMing-Hung Tsai1-8/+8
When shrinking the fast device, dm-cache iteratively searches for a dirty bit among the cache blocks to be dropped, which is less efficient. Use find_next_bit instead, as it is twice as fast as the iterative approach with test_bit. Signed-off-by: Ming-Hung Tsai <mtsai@redhat.com> Fixes: f494a9c6b1b6 ("dm cache: cache shrinking support") Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Joe Thornber <thornber@redhat.com>
2024-11-04dm cache: fix out-of-bounds access to the dirty bitset when resizingMing-Hung Tsai1-1/+1
dm-cache checks the dirty bits of the cache blocks to be dropped when shrinking the fast device, but an index bug in bitset iteration causes out-of-bounds access. Reproduce steps: 1. create a cache device of 1024 cache blocks (128 bytes dirty bitset) dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" dmsetup create cdata --table "0 131072 linear /dev/sdc 8192" dmsetup create corig --table "0 524288 linear /dev/sdc 262144" dd if=/dev/zero of=/dev/mapper/cmeta bs=4k count=1 oflag=direct dmsetup create cache --table "0 524288 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0" 2. shrink the fast device to 512 cache blocks, triggering out-of-bounds access to the dirty bitset (offset 0x80) dmsetup suspend cache dmsetup reload cdata --table "0 65536 linear /dev/sdc 8192" dmsetup resume cdata dmsetup resume cache KASAN reports: BUG: KASAN: vmalloc-out-of-bounds in cache_preresume+0x269/0x7b0 Read of size 8 at addr ffffc900000f3080 by task dmsetup/131 (...snip...) The buggy address belongs to the virtual mapping at [ffffc900000f3000, ffffc900000f5000) created by: cache_ctr+0x176a/0x35f0 (...snip...) Memory state around the buggy address: ffffc900000f2f80: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ffffc900000f3000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffffc900000f3080: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ^ ffffc900000f3100: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ffffc900000f3180: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 Fix by making the index post-incremented. Signed-off-by: Ming-Hung Tsai <mtsai@redhat.com> Fixes: f494a9c6b1b6 ("dm cache: cache shrinking support") Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Joe Thornber <thornber@redhat.com>
2024-11-04dm cache: fix flushing uninitialized delayed_work on cache_ctr errorMing-Hung Tsai1-9/+15
An unexpected WARN_ON from flush_work() may occur when cache creation fails, caused by destroying the uninitialized delayed_work waker in the error path of cache_create(). For example, the warning appears on the superblock checksum error. Reproduce steps: dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" dmsetup create cdata --table "0 65536 linear /dev/sdc 8192" dmsetup create corig --table "0 524288 linear /dev/sdc 262144" dd if=/dev/urandom of=/dev/mapper/cmeta bs=4k count=1 oflag=direct dmsetup create cache --table "0 524288 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0" Kernel logs: (snip) WARNING: CPU: 0 PID: 84 at kernel/workqueue.c:4178 __flush_work+0x5d4/0x890 Fix by pulling out the cancel_delayed_work_sync() from the constructor's error path. This patch doesn't affect the use-after-free fix for concurrent dm_resume and dm_destroy (commit 6a459d8edbdb ("dm cache: Fix UAF in destroy()")) as cache_dtr is not changed. Signed-off-by: Ming-Hung Tsai <mtsai@redhat.com> Fixes: 6a459d8edbdb ("dm cache: Fix UAF in destroy()") Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Joe Thornber <thornber@redhat.com>
2024-11-04dm cache: correct the number of origin blocks to match the target lengthMing-Hung Tsai1-4/+4
When creating a cache device, the actual size of the cache origin might be greater than the specified cache target length. In such case, the number of origin blocks should match the cache target length, not the full size of the origin device, since access beyond the cache target is not possible. This issue occurs when reducing the origin device size using lvm, as lvreduce preloads the new cache table before resuming the cache origin, which can result in incorrect sizes for the discard bitset and smq hotspot blocks. Reproduce steps: 1. create a cache device consists of 4096 origin blocks dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" dmsetup create cdata --table "0 65536 linear /dev/sdc 8192" dmsetup create corig --table "0 524288 linear /dev/sdc 262144" dd if=/dev/zero of=/dev/mapper/cmeta bs=4k count=1 oflag=direct dmsetup create cache --table "0 524288 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0" 2. reduce the cache origin to 2048 oblocks, in lvreduce's approach dmsetup reload corig --table "0 262144 linear /dev/sdc 262144" dmsetup reload cache --table "0 262144 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0" dmsetup suspend cache dmsetup suspend corig dmsetup suspend cdata dmsetup suspend cmeta dmsetup resume corig dmsetup resume cdata dmsetup resume cmeta dmsetup resume cache 3. shutdown the cache, and check the number of discard blocks in superblock. The value is expected to be 2048, but actually is 4096. dmsetup remove cache corig cdata cmeta dd if=/dev/sdc bs=1c count=8 skip=224 2>/dev/null | hexdump -e '1/8 "%u\n"' Fix by correcting the origin_blocks initialization in cache_create and removing the unused origin_sectors from struct cache_args accordingly. Signed-off-by: Ming-Hung Tsai <mtsai@redhat.com> Fixes: c6b4fcbad044 ("dm: add cache target") Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Joe Thornber <thornber@redhat.com>
2024-11-04dm-verity: don't crash if panic_on_corruption is not selectedMikulas Patocka2-3/+7
If the user sets panic_on_error and doesn't set panic_on_corruption, dm-verity should not panic on data mismatch. But, currently it panics, because it treats data mismatch as I/O error. This commit fixes the logic so that if there is data mismatch and panic_on_corruption or restart_on_corruption is not selected, the system won't restart or panic. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Fixes: f811b83879fb ("dm-verity: introduce the options restart_on_error and panic_on_error")
2024-11-04dm-unstriped: cast an operand to sector_t to prevent potential uint32_t overflowZichen Xie1-2/+2
This was found by a static analyzer. There may be a potential integer overflow issue in unstripe_ctr(). uc->unstripe_offset and uc->unstripe_width are defined as "sector_t"(uint64_t), while uc->unstripe, uc->chunk_size and uc->stripes are all defined as "uint32_t". The result of the calculation will be limited to "uint32_t" without correct casting. So, we recommend adding an extra cast to prevent potential integer overflow. Fixes: 18a5bf270532 ("dm: add unstriped target") Signed-off-by: Zichen Xie <zichenxie0106@gmail.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org
2024-10-27Merge tag 'block-6.12-20241026' of git://git.kernel.dk/linuxLinus Torvalds2-3/+28
Pull block fixes from Jens Axboe: - Pull request for MD via Song fixing a few issues - Fix a wrong check in blk_rq_map_user_bvec(), causing IO errors on passthrough IO (Xinyu) * tag 'block-6.12-20241026' of git://git.kernel.dk/linux: block: fix sanity checks in blk_rq_map_user_bvec md/raid10: fix null ptr dereference in raid10_size() md: ensure child flush IO does not affect origin bio->bi_status
2024-10-17md/raid10: fix null ptr dereference in raid10_size()Yu Kuai1-2/+5
In raid10_run() if raid10_set_queue_limits() succeed, the return value is set to zero, and if following procedures failed raid10_run() will return zero while mddev->private is still NULL, causing null ptr dereference in raid10_size(). Fix the problem by only overwrite the return value if raid10_set_queue_limits() failed. Fixes: 3d8466ba68d4 ("md/raid10: use the atomic queue limit update APIs") Cc: stable@vger.kernel.org Reported-and-tested-by: ValdikSS <iam@valdikss.org.ru> Closes: https://lore.kernel.org/all/0dd96820-fe52-4841-bc58-dbf14d6bfcc8@valdikss.org.ru/ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241009014914.1682037-1-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org>
2024-10-17md: ensure child flush IO does not affect origin bio->bi_statusLi Nan1-1/+23
When a flush is issued to an RAID array, a child flush IO is created and issued for each member disk in the RAID array. Since commit b75197e86e6d ("md: Remove flush handling"), each child flush IO has been chained with the original bio. As a result, the failure of any child IO could modify the bi_status of the original bio, potentially impacting the upper-layer filesystem. Fix the issue by preventing child flush IO from altering the original bio->bi_status as before. However, this design introduces a known issue: in the event of a power failure, if a flush IO on a member disk fails, the upper layers may not be informed. This issue is not easy to fix and will not be addressed for the time being in this issue. Fixes: b75197e86e6d ("md: Remove flush handling") Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240919063048.2887579-1-linan666@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org>
2024-10-15dm: fix a crash if blk_alloc_disk failsMikulas Patocka1-1/+3
If blk_alloc_disk fails, the variable md->disk is set to an error value. cleanup_mapped_device will see that md->disk is non-NULL and it will attempt to access it, causing a crash on this statement "md->disk->private_data = NULL;". Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reported-by: Chenyuan Yang <chenyuan0y@gmail.com> Closes: https://marc.info/?l=dm-devel&m=172824125004329&w=2 Cc: stable@vger.kernel.org Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
2024-10-03Merge tag 'pull-work.unaligned' of ↵Linus Torvalds3-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull generic unaligned.h cleanups from Al Viro: "Get rid of architecture-specific <asm/unaligned.h> includes, replacing them with a single generic <linux/unaligned.h> header file. It's the second largest (after asm/io.h) class of asm/* includes, and all but two architectures actually end up using exact same file. Massage the remaining two (arc and parisc) to do the same and just move the thing to from asm-generic/unaligned.h to linux/unaligned.h" [ This is one of those things that we're better off doing outside the merge window, and would only cause extra conflict noise if it was in linux-next for the next release due to all the trivial #include line updates. Rip off the band-aid. - Linus ] * tag 'pull-work.unaligned' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: move asm/unaligned.h to linux/unaligned.h arc: get rid of private asm/unaligned.h parisc: get rid of private asm/unaligned.h
2024-10-03move asm/unaligned.h to linux/unaligned.hAl Viro3-3/+3
asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-10-02dm-verity: introduce the options restart_on_error and panic_on_errorMikulas Patocka2-1/+83
This patch introduces the options restart_on_error and panic_on_error on dm-verity. Previously, restarting on error was handled by the patch e6a3531dd542cb127c8de32ab1e54a48ae19962b, but Google engineers wanted to have a special option for it. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Suggested-by: Sami Tolvanen <samitolvanen@google.com> Suggested-by: Will Drewry <wad@chromium.org>
2024-10-02Revert: "dm-verity: restart or panic on an I/O error"Mikulas Patocka1-21/+2
This reverts commit e6a3531dd542cb127c8de32ab1e54a48ae19962b. The problem that the commit e6a3531dd542cb127c8de32ab1e54a48ae19962b fixes was reported as a security bug, but Google engineers working on Android and ChromeOS didn't want to change the default behavior, they want to get -EIO rather than restarting the system, so I am reverting that commit. Note also that calling machine_restart from the I/O handling code is potentially unsafe (the reboot notifiers may wait for the bio that triggered the restart), but Android uses the reboot notifiers to store the reboot reason into the PMU microcontroller, so machine_restart must be used. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Fixes: e6a3531dd542 ("dm-verity: restart or panic on an I/O error") Suggested-by: Sami Tolvanen <samitolvanen@google.com> Suggested-by: Will Drewry <wad@chromium.org>
2024-09-27Merge tag 'for-6.12/dm-changes' of ↵Linus Torvalds22-135/+445
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mikulas Patocka: - Misc VDO fixes - Remove unused declarations dm_get_rq_mapinfo() and dm_zone_map_bio() - Dm-delay: Improve kernel documentation - Dm-crypt: Allow to specify the integrity key size as an option - Dm-bufio: Remove pointless NULL check - Small code cleanups: Use ERR_CAST; remove unlikely() around IS_ERR; use __assign_bit - Dm-integrity: Fix gcc 5 warning; convert comma to semicolon; fix smatch warning - Dm-integrity: Support recalculation in the 'I' mode - Revert "dm: requeue IO if mapping table not yet available" - Dm-crypt: Small refactoring to make the code more readable - Dm-cache: Remove pointless error check - Dm: Fix spelling errors - Dm-verity: Restart or panic on an I/O error if restart or panic was requested - Dm-verity: Fallback to platform keyring also if key in trusted keyring is rejected * tag 'for-6.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (26 commits) dm verity: fallback to platform keyring also if key in trusted keyring is rejected dm-verity: restart or panic on an I/O error dm: fix spelling errors dm-cache: remove pointless error check dm vdo: handle unaligned discards correctly dm vdo indexer: Convert comma to semicolon dm-crypt: Use common error handling code in crypt_set_keyring_key() dm-crypt: Use up_read() together with key_put() only once in crypt_set_keyring_key() Revert "dm: requeue IO if mapping table not yet available" dm-integrity: check mac_size against HASH_MAX_DIGESTSIZE in sb_mac() dm-integrity: support recalculation in the 'I' mode dm integrity: Convert comma to semicolon dm integrity: fix gcc 5 warning dm: Make use of __assign_bit() API dm integrity: Remove extra unlikely helper dm: Convert to use ERR_CAST() dm bufio: Remove NULL check of list_entry() dm-crypt: Allow to specify the integrity key size as option dm: Remove unused declaration and empty definition "dm_zone_map_bio" dm delay: enhance kernel documentation ...
2024-09-26dm verity: fallback to platform keyring also if key in trusted keyring is ↵Luca Boccassi1-1/+1
rejected If enabled, we fallback to the platform keyring if the trusted keyring doesn't have the key used to sign the roothash. But if pkcs7_verify() rejects the key for other reasons, such as usage restrictions, we do not fallback. Do so. Follow-up for 6fce1f40e95182ebbfe1ee3096b8fc0b37903269 Suggested-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Luca Boccassi <bluca@debian.org> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2024-09-26dm-verity: restart or panic on an I/O errorMikulas Patocka1-2/+21
Maxim Suhanov reported that dm-verity doesn't crash if an I/O error happens. In theory, this could be used to subvert security, because an attacker can create sectors that return error with the Write Uncorrectable command. Some programs may misbehave if they have to deal with EIO. This commit fixes dm-verity, so that if "panic_on_corruption" or "restart_on_corruption" was specified and an I/O error happens, the machine will panic or restart. This commit also changes kernel_restart to emergency_restart - kernel_restart calls reboot notifiers and these reboot notifiers may wait for the bio that failed. emergency_restart doesn't call the notifiers. Reported-by: Maxim Suhanov <dfirblog@gmail.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org