kernel/linux.git/drivers/md/md.c, branch v6.19.11

md raid: fix hang when stopping arrays with metadata through dm-raid

2026-03-04T12:19:39+00:00

[ Upstream commit cefcb9297fbdb6d94b61787b4f8d84f55b741470 ] When using device-mapper's dm-raid target, stopping a RAID array can cause the system to hang under specific conditions. This occurs when: - A dm-raid managed device tree is suspended from top to bottom (the top-level RAID device is suspended first, followed by its underlying metadata and data devices) - The top-level RAID device is then removed Removing the top-level device triggers a hang in the following sequence: the dm-raid destructor calls md_stop(), which tries to flush the write-intent bitmap by writing to the metadata sub-devices. However, these devices are already suspended, making them unable to complete the write-intent operations and causing an indefinite block. Fix: - Prevent bitmap flushing when md_stop() is called from dm-raid destructor context and avoid a quiescing/unquescing cycle which could also cause I/O - Still allow write-intent bitmap flushing when called from dm-raid suspend context This ensures that RAID array teardown can complete successfully even when the underlying devices are in a suspended state. This second patch uses md_is_rdwr() to distinguish between suspend and destructor paths as elaborated on above. Link: https://lore.kernel.org/linux-raid/CAM23VxqYrwkhKEBeQrZeZwQudbiNey2_8B_SEOLqug=pXxaFrA@mail.gmail.com Signed-off-by: Heinz Mauelshagen Signed-off-by: Yu Kuai Signed-off-by: Sasha Levin

md: Fix forward incompatibility from configurable logical block size

2025-12-27T02:14:07+00:00

Commit 62ed1b582246 ("md: allow configuring logical block size") used reserved pad to add 'logical_block_size' to metadata. RAID rejects non-zero reserved pad, so arrays fail when rolling back to old kernels after booting new ones. Set 'logical_block_size' only for newly created arrays to support rollback to old kernels. Importantly new arrays still won't work on old kernels to prevent data loss issue from LBS changes. For arrays created on old kernels which confirmed not to rollback, configure LBS by echo current LBS (queue/logical_block_size) to md/logical_block_size. Fixes: 62ed1b582246 ("md: allow configuring logical block size") Reported-by: BugReports Closes: https://lore.kernel.org/linux-raid/825e532d-d1e1-44bb-5581-692b7c091796@huaweicloud.com/T/#t Signed-off-by: Li Nan Link: https://lore.kernel.org/linux-raid/20251226024221.724201-2-linan666@huaweicloud.com Signed-off-by: Yu Kuai

md: Fix logical_block_size configuration being overwritten

2025-12-27T02:13:40+00:00

In super_1_validate(), mddev->logical_block_size is directly overwritten with the value from metadata. This causes the previously configured lbs to be lost, making the configuration ineffective. Fix it. Fixes: 62ed1b582246 ("md: allow configuring logical block size") Signed-off-by: Li Nan Reviewed-by: Yu Kuai Reviewed-by: Xiao Ni Link: https://lore.kernel.org/linux-raid/20251226024221.724201-1-linan666@huaweicloud.com Signed-off-by: Yu Kuai

md: suspend array while updating raid_disks via sysfs

2025-12-27T01:54:50+00:00

In raid1_reshape(), freeze_array() is called before modifying the r1bio memory pool (conf->r1bio_pool) and conf->raid_disks, and unfreeze_array() is called after the update is completed. However, freeze_array() only waits until nr_sync_pending and (nr_pending - nr_queued) of all buckets reaches zero. When an I/O error occurs, nr_queued is increased and the corresponding r1bio is queued to either retry_list or bio_end_io_list. As a result, freeze_array() may unblock before these r1bios are released. This can lead to a situation where conf->raid_disks and the mempool have already been updated while queued r1bios, allocated with the old raid_disks value, are later released. Consequently, free_r1bio() may access memory out of bounds in put_all_bios() and release r1bios of the wrong size to the new mempool, potentially causing issues with the mempool as well. Since only normal I/O might increase nr_queued while an I/O error occurs, suspending the array avoids this issue. Note: Updating raid_disks via ioctl SET_ARRAY_INFO already suspends the array. Therefore, we suspend the array when updating raid_disks via sysfs to avoid this issue too. Signed-off-by: FengWei Shih Link: https://lore.kernel.org/linux-raid/20251226101816.4506-1-dannyshih@synology.com Signed-off-by: Yu Kuai

md: Fix static checker warning in analyze_sbs

2025-12-25T07:45:21+00:00

The following warn is reported: drivers/md/md.c:3912 analyze_sbs() warn: iterator 'i' not incremented Fixes: d8730f0cf4ef ("md: Remove deprecated CONFIG_MD_MULTIPATH") Reported-by: Dan Carpenter Closes: https://lore.kernel.org/linux-raid/7e2e95ce-3740-09d8-a561-af6bfb767f18@huaweicloud.com/T/#t Signed-off-by: Li Nan Link: https://lore.kernel.org/linux-raid/20251215124412.4015572-1-linan666@huaweicloud.com Signed-off-by: Yu Kuai

md: remove legacy 1s delay in md_notify_reboot

2025-11-30T01:42:28+00:00

During system shutdown, the md driver registered notifier function (md_notify_reboot) currently imposes a hardcoded one-second delay. This delay was introduced approximately 23 years ago and was likely necessary for the hardware generation of that time. Proposing this patch to make sure there are no known devices that need this delay. Link: https://lore.kernel.org/linux-raid/20251121191422.2758555-1-tarunsahu@google.com Signed-off-by: Tarun Sahu Reviewed-by: Yu Kuai Signed-off-by: Yu Kuai

md: warn about updating super block failure

2025-11-30T01:38:22+00:00

Many personalities will handle IO error from daemon thread(like raid1d, raid10d, raid5d), and sb will require to be clean before hanlding these failed IO. However update sb can fail, for example array is broken by IO failure, or user config sysfs api array_state. This patch adds warning if updating sb failed first, in case this will be related to IO hang. Link: https://lore.kernel.org/linux-raid/20251117085557.770572-2-yukuai@fnnas.com Signed-off-by: Yu Kuai Reviewed-by: Li Nan

md: allow configuring logical block size

2025-11-11T03:20:15+00:00

Previously, raid array used the maximum logical block size (LBS) of all member disks. Adding a larger LBS disk at runtime could unexpectedly increase RAID's LBS, risking corruption of existing partitions. This can be reproduced by: ``` # LBS of sd[de] is 512 bytes, sdf is 4096 bytes. mdadm -CRq /dev/md0 -l1 -n3 /dev/sd[de] missing --assume-clean # LBS is 512 cat /sys/block/md0/queue/logical_block_size # create partition md0p1 parted -s /dev/md0 mklabel gpt mkpart primary 1MiB 100% lsblk | grep md0p1 # LBS becomes 4096 after adding sdf mdadm --add -q /dev/md0 /dev/sdf cat /sys/block/md0/queue/logical_block_size # partition lost partprobe /dev/md0 lsblk | grep md0p1 ``` Simply restricting larger-LBS disks is inflexible. In some scenarios, only disks with 512 bytes LBS are available currently, but later, disks with 4KB LBS may be added to the array. Making LBS configurable is the best way to solve this scenario. After this patch, the raid will: - store LBS in disk metadata - add a read-write sysfs 'mdX/logical_block_size' Future mdadm should support setting LBS via metadata field during RAID creation and the new sysfs. Though the kernel allows runtime LBS changes, users should avoid modifying it after creating partitions or filesystems to prevent compatibility issues. Only 1.x metadata supports configurable LBS. 0.90 metadata inits all fields to default values at auto-detect. Supporting 0.90 would require more extensive changes and no such use case has been observed. Note that many RAID paths rely on PAGE_SIZE alignment, including for metadata I/O. A larger LBS than PAGE_SIZE will result in metadata read/write failures. So this config should be prevented. Link: https://lore.kernel.org/linux-raid/20251103125757.1405796-6-linan666@huaweicloud.com Signed-off-by: Li Nan Reviewed-by: Xiao Ni Signed-off-by: Yu Kuai

md: add check_new_feature module parameter

2025-11-11T03:19:54+00:00

Raid checks if pad3 is zero when loading superblock from disk. Arrays created with new features may fail to assemble on old kernels as pad3 is used. Add module parameter check_new_feature to bypass this check. Link: https://lore.kernel.org/linux-raid/20251103125757.1405796-5-linan666@huaweicloud.com Signed-off-by: Li Nan Reviewed-by: Xiao Ni Signed-off-by: Yu Kuai

md: init bioset in mddev_init

2025-11-11T03:19:10+00:00

IO operations may be needed before md_run(), such as updating metadata after writing sysfs. Without bioset, this triggers a NULL pointer dereference as below: BUG: kernel NULL pointer dereference, address: 0000000000000020 Call Trace: md_update_sb+0x658/0xe00 new_level_store+0xc5/0x120 md_attr_store+0xc9/0x1e0 sysfs_kf_write+0x6f/0xa0 kernfs_fop_write_iter+0x141/0x2a0 vfs_write+0x1fc/0x5a0 ksys_write+0x79/0x180 __x64_sys_write+0x1d/0x30 x64_sys_call+0x2818/0x2880 do_syscall_64+0xa9/0x580 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Reproducer ``` mdadm -CR /dev/md0 -l1 -n2 /dev/sd[cd] echo inactive > /sys/block/md0/md/array_state echo 10 > /sys/block/md0/md/new_level ``` mddev_init() can only be called once per mddev, no need to test if bioset has been initialized anymore. Link: https://lore.kernel.org/linux-raid/20251103125757.1405796-3-linan666@huaweicloud.com Fixes: d981ed841930 ("md: Add new_level sysfs interface") Signed-off-by: Li Nan Reviewed-by: Xiao Ni Signed-off-by: Yu Kuai