kernel/linux.git/drivers/md/md.h, branch v7.1-rc5

md/md-bitmap: add a none backend for bitmap grow

2026-04-28T12:44:38+00:00

Add a real none bitmap backend that exposes the common bitmap sysfs group and use it to keep bitmap/location available when an array has no bitmap. Then switch the bitmap location sysfs path to move only between none and the classic bitmap backend, using the no-sysfs bitmap helpers while merging or unmerging the internal bitmap sysfs group. This restores mdadm --grow bitmap addition through bitmap/location. Fixes: fb8cc3b0d9db ("md/md-bitmap: delay registration of bitmap_ops until creating bitmap") Reviewed-by: Su Yue Link: https://lore.kernel.org/r/20260425024615.1696892-4-yukuai@fnnas.com Signed-off-by: Yu Kuai

md: use mddev_lock_nointr() in mddev_suspend_and_lock_nointr()

2026-04-28T12:44:37+00:00

This keeps mddev locking consistent and ensures that any future changes to locking behavior are done through the wrapper. Signed-off-by: Abd-Alrhman Masalkhi Link: https://lore.kernel.org/r/20260415140319.376578-3-abd.masalkhi@gmail.com Signed-off-by: Yu Kuai

md/raid5: Fix UAF on IO across the reshape position

2026-04-28T12:44:37+00:00

If make_stripe_request() returns STRIPE_WAIT_RESHAPE, raid5_make_request() will free the cloned bio. But raid5_make_request() can call make_stripe_request() multiple times, writing to the various stripes. If that bio got added to the toread or towrite lists of a stripe disk in an earlier call to make_stripe_request(), then it's not safe to just free the bio if a later part of it is found to cross the reshape position. Doing so can lead to a UAF error, when bio_endio() is called on the bio for the earlier stripes. Instead, raid5_make_request() needs to wait until all parts of the bio have called bio_endio(). To do this, bios that cross the reshape position while the reshape can't make progress are flagged as needing to wait for all parts to complete. When raid5_make_request() has a bio that failed make_stripe_request() with STRIPE_WAIT_RESHAPE, it sets bi->bi_private to a completion struct and waits for completion after ending the bio. When the bio_endio() is called for the last time on a clone bio with bi->bi_private set, it wakes up the waiter. This guarantees that raid5_make_request() doesn't return until the cloned bio needing a retry for io across the reshape boundary is safely cleaned up. There is a simple reproducer available at [1]. Compile the kernel with KASAN for more useful reporting when the error is triggered (this is not necessary to see the bug). [1] https://gist.github.com/bmarzins/e48598824305cf2171289e47d7241fa5 Signed-off-by: Benjamin Marzinski Reviewed-by: Xiao Ni Link: https://lore.kernel.org/r/20260408043548.1695157-1-bmarzins@redhat.com Signed-off-by: Yu Kuai

md/raid1: serialize overlap io for writemostly disk

2026-04-07T05:09:22+00:00

Previously, using wait_event() would wake up all waiters simultaneously, and they would compete for the tree lock. The bio which gets the lock first will be handled, so the write sequence cannot be guaranteed. For example: bio1(100,200) bio2(150,200) bio3(150,300) The write sequence of fast device is bio1,bio2,bio3. But the write sequence of slow device could be bio1,bio3,bio2 due to lock competition. This causes data corruption. Replace waitqueue with a fifo list to guarantee the write sequence. And it also needs to iterate the list when removing one entry. If not, it may miss the opportunity to wake up the waiting io. For example: bio1(1,3), bio2(2,4) bio3(5,7), bio4(6,8) These four bios are in the same bucket. bio1 and bio3 are inserted into the rbtree. bio2 and bio4 are added to the waiting list and bio2 is the first one. bio3 returns from slow disk and tries to wake up the waiting bios. bio2 is removed from the list and will be handled. But bio1 hasn't finished. So bio2 will be added into waiting list again. Then bio1 returns from slow disk and wakes up waiting bios. bio4 is removed from the list and will be handled. Now bio1, bio3 and bio4 all finish and bio2 is left on the waiting list. So it needs to iterate the waiting list to wake up the right bio. Signed-off-by: Xiao Ni Link: https://lore.kernel.org/linux-raid/20260324072501.59865-1-xni@redhat.com/ Signed-off-by: Yu Kuai

md: fix return value of mddev_trylock

2026-02-02T07:39:55+00:00

A return value of 0 is treaded as successful lock acquisition. In fact, a return value of 1 means getting the lock successfully. Link: https://lore.kernel.org/linux-raid/20260127073951.17248-1-xni@redhat.com Fixes: 9e59d609763f ("md: call del_gendisk in control path") Reported-by: Bart Van Assche Closes: https://lore.kernel.org/linux-raid/20250611073108.25463-1-xni@redhat.com/T/#mfa369ef5faa4aa58e13e6d9fdb88aecd862b8f2f Signed-off-by: Xiao Ni Reviewed-by: Bart Van Assche Reviewed-by: Li Nan Signed-off-by: Yu Kuai

md: remove recovery_disabled

2026-01-26T05:17:38+00:00

'recovery_disabled' logic is complex and confusing, originally intended to preserve raid in extreme scenarios. It was used in following cases: - When sync fails and setting badblocks also fails, kick out non-In_sync rdev and block spare rdev from joining to preserve raid [1] - When last backup is unavailable, prevent repeated add-remove of spares triggering recovery [2] The original issues are now resolved: - Error handlers in all raid types prevent last rdev from being kicked out - Disks with failed recovery are marked Faulty and can't re-join Therefore, remove 'recovery_disabled' as it's no longer needed. [1] 5389042ffa36 ("md: change managed of recovery_disabled.") [2] 4044ba58dd15 ("md: don't retry recovery of raid1 that fails due to error on source drive.") Link: https://lore.kernel.org/linux-raid/20260105110300.1442509-13-linan666@huaweicloud.com Signed-off-by: Li Nan Signed-off-by: Yu Kuai

md: remove MD_RECOVERY_ERROR handling and simplify resync_offset update

2026-01-26T05:16:19+00:00

Following previous patch "md: update curr_resync_completed even when MD_RECOVERY_INTR is set", 'curr_resync_completed' always equals 'curr_resync' for resync, so MD_RECOVERY_ERROR can be removed. Also, simplify resync_offset update logic. Link: https://lore.kernel.org/linux-raid/20260105110300.1442509-8-linan666@huaweicloud.com Signed-off-by: Li Nan Reviewed-by: Yu Kuai Signed-off-by: Yu Kuai

md: factor error handling out of md_done_sync into helper

2026-01-26T05:16:07+00:00

The 'ok' parameter in md_done_sync() is redundant for most callers that always pass 'true'. Factor error handling logic into a separate helper function md_sync_error() to eliminate unnecessary parameter passing and improve code clarity. No functional changes introduced. Link: https://lore.kernel.org/linux-raid/20260105110300.1442509-3-linan666@huaweicloud.com Signed-off-by: Li Nan Reviewed-by: Yu Kuai Signed-off-by: Yu Kuai

md/raid5: use mempool to allocate stripe_request_ctx

2026-01-26T05:11:29+00:00

On the one hand, stripe_request_ctx is 72 bytes, and it's a bit huge for a stack variable. On the other hand, the bitmap sectors_to_do is a fixed size, result in max_hw_sector_kb of raid5 array is at most 256 * 4k = 1Mb, and this will make full stripe IO impossible for the array that chunk_size * data_disks is bigger. Allocate ctx during runtime will make it possible to get rid of this limit. Link: https://lore.kernel.org/linux-raid/20260114171241.3043364-6-yukuai@fnnas.com Signed-off-by: Yu Kuai Reviewed-by: Li Nan

md: merge mddev serialize_policy into mddev_flags

2026-01-26T05:10:51+00:00

There is not need to use a separate field in struct mddev, there are no functional changes. Link: https://lore.kernel.org/linux-raid/20260114171241.3043364-5-yukuai@fnnas.com Signed-off-by: Yu Kuai Reviewed-by: Li Nan