kernel/linux.git/drivers/md/raid1.c, branch v7.1-rc5

md/raid1: replace wait loop with wait_event_idle() in raid1_write_request()

2026-04-28T12:44:38+00:00

The wait loop is equivalent to wait_event_idle(); use it to improve readability. Signed-off-by: Abd-Alrhman Masalkhi Link: https://lore.kernel.org/linux-raid/20260423101303.48196-2-abd.masalkhi@gmail.com Signed-off-by: Yu Kuai

md/raid1: serialize overlap io for writemostly disk

2026-04-07T05:09:22+00:00

Previously, using wait_event() would wake up all waiters simultaneously, and they would compete for the tree lock. The bio which gets the lock first will be handled, so the write sequence cannot be guaranteed. For example: bio1(100,200) bio2(150,200) bio3(150,300) The write sequence of fast device is bio1,bio2,bio3. But the write sequence of slow device could be bio1,bio3,bio2 due to lock competition. This causes data corruption. Replace waitqueue with a fifo list to guarantee the write sequence. And it also needs to iterate the list when removing one entry. If not, it may miss the opportunity to wake up the waiting io. For example: bio1(1,3), bio2(2,4) bio3(5,7), bio4(6,8) These four bios are in the same bucket. bio1 and bio3 are inserted into the rbtree. bio2 and bio4 are added to the waiting list and bio2 is the first one. bio3 returns from slow disk and tries to wake up the waiting bios. bio2 is removed from the list and will be handled. But bio1 hasn't finished. So bio2 will be added into waiting list again. Then bio1 returns from slow disk and wakes up waiting bios. bio4 is removed from the list and will be handled. Now bio1, bio3 and bio4 all finish and bio2 is left on the waiting list. So it needs to iterate the waiting list to wake up the right bio. Signed-off-by: Xiao Ni Link: https://lore.kernel.org/linux-raid/20260324072501.59865-1-xni@redhat.com/ Signed-off-by: Yu Kuai

md/raid1: fix the comparing region of interval tree

2026-03-22T18:15:10+00:00

Interval tree uses [start, end] as a region which stores in the tree. In raid1, it uses the wrong end value. For example: bio(A,B) is too big and needs to be split to bio1(A,C-1), bio2(C,B). The region of bio1 is [A,C] and the region of bio2 is [C,B]. So bio1 and bio2 overlap which is not right. Fix this problem by using right end value of the region. Fixes: d0d2d8ba0494 ("md/raid1: introduce wait_for_serialization") Signed-off-by: Xiao Ni Link: https://lore.kernel.org/linux-raid/20260305011839.5118-2-xni@redhat.com/ Signed-off-by: Yu Kuai

block: remove bdev_nonrot()

2026-03-09T20:30:00+00:00

bdev_nonrot() is simply the negative return value of bdev_rot(). So replace all call sites of bdev_nonrot() with calls to bdev_rot() and remove bdev_nonrot(). Signed-off-by: Damien Le Moal Reviewed-by: Martin K. Petersen Reviewed-by: Paul Menzel Signed-off-by: Jens Axboe

Convert more 'alloc_obj' cases to default GFP_KERNEL arguments

2026-02-22T04:03:00+00:00

This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds

Convert 'alloc_obj' family to use the new default GFP_KERNEL argument

2026-02-22T01:09:51+00:00

This was done entirely with mindless brute force, using git grep -l '\

treewide: Replace kmalloc with kmalloc_obj for non-scalar types

2026-02-21T09:02:28+00:00

This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook

md/raid1: fix memory leak in raid1_run()

2026-02-02T07:35:03+00:00

raid1_run() calls setup_conf() which registers a thread via md_register_thread(). If raid1_set_limits() fails, the previously registered thread is not unregistered, resulting in a memory leak of the md_thread structure and the thread resource itself. Add md_unregister_thread() to the error path to properly cleanup the thread, which aligns with the error handling logic of other paths in this function. Compile tested only. Issue found using a prototype static analysis tool and code review. Link: https://lore.kernel.org/linux-raid/20260126071533.606263-1-zilin@seu.edu.cn Fixes: 97894f7d3c29 ("md/raid1: use the atomic queue limit update APIs") Signed-off-by: Zilin Guan Reviewed-by: Li Nan Signed-off-by: Yu Kuai

md: remove recovery_disabled

2026-01-26T05:17:38+00:00

'recovery_disabled' logic is complex and confusing, originally intended to preserve raid in extreme scenarios. It was used in following cases: - When sync fails and setting badblocks also fails, kick out non-In_sync rdev and block spare rdev from joining to preserve raid [1] - When last backup is unavailable, prevent repeated add-remove of spares triggering recovery [2] The original issues are now resolved: - Error handlers in all raid types prevent last rdev from being kicked out - Disks with failed recovery are marked Faulty and can't re-join Therefore, remove 'recovery_disabled' as it's no longer needed. [1] 5389042ffa36 ("md: change managed of recovery_disabled.") [2] 4044ba58dd15 ("md: don't retry recovery of raid1 that fails due to error on source drive.") Link: https://lore.kernel.org/linux-raid/20260105110300.1442509-13-linan666@huaweicloud.com Signed-off-by: Li Nan Signed-off-by: Yu Kuai

md: mark rdev Faulty when badblocks setting fails

2026-01-26T05:16:15+00:00

Currently when sync read fails and badblocks set fails (exceeding 512 limit), rdev isn't immediately marked Faulty. Instead 'recovery_disabled' is set and non-In_sync rdevs are removed later. This preserves array availability if bad regions aren't read, but bad sectors might be read by users before rdev removal. This occurs due to incorrect resync/recovery_offset updates that include these bad sectors. When badblocks exceed 512, keeping the disk provides little benefit while adding complexity. Prompt disk replacement is more important. Therefore when badblocks set fails, directly call md_error to mark rdev Faulty immediately, preventing potential data access issues. After this change, cleanup of offset update logic and 'recovery_disabled' handling will follow. Link: https://lore.kernel.org/linux-raid/20260105110300.1442509-6-linan666@huaweicloud.com Fixes: 5e5702898e93 ("md/raid10: Handle read errors during recovery better.") Fixes: 3a9f28a5117e ("md/raid1: improve handling of read failure during recovery.") Signed-off-by: Li Nan Signed-off-by: Yu Kuai