kernel/linux.git/drivers/md/raid5.c, branch v7.1-rc5

md/raid5: Fix UAF on IO across the reshape position

2026-04-28T12:44:37+00:00

If make_stripe_request() returns STRIPE_WAIT_RESHAPE, raid5_make_request() will free the cloned bio. But raid5_make_request() can call make_stripe_request() multiple times, writing to the various stripes. If that bio got added to the toread or towrite lists of a stripe disk in an earlier call to make_stripe_request(), then it's not safe to just free the bio if a later part of it is found to cross the reshape position. Doing so can lead to a UAF error, when bio_endio() is called on the bio for the earlier stripes. Instead, raid5_make_request() needs to wait until all parts of the bio have called bio_endio(). To do this, bios that cross the reshape position while the reshape can't make progress are flagged as needing to wait for all parts to complete. When raid5_make_request() has a bio that failed make_stripe_request() with STRIPE_WAIT_RESHAPE, it sets bi->bi_private to a completion struct and waits for completion after ending the bio. When the bio_endio() is called for the last time on a clone bio with bi->bi_private set, it wakes up the waiter. This guarantees that raid5_make_request() doesn't return until the cloned bio needing a retry for io across the reshape boundary is safely cleaned up. There is a simple reproducer available at [1]. Compile the kernel with KASAN for more useful reporting when the error is triggered (this is not necessary to see the bug). [1] https://gist.github.com/bmarzins/e48598824305cf2171289e47d7241fa5 Signed-off-by: Benjamin Marzinski Reviewed-by: Xiao Ni Link: https://lore.kernel.org/r/20260408043548.1695157-1-bmarzins@redhat.com Signed-off-by: Yu Kuai

md/raid5: fix soft lockup in retry_aligned_read()

2026-04-07T07:13:52+00:00

When retry_aligned_read() encounters an overlapped stripe, it releases the stripe via raid5_release_stripe() which puts it on the lockless released_stripes llist. In the next raid5d loop iteration, release_stripe_list() drains the stripe onto handle_list (since STRIPE_HANDLE is set by the original IO), but retry_aligned_read() runs before handle_active_stripes() and removes the stripe from handle_list via find_get_stripe() -> list_del_init(). This prevents handle_stripe() from ever processing the stripe to resolve the overlap, causing an infinite loop and soft lockup. Fix this by using __release_stripe() with temp_inactive_list instead of raid5_release_stripe() in the failure path, so the stripe does not go through the released_stripes llist. This allows raid5d to break out of its loop, and the overlap will be resolved when the stripe is eventually processed by handle_stripe(). Fixes: 773ca82fa1ee ("raid5: make release_stripe lockless") Cc: stable@vger.kernel.org Signed-off-by: FengWei Shih Signed-off-by: Chia-Ming Chang Link: https://lore.kernel.org/linux-raid/20260402061406.455755-1-chiamingc@synology.com/ Signed-off-by: Yu Kuai

md/raid5: move handle_stripe() comment to correct location

2026-03-22T18:15:11+00:00

Move the handle_stripe() documentation comment from above analyse_stripe() to directly above handle_stripe() where it belongs. Signed-off-by: Chen Cheng Reviewed-by: Yu Kuai Link: https://lore.kernel.org/linux-raid/20260304111001.15767-1-chencheng@fnnas.com/ Signed-off-by: Yu Kuai

md/raid5: skip 2-failure compute when other disk is R5_LOCKED

2026-03-22T01:57:33+00:00

When skip_copy is enabled on a doubly-degraded RAID6, a device that is being written to will be in R5_LOCKED state with R5_UPTODATE cleared. If a new read triggers fetch_block() while the write is still in flight, the 2-failure compute path may select this locked device as a compute target because it is not R5_UPTODATE. Because skip_copy makes the device page point directly to the bio page, reconstructing data into it might be risky. Also, since the compute marks the device R5_UPTODATE, it triggers WARN_ON in ops_run_io() which checks that R5_SkipCopy and R5_UPTODATE are not both set. This can be reproduced by running small-range concurrent read/write on a doubly-degraded RAID6 with skip_copy enabled, for example: mdadm -C /dev/md0 -l6 -n6 -R -f /dev/loop[0-3] missing missing echo 1 > /sys/block/md0/md/skip_copy fio --filename=/dev/md0 --rw=randrw --bs=4k --numjobs=8 \ --iodepth=32 --size=4M --runtime=30 --time_based --direct=1 Fix by checking R5_LOCKED before proceeding with the compute. The compute will be retried once the lock is cleared on IO completion. Signed-off-by: FengWei Shih Reviewed-by: Yu Kuai Link: https://lore.kernel.org/linux-raid/20260319053351.3676794-1-dannyshih@synology.com/ Signed-off-by: Yu Kuai

md/raid5: set chunk_sectors to enable full stripe I/O splitting

2026-03-15T17:24:59+00:00

Set chunk_sectors to the full stripe width (io_opt) so that the block layer splits I/O at full stripe boundaries. This ensures that large writes are aligned to full stripes, avoiding the read-modify-write overhead that occurs with partial stripe writes in RAID-5/6. When chunk_sectors is set, the block layer's bio splitting logic in get_max_io_size() uses blk_boundary_sectors_left() to limit I/O size to the boundary. This naturally aligns split bios to full stripe boundaries, enabling more efficient full stripe writes. Test results with 24-disk RAID5 (chunk_size=64k): dd if=/dev/zero of=/dev/md0 bs=10M oflag=direct Before: 461 MB/s After: 520 MB/s (+12.8%) Link: https://lore.kernel.org/linux-raid/20260223035834.3132498-1-yukuai@fnnas.com Suggested-by: Christoph Hellwig Reviewed-by: Paul Menzel Reviewed-by: Christoph Hellwig Signed-off-by: Yu Kuai

block: remove bdev_nonrot()

2026-03-09T20:30:00+00:00

bdev_nonrot() is simply the negative return value of bdev_rot(). So replace all call sites of bdev_nonrot() with calls to bdev_rot() and remove bdev_nonrot(). Signed-off-by: Damien Le Moal Reviewed-by: Martin K. Petersen Reviewed-by: Paul Menzel Signed-off-by: Jens Axboe

Convert more 'alloc_obj' cases to default GFP_KERNEL arguments

2026-02-22T04:03:00+00:00

This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds

Convert 'alloc_obj' family to use the new default GFP_KERNEL argument

2026-02-22T01:09:51+00:00

This was done entirely with mindless brute force, using git grep -l '\

treewide: Replace kmalloc with kmalloc_obj for non-scalar types

2026-02-21T09:02:28+00:00

This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook

md/raid5: fix IO hang with degraded array with llbitmap

2026-01-26T05:18:59+00:00

When llbitmap bit state is still unwritten, any new write should force rcw, as bitmap_ops->blocks_synced() is checked in handle_stripe_dirtying(). However, later the same check is missing in need_this_block(), causing stripe to deadloop during handling because handle_stripe() will decide to go to handle_stripe_fill(), meanwhile need_this_block() always return 0 and nothing is handled. Link: https://lore.kernel.org/linux-raid/20260123182623.3718551-2-yukuai@fnnas.com Fixes: 5ab829f1971d ("md/md-llbitmap: introduce new lockless bitmap") Signed-off-by: Yu Kuai Reviewed-by: Li Nan