summaryrefslogtreecommitdiff
path: root/drivers/block
AgeCommit message (Collapse)AuthorFilesLines
2026-03-04rnbd-srv: Zero the rsp buffer before using itMd Haris Iqbal1-0/+3
[ Upstream commit 69d26698e4fd44935510553809007151b2fe4db5 ] Before using the data buffer to send back the response message, zero it completely. This prevents any stray bytes to be picked up by the client side when there the message is exchanged between different protocol versions. Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04drbd: always set BLK_FEAT_STABLE_WRITESChristoph Böhmwalder2-4/+19
[ Upstream commit 2ebc8d600fb907fa6b1e7095c0b6d84fc47e91ea ] DRBD requires stable pages because it may read the same bio data multiple times for local disk I/O and network transmission, and in some cases for calculating checksums. The BLK_FEAT_STABLE_WRITES flag is set when the device is first created, but blk_set_stacking_limits() clears it whenever a backing device is attached. In some cases the flag may be inherited from the backing device, but we want it to be enabled at all times. Unconditionally re-enable BLK_FEAT_STABLE_WRITES in drbd_reconsider_queue_parameters() after the queue parameter negotiations. Also, document why we want this flag enabled in the first place. Fixes: 1a02f3a73f8c ("block: move the stable_writes flag to queue_limits") Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04ublk: Validate SQE128 flag before accessing the cmdGovindarajulu Varadarajan1-3/+3
[ Upstream commit da7e4b75e50c087d2031a92f6646eb90f7045a67 ] ublk_ctrl_cmd_dump() accesses (header *)sqe->cmd before IO_URING_F_SQE128 flag check. This could cause out of boundary memory access. Move the SQE128 flag check earlier in ublk_ctrl_uring_cmd() to return -EINVAL immediately if the flag is not set. Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver") Signed-off-by: Govindarajulu Varadarajan <govind.varadar@gmail.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04rnbd-srv: Fix server side setting of bi_size for special IOsFlorian-Ewald Mueller1-10/+23
[ Upstream commit 4ac9690d4b9456ca1d5276d86547fa2e7cd47684 ] On rnbd-srv, the bi_size of the bio is set during the bio_add_page function, to which datalen is passed. But for special IOs like DISCARD and WRITE_ZEROES, datalen is 0, since there is no data to write. For these special IOs, use the bi_size of the rnbd_msg_io. Fixes: f6f84be089c9 ("block/rnbd-srv: Add sanity check and remove redundant assignment") Signed-off-by: Florian-Ewald Mueller <florian-ewald.mueller@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Grzegorz Prajsner <grzegorz.prajsner@ionos.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-04rnbd-srv: use bio_add_virt_nofailChristoph Hellwig1-6/+1
[ Upstream commit a216081323a1391991c9073fed2459265bfc7f5c ] Use the bio_add_virt_nofail to add a single kernel virtual address to a bio as that can't fail. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250507120451.4000627-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: 4ac9690d4b94 ("rnbd-srv: Fix server side setting of bi_size for special IOs") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-02-11ublk: fix deadlock when reading partition tableMing Lei1-3/+27
commit c258f5c4502c9667bccf5d76fa731ab9c96687c1 upstream. When one process(such as udev) opens ublk block device (e.g., to read the partition table via bdev_open()), a deadlock[1] can occur: 1. bdev_open() grabs disk->open_mutex 2. The process issues read I/O to ublk backend to read partition table 3. In __ublk_complete_rq(), blk_update_request() or blk_mq_end_request() runs bio->bi_end_io() callbacks 4. If this triggers fput() on file descriptor of ublk block device, the work may be deferred to current task's task work (see fput() implementation) 5. This eventually calls blkdev_release() from the same context 6. blkdev_release() tries to grab disk->open_mutex again 7. Deadlock: same task waiting for a mutex it already holds The fix is to run blk_update_request() and blk_mq_end_request() with bottom halves disabled. This forces blkdev_release() to run in kernel work-queue context instead of current task work context, and allows ublk server to make forward progress, and avoids the deadlock. Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver") Link: https://github.com/ublk-org/ublksrv/issues/170 [1] Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> [axboe: rewrite comment in ublk] Signed-off-by: Jens Axboe <axboe@kernel.dk> [ The fix omits the change in __ublk_do_auto_buf_reg() since this function doesn't exist in Linux 6.12. ] Signed-off-by: Alva Lan <alvalan9@foxmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-11rbd: check for EOD after exclusive lock is ensured to be heldIlya Dryomov1-12/+21
commit bd3884a204c3b507e6baa9a4091aa927f9af5404 upstream. Similar to commit 870611e4877e ("rbd: get snapshot context after exclusive lock is ensured to be held"), move the "beyond EOD" check into the image request state machine so that it's performed after exclusive lock is ensured to be held. This avoids various race conditions which can arise when the image is shrunk under I/O (in practice, mostly readahead). In one such scenario rbd_assert(objno < rbd_dev->object_map_size); can be triggered if a close-to-EOD read gets queued right before the shrink is initiated and the EOD check is performed against an outdated mapping_size. After the resize is done on the server side and exclusive lock is (re)acquired bringing along the new (now shrunk) object map, the read starts going through the state machine and rbd_obj_may_exist() gets invoked on an object that is out of bounds of rbd_dev->object_map array. Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@linux.dev> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-23null_blk: fix kmemleak by releasing references to fault configfs itemsNilay Shroff1-1/+11
commit 40b94ec7edbbb867c4e26a1a43d2b898f04b93c5 upstream. When CONFIG_BLK_DEV_NULL_BLK_FAULT_INJECTION is enabled, the null-blk driver sets up fault injection support by creating the timeout_inject, requeue_inject, and init_hctx_fault_inject configfs items as children of the top-level nullbX configfs group. However, when the nullbX device is removed, the references taken to these fault-config configfs items are not released. As a result, kmemleak reports a memory leak, for example: unreferenced object 0xc00000021ff25c40 (size 32): comm "mkdir", pid 10665, jiffies 4322121578 hex dump (first 32 bytes): 69 6e 69 74 5f 68 63 74 78 5f 66 61 75 6c 74 5f init_hctx_fault_ 69 6e 6a 65 63 74 00 88 00 00 00 00 00 00 00 00 inject.......... backtrace (crc 1a018c86): __kmalloc_node_track_caller_noprof+0x494/0xbd8 kvasprintf+0x74/0xf4 config_item_set_name+0xf0/0x104 config_group_init_type_name+0x48/0xfc fault_config_init+0x48/0xf0 0xc0080000180559e4 configfs_mkdir+0x304/0x814 vfs_mkdir+0x49c/0x604 do_mkdirat+0x314/0x3d0 sys_mkdir+0xa0/0xd8 system_call_exception+0x1b0/0x4f0 system_call_vectored_common+0x15c/0x2ec Fix this by explicitly releasing the references to the fault-config configfs items when dropping the reference to the top-level nullbX configfs group. Cc: stable@vger.kernel.org Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Fixes: bb4c19e030f4 ("block: null_blk: make fault-injection dynamically configurable per device") Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-08floppy: fix for PAGE_SIZE != 4KBRene Rebe1-1/+1
commit 82d20481024cbae2ea87fe8b86d12961bfda7169 upstream. For years I wondered why the floppy driver does not just work on sparc64, e.g: root@SUNW_375_0066:# disktype /dev/fd0 disktype: Can't open /dev/fd0: No such device or address [ 525.341906] disktype: attempt to access beyond end of device fd0: rw=0, sector=0, nr_sectors = 16 limit=8 [ 525.341991] floppy: error 10 while reading block 0 Turns out floppy.c __floppy_read_block_0 tries to read one page for the first test read to determine the disk size and thus fails if that is greater than 4k. Adjust minimum MAX_DISK_SIZE to PAGE_SIZE to fix floppy on sparc64 and likely all other PAGE_SIZE != 4KB configs. Cc: stable@vger.kernel.org Signed-off-by: René Rebe <rene@exactco.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-08block: rnbd-clt: Fix signedness bug in init_dev()Dan Carpenter1-1/+1
[ Upstream commit 1ddb815fdfd45613c32e9bd1f7137428f298e541 ] The "dev->clt_device_id" variable is set using ida_alloc_max() which returns an int and in particular it returns negative error codes. Change the type from u32 to int to fix the error checking. Fixes: c9b5645fd8ca ("block: rnbd-clt: Fix leaked ID in init_dev()") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-08block: rnbd-clt: Fix leaked ID in init_dev()Thomas Fourier1-5/+8
[ Upstream commit c9b5645fd8ca10f310e41b07540f98e6a9720f40 ] If kstrdup() fails in init_dev(), then the newly allocated ID is lost. Fixes: 64e8a6ece1a5 ("block/rnbd-clt: Dynamically alloc buffer for pathname & blk_symlink_name") Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18ublk: prevent invalid access with DEBUGKevin Brodsky1-2/+2
[ Upstream commit c6a45ee7607de3a350008630f4369b1b5ac80884 ] ublk_ch_uring_cmd_local() may jump to the out label before initialising the io pointer. This will cause trouble if DEBUG is defined, because the pr_devel() call dereferences io. Clang reports: drivers/block/ublk_drv.c:2403:6: error: variable 'io' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized] 2403 | if (tag >= ub->dev_info.queue_depth) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/block/ublk_drv.c:2492:32: note: uninitialized use occurs here 2492 | __func__, cmd_op, tag, ret, io->flags); | Fix this by initialising io to NULL and checking it before dereferencing it. Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver") Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18ps3disk: use memcpy_{from,to}_bvec indexRene Rebe1-0/+4
[ Upstream commit 79bd8c9814a273fa7ba43399e1c07adec3fc95db ] With 6e0a48552b8c (ps3disk: use memcpy_{from,to}_bvec) converting ps3disk to new bvec helpers, incrementing the offset was accidently lost, corrupting consecutive buffers. Restore index for non-corrupted data transfers. Fixes: 6e0a48552b8c (ps3disk: use memcpy_{from,to}_bvec) Signed-off-by: René Rebe <rene@exactco.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18nbd: defer config unlock in nbd_genl_connectZheng Qixing1-1/+2
[ Upstream commit 1649714b930f9ea6233ce0810ba885999da3b5d4 ] There is one use-after-free warning when running NBD_CMD_CONNECT and NBD_CLEAR_SOCK: nbd_genl_connect nbd_alloc_and_init_config // config_refs=1 nbd_start_device // config_refs=2 set NBD_RT_HAS_CONFIG_REF open nbd // config_refs=3 recv_work done // config_refs=2 NBD_CLEAR_SOCK // config_refs=1 close nbd // config_refs=0 refcount_inc -> uaf ------------[ cut here ]------------ refcount_t: addition on 0; use-after-free. WARNING: CPU: 24 PID: 1014 at lib/refcount.c:25 refcount_warn_saturate+0x12e/0x290 nbd_genl_connect+0x16d0/0x1ab0 genl_family_rcv_msg_doit+0x1f3/0x310 genl_rcv_msg+0x44a/0x790 The issue can be easily reproduced by adding a small delay before refcount_inc(&nbd->config_refs) in nbd_genl_connect(): mutex_unlock(&nbd->config_lock); if (!ret) { set_bit(NBD_RT_HAS_CONFIG_REF, &config->runtime_flags); + printk("before sleep\n"); + mdelay(5 * 1000); + printk("after sleep\n"); refcount_inc(&nbd->config_refs); nbd_connect_reply(info, nbd->index); } Fixes: e46c7287b1c2 ("nbd: add a basic netlink interface") Signed-off-by: Zheng Qixing <zhengqixing@huawei.com> Reviewed-by: Yu Kuai <yukuai@fnnas.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-12-18nbd: defer config put in recv_workZheng Qixing1-1/+1
[ Upstream commit 9517b82d8d422d426a988b213fdd45c6b417b86d ] There is one uaf issue in recv_work when running NBD_CLEAR_SOCK and NBD_CMD_RECONFIGURE: nbd_genl_connect // conf_ref=2 (connect and recv_work A) nbd_open // conf_ref=3 recv_work A done // conf_ref=2 NBD_CLEAR_SOCK // conf_ref=1 nbd_genl_reconfigure // conf_ref=2 (trigger recv_work B) close nbd // conf_ref=1 recv_work B config_put // conf_ref=0 atomic_dec(&config->recv_threads); -> UAF Or only running NBD_CLEAR_SOCK: nbd_genl_connect // conf_ref=2 nbd_open // conf_ref=3 NBD_CLEAR_SOCK // conf_ref=2 close nbd nbd_release config_put // conf_ref=1 recv_work config_put // conf_ref=0 atomic_dec(&config->recv_threads); -> UAF Commit 87aac3a80af5 ("nbd: call nbd_config_put() before notifying the waiter") moved nbd_config_put() to run before waking up the waiter in recv_work, in order to ensure that nbd_start_device_ioctl() would not be woken up while nbd->task_recv was still uncleared. However, in nbd_start_device_ioctl(), after being woken up it explicitly calls flush_workqueue() to make sure all current works are finished. Therefore, there is no need to move the config put ahead of the wakeup. Move nbd_config_put() to the end of recv_work, so that the reference is held for the whole lifetime of the worker thread. This makes sure the config cannot be freed while recv_work is still running, even if clear + reconfigure interleave. In addition, we don't need to worry about recv_work dropping the last nbd_put (which causes deadlock): path A (netlink with NBD_CFLAG_DESTROY_ON_DISCONNECT): connect // nbd_refs=1 (trigger recv_work) open nbd // nbd_refs=2 NBD_CLEAR_SOCK close nbd nbd_release nbd_disconnect_and_put flush_workqueue // recv_work done nbd_config_put nbd_put // nbd_refs=1 nbd_put // nbd_refs=0 queue_work path B (netlink without NBD_CFLAG_DESTROY_ON_DISCONNECT): connect // nbd_refs=2 (trigger recv_work) open nbd // nbd_refs=3 NBD_CLEAR_SOCK // conf_refs=2 close nbd nbd_release nbd_config_put // conf_refs=1 nbd_put // nbd_refs=2 recv_work done // conf_refs=0, nbd_refs=1 rmmod // nbd_refs=0 Reported-by: syzbot+56fbf4c7ddf65e95c7cc@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/6907edce.a70a0220.37351b.0014.GAE@google.com/T/ Fixes: 87aac3a80af5 ("nbd: make the config put is called before the notifying the waiter") Depends-on: e2daec488c57 ("nbd: Fix hungtask when nbd_config_put") Signed-off-by: Zheng Qixing <zhengqixing@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-29nbd: override creds to kernel when calling sock_{send,recv}msg()Ondrej Mosnacek1-0/+15
[ Upstream commit 81ccca31214e11ea2b537fd35d4f66d7cf46268e ] sock_{send,recv}msg() internally calls security_socket_{send,recv}msg(), which does security checks (e.g. SELinux) for socket access against the current task. However, _sock_xmit() in drivers/block/nbd.c may be called indirectly from a userspace syscall, where the NBD socket access would be incorrectly checked against the calling userspace task (which simply tries to read/write a file that happens to reside on an NBD device). To fix this, temporarily override creds to kernel ones before calling the sock_*() functions. This allows the security modules to recognize this as internal access by the kernel, which will normally be allowed. A way to trigger the issue is to do the following (on a system with SELinux set to enforcing): ### Create nbd device: truncate -s 256M /tmp/testfile nbd-server localhost:10809 /tmp/testfile ### Connect to the nbd server: nbd-client localhost ### Create mdraid array mdadm --create -l 1 -n 2 /dev/md/testarray /dev/nbd0 missing After these steps, assuming the SELinux policy doesn't allow the unexpected access pattern, errors will be visible on the kernel console: [ 142.204243] nbd0: detected capacity change from 0 to 524288 [ 165.189967] md: async del_gendisk mode will be removed in future, please upgrade to mdadm-4.5+ [ 165.252299] md/raid1:md127: active with 1 out of 2 mirrors [ 165.252725] md127: detected capacity change from 0 to 522240 [ 165.255434] block nbd0: Send control failed (result -13) [ 165.255718] block nbd0: Request send failed, requeueing [ 165.256006] block nbd0: Dead connection, failed to find a fallback [ 165.256041] block nbd0: Receive control failed (result -32) [ 165.256423] block nbd0: shutting down sockets [ 165.257196] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [ 165.257736] Buffer I/O error on dev md127, logical block 0, async page read [ 165.258263] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [ 165.259376] Buffer I/O error on dev md127, logical block 0, async page read [ 165.259920] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [ 165.260628] Buffer I/O error on dev md127, logical block 0, async page read [ 165.261661] ldm_validate_partition_table(): Disk read failed. [ 165.262108] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [ 165.262769] Buffer I/O error on dev md127, logical block 0, async page read [ 165.263697] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [ 165.264412] Buffer I/O error on dev md127, logical block 0, async page read [ 165.265412] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [ 165.265872] Buffer I/O error on dev md127, logical block 0, async page read [ 165.266378] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [ 165.267168] Buffer I/O error on dev md127, logical block 0, async page read [ 165.267564] md127: unable to read partition table [ 165.269581] I/O error, dev nbd0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [ 165.269960] Buffer I/O error on dev nbd0, logical block 0, async page read [ 165.270316] I/O error, dev nbd0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [ 165.270913] Buffer I/O error on dev nbd0, logical block 0, async page read [ 165.271253] I/O error, dev nbd0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [ 165.271809] Buffer I/O error on dev nbd0, logical block 0, async page read [ 165.272074] ldm_validate_partition_table(): Disk read failed. [ 165.272360] nbd0: unable to read partition table [ 165.289004] ldm_validate_partition_table(): Disk read failed. [ 165.289614] nbd0: unable to read partition table The corresponding SELinux denial on Fedora/RHEL will look like this (assuming it's not silenced): type=AVC msg=audit(1758104872.510:116): avc: denied { write } for pid=1908 comm="mdadm" laddr=::1 lport=32772 faddr=::1 fport=10809 scontext=system_u:system_r:mdadm_t:s0-s0:c0.c1023 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=tcp_socket permissive=0 The respective backtrace looks like this: @security[mdadm, -13, handshake_exit+221615650 handshake_exit+221615650 handshake_exit+221616465 security_socket_sendmsg+5 sock_sendmsg+106 handshake_exit+221616150 sock_sendmsg+5 __sock_xmit+162 nbd_send_cmd+597 nbd_handle_cmd+377 nbd_queue_rq+63 blk_mq_dispatch_rq_list+653 __blk_mq_do_dispatch_sched+184 __blk_mq_sched_dispatch_requests+333 blk_mq_sched_dispatch_requests+38 blk_mq_run_hw_queue+239 blk_mq_dispatch_plug_list+382 blk_mq_flush_plug_list.part.0+55 __blk_flush_plug+241 __submit_bio+353 submit_bio_noacct_nocheck+364 submit_bio_wait+84 __blkdev_direct_IO_simple+232 blkdev_read_iter+162 vfs_read+591 ksys_read+95 do_syscall_64+92 entry_SYSCALL_64_after_hwframe+120 ]: 1 The issue has started to appear since commit 060406c61c7c ("block: add plug while submitting IO"). Cc: Ming Lei <ming.lei@redhat.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2348878 Fixes: 060406c61c7c ("block: add plug while submitting IO") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Paul Moore <paul@paul-moore.com> Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-19loop: fix backing file reference leak on validation errorLi Chen1-2/+6
commit 98b7bf54338b797e3a11e8178ce0e806060d8fa3 upstream. loop_change_fd() and loop_configure() call loop_check_backing_file() to validate the new backing file. If validation fails, the reference acquired by fget() was not dropped, leaking a file reference. Fix this by calling fput(file) before returning the error. Cc: stable@vger.kernel.org Cc: Markus Elfring <Markus.Elfring@web.de> CC: Yang Erkun <yangerkun@huawei.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Yu Kuai <yukuai1@huaweicloud.com> Fixes: f5c84eff634b ("loop: Add sanity check for read/write_iter") Signed-off-by: Li Chen <chenl311@chinatelecom.cn> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Yang Erkun <yangerkun@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-15nbd: restrict sockets to TCP and UDPEric Dumazet1-0/+8
[ Upstream commit 9f7c02e031570e8291a63162c6c046dc15ff85b0 ] Recently, syzbot started to abuse NBD with all kinds of sockets. Commit cf1b2326b734 ("nbd: verify socket is supported during setup") made sure the socket supported a shutdown() method. Explicitely accept TCP and UNIX stream sockets. Fixes: cf1b2326b734 ("nbd: verify socket is supported during setup") Reported-by: syzbot+e1cd6bd8493060bd701d@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/CANn89iJ+76eE3A_8S_zTpSyW5hvPRn6V57458hCZGY5hbH_bFA@mail.gmail.com/T/#m081036e8747cd7e2626c1da5d78c8b9d1e55b154 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mike Christie <mchristi@redhat.com> Cc: Richard W.M. Jones <rjones@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Yu Kuai <yukuai1@huaweicloud.com> Cc: linux-block@vger.kernel.org Cc: nbd@other.debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-15null_blk: Fix the description of the cache_size module argumentGenjian Zhang1-1/+1
[ Upstream commit 7942b226e6b84df13b46b76c01d3b6e07a1b349e ] When executing modinfo null_blk, there is an error in the description of module parameter mbps, and the output information of cache_size is incomplete.The output of modinfo before and after applying this patch is as follows: Before: [...] parm: cache_size:ulong [...] parm: mbps:Cache size in MiB for memory-backed device. Default: 0 (none) (uint) [...] After: [...] parm: cache_size:Cache size in MiB for memory-backed device. Default: 0 (none) (ulong) [...] parm: mbps:Limit maximum bandwidth (in MiB/s). Default: 0 (no limit) (uint) [...] Fixes: 058efe000b31 ("null_blk: add module parameters for 4 options") Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-09block: add a queue_limits_commit_update_frozen helperChristoph Hellwig1-3/+1
[ Upstream commit aa427d7b73b196f657d6d2cf0e94eff6b883fdef ] Add a helper that freezes the queue, updates the queue limits and unfreezes the queue and convert all open coded versions of that to the new helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250110054726.1499538-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: 708e2371f77a ("scsi: sr: Reinstate rotational media flag") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20drbd: add missing kref_get in handle_write_conflictsSarah Newman1-1/+5
[ Upstream commit 00c9c9628b49e368d140cfa61d7df9b8922ec2a8 ] With `two-primaries` enabled, DRBD tries to detect "concurrent" writes and handle write conflicts, so that even if you write to the same sector simultaneously on both nodes, they end up with the identical data once the writes are completed. In handling "superseeded" writes, we forgot a kref_get, resulting in a premature drbd_destroy_device and use after free, and further to kernel crashes with symptoms. Relevance: No one should use DRBD as a random data generator, and apparently all users of "two-primaries" handle concurrent writes correctly on layer up. That is cluster file systems use some distributed lock manager, and live migration in virtualization environments stops writes on one node before starting writes on the other node. Which means that other than for "test cases", this code path is never taken in real life. FYI, in DRBD 9, things are handled differently nowadays. We still detect "write conflicts", but no longer try to be smart about them. We decided to disconnect hard instead: upper layers must not submit concurrent writes. If they do, that's their fault. Signed-off-by: Sarah Newman <srn@prgmr.com> Signed-off-by: Lars Ellenberg <lars@linbit.com> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20250627095728.800688-1-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20loop: Avoid updating block size under exclusive ownerJan Kara1-8/+30
[ Upstream commit 7e49538288e523427beedd26993d446afef1a6fb ] Syzbot came up with a reproducer where a loop device block size is changed underneath a mounted filesystem. This causes a mismatch between the block device block size and the block size stored in the superblock causing confusion in various places such as fs/buffer.c. The particular issue triggered by syzbot was a warning in __getblk_slow() due to requested buffer size not matching block device block size. Fix the problem by getting exclusive hold of the loop device to change its block size. This fails if somebody (such as filesystem) has already an exclusive ownership of the block device and thus prevents modifying the loop device under some exclusive owner which doesn't expect it. Reported-by: syzbot+01ef7a8da81a975e1ccd@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: syzbot+01ef7a8da81a975e1ccd@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20250711163202.19623-2-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-20sunvdc: Balance device refcount in vdc_port_mpgroup_checkMa Ke1-1/+3
commit 63ce53724637e2e7ba51fe3a4f78351715049905 upstream. Using device_find_child() to locate a probed virtual-device-port node causes a device refcount imbalance, as device_find_child() internally calls get_device() to increment the device’s reference count before returning its pointer. vdc_port_mpgroup_check() directly returns true upon finding a matching device without releasing the reference via put_device(). We should call put_device() to decrement refcount. As comment of device_find_child() says, 'NOTE: you will need to drop the reference with put_device() after use'. Found by code review. Cc: stable@vger.kernel.org Fixes: 3ee70591d6c4 ("sunvdc: prevent sunvdc panic when mpgroup disk added to guest domain") Signed-off-by: Ma Ke <make24@iscas.ac.cn> Link: https://lore.kernel.org/r/20250719075856.3447953-1-make24@iscas.ac.cn Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-15ublk: use vmalloc for ublk_device's __queuesCaleb Sander Mateos1-2/+2
[ Upstream commit c2f48453b7806d41f5a3270f206a5cd5640ed207 ] struct ublk_device's __queues points to an allocation with up to UBLK_MAX_NR_QUEUES (4096) queues, each of which have: - struct ublk_queue (48 bytes) - Tail array of up to UBLK_MAX_QUEUE_DEPTH (4096) struct ublk_io's, 32 bytes each This means the full allocation can exceed 512 MB, which may well be impossible to service with contiguous physical pages. Switch to kvcalloc() and kvfree(), since there is no need for physically contiguous memory. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver") Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250620151008.3976463-2-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-24loop: use kiocb helpers to fix lockdep warningMing Lei1-3/+2
[ Upstream commit c4706c5058a7bd7d7c20f3b24a8f523ecad44e83 ] The lockdep tool can report a circular lock dependency warning in the loop driver's AIO read/write path: ``` [ 6540.587728] kworker/u96:5/72779 is trying to acquire lock: [ 6540.593856] ff110001b5968440 (sb_writers#9){.+.+}-{0:0}, at: loop_process_work+0x11a/0xf70 [loop] [ 6540.603786] [ 6540.603786] but task is already holding lock: [ 6540.610291] ff110001b5968440 (sb_writers#9){.+.+}-{0:0}, at: loop_process_work+0x11a/0xf70 [loop] [ 6540.620210] [ 6540.620210] other info that might help us debug this: [ 6540.627499] Possible unsafe locking scenario: [ 6540.627499] [ 6540.634110] CPU0 [ 6540.636841] ---- [ 6540.639574] lock(sb_writers#9); [ 6540.643281] lock(sb_writers#9); [ 6540.646988] [ 6540.646988] *** DEADLOCK *** ``` This patch fixes the issue by using the AIO-specific helpers `kiocb_start_write()` and `kiocb_end_write()`. These functions are designed to be used with a `kiocb` and manage write sequencing correctly for asynchronous I/O without introducing the problematic lock dependency. The `kiocb` is already part of the `loop_cmd` struct, so this change also simplifies the completion function `lo_rw_aio_do_completion()` by using the `iocb` from the `cmd` struct directly, instead of retrieving the loop device from the request queue. Fixes: 39d86db34e41 ("loop: add file_start_write() and file_end_write()") Cc: Changhui Zhong <czhong@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250716114808.3159657-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-17ublk: sanity check add_dev input for underflowRonnie Sahlberg1-1/+2
[ Upstream commit 969127bf0783a4ac0c8a27e633a9e8ea1738583f ] Add additional checks that queue depth and number of queues are non-zero. Signed-off-by: Ronnie Sahlberg <rsahlberg@whamcloud.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250626022046.235018-1-ronniesahlberg@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-17nbd: fix uaf in nbd_genl_connect() error pathZheng Qixing1-3/+3
[ Upstream commit aa9552438ebf015fc5f9f890dbfe39f0c53cf37e ] There is a use-after-free issue in nbd: block nbd6: Receive control failed (result -104) block nbd6: shutting down sockets ================================================================== BUG: KASAN: slab-use-after-free in recv_work+0x694/0xa80 drivers/block/nbd.c:1022 Write of size 4 at addr ffff8880295de478 by task kworker/u33:0/67 CPU: 2 UID: 0 PID: 67 Comm: kworker/u33:0 Not tainted 6.15.0-rc5-syzkaller-00123-g2c89c1b655c0 #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 Workqueue: nbd6-recv recv_work Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:408 [inline] print_report+0xc3/0x670 mm/kasan/report.c:521 kasan_report+0xe0/0x110 mm/kasan/report.c:634 check_region_inline mm/kasan/generic.c:183 [inline] kasan_check_range+0xef/0x1a0 mm/kasan/generic.c:189 instrument_atomic_read_write include/linux/instrumented.h:96 [inline] atomic_dec include/linux/atomic/atomic-instrumented.h:592 [inline] recv_work+0x694/0xa80 drivers/block/nbd.c:1022 process_one_work+0x9cc/0x1b70 kernel/workqueue.c:3238 process_scheduled_works kernel/workqueue.c:3319 [inline] worker_thread+0x6c8/0xf10 kernel/workqueue.c:3400 kthread+0x3c2/0x780 kernel/kthread.c:464 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:153 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK> nbd_genl_connect() does not properly stop the device on certain error paths after nbd_start_device() has been called. This causes the error path to put nbd->config while recv_work continue to use the config after putting it, leading to use-after-free in recv_work. This patch moves nbd_start_device() after the backend file creation. Reported-by: syzbot+48240bab47e705c53126@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68227a04.050a0220.f2294.00b5.GAE@google.com/T/ Fixes: 6497ef8df568 ("nbd: provide a way for userspace processes to identify device backends") Signed-off-by: Zheng Qixing <zhengqixing@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20250612132405.364904-1-zhengqixing@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-10aoe: defer rexmit timer downdev work to workqueueJustin Sanders3-3/+11
[ Upstream commit cffc873d68ab09a0432b8212008c5613f8a70a2c ] When aoe's rexmit_timer() notices that an aoe target fails to respond to commands for more than aoe_deadsecs, it calls aoedev_downdev() which cleans the outstanding aoe and block queues. This can involve sleeping, such as in blk_mq_freeze_queue(), which should not occur in irq context. This patch defers that aoedev_downdev() call to the aoe device's workqueue. Link: https://bugzilla.kernel.org/show_bug.cgi?id=212665 Signed-off-by: Justin Sanders <jsanders.devel@gmail.com> Link: https://lore.kernel.org/r/20250610170600.869-2-jsanders.devel@gmail.com Tested-By: Valentin Kleibel <valentin@vrvis.at> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27ublk: santizize the arguments from userspace when adding a deviceRonnie Sahlberg1-0/+3
[ Upstream commit 8c8472855884355caf3d8e0c50adf825f83454b2 ] Sanity check the values for queue depth and number of queues we get from userspace when adding a device. Signed-off-by: Ronnie Sahlberg <rsahlberg@whamcloud.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver") Fixes: 62fe99cef94a ("ublk: add read()/write() support for ublk char device") Link: https://lore.kernel.org/r/20250619021031.181340-1-ronniesahlberg@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27aoe: clean device rq_list in aoedev_downdev()Justin Sanders1-0/+8
[ Upstream commit 7f90d45e57cb2ef1f0adcaf925ddffdfc5e680ca ] An aoe device's rq_list contains accepted block requests that are waiting to be transmitted to the aoe target. This queue was added as part of the conversion to blk_mq. However, the queue was not cleaned out when an aoe device is downed which caused blk_mq_freeze_queue() to sleep indefinitely waiting for those requests to complete, causing a hang. This fix cleans out the queue before calling blk_mq_freeze_queue(). Link: https://bugzilla.kernel.org/show_bug.cgi?id=212665 Fixes: 3582dd291788 ("aoe: convert aoeblk to blk-mq") Signed-off-by: Justin Sanders <jsanders.devel@gmail.com> Link: https://lore.kernel.org/r/20250610170600.869-1-jsanders.devel@gmail.com Tested-By: Valentin Kleibel <valentin@vrvis.at> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-19loop: add file_start_write() and file_end_write()Ming Lei1-2/+6
[ Upstream commit 39d86db34e41b96bd86f1955cd0ce6cd9c5fca4c ] file_start_write() and file_end_write() should be added around ->write_iter(). Recently we switch to ->write_iter() from vfs_iter_write(), and the implied file_start_write() and file_end_write() are lost. Also we never add them for dio code path, so add them back for covering both. Cc: Jeff Moyer <jmoyer@redhat.com> Fixes: f2fed441c69b ("loop: stop using vfs_iter_{read,write} for buffered I/O") Fixes: bc07c10a3603 ("block: loop: support DIO & AIO") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250527153405.837216-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-19brd: fix discard end sectorYu Kuai1-3/+6
[ Upstream commit a26a339a654b9403f0ee1004f1db4c2b2a355460 ] brd_do_discard() just aligned start sector to page, this can only work if the discard size if at least one page. For example: blkdiscard /dev/ram0 -o 5120 -l 1024 In this case, size = (1024 - (8192 - 5120)), which is a huge value. Fix the problem by round_down() the end sector. Fixes: 9ead7efc6f3f ("brd: implement discard support") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250506061756.2970934-4-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-19brd: fix aligned_sector from brd_do_discard()Yu Kuai1-1/+1
[ Upstream commit d4099f8893b057ad7e8d61df76bdeaf807ebd679 ] The calculation is just wrong, fix it by round_up(). Fixes: 9ead7efc6f3f ("brd: implement discard support") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250506061756.2970934-3-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-29loop: don't require ->write_iter for writable files in loop_configureChristoph Hellwig1-3/+0
[ Upstream commit 355341e4359b2d5edf0ed5e117f7e9e7a0a5dac0 ] Block devices can be opened read-write even if they can't be written to for historic reasons. Remove the check requiring file->f_op->write_iter when the block devices was opened in loop_configure. The call to loop_check_backing_file just below ensures the ->write_iter is present for backing files opened for writing, which is the only check that is actually needed. Fixes: f5c84eff634b ("loop: Add sanity check for read/write_iter") Reported-by: Christian Hesse <mail@eworm.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250520135420.1177312-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-29loop: check in LO_FLAGS_DIRECT_IO in loop_default_blocksizeChristoph Hellwig1-1/+1
[ Upstream commit f6f9e32fe1e454ae8ac0190b2c2bd6074914beec ] We can't go below the minimum direct I/O size no matter if direct I/O is enabled by passing in an O_DIRECT file descriptor or due to the explicit flag. Now that LO_FLAGS_DIRECT_IO is set earlier after assigning a backing file, loop_default_blocksize can check it instead of the O_DIRECT flag to handle both conditions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20250131120120.1315125-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-29ublk: complete command synchronously on errorCaleb Sander Mateos1-5/+6
[ Upstream commit 603f9be21c1894e462416e3324962d6c9c2b95f8 ] In case of an error, ublk's ->uring_cmd() functions currently return -EIOCBQUEUED and immediately call io_uring_cmd_done(). -EIOCBQUEUED and io_uring_cmd_done() are intended for asynchronous completions. For synchronous completions, the ->uring_cmd() function can just return the negative return code directly. This skips io_uring_cmd_del_cancelable(), and deferring the completion to task work. So return the error code directly from __ublk_ch_uring_cmd() and ublk_ctrl_uring_cmd(). Update ublk_ch_uring_cmd_cb(), which currently ignores the return value from __ublk_ch_uring_cmd(), to call io_uring_cmd_done() for synchronous completions. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20250225212456.2902549-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-29ublk: enforce ublks_max only for unprivileged devicesUday Shankar1-15/+27
[ Upstream commit 80bdfbb3545b6f16680a72c825063d08a6b44c7a ] Commit 403ebc877832 ("ublk_drv: add module parameter of ublks_max for limiting max allowed ublk dev"), claimed ublks_max was added to prevent a DoS situation with an untrusted user creating too many ublk devices. If that's the case, ublks_max should only restrict the number of unprivileged ublk devices in the system. Enforce the limit only for unprivileged ublk devices, and rename variables accordingly. Leave the external-facing parameter name unchanged, since changing it may break systems which use it (but still update its documentation to reflect its new meaning). As a result of this change, in a system where there are only normal (non-unprivileged) devices, the maximum number of such devices is increased to 1 << MINORBITS, or 1048576. That ought to be enough for anyone, right? Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250228-ublks_max-v1-1-04b7379190c0@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-18loop: Add sanity check for read/write_iterLizhi Xu1-0/+23
[ Upstream commit f5c84eff634ba003326aa034c414e2a9dcb7c6a7 ] Some file systems do not support read_iter/write_iter, such as selinuxfs in this issue. So before calling them, first confirm that the interface is supported and then call it. It is releavant in that vfs_iter_read/write have the check, and removal of their used caused szybot to be able to hit this issue. Fixes: f2fed441c69b ("loop: stop using vfs_iter__{read,write} for buffered I/O") Reported-by: syzbot+6af973a3b8dfd2faefdc@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6af973a3b8dfd2faefdc Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250428143626.3318717-1-lizhi.xu@windriver.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-18loop: factor out a loop_assign_backing_file helperChristoph Hellwig1-10/+10
[ Upstream commit d278164832618bf2775c6a89e6434e2633de1eed ] Split the code for setting up a backing file into a helper in preparation of adding more code to this path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20250131120120.1315125-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: f5c84eff634b ("loop: Add sanity check for read/write_iter") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-18loop: refactor queue limits updatesChristoph Hellwig1-16/+20
[ Upstream commit b38c8be255e89ffcdeb817407222d2de0b573a41 ] Replace loop_reconfigure_limits with a slightly less encompassing loop_update_limits that expects the caller to acquire and commit the queue limits to prepare for sorting out the freeze vs limits lock ordering. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Link: https://lore.kernel.org/r/20250110054726.1499538-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: f5c84eff634b ("loop: Add sanity check for read/write_iter") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-18loop: Fix ABBA locking raceOGAWA Hirofumi1-15/+15
[ Upstream commit b49125574cae26458d4aa02ce8f4523ba9a2a328 ] Current loop calls vfs_statfs() while holding the q->limits_lock. If FS takes some locking in vfs_statfs callback, this may lead to ABBA locking bug (at least, FAT fs has this issue actually). So this patch calls vfs_statfs() outside q->limits_locks instead, because looks like no reason to hold q->limits_locks while getting discord configs. Chain exists of: &sbi->fat_lock --> &q->q_usage_counter(io)#17 --> &q->limits_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&q->limits_lock); lock(&q->q_usage_counter(io)#17); lock(&q->limits_lock); lock(&sbi->fat_lock); *** DEADLOCK *** Reported-by: syzbot+a5d8c609c02f508672cc@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a5d8c609c02f508672cc Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: f5c84eff634b ("loop: Add sanity check for read/write_iter") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-18loop: Simplify discard granularity calcJohn Garry1-2/+1
[ Upstream commit d47de6ac8842327ae1c782670283450159c55d5b ] A bdev discard granularity is always at least SECTOR_SIZE, so don't check for a zero value. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241101092215.422428-1-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: f5c84eff634b ("loop: Add sanity check for read/write_iter") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-18loop: Use bdev limit helpers for configuring discardJohn Garry1-4/+4
[ Upstream commit 8d3fd059dd289e6c322e5741ad56794bcce699a2 ] Instead of directly looking at the request_queue limits, use the bdev limits helpers, which is preferable. Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241030111900.3981223-1-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: f5c84eff634b ("loop: Add sanity check for read/write_iter") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-25block: don't reorder requests in blk_add_rq_to_plugChristoph Hellwig1-1/+1
commit e70c301faece15b618e54b613b1fd6ece3dd05b4 upstream. Add requests to the tail of the list instead of the front so that they are queued up in submission order. Remove the re-reordering in blk_mq_dispatch_plug_list, virtio_queue_rqs and nvme_queue_rqs now that the list is ordered as expected. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241113152050.157179-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-25block: add a rq_list typeChristoph Hellwig2-12/+10
commit a3396b99990d8b4e5797e7b16fdeb64c15ae97bb upstream. Replace the semi-open coded request list helpers with a proper rq_list type that mirrors the bio_list and has head and tail pointers. Besides better type safety this actually allows to insert at the tail of the list, which will be useful soon. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241113152050.157179-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-25loop: LOOP_SET_FD: send uevents for partitionsThomas Weißschuh1-1/+2
commit 0dba7a05b9e47d8b546399117b0ddf2426dc6042 upstream. Remove the suppression of the uevents before scanning for partitions. The partitions inherit their suppression settings from their parent device, which lead to the uevents being dropped. This is similar to the same changes for LOOP_CONFIGURE done in commit bb430b694226 ("loop: LOOP_CONFIGURE: send uevents for partitions"). Fixes: 498ef5c777d9 ("loop: suppress uevents while reconfiguring the device") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20250415-loop-uevent-changed-v3-1-60ff69ac6088@linutronix.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-25loop: properly send KOBJ_CHANGED uevent for disk deviceThomas Weißschuh1-2/+2
commit e7bc0010ceb403d025100698586c8e760921d471 upstream. The original commit message and the wording "uncork" in the code comment indicate that it is expected that the suppressed event instances are automatically sent after unsuppressing. This is not the case, instead they are discarded. In effect this means that no "changed" events are emitted on the device itself by default. While each discovered partition does trigger a changed event on the device, devices without partitions don't have any event emitted. This makes udev miss the device creation and prompted workarounds in userspace. See the linked util-linux/losetup bug. Explicitly emit the events and drop the confusingly worded comments. Link: https://github.com/util-linux/util-linux/issues/2434 Fixes: 498ef5c777d9 ("loop: suppress uevents while reconfiguring the device") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://lore.kernel.org/r/20250415-loop-uevent-changed-v2-1-0c4e6a923b2a@linutronix.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-25loop: stop using vfs_iter_{read,write} for buffered I/OChristoph Hellwig1-95/+17
[ Upstream commit f2fed441c69b9237760840a45a004730ff324faf ] vfs_iter_{read,write} always perform direct I/O when the file has the O_DIRECT flag set, which breaks disabling direct I/O using the LOOP_SET_STATUS / LOOP_SET_STATUS64 ioctls. This was recenly reported as a regression, but as far as I can tell was only uncovered by better checking for block sizes and has been around since the direct I/O support was added. Fix this by using the existing aio code that calls the raw read/write iter methods instead. Note that despite the comments there is no need for block drivers to ever call flush_dcache_page themselves, and the call is a left-over from prehistoric times. Fixes: ab1cb278bc70 ("block: loop: introduce ioctl command of LOOP_SET_DIRECT_IO") Reported-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20250409130940.3685677-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-25loop: aio inherit the ioprio of original requestYunlong Xing1-1/+1
[ Upstream commit 1fdb8188c3d505452b40cdb365b1bb32be533a8e ] Set cmd->iocb.ki_ioprio to the ioprio of loop device's request. The purpose is to inherit the original request ioprio in the aio flow. Signed-off-by: Yunlong Xing <yunlong.xing@unisoc.com> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250414030159.501180-1-yunlong.xing@unisoc.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: f2fed441c69b ("loop: stop using vfs_iter_{read,write} for buffered I/O") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20ublk: fix handling recovery & reissue in ublk_abort_queue()Ming Lei1-4/+26
[ Upstream commit 6ee6bd5d4fce502a5b5a2ea805e9ff16e6aa890f ] Commit 8284066946e6 ("ublk: grab request reference when the request is handled by userspace") doesn't grab request reference in case of recovery reissue. Then the request can be requeued & re-dispatch & failed when canceling uring command. If it is one zc request, the request can be freed before io_uring returns the zc buffer back, then cause kernel panic: [ 126.773061] BUG: kernel NULL pointer dereference, address: 00000000000000c8 [ 126.773657] #PF: supervisor read access in kernel mode [ 126.774052] #PF: error_code(0x0000) - not-present page [ 126.774455] PGD 0 P4D 0 [ 126.774698] Oops: Oops: 0000 [#1] SMP NOPTI [ 126.775034] CPU: 13 UID: 0 PID: 1612 Comm: kworker/u64:55 Not tainted 6.14.0_blk+ #182 PREEMPT(full) [ 126.775676] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39 04/01/2014 [ 126.776275] Workqueue: iou_exit io_ring_exit_work [ 126.776651] RIP: 0010:ublk_io_release+0x14/0x130 [ublk_drv] Fixes it by always grabbing request reference for aborting the request. Reported-by: Caleb Sander Mateos <csander@purestorage.com> Closes: https://lore.kernel.org/linux-block/CADUfDZodKfOGUeWrnAxcZiLT+puaZX8jDHoj_sfHZCOZwhzz6A@mail.gmail.com/ Fixes: 8284066946e6 ("ublk: grab request reference when the request is handled by userspace") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250409011444.2142010-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>