summaryrefslogtreecommitdiff
path: root/drivers/block
AgeCommit message (Collapse)AuthorFilesLines
2025-03-13nbd: don't allow reconnect after disconnectYu Kuai1-0/+1
[ Upstream commit 844b8cdc681612ff24df62cdefddeab5772fadf1 ] Following process can cause nbd_config UAF: 1) grab nbd_config temporarily; 2) nbd_genl_disconnect() flush all recv_work() and release the initial reference: nbd_genl_disconnect nbd_disconnect_and_put nbd_disconnect flush_workqueue(nbd->recv_workq) if (test_and_clear_bit(NBD_RT_HAS_CONFIG_REF, ...)) nbd_config_put -> due to step 1), reference is still not zero 3) nbd_genl_reconfigure() queue recv_work() again; nbd_genl_reconfigure config = nbd_get_config_unlocked(nbd) if (!config) -> succeed if (!test_bit(NBD_RT_BOUND, ...)) -> succeed nbd_reconnect_socket queue_work(nbd->recv_workq, &args->work) 4) step 1) release the reference; 5) Finially, recv_work() will trigger UAF: recv_work nbd_config_put(nbd) -> nbd_config is freed atomic_dec(&config->recv_threads) -> UAF Fix the problem by clearing NBD_RT_BOUND in nbd_genl_disconnect(), so that nbd_genl_reconfigure() will fail. Fixes: b7aa3d39385d ("nbd: add a reconfigure netlink command") Reported-by: syzbot+6b0df248918b92c33e6a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/675bfb65.050a0220.1a2d0d.0006.GAE@google.com/ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250103092859.3574648-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-09virtio-blk: don't keep queue frozen during system suspendMing Lei1-2/+5
[ Upstream commit 7678abee0867e6b7fb89aa40f6e9f575f755fb37 ] Commit 4ce6e2db00de ("virtio-blk: Ensure no requests in virtqueues before deleting vqs.") replaces queue quiesce with queue freeze in virtio-blk's PM callbacks. And the motivation is to drain inflight IOs before suspending. block layer's queue freeze looks very handy, but it is also easy to cause deadlock, such as, any attempt to call into bio_queue_enter() may run into deadlock if the queue is frozen in current context. There are all kinds of ->suspend() called in suspend context, so keeping queue frozen in the whole suspend context isn't one good idea. And Marek reported lockdep warning[1] caused by virtio-blk's freeze queue in virtblk_freeze(). [1] https://lore.kernel.org/linux-block/ca16370e-d646-4eee-b9cc-87277c89c43c@samsung.com/ Given the motivation is to drain in-flight IOs, it can be done by calling freeze & unfreeze, meantime restore to previous behavior by keeping queue quiesced during suspend. Cc: Yi Sun <yi.sun@unisoc.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: virtualization@lists.linux.dev Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20241112125821.1475793-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-09zram: refuse to use zero sized block device as backing deviceKairui Song1-0/+6
commit be48c412f6ebf38849213c19547bc6d5b692b5e5 upstream. Patch series "zram: fix backing device setup issue", v2. This series fixes two bugs of backing device setting: - ZRAM should reject using a zero sized (or the uninitialized ZRAM device itself) as the backing device. - Fix backing device leaking when removing a uninitialized ZRAM device. This patch (of 2): Setting a zero sized block device as backing device is pointless, and one can easily create a recursive loop by setting the uninitialized ZRAM device itself as its own backing device by (zram0 is uninitialized): echo /dev/zram0 > /sys/block/zram0/backing_dev It's definitely a wrong config, and the module will pin itself, kernel should refuse doing so in the first place. By refusing to use zero sized device we avoided misuse cases including this one above. Link: https://lkml.kernel.org/r/20241209165717.94215-1-ryncsn@gmail.com Link: https://lkml.kernel.org/r/20241209165717.94215-2-ryncsn@gmail.com Fixes: 013bf95a83ec ("zram: add interface to specif backing device") Signed-off-by: Kairui Song <kasong@tencent.com> Reported-by: Desheng Wu <deshengwu@tencent.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-08aoe: fix the potential use-after-free problem in more placesChun-Yi Lee1-1/+12
commit 6d6e54fc71ad1ab0a87047fd9c211e75d86084a3 upstream. For fixing CVE-2023-6270, f98364e92662 ("aoe: fix the potential use-after-free problem in aoecmd_cfg_pkts") makes tx() calling dev_put() instead of doing in aoecmd_cfg_pkts(). It avoids that the tx() runs into use-after-free. Then Nicolai Stange found more places in aoe have potential use-after-free problem with tx(). e.g. revalidate(), aoecmd_ata_rw(), resend(), probe() and aoecmd_cfg_rsp(). Those functions also use aoenet_xmit() to push packet to tx queue. So they should also use dev_hold() to increase the refcnt of skb->dev. On the other hand, moving dev_put() to tx() causes that the refcnt of skb->dev be reduced to a negative value, because corresponding dev_hold() are not called in revalidate(), aoecmd_ata_rw(), resend(), probe(), and aoecmd_cfg_rsp(). This patch fixed this issue. Cc: stable@vger.kernel.org Link: https://nvd.nist.gov/vuln/detail/CVE-2023-6270 Fixes: f98364e92662 ("aoe: fix the potential use-after-free problem in aoecmd_cfg_pkts") Reported-by: Nicolai Stange <nstange@suse.com> Signed-off-by: Chun-Yi Lee <jlee@suse.com> Link: https://lore.kernel.org/stable/20240624064418.27043-1-jlee%40suse.com Link: https://lore.kernel.org/r/20241002035458.24401-1-jlee@suse.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-08drbd: Add NULL check for net_conf to prevent dereference in state validationMikhail Lobanov1-1/+1
commit a5e61b50c9f44c5edb6e134ede6fee8806ffafa9 upstream. If the net_conf pointer is NULL and the code attempts to access its fields without a check, it will lead to a null pointer dereference. Add a NULL check before dereferencing the pointer. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 44ed167da748 ("drbd: rcu_read_lock() and rcu_dereference() for tconn->net_conf") Cc: stable@vger.kernel.org Signed-off-by: Mikhail Lobanov <m.lobanov@rosalinux.ru> Link: https://lore.kernel.org/r/20240909133740.84297-1-m.lobanov@rosalinux.ru Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-08drbd: Fix atomicity violation in drbd_uuid_set_bm()Qiu-ji Chen1-2/+4
commit 2f02b5af3a4482b216e6a466edecf6ba8450fa45 upstream. The violation of atomicity occurs when the drbd_uuid_set_bm function is executed simultaneously with modifying the value of device->ldev->md.uuid[UI_BITMAP]. Consider a scenario where, while device->ldev->md.uuid[UI_BITMAP] passes the validity check when its value is not zero, the value of device->ldev->md.uuid[UI_BITMAP] is written to zero. In this case, the check in drbd_uuid_set_bm might refer to the old value of device->ldev->md.uuid[UI_BITMAP] (before locking), which allows an invalid value to pass the validity check, resulting in inconsistency. To address this issue, it is recommended to include the data validity check within the locked section of the function. This modification ensures that the value of device->ldev->md.uuid[UI_BITMAP] does not change during the validation process, thereby maintaining its integrity. This possible bug is found by an experimental static analysis tool developed by our team. This tool analyzes the locking APIs to extract function pairs that can be concurrently executed, and then analyzes the instructions in the paired functions to identify possible concurrency bugs including data races and atomicity violations. Fixes: 9f2247bb9b75 ("drbd: Protect accesses to the uuid set with a spinlock") Cc: stable@vger.kernel.org Signed-off-by: Qiu-ji Chen <chenqiuji666@gmail.com> Reviewed-by: Philipp Reisner <philipp.reisner@linbit.com> Link: https://lore.kernel.org/r/20240913083504.10549-1-chenqiuji666@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-19rbd: don't assume RBD_LOCK_STATE_LOCKED for exclusive mappingsIlya Dryomov1-5/+5
commit 2237ceb71f89837ac47c5dce2aaa2c2b3a337a3c upstream. Every time a watch is reestablished after getting lost, we need to update the cookie which involves quiescing exclusive lock. For this, we transition from RBD_LOCK_STATE_LOCKED to RBD_LOCK_STATE_QUIESCING roughly for the duration of rbd_reacquire_lock() call. If the mapping is exclusive and I/O happens to arrive in this time window, it's failed with EROFS (later translated to EIO) based on the wrong assumption in rbd_img_exclusive_lock() -- "lock got released?" check there stopped making sense with commit a2b1da09793d ("rbd: lock should be quiesced on reacquire"). To make it worse, any such I/O is added to the acquiring list before EROFS is returned and this sets up for violating rbd_lock_del_request() precondition that the request is either on the running list or not on any list at all -- see commit ded080c86b3f ("rbd: don't move requests to the running list on errors"). rbd_lock_del_request() ends up processing these requests as if they were on the running list which screws up quiescing_wait completion counter and ultimately leads to rbd_assert(!completion_done(&rbd_dev->quiescing_wait)); being triggered on the next watch error. Cc: stable@vger.kernel.org # 06ef84c4e9c4: rbd: rename RBD_LOCK_STATE_RELEASING and releasing_wait Cc: stable@vger.kernel.org Fixes: 637cd060537d ("rbd: new exclusive lock wait/wake code") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-19rbd: rename RBD_LOCK_STATE_RELEASING and releasing_waitIlya Dryomov1-10/+10
commit f5c466a0fdb2d9f3650d2e3911b0735f17ba00cf upstream. ... to RBD_LOCK_STATE_QUIESCING and quiescing_wait to recognize that this state and the associated completion are backing rbd_quiesce_lock(), which isn't specific to releasing the lock. While exclusive lock does get quiesced before it's released, it also gets quiesced before an attempt to update the cookie is made and there the lock is not released as long as ceph_cls_set_cookie() succeeds. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-19rbd: don't assume rbd_is_lock_owner() for exclusive mappingsIlya Dryomov1-5/+0
commit 3ceccb14f5576e02b81cc8b105ab81f224bd87f6 upstream. Expanding on the previous commit, assuming that rbd_is_lock_owner() always returns true (i.e. that we are either in RBD_LOCK_STATE_LOCKED or RBD_LOCK_STATE_QUIESCING) if the mapping is exclusive is wrong too. In case ceph_cls_set_cookie() fails, the lock would be temporarily released even if the mapping is exclusive, meaning that we can end up even in RBD_LOCK_STATE_UNLOCKED. IOW, exclusive mappings are really "just" about disabling automatic lock transitions (as documented in the man page), not about grabbing the lock and holding on to it whatever it takes. Cc: stable@vger.kernel.org Fixes: 637cd060537d ("rbd: new exclusive lock wait/wake code") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-16null_blk: Fix the WARNING: modpost: missing MODULE_DESCRIPTION()Zhu Yanjun1-0/+1
[ Upstream commit 9e6727f824edcdb8fdd3e6e8a0862eb49546e1cd ] No functional changes intended. Fixes: f2298c0403b0 ("null_blk: multi queue aware block test driver") Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20240506075538.6064-1-yanjun.zhu@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-16null_blk: Fix missing mutex_destroy() at module removalZhu Yanjun1-0/+2
[ Upstream commit 07d1b99825f40f9c0d93e6b99d79a08d0717bac1 ] When a mutex lock is not used any more, the function mutex_destroy should be called to mark the mutex lock uninitialized. Fixes: f2298c0403b0 ("null_blk: multi queue aware block test driver") Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20240425171635.4227-1-yanjun.zhu@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-13loop: loop_set_status_from_info() check before assignmentZhong Jinghua1-4/+4
[ Upstream commit 9f6ad5d533d1c71e51bdd06a5712c4fbc8768dfa ] In loop_set_status_from_info(), lo->lo_offset and lo->lo_sizelimit should be checked before reassignment, because if an overflow error occurs, the original correct value will be changed to the wrong value, and it will not be changed back. More, the original patch did not solve the problem, the value was set and ioctl returned an error, but the subsequent io used the value in the loop driver, which still caused an alarm: loop_handle_cmd do_req_filebacked loff_t pos = ((loff_t) blk_rq_pos(rq) << 9) + lo->lo_offset; lo_rw_aio cmd->iocb.ki_pos = pos Fixes: c490a0b5a4f3 ("loop: Check for overflow while configuring loop") Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20230221095027.3656193-1-zhongjinghua@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-13loop: Check for overflow while configuring loopSiddh Raman Pant1-0/+5
[ Upstream commit c490a0b5a4f36da3918181a8acdc6991d967c5f3 ] The userspace can configure a loop using an ioctl call, wherein a configuration of type loop_config is passed (see lo_ioctl()'s case on line 1550 of drivers/block/loop.c). This proceeds to call loop_configure() which in turn calls loop_set_status_from_info() (see line 1050 of loop.c), passing &config->info which is of type loop_info64*. This function then sets the appropriate values, like the offset. loop_device has lo_offset of type loff_t (see line 52 of loop.c), which is typdef-chained to long long, whereas loop_info64 has lo_offset of type __u64 (see line 56 of include/uapi/linux/loop.h). The function directly copies offset from info to the device as follows (See line 980 of loop.c): lo->lo_offset = info->lo_offset; This results in an overflow, which triggers a warning in iomap_iter() due to a call to iomap_iter_done() which has: WARN_ON_ONCE(iter->iomap.offset > iter->pos); Thus, check for negative value during loop_set_status_from_info(). Bug report: https://syzkaller.appspot.com/bug?id=c620fe14aac810396d3c3edc9ad73848bf69a29e Reported-and-tested-by: syzbot+a8e049cd3abd342936b6@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Siddh Raman Pant <code@siddh.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220823160810.181275-1-code@siddh.me Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-13loop: Factor out configuring loop from statusMartijn Coenen1-50/+67
[ Upstream commit 0c3796c244598122a5d59d56f30d19390096817f ] Factor out this code into a separate function, so it can be reused by other code more easily. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-13loop: Refactor loop_set_status() size calculationMartijn Coenen1-18/+19
[ Upstream commit b0bd158dd630bd47640e0e418c062cda1e0da5ad ] figure_loop_size() calculates the loop size based on the passed in parameters, but at the same time it updates the offset and sizelimit parameters in the loop device configuration. That is a somewhat unexpected side effect of a function with this name, and it is only only needed by one of the two callers of this function - loop_set_status(). Move the lo_offset and lo_sizelimit assignment back into loop_set_status(), and use the newly factored out functions to validate and apply the newly calculated size. This allows us to get rid of figure_loop_size() in a follow-up commit. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-13loop: Factor out setting loop device sizeMartijn Coenen1-9/+21
[ Upstream commit 5795b6f5607f7e4db62ddea144727780cb351a9b ] This code is used repeatedly. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-13loop: Remove sector_t truncation checksMartijn Coenen1-14/+7
[ Upstream commit 083a6a50783ef54256eec3499e6575237e0e3d53 ] sector_t is now always u64, so we don't need to check for truncation. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-13loop: Call loop_config_discard() only after new config is appliedMartijn Coenen1-2/+2
[ Upstream commit 7c5014b0987a30e4989c90633c198aced454c0ec ] loop_set_status() calls loop_config_discard() to configure discard for the loop device; however, the discard configuration depends on whether the loop device uses encryption, and when we call it the encryption configuration has not been updated yet. Move the call down so we apply the correct discard configuration based on the new configuration. Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-13Revert "loop: Check for overflow while configuring loop"Genjian Zhang1-5/+0
This reverts commit 13b2856037a651ba3ab4a8b25ecab3e791926da3. This patch lost a unlock loop_ctl_mutex in loop_get_status(...), which caused syzbot to report a UAF issue.The upstream patch does not have this issue. Therefore, we revert this patch and directly apply the upstream patch later on. Risk use-after-free as reported by syzbot: [ 84.669496] ================================================================== [ 84.670021] BUG: KASAN: use-after-free in __mutex_lock.isra.9+0xc13/0xcb0 [ 84.670433] Read of size 4 at addr ffff88808dba43b8 by task syz-executor.22/14230 [ 84.670885] [ 84.670987] CPU: 1 PID: 14230 Comm: syz-executor.22 Not tainted 5.4.270 #4 [ 84.671397] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1kylin1 04/01/2014 [ 84.671927] Call Trace: [ 84.672085] dump_stack+0x94/0xc7 [ 84.672293] ? __mutex_lock.isra.9+0xc13/0xcb0 [ 84.672569] print_address_description.constprop.6+0x16/0x220 [ 84.672915] ? __mutex_lock.isra.9+0xc13/0xcb0 [ 84.673187] ? __mutex_lock.isra.9+0xc13/0xcb0 [ 84.673462] __kasan_report.cold.9+0x1a/0x32 [ 84.673723] ? __mutex_lock.isra.9+0xc13/0xcb0 [ 84.673993] kasan_report+0x10/0x20 [ 84.674208] __mutex_lock.isra.9+0xc13/0xcb0 [ 84.674468] ? __mutex_lock.isra.9+0x617/0xcb0 [ 84.674739] ? ww_mutex_lock_interruptible+0x100/0x100 [ 84.675055] ? ww_mutex_lock_interruptible+0x100/0x100 [ 84.675369] ? kobject_get_unless_zero+0x144/0x190 [ 84.675668] ? kobject_del+0x60/0x60 [ 84.675893] ? __module_get+0x120/0x120 [ 84.676128] ? __mutex_lock_slowpath+0x10/0x10 [ 84.676399] mutex_lock_killable+0xde/0xf0 [ 84.676652] ? __mutex_lock_killable_slowpath+0x10/0x10 [ 84.676967] ? __mutex_lock_slowpath+0x10/0x10 [ 84.677243] ? disk_block_events+0x1d/0x120 [ 84.677509] lo_open+0x16/0xc0 [ 84.677701] ? lo_compat_ioctl+0x160/0x160 [ 84.677954] __blkdev_get+0xb0f/0x1160 [ 84.678185] ? bd_may_claim+0xd0/0xd0 [ 84.678410] ? bdev_disk_changed+0x190/0x190 [ 84.678674] ? _raw_spin_lock+0x7c/0xd0 [ 84.678915] ? _raw_write_lock_bh+0xd0/0xd0 [ 84.679172] blkdev_get+0x9b/0x290 [ 84.679381] ? ihold+0x1a/0x40 [ 84.679574] blkdev_open+0x1bd/0x240 [ 84.679794] do_dentry_open+0x439/0x1000 [ 84.680035] ? blkdev_get_by_dev+0x60/0x60 [ 84.680286] ? __x64_sys_fchdir+0x1a0/0x1a0 [ 84.680557] ? inode_permission+0x86/0x320 [ 84.680814] path_openat+0x998/0x4120 [ 84.681044] ? stack_trace_consume_entry+0x160/0x160 [ 84.681348] ? do_futex+0x136/0x1880 [ 84.681568] ? path_mountpoint+0xb50/0xb50 [ 84.681823] ? save_stack+0x4d/0x80 [ 84.682038] ? save_stack+0x19/0x80 [ 84.682253] ? __kasan_kmalloc.constprop.6+0xc1/0xd0 [ 84.682553] ? kmem_cache_alloc+0xc7/0x210 [ 84.682804] ? getname_flags+0xc4/0x560 [ 84.683045] ? do_sys_open+0x1ce/0x450 [ 84.683272] ? do_syscall_64+0x9a/0x330 [ 84.683509] ? entry_SYSCALL_64_after_hwframe+0x5c/0xc1 [ 84.683826] ? _raw_spin_lock+0x7c/0xd0 [ 84.684063] ? _raw_write_lock_bh+0xd0/0xd0 [ 84.684319] ? futex_exit_release+0x60/0x60 [ 84.684574] ? kasan_unpoison_shadow+0x30/0x40 [ 84.684844] ? __kasan_kmalloc.constprop.6+0xc1/0xd0 [ 84.685149] ? get_partial_node.isra.83.part.84+0x1e5/0x340 [ 84.685485] ? __fget_light+0x1d1/0x550 [ 84.685721] do_filp_open+0x197/0x270 [ 84.685946] ? may_open_dev+0xd0/0xd0 [ 84.686172] ? kasan_unpoison_shadow+0x30/0x40 [ 84.686443] ? __kasan_kmalloc.constprop.6+0xc1/0xd0 [ 84.686743] ? __alloc_fd+0x1a3/0x580 [ 84.686973] do_sys_open+0x2c7/0x450 [ 84.687195] ? filp_open+0x60/0x60 [ 84.687406] ? __x64_sys_timer_settime32+0x280/0x280 [ 84.687707] do_syscall_64+0x9a/0x330 [ 84.687931] ? syscall_return_slowpath+0x17a/0x230 [ 84.688221] entry_SYSCALL_64_after_hwframe+0x5c/0xc1 [ 84.688524] [ 84.688622] Allocated by task 14056: [ 84.688842] save_stack+0x19/0x80 [ 84.689044] __kasan_kmalloc.constprop.6+0xc1/0xd0 [ 84.689333] kmem_cache_alloc_node+0xe2/0x230 [ 84.689600] copy_process+0x165c/0x72d0 [ 84.689833] _do_fork+0xf9/0x9a0 [ 84.690032] __x64_sys_clone+0x17a/0x200 [ 84.690271] do_syscall_64+0x9a/0x330 [ 84.690496] entry_SYSCALL_64_after_hwframe+0x5c/0xc1 [ 84.690800] [ 84.690903] Freed by task 0: [ 84.691081] save_stack+0x19/0x80 [ 84.691287] __kasan_slab_free+0x125/0x170 [ 84.691535] kmem_cache_free+0x7a/0x2a0 [ 84.691774] __put_task_struct+0x1ec/0x4a0 [ 84.692023] delayed_put_task_struct+0x178/0x1d0 [ 84.692303] rcu_core+0x538/0x16c0 [ 84.692512] __do_softirq+0x175/0x63d [ 84.692741] [ 84.692840] The buggy address belongs to the object at ffff88808dba4380 [ 84.692840] which belongs to the cache task_struct of size 3328 [ 84.693584] The buggy address is located 56 bytes inside of [ 84.693584] 3328-byte region [ffff88808dba4380, ffff88808dba5080) [ 84.694272] The buggy address belongs to the page: [ 84.694563] page:ffffea000236e800 refcount:1 mapcount:0 mapping:ffff8881838acdc0 index:0x0 compound_mapcount: 0 [ 84.695166] flags: 0x100000000010200(slab|head) [ 84.695457] raw: 0100000000010200 dead000000000100 dead000000000122 ffff8881838acdc0 [ 84.695919] raw: 0000000000000000 0000000000090009 00000001ffffffff 0000000000000000 [ 84.696375] page dumped because: kasan: bad access detected [ 84.696705] [ 84.696801] Memory state around the buggy address: [ 84.697089] ffff88808dba4280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 84.697519] ffff88808dba4300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 84.697945] >ffff88808dba4380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 84.698371] ^ [ 84.698674] ffff88808dba4400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 84.699111] ffff88808dba4480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 84.699537] ================================================================== [ 84.699965] Disabling lock debugging due to kernel taint Reported-by: k2ci <kernel-bot@kylinos.cn> Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-27aoe: fix the potential use-after-free problem in aoecmd_cfg_pktsChun-Yi Lee2-6/+7
[ Upstream commit f98364e926626c678fb4b9004b75cacf92ff0662 ] This patch is against CVE-2023-6270. The description of cve is: A flaw was found in the ATA over Ethernet (AoE) driver in the Linux kernel. The aoecmd_cfg_pkts() function improperly updates the refcnt on `struct net_device`, and a use-after-free can be triggered by racing between the free on the struct and the access through the `skbtxq` global queue. This could lead to a denial of service condition or potential code execution. In aoecmd_cfg_pkts(), it always calls dev_put(ifp) when skb initial code is finished. But the net_device ifp will still be used in later tx()->dev_queue_xmit() in kthread. Which means that the dev_put(ifp) should NOT be called in the success path of skb initial code in aoecmd_cfg_pkts(). Otherwise tx() may run into use-after-free because the net_device is freed. This patch removed the dev_put(ifp) in the success path in aoecmd_cfg_pkts(), and added dev_put() after skb xmit in tx(). Link: https://nvd.nist.gov/vuln/detail/CVE-2023-6270 Fixes: 7562f876cd93 ("[NET]: Rework dev_base via list_head (v3)") Signed-off-by: Chun-Yi Lee <jlee@suse.com> Link: https://lore.kernel.org/r/20240305082048.25526-1-jlee@suse.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-27nbd: null check for nla_nest_startNavid Emamdoost1-0/+6
[ Upstream commit 31edf4bbe0ba27fd03ac7d87eb2ee3d2a231af6d ] nla_nest_start() may fail and return NULL. Insert a check and set errno based on other call sites within the same source code. Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Fixes: 47d902b90a32 ("nbd: add a status netlink command") Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240218042534.it.206-kees@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01virtio-blk: Ensure no requests in virtqueues before deleting vqs.Yi Sun1-3/+4
[ Upstream commit 4ce6e2db00de8103a0687fb0f65fd17124a51aaa ] Ensure no remaining requests in virtqueues before resetting vdev and deleting virtqueues. Otherwise these requests will never be completed. It may cause the system to become unresponsive. Function blk_mq_quiesce_queue() can ensure that requests have become in_flight status, but it cannot guarantee that requests have been processed by the device. Virtqueues should never be deleted before all requests become complete status. Function blk_mq_freeze_queue() ensure that all requests in virtqueues become complete status. And no requests can enter in virtqueues. Signed-off-by: Yi Sun <yi.sun@unisoc.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20240129085250.1550594-1-yi.sun@unisoc.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23rbd: don't move requests to the running list on errorsIlya Dryomov1-8/+14
commit ded080c86b3f99683774af0441a58fc2e3d60cae upstream. The running list is supposed to contain requests that are pinning the exclusive lock, i.e. those that must be flushed before exclusive lock is released. When wake_lock_waiters() is called to handle an error, requests on the acquiring list are failed with that error and no flushing takes place. Briefly moving them to the running list is not only pointless but also harmful: if exclusive lock gets acquired before all of their state machines are scheduled and go through rbd_lock_del_request(), we trigger rbd_assert(list_empty(&rbd_dev->running_list)); in rbd_try_acquire_lock(). Cc: stable@vger.kernel.org Fixes: 637cd060537d ("rbd: new exclusive lock wait/wake code") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-08remove the sx8 block driverChristoph Hellwig3-1597/+0
commit d13bc4d84a8e91060d3797fc95c1a0202bfd1499 upstream. This driver is for fairly obscure hardware, and has only seen random drive-by changes after the maintainer stopped working on it in 2005 (about a year and a half after it was introduced). It has some "interesting" block layer interactions, so let's just drop it unless anyone complains. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220721064102.1715460-1-hch@lst.de [axboe: fix date typo, it was in 2005, not 2015] Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-10rbd: take header_rwsem in rbd_dev_refresh() only when updatingIlya Dryomov1-15/+16
commit 0b207d02bd9ab8dcc31b262ca9f60dbc1822500d upstream. rbd_dev_refresh() has been holding header_rwsem across header and parent info read-in unnecessarily for ages. With commit 870611e4877e ("rbd: get snapshot context after exclusive lock is ensured to be held"), the potential for deadlocks became much more real owning to a) header_rwsem now nesting inside lock_rwsem and b) rw_semaphores not allowing new readers after a writer is registered. For example, assuming that I/O request 1, I/O request 2 and header read-in request all target the same OSD: 1. I/O request 1 comes in and gets submitted 2. watch error occurs 3. rbd_watch_errcb() takes lock_rwsem for write, clears owner_cid and releases lock_rwsem 4. after reestablishing the watch, rbd_reregister_watch() calls rbd_dev_refresh() which takes header_rwsem for write and submits a header read-in request 5. I/O request 2 comes in: after taking lock_rwsem for read in __rbd_img_handle_request(), it blocks trying to take header_rwsem for read in rbd_img_object_requests() 6. another watch error occurs 7. rbd_watch_errcb() blocks trying to take lock_rwsem for write 8. I/O request 1 completion is received by the messenger but can't be processed because lock_rwsem won't be granted anymore 9. header read-in request completion can't be received, let alone processed, because the messenger is stranded Change rbd_dev_refresh() to take header_rwsem only for actually updating rbd_dev->header. Header and parent info read-in don't need any locking. Cc: stable@vger.kernel.org # 0b035401c570: rbd: move rbd_dev_refresh() definition Cc: stable@vger.kernel.org # 510a7330c82a: rbd: decouple header read-in from updating rbd_dev->header Cc: stable@vger.kernel.org # c10311776f0a: rbd: decouple parent info read-in from updating rbd_dev Cc: stable@vger.kernel.org Fixes: 870611e4877e ("rbd: get snapshot context after exclusive lock is ensured to be held") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn> [idryomov@gmail.com: backport to 5.4: open-code rbd_is_snap(), preserve rbd_exists_validate() call] Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-10rbd: decouple parent info read-in from updating rbd_devIlya Dryomov1-62/+80
commit c10311776f0a8ddea2276df96e255625b07045a8 upstream. Unlike header read-in, parent info read-in is already decoupled in get_parent_info(), but it's buried in rbd_dev_v2_parent_info() along with the processing logic. Separate the initial read-in and update read-in logic into rbd_dev_setup_parent() and rbd_dev_update_parent() respectively and have rbd_dev_v2_parent_info() just populate struct parent_image_info (i.e. what get_parent_info() did). Some existing QoI issues, like flatten of a standalone clone being disregarded on refresh, remain. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn> [idryomov@gmail.com: backport to 5.4: context] Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-10rbd: decouple header read-in from updating rbd_dev->headerIlya Dryomov1-91/+114
commit 510a7330c82a7754d5df0117a8589e8a539067c7 upstream. Make rbd_dev_header_info() populate a passed struct rbd_image_header instead of rbd_dev->header and introduce rbd_dev_update_header() for updating mutable fields in rbd_dev->header upon refresh. The initial read-in of both mutable and immutable fields in rbd_dev_image_probe() passes in rbd_dev->header so no update step is required there. rbd_init_layout() is now called directly from rbd_dev_image_probe() instead of individually in format 1 and format 2 implementations. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn> [idryomov@gmail.com: backport to 5.4: _rbd_dev_v2_snap_features() doesn't have read_only param] Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-10rbd: move rbd_dev_refresh() definitionIlya Dryomov1-39/+37
commit 0b035401c57021fc6c300272cbb1c5a889d4fe45 upstream. Move rbd_dev_refresh() definition further down to avoid having to move struct parent_image_info definition in the next commit. This spares some forward declarations too. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn> [idryomov@gmail.com: backport to 5.4: drop rbd_is_snap() assert, preserve rbd_exists_validate() call] Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-11loop: Select I/O scheduler 'none' from inside add_disk()Bart Van Assche1-1/+1
commit 2112f5c1330a671fa852051d85cb9eadc05d7eb7 upstream. We noticed that the user interface of Android devices becomes very slow under memory pressure. This is because Android uses the zram driver on top of the loop driver for swapping, because under memory pressure the swap code alternates reads and writes quickly, because mq-deadline is the default scheduler for loop devices and because mq-deadline delays writes by five seconds for such a workload with default settings. Fix this by making the kernel select I/O scheduler 'none' from inside add_disk() for loop devices. This default can be overridden at any time from user space, e.g. via a udev rule. This approach has an advantage compared to changing the I/O scheduler from userspace from 'mq-deadline' into 'none', namely that synchronize_rcu() does not get called. This patch changes the default I/O scheduler for loop devices from 'mq-deadline' into 'none'. Additionally, this patch reduces the Android boot time on my test setup with 0.5 seconds compared to configuring the loop I/O scheduler from user space. Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Martijn Coenen <maco@android.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210805174200.3250718-3-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-27nbd: Add the maximum limit of allocated index in nbd_dev_addZhong Jinghua1-1/+2
[ Upstream commit f12bc113ce904777fd6ca003b473b427782b3dde ] If the index allocated by idr_alloc greater than MINORMASK >> part_shift, the device number will overflow, resulting in failure to create a block device. Fix it by imiting the size of the max allocation. Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230605122159.2134384-1-zhongjinghua@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-21xen/blkfront: Only check REQ_FUA for writesRoss Lagerwall1-1/+2
[ Upstream commit b6ebaa8100090092aa602530d7e8316816d0c98d ] The existing code silently converts read operations with the REQ_FUA bit set into write-barrier operations. This results in data loss as the backend scribbles zeroes over the data instead of returning it. While the REQ_FUA bit doesn't make sense on a read operation, at least one well-known out-of-tree kernel module does set it and since it results in data loss, let's be safe here and only look at REQ_FUA for writes. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230426164005.2213139-1-ross.lagerwall@citrix.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-14rbd: get snapshot context after exclusive lock is ensured to be heldIlya Dryomov1-17/+24
commit 870611e4877eff1e8413c3fb92a585e45d5291f6 upstream. Move capturing the snapshot context into the image request state machine, after exclusive lock is ensured to be held for the duration of dealing with the image request. This is needed to ensure correctness of fast-diff states (OBJECT_EXISTS vs OBJECT_EXISTS_CLEAN) and object deltas computed based off of them. Otherwise the object map that is forked for the snapshot isn't guaranteed to accurately reflect the contents of the snapshot when the snapshot is taken under I/O. This breaks differential backup and snapshot-based mirroring use cases with fast-diff enabled: since some object deltas may be incomplete, the destination image may get corrupted. Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/61472 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn> [idryomov@gmail.com: backport to 5.4: no rbd_img_capture_header(), img_request not embedded in blk-mq pdu] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-14rbd: move RBD_OBJ_FLAG_COPYUP_ENABLED flag settingIlya Dryomov1-11/+21
commit 09fe05c57b5aaf23e2c35036c98ea9f282b19a77 upstream. Move RBD_OBJ_FLAG_COPYUP_ENABLED flag setting into the object request state machine to allow for the snapshot context to be captured in the image request state machine rather than in rbd_queue_workfn(). Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-09treewide: Remove uninitialized_var() usageKees Cook2-4/+4
commit 3f649ab728cda8038259d8f14492fe400fbab911 upstream. Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-09nbd: Fix debugfs_create_dir error checkingIvan Orlov1-2/+2
[ Upstream commit 4913cfcf014c95f0437db2df1734472fd3e15098 ] The debugfs_create_dir function returns ERR_PTR in case of error, and the only correct way to check if an error occurred is 'IS_ERR' inline function. This patch will replace the null-comparison with IS_ERR. Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> Link: https://lore.kernel.org/r/20230512130533.98709-1-ivan.orlov0322@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-17drbd: correctly submit flush bio on barrierChristoph Böhmwalder1-1/+1
commit 3899d94e3831ee07ea6821c032dc297aec80586a upstream. When we receive a flush command (or "barrier" in DRBD), we currently use a REQ_OP_FLUSH with the REQ_PREFLUSH flag set. The correct way to submit a flush bio is by using a REQ_OP_WRITE without any data, and set the REQ_PREFLUSH flag. Since commit b4a6bb3a67aa ("block: add a sanity check for non-write flush/fua bios"), this triggers a warning in the block layer, but this has been broken for quite some time before that. So use the correct set of flags to actually make the flush happen. Cc: Christoph Hellwig <hch@infradead.org> Cc: stable@vger.kernel.org Fixes: f9ff0da56437 ("drbd: allow parallel flushes for multi-volume resources") Reported-by: Thomas Voegtle <tv@lio96.de> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230503121937.17232-1-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-22block: sunvdc: add check for mdesc_grab() returning NULLLiang He1-0/+2
[ Upstream commit 6030363199e3a6341afb467ddddbed56640cbf6a ] In vdc_port_probe(), we should check the return value of mdesc_grab() as it may return NULL, which can cause potential NPD bug. Fixes: 43fdf27470b2 ("[SPARC64]: Abstract out mdesc accesses for better MD update handling.") Signed-off-by: Liang He <windhl@126.com> Link: https://lore.kernel.org/r/20230315062032.1741692-1-windhl@126.com [axboe: style cleanup] Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11rbd: avoid use-after-free in do_rbd_add() when rbd_dev_create() failsIlya Dryomov1-11/+9
commit f7c4d9b133c7a04ca619355574e96b6abf209fba upstream. If getting an ID or setting up a work queue in rbd_dev_create() fails, use-after-free on rbd_dev->rbd_client, rbd_dev->spec and rbd_dev->opts is triggered in do_rbd_add(). The root cause is that the ownership of these structures is transfered to rbd_dev prematurely and they all end up getting freed when rbd_dev_create() calls rbd_dev_free() prior to returning to do_rbd_add(). Found by Linux Verification Center (linuxtesting.org) with SVACE, an incomplete patch submitted by Natalia Petrova <n.petrova@fintech.ru>. Cc: stable@vger.kernel.org Fixes: 1643dfa4c2c8 ("rbd: introduce a per-device ordered workqueue") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-25drbd: use after free in drbd_create_device()Dan Carpenter1-2/+2
[ Upstream commit a7a1598189228b5007369a9622ccdf587be0730f ] The drbd_destroy_connection() frees the "connection" so use the _safe() iterator to prevent a use after free. Fixes: b6f85ef9538b ("drbd: Iterate over all connections") Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/Y3Jd5iZRbNQ9w6gm@kili Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-26nbd: Fix hung when signal interrupts nbd_start_device_ioctl()Shigeru Yoshida1-2/+4
[ Upstream commit 1de7c3cf48fc41cd95adb12bd1ea9033a917798a ] syzbot reported hung task [1]. The following program is a simplified version of the reproducer: int main(void) { int sv[2], fd; if (socketpair(AF_UNIX, SOCK_STREAM, 0, sv) < 0) return 1; if ((fd = open("/dev/nbd0", 0)) < 0) return 1; if (ioctl(fd, NBD_SET_SIZE_BLOCKS, 0x81) < 0) return 1; if (ioctl(fd, NBD_SET_SOCK, sv[0]) < 0) return 1; if (ioctl(fd, NBD_DO_IT) < 0) return 1; return 0; } When signal interrupt nbd_start_device_ioctl() waiting the condition atomic_read(&config->recv_threads) == 0, the task can hung because it waits the completion of the inflight IOs. This patch fixes the issue by clearing queue, not just shutdown, when signal interrupt nbd_start_device_ioctl(). Link: https://syzkaller.appspot.com/bug?id=7d89a3ffacd2b83fdd39549bc4d8e0a89ef21239 [1] Reported-by: syzbot+38e6c55d4969a14c1534@syzkaller.appspotmail.com Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20220907163502.577561-1-syoshida@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-05loop: Check for overflow while configuring loopSiddh Raman Pant1-0/+5
commit c490a0b5a4f36da3918181a8acdc6991d967c5f3 upstream. The userspace can configure a loop using an ioctl call, wherein a configuration of type loop_config is passed (see lo_ioctl()'s case on line 1550 of drivers/block/loop.c). This proceeds to call loop_configure() which in turn calls loop_set_status_from_info() (see line 1050 of loop.c), passing &config->info which is of type loop_info64*. This function then sets the appropriate values, like the offset. loop_device has lo_offset of type loff_t (see line 52 of loop.c), which is typdef-chained to long long, whereas loop_info64 has lo_offset of type __u64 (see line 56 of include/uapi/linux/loop.h). The function directly copies offset from info to the device as follows (See line 980 of loop.c): lo->lo_offset = info->lo_offset; This results in an overflow, which triggers a warning in iomap_iter() due to a call to iomap_iter_done() which has: WARN_ON_ONCE(iter->iomap.offset > iter->pos); Thus, check for negative value during loop_set_status_from_info(). Bug report: https://syzkaller.appspot.com/bug?id=c620fe14aac810396d3c3edc9ad73848bf69a29e Reported-and-tested-by: syzbot+a8e049cd3abd342936b6@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Siddh Raman Pant <code@siddh.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220823160810.181275-1-code@siddh.me Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25null_blk: fix ida error handling in null_add_dev()Dan Carpenter1-3/+11
[ Upstream commit ee452a8d984f94fa8e894f003a52e776e4572881 ] There needs to be some error checking if ida_simple_get() fails. Also call ida_free() if there are errors later. Fixes: 94bc02e30fb8 ("nullb: use ida to manage index") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/YtEhXsr6vJeoiYhd@kili Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-07-07xen/blkfront: force data bouncing when backend is untrustedRoger Pau Monne1-15/+34
commit 2400617da7eebf9167d71a46122828bc479d64c9 upstream. Split the current bounce buffering logic used with persistent grants into it's own option, and allow enabling it independently of persistent grants. This allows to reuse the same code paths to perform the bounce buffering required to avoid leaking contiguous data in shared pages not part of the request fragments. Reporting whether the backend is to be trusted can be done using a module parameter, or from the xenstore frontend path as set by the toolstack when adding the device. This is CVE-2022-33742, part of XSA-403. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-07xen/blkfront: fix leaking data in shared pagesRoger Pau Monne1-3/+4
commit 2f446ffe9d737e9a844b97887919c4fda18246e7 upstream. When allocating pages to be used for shared communication with the backend always zero them, this avoids leaking unintended data present on the pages. This is CVE-2022-26365, part of XSA-403. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-14nbd: fix io hung while disconnecting deviceYu Kuai1-1/+1
[ Upstream commit 09dadb5985023e27d4740ebd17e6fea4640110e5 ] In our tests, "qemu-nbd" triggers a io hung: INFO: task qemu-nbd:11445 blocked for more than 368 seconds. Not tainted 5.18.0-rc3-next-20220422-00003-g2176915513ca #884 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:qemu-nbd state:D stack: 0 pid:11445 ppid: 1 flags:0x00000000 Call Trace: <TASK> __schedule+0x480/0x1050 ? _raw_spin_lock_irqsave+0x3e/0xb0 schedule+0x9c/0x1b0 blk_mq_freeze_queue_wait+0x9d/0xf0 ? ipi_rseq+0x70/0x70 blk_mq_freeze_queue+0x2b/0x40 nbd_add_socket+0x6b/0x270 [nbd] nbd_ioctl+0x383/0x510 [nbd] blkdev_ioctl+0x18e/0x3e0 __x64_sys_ioctl+0xac/0x120 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fd8ff706577 RSP: 002b:00007fd8fcdfebf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000040000000 RCX: 00007fd8ff706577 RDX: 000000000000000d RSI: 000000000000ab00 RDI: 000000000000000f RBP: 000000000000000f R08: 000000000000fbe8 R09: 000055fe497c62b0 R10: 00000002aff20000 R11: 0000000000000246 R12: 000000000000006d R13: 0000000000000000 R14: 00007ffe82dc5e70 R15: 00007fd8fcdff9c0 "qemu-ndb -d" will call ioctl 'NBD_DISCONNECT' first, however, following message was found: block nbd0: Send disconnect failed -32 Which indicate that something is wrong with the server. Then, "qemu-nbd -d" will call ioctl 'NBD_CLEAR_SOCK', however ioctl can't clear requests after commit 2516ab1543fd("nbd: only clear the queue on device teardown"). And in the meantime, request can't complete through timeout because nbd_xmit_timeout() will always return 'BLK_EH_RESET_TIMER', which means such request will never be completed in this situation. Now that the flag 'NBD_CMD_INFLIGHT' can make sure requests won't complete multiple times, switch back to call nbd_clear_sock() in nbd_clear_sock_ioctl(), so that inflight requests can be cleared. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20220521073749.3146892-5-yukuai3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14nbd: fix race between nbd_alloc_config() and module removalYu Kuai1-9/+19
[ Upstream commit c55b2b983b0fa012942c3eb16384b2b722caa810 ] When nbd module is being removing, nbd_alloc_config() may be called concurrently by nbd_genl_connect(), although try_module_get() will return false, but nbd_alloc_config() doesn't handle it. The race may lead to the leak of nbd_config and its related resources (e.g, recv_workq) and oops in nbd_read_stat() due to the unload of nbd module as shown below: BUG: kernel NULL pointer dereference, address: 0000000000000040 Oops: 0000 [#1] SMP PTI CPU: 5 PID: 13840 Comm: kworker/u17:33 Not tainted 5.14.0+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) Workqueue: knbd16-recv recv_work [nbd] RIP: 0010:nbd_read_stat.cold+0x130/0x1a4 [nbd] Call Trace: recv_work+0x3b/0xb0 [nbd] process_one_work+0x1ed/0x390 worker_thread+0x4a/0x3d0 kthread+0x12a/0x150 ret_from_fork+0x22/0x30 Fixing it by checking the return value of try_module_get() in nbd_alloc_config(). As nbd_alloc_config() may return ERR_PTR(-ENODEV), assign nbd->config only when nbd_alloc_config() succeeds to ensure the value of nbd->config is binary (valid or NULL). Also adding a debug message to check the reference counter of nbd_config during module removal. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20220521073749.3146892-3-yukuai3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14nbd: call genl_unregister_family() first in nbd_cleanup()Yu Kuai1-1/+6
[ Upstream commit 06c4da89c24e7023ea448cadf8e9daf06a0aae6e ] Otherwise there may be race between module removal and the handling of netlink command, which can lead to the oops as shown below: BUG: kernel NULL pointer dereference, address: 0000000000000098 Oops: 0002 [#1] SMP PTI CPU: 1 PID: 31299 Comm: nbd-client Tainted: G E 5.14.0-rc4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:down_write+0x1a/0x50 Call Trace: start_creating+0x89/0x130 debugfs_create_dir+0x1b/0x130 nbd_start_device+0x13d/0x390 [nbd] nbd_genl_connect+0x42f/0x748 [nbd] genl_family_rcv_msg_doit.isra.0+0xec/0x150 genl_rcv_msg+0xe5/0x1e0 netlink_rcv_skb+0x55/0x100 genl_rcv+0x29/0x40 netlink_unicast+0x1a8/0x250 netlink_sendmsg+0x21b/0x430 ____sys_sendmsg+0x2a4/0x2d0 ___sys_sendmsg+0x81/0xc0 __sys_sendmsg+0x62/0xb0 __x64_sys_sendmsg+0x1f/0x30 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae Modules linked in: nbd(E-) Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20220521073749.3146892-2-yukuai3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14virtio_blk: fix the discard_granularity and discard_alignment queue limitsChristoph Hellwig1-3/+4
[ Upstream commit 62952cc5bccd89b76d710de1d0b43244af0f2903 ] The discard_alignment queue limit is named a bit misleading means the offset into the block device at which the discard granularity starts. On the other hand the discard_sector_alignment from the virtio 1.1 looks similar to what Linux uses as discard granularity (even if not very well described): "discard_sector_alignment can be used by OS when splitting a request based on alignment. " And at least qemu does set it to the discard granularity. So stop setting the discard_alignment and use the virtio discard_sector_alignment to set the discard granularity. Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14drbd: fix duplicate array initializerArnd Bergmann1-5/+6
[ Upstream commit 33cb0917bbe241dd17a2b87ead63514c1b7e5615 ] There are two initializers for P_RETRY_WRITE: drivers/block/drbd/drbd_main.c:3676:22: warning: initialized field overwritten [-Woverride-init] Remove the first one since it was already ignored by the compiler and reorder the list to match the enum definition. As P_ZEROES had no entry, add that one instead. Fixes: 036b17eaab93 ("drbd: Receiving part for the PROTOCOL_UPDATE packet") Fixes: f31e583aa2c2 ("drbd: introduce P_ZEROES (REQ_OP_WRITE_ZEROES on the "wire")") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20220406190715.1938174-2-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14nbd: Fix hung on disconnect request if socket is closed beforeXie Yongji1-4/+9
[ Upstream commit 491bf8f236fdeec698fa6744993f1ecf3fafd1a5 ] When userspace closes the socket before sending a disconnect request, the following I/O requests will be blocked in wait_for_reconnect() until dead timeout. This will cause the following disconnect request also hung on blk_mq_quiesce_queue(). That means we have no way to disconnect a nbd device if there are some I/O requests waiting for reconnecting until dead timeout. It's not expected. So let's wake up the thread waiting for reconnecting directly when a disconnect request is sent. Reported-by: Xu Jianhai <zero.xu@bytedance.com> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20220322080639.142-1-xieyongji@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>