| Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit 844b8cdc681612ff24df62cdefddeab5772fadf1 ]
Following process can cause nbd_config UAF:
1) grab nbd_config temporarily;
2) nbd_genl_disconnect() flush all recv_work() and release the
initial reference:
nbd_genl_disconnect
nbd_disconnect_and_put
nbd_disconnect
flush_workqueue(nbd->recv_workq)
if (test_and_clear_bit(NBD_RT_HAS_CONFIG_REF, ...))
nbd_config_put
-> due to step 1), reference is still not zero
3) nbd_genl_reconfigure() queue recv_work() again;
nbd_genl_reconfigure
config = nbd_get_config_unlocked(nbd)
if (!config)
-> succeed
if (!test_bit(NBD_RT_BOUND, ...))
-> succeed
nbd_reconnect_socket
queue_work(nbd->recv_workq, &args->work)
4) step 1) release the reference;
5) Finially, recv_work() will trigger UAF:
recv_work
nbd_config_put(nbd)
-> nbd_config is freed
atomic_dec(&config->recv_threads)
-> UAF
Fix the problem by clearing NBD_RT_BOUND in nbd_genl_disconnect(), so
that nbd_genl_reconfigure() will fail.
Fixes: b7aa3d39385d ("nbd: add a reconfigure netlink command")
Reported-by: syzbot+6b0df248918b92c33e6a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/675bfb65.050a0220.1a2d0d.0006.GAE@google.com/
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250103092859.3574648-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 7678abee0867e6b7fb89aa40f6e9f575f755fb37 ]
Commit 4ce6e2db00de ("virtio-blk: Ensure no requests in virtqueues before
deleting vqs.") replaces queue quiesce with queue freeze in virtio-blk's
PM callbacks. And the motivation is to drain inflight IOs before suspending.
block layer's queue freeze looks very handy, but it is also easy to cause
deadlock, such as, any attempt to call into bio_queue_enter() may run into
deadlock if the queue is frozen in current context. There are all kinds
of ->suspend() called in suspend context, so keeping queue frozen in the
whole suspend context isn't one good idea. And Marek reported lockdep
warning[1] caused by virtio-blk's freeze queue in virtblk_freeze().
[1] https://lore.kernel.org/linux-block/ca16370e-d646-4eee-b9cc-87277c89c43c@samsung.com/
Given the motivation is to drain in-flight IOs, it can be done by calling
freeze & unfreeze, meantime restore to previous behavior by keeping queue
quiesced during suspend.
Cc: Yi Sun <yi.sun@unisoc.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: virtualization@lists.linux.dev
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20241112125821.1475793-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit be48c412f6ebf38849213c19547bc6d5b692b5e5 upstream.
Patch series "zram: fix backing device setup issue", v2.
This series fixes two bugs of backing device setting:
- ZRAM should reject using a zero sized (or the uninitialized ZRAM
device itself) as the backing device.
- Fix backing device leaking when removing a uninitialized ZRAM
device.
This patch (of 2):
Setting a zero sized block device as backing device is pointless, and one
can easily create a recursive loop by setting the uninitialized ZRAM
device itself as its own backing device by (zram0 is uninitialized):
echo /dev/zram0 > /sys/block/zram0/backing_dev
It's definitely a wrong config, and the module will pin itself, kernel
should refuse doing so in the first place.
By refusing to use zero sized device we avoided misuse cases including
this one above.
Link: https://lkml.kernel.org/r/20241209165717.94215-1-ryncsn@gmail.com
Link: https://lkml.kernel.org/r/20241209165717.94215-2-ryncsn@gmail.com
Fixes: 013bf95a83ec ("zram: add interface to specif backing device")
Signed-off-by: Kairui Song <kasong@tencent.com>
Reported-by: Desheng Wu <deshengwu@tencent.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6d6e54fc71ad1ab0a87047fd9c211e75d86084a3 upstream.
For fixing CVE-2023-6270, f98364e92662 ("aoe: fix the potential
use-after-free problem in aoecmd_cfg_pkts") makes tx() calling dev_put()
instead of doing in aoecmd_cfg_pkts(). It avoids that the tx() runs
into use-after-free.
Then Nicolai Stange found more places in aoe have potential use-after-free
problem with tx(). e.g. revalidate(), aoecmd_ata_rw(), resend(), probe()
and aoecmd_cfg_rsp(). Those functions also use aoenet_xmit() to push
packet to tx queue. So they should also use dev_hold() to increase the
refcnt of skb->dev.
On the other hand, moving dev_put() to tx() causes that the refcnt of
skb->dev be reduced to a negative value, because corresponding
dev_hold() are not called in revalidate(), aoecmd_ata_rw(), resend(),
probe(), and aoecmd_cfg_rsp(). This patch fixed this issue.
Cc: stable@vger.kernel.org
Link: https://nvd.nist.gov/vuln/detail/CVE-2023-6270
Fixes: f98364e92662 ("aoe: fix the potential use-after-free problem in aoecmd_cfg_pkts")
Reported-by: Nicolai Stange <nstange@suse.com>
Signed-off-by: Chun-Yi Lee <jlee@suse.com>
Link: https://lore.kernel.org/stable/20240624064418.27043-1-jlee%40suse.com
Link: https://lore.kernel.org/r/20241002035458.24401-1-jlee@suse.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a5e61b50c9f44c5edb6e134ede6fee8806ffafa9 upstream.
If the net_conf pointer is NULL and the code attempts to access its
fields without a check, it will lead to a null pointer dereference.
Add a NULL check before dereferencing the pointer.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 44ed167da748 ("drbd: rcu_read_lock() and rcu_dereference() for tconn->net_conf")
Cc: stable@vger.kernel.org
Signed-off-by: Mikhail Lobanov <m.lobanov@rosalinux.ru>
Link: https://lore.kernel.org/r/20240909133740.84297-1-m.lobanov@rosalinux.ru
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2f02b5af3a4482b216e6a466edecf6ba8450fa45 upstream.
The violation of atomicity occurs when the drbd_uuid_set_bm function is
executed simultaneously with modifying the value of
device->ldev->md.uuid[UI_BITMAP]. Consider a scenario where, while
device->ldev->md.uuid[UI_BITMAP] passes the validity check when its
value is not zero, the value of device->ldev->md.uuid[UI_BITMAP] is
written to zero. In this case, the check in drbd_uuid_set_bm might refer
to the old value of device->ldev->md.uuid[UI_BITMAP] (before locking),
which allows an invalid value to pass the validity check, resulting in
inconsistency.
To address this issue, it is recommended to include the data validity
check within the locked section of the function. This modification
ensures that the value of device->ldev->md.uuid[UI_BITMAP] does not
change during the validation process, thereby maintaining its integrity.
This possible bug is found by an experimental static analysis tool
developed by our team. This tool analyzes the locking APIs to extract
function pairs that can be concurrently executed, and then analyzes the
instructions in the paired functions to identify possible concurrency
bugs including data races and atomicity violations.
Fixes: 9f2247bb9b75 ("drbd: Protect accesses to the uuid set with a spinlock")
Cc: stable@vger.kernel.org
Signed-off-by: Qiu-ji Chen <chenqiuji666@gmail.com>
Reviewed-by: Philipp Reisner <philipp.reisner@linbit.com>
Link: https://lore.kernel.org/r/20240913083504.10549-1-chenqiuji666@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2237ceb71f89837ac47c5dce2aaa2c2b3a337a3c upstream.
Every time a watch is reestablished after getting lost, we need to
update the cookie which involves quiescing exclusive lock. For this,
we transition from RBD_LOCK_STATE_LOCKED to RBD_LOCK_STATE_QUIESCING
roughly for the duration of rbd_reacquire_lock() call. If the mapping
is exclusive and I/O happens to arrive in this time window, it's failed
with EROFS (later translated to EIO) based on the wrong assumption in
rbd_img_exclusive_lock() -- "lock got released?" check there stopped
making sense with commit a2b1da09793d ("rbd: lock should be quiesced on
reacquire").
To make it worse, any such I/O is added to the acquiring list before
EROFS is returned and this sets up for violating rbd_lock_del_request()
precondition that the request is either on the running list or not on
any list at all -- see commit ded080c86b3f ("rbd: don't move requests
to the running list on errors"). rbd_lock_del_request() ends up
processing these requests as if they were on the running list which
screws up quiescing_wait completion counter and ultimately leads to
rbd_assert(!completion_done(&rbd_dev->quiescing_wait));
being triggered on the next watch error.
Cc: stable@vger.kernel.org # 06ef84c4e9c4: rbd: rename RBD_LOCK_STATE_RELEASING and releasing_wait
Cc: stable@vger.kernel.org
Fixes: 637cd060537d ("rbd: new exclusive lock wait/wake code")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f5c466a0fdb2d9f3650d2e3911b0735f17ba00cf upstream.
... to RBD_LOCK_STATE_QUIESCING and quiescing_wait to recognize that
this state and the associated completion are backing rbd_quiesce_lock(),
which isn't specific to releasing the lock.
While exclusive lock does get quiesced before it's released, it also
gets quiesced before an attempt to update the cookie is made and there
the lock is not released as long as ceph_cls_set_cookie() succeeds.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3ceccb14f5576e02b81cc8b105ab81f224bd87f6 upstream.
Expanding on the previous commit, assuming that rbd_is_lock_owner()
always returns true (i.e. that we are either in RBD_LOCK_STATE_LOCKED
or RBD_LOCK_STATE_QUIESCING) if the mapping is exclusive is wrong too.
In case ceph_cls_set_cookie() fails, the lock would be temporarily
released even if the mapping is exclusive, meaning that we can end up
even in RBD_LOCK_STATE_UNLOCKED.
IOW, exclusive mappings are really "just" about disabling automatic
lock transitions (as documented in the man page), not about grabbing
the lock and holding on to it whatever it takes.
Cc: stable@vger.kernel.org
Fixes: 637cd060537d ("rbd: new exclusive lock wait/wake code")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 9e6727f824edcdb8fdd3e6e8a0862eb49546e1cd ]
No functional changes intended.
Fixes: f2298c0403b0 ("null_blk: multi queue aware block test driver")
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20240506075538.6064-1-yanjun.zhu@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 07d1b99825f40f9c0d93e6b99d79a08d0717bac1 ]
When a mutex lock is not used any more, the function mutex_destroy
should be called to mark the mutex lock uninitialized.
Fixes: f2298c0403b0 ("null_blk: multi queue aware block test driver")
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://lore.kernel.org/r/20240425171635.4227-1-yanjun.zhu@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9f6ad5d533d1c71e51bdd06a5712c4fbc8768dfa ]
In loop_set_status_from_info(), lo->lo_offset and lo->lo_sizelimit should
be checked before reassignment, because if an overflow error occurs, the
original correct value will be changed to the wrong value, and it will not
be changed back.
More, the original patch did not solve the problem, the value was set and
ioctl returned an error, but the subsequent io used the value in the loop
driver, which still caused an alarm:
loop_handle_cmd
do_req_filebacked
loff_t pos = ((loff_t) blk_rq_pos(rq) << 9) + lo->lo_offset;
lo_rw_aio
cmd->iocb.ki_pos = pos
Fixes: c490a0b5a4f3 ("loop: Check for overflow while configuring loop")
Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20230221095027.3656193-1-zhongjinghua@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit c490a0b5a4f36da3918181a8acdc6991d967c5f3 ]
The userspace can configure a loop using an ioctl call, wherein
a configuration of type loop_config is passed (see lo_ioctl()'s
case on line 1550 of drivers/block/loop.c). This proceeds to call
loop_configure() which in turn calls loop_set_status_from_info()
(see line 1050 of loop.c), passing &config->info which is of type
loop_info64*. This function then sets the appropriate values, like
the offset.
loop_device has lo_offset of type loff_t (see line 52 of loop.c),
which is typdef-chained to long long, whereas loop_info64 has
lo_offset of type __u64 (see line 56 of include/uapi/linux/loop.h).
The function directly copies offset from info to the device as
follows (See line 980 of loop.c):
lo->lo_offset = info->lo_offset;
This results in an overflow, which triggers a warning in iomap_iter()
due to a call to iomap_iter_done() which has:
WARN_ON_ONCE(iter->iomap.offset > iter->pos);
Thus, check for negative value during loop_set_status_from_info().
Bug report: https://syzkaller.appspot.com/bug?id=c620fe14aac810396d3c3edc9ad73848bf69a29e
Reported-and-tested-by: syzbot+a8e049cd3abd342936b6@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Siddh Raman Pant <code@siddh.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220823160810.181275-1-code@siddh.me
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0c3796c244598122a5d59d56f30d19390096817f ]
Factor out this code into a separate function, so it can be reused by
other code more easily.
Signed-off-by: Martijn Coenen <maco@android.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit b0bd158dd630bd47640e0e418c062cda1e0da5ad ]
figure_loop_size() calculates the loop size based on the passed in
parameters, but at the same time it updates the offset and sizelimit
parameters in the loop device configuration. That is a somewhat
unexpected side effect of a function with this name, and it is only only
needed by one of the two callers of this function - loop_set_status().
Move the lo_offset and lo_sizelimit assignment back into loop_set_status(),
and use the newly factored out functions to validate and apply the newly
calculated size. This allows us to get rid of figure_loop_size() in a
follow-up commit.
Signed-off-by: Martijn Coenen <maco@android.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 5795b6f5607f7e4db62ddea144727780cb351a9b ]
This code is used repeatedly.
Signed-off-by: Martijn Coenen <maco@android.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 083a6a50783ef54256eec3499e6575237e0e3d53 ]
sector_t is now always u64, so we don't need to check for truncation.
Signed-off-by: Martijn Coenen <maco@android.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 7c5014b0987a30e4989c90633c198aced454c0ec ]
loop_set_status() calls loop_config_discard() to configure discard for
the loop device; however, the discard configuration depends on whether
the loop device uses encryption, and when we call it the encryption
configuration has not been updated yet. Move the call down so we apply
the correct discard configuration based on the new configuration.
Signed-off-by: Martijn Coenen <maco@android.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
This reverts commit 13b2856037a651ba3ab4a8b25ecab3e791926da3.
This patch lost a unlock loop_ctl_mutex in loop_get_status(...),
which caused syzbot to report a UAF issue.The upstream patch does not
have this issue. Therefore, we revert this patch and directly apply
the upstream patch later on.
Risk use-after-free as reported by syzbot:
[ 84.669496] ==================================================================
[ 84.670021] BUG: KASAN: use-after-free in __mutex_lock.isra.9+0xc13/0xcb0
[ 84.670433] Read of size 4 at addr ffff88808dba43b8 by task syz-executor.22/14230
[ 84.670885]
[ 84.670987] CPU: 1 PID: 14230 Comm: syz-executor.22 Not tainted 5.4.270 #4
[ 84.671397] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1kylin1 04/01/2014
[ 84.671927] Call Trace:
[ 84.672085] dump_stack+0x94/0xc7
[ 84.672293] ? __mutex_lock.isra.9+0xc13/0xcb0
[ 84.672569] print_address_description.constprop.6+0x16/0x220
[ 84.672915] ? __mutex_lock.isra.9+0xc13/0xcb0
[ 84.673187] ? __mutex_lock.isra.9+0xc13/0xcb0
[ 84.673462] __kasan_report.cold.9+0x1a/0x32
[ 84.673723] ? __mutex_lock.isra.9+0xc13/0xcb0
[ 84.673993] kasan_report+0x10/0x20
[ 84.674208] __mutex_lock.isra.9+0xc13/0xcb0
[ 84.674468] ? __mutex_lock.isra.9+0x617/0xcb0
[ 84.674739] ? ww_mutex_lock_interruptible+0x100/0x100
[ 84.675055] ? ww_mutex_lock_interruptible+0x100/0x100
[ 84.675369] ? kobject_get_unless_zero+0x144/0x190
[ 84.675668] ? kobject_del+0x60/0x60
[ 84.675893] ? __module_get+0x120/0x120
[ 84.676128] ? __mutex_lock_slowpath+0x10/0x10
[ 84.676399] mutex_lock_killable+0xde/0xf0
[ 84.676652] ? __mutex_lock_killable_slowpath+0x10/0x10
[ 84.676967] ? __mutex_lock_slowpath+0x10/0x10
[ 84.677243] ? disk_block_events+0x1d/0x120
[ 84.677509] lo_open+0x16/0xc0
[ 84.677701] ? lo_compat_ioctl+0x160/0x160
[ 84.677954] __blkdev_get+0xb0f/0x1160
[ 84.678185] ? bd_may_claim+0xd0/0xd0
[ 84.678410] ? bdev_disk_changed+0x190/0x190
[ 84.678674] ? _raw_spin_lock+0x7c/0xd0
[ 84.678915] ? _raw_write_lock_bh+0xd0/0xd0
[ 84.679172] blkdev_get+0x9b/0x290
[ 84.679381] ? ihold+0x1a/0x40
[ 84.679574] blkdev_open+0x1bd/0x240
[ 84.679794] do_dentry_open+0x439/0x1000
[ 84.680035] ? blkdev_get_by_dev+0x60/0x60
[ 84.680286] ? __x64_sys_fchdir+0x1a0/0x1a0
[ 84.680557] ? inode_permission+0x86/0x320
[ 84.680814] path_openat+0x998/0x4120
[ 84.681044] ? stack_trace_consume_entry+0x160/0x160
[ 84.681348] ? do_futex+0x136/0x1880
[ 84.681568] ? path_mountpoint+0xb50/0xb50
[ 84.681823] ? save_stack+0x4d/0x80
[ 84.682038] ? save_stack+0x19/0x80
[ 84.682253] ? __kasan_kmalloc.constprop.6+0xc1/0xd0
[ 84.682553] ? kmem_cache_alloc+0xc7/0x210
[ 84.682804] ? getname_flags+0xc4/0x560
[ 84.683045] ? do_sys_open+0x1ce/0x450
[ 84.683272] ? do_syscall_64+0x9a/0x330
[ 84.683509] ? entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[ 84.683826] ? _raw_spin_lock+0x7c/0xd0
[ 84.684063] ? _raw_write_lock_bh+0xd0/0xd0
[ 84.684319] ? futex_exit_release+0x60/0x60
[ 84.684574] ? kasan_unpoison_shadow+0x30/0x40
[ 84.684844] ? __kasan_kmalloc.constprop.6+0xc1/0xd0
[ 84.685149] ? get_partial_node.isra.83.part.84+0x1e5/0x340
[ 84.685485] ? __fget_light+0x1d1/0x550
[ 84.685721] do_filp_open+0x197/0x270
[ 84.685946] ? may_open_dev+0xd0/0xd0
[ 84.686172] ? kasan_unpoison_shadow+0x30/0x40
[ 84.686443] ? __kasan_kmalloc.constprop.6+0xc1/0xd0
[ 84.686743] ? __alloc_fd+0x1a3/0x580
[ 84.686973] do_sys_open+0x2c7/0x450
[ 84.687195] ? filp_open+0x60/0x60
[ 84.687406] ? __x64_sys_timer_settime32+0x280/0x280
[ 84.687707] do_syscall_64+0x9a/0x330
[ 84.687931] ? syscall_return_slowpath+0x17a/0x230
[ 84.688221] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[ 84.688524]
[ 84.688622] Allocated by task 14056:
[ 84.688842] save_stack+0x19/0x80
[ 84.689044] __kasan_kmalloc.constprop.6+0xc1/0xd0
[ 84.689333] kmem_cache_alloc_node+0xe2/0x230
[ 84.689600] copy_process+0x165c/0x72d0
[ 84.689833] _do_fork+0xf9/0x9a0
[ 84.690032] __x64_sys_clone+0x17a/0x200
[ 84.690271] do_syscall_64+0x9a/0x330
[ 84.690496] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[ 84.690800]
[ 84.690903] Freed by task 0:
[ 84.691081] save_stack+0x19/0x80
[ 84.691287] __kasan_slab_free+0x125/0x170
[ 84.691535] kmem_cache_free+0x7a/0x2a0
[ 84.691774] __put_task_struct+0x1ec/0x4a0
[ 84.692023] delayed_put_task_struct+0x178/0x1d0
[ 84.692303] rcu_core+0x538/0x16c0
[ 84.692512] __do_softirq+0x175/0x63d
[ 84.692741]
[ 84.692840] The buggy address belongs to the object at ffff88808dba4380
[ 84.692840] which belongs to the cache task_struct of size 3328
[ 84.693584] The buggy address is located 56 bytes inside of
[ 84.693584] 3328-byte region [ffff88808dba4380, ffff88808dba5080)
[ 84.694272] The buggy address belongs to the page:
[ 84.694563] page:ffffea000236e800 refcount:1 mapcount:0 mapping:ffff8881838acdc0 index:0x0 compound_mapcount: 0
[ 84.695166] flags: 0x100000000010200(slab|head)
[ 84.695457] raw: 0100000000010200 dead000000000100 dead000000000122 ffff8881838acdc0
[ 84.695919] raw: 0000000000000000 0000000000090009 00000001ffffffff 0000000000000000
[ 84.696375] page dumped because: kasan: bad access detected
[ 84.696705]
[ 84.696801] Memory state around the buggy address:
[ 84.697089] ffff88808dba4280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 84.697519] ffff88808dba4300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 84.697945] >ffff88808dba4380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 84.698371] ^
[ 84.698674] ffff88808dba4400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 84.699111] ffff88808dba4480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 84.699537] ==================================================================
[ 84.699965] Disabling lock debugging due to kernel taint
Reported-by: k2ci <kernel-bot@kylinos.cn>
Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit f98364e926626c678fb4b9004b75cacf92ff0662 ]
This patch is against CVE-2023-6270. The description of cve is:
A flaw was found in the ATA over Ethernet (AoE) driver in the Linux
kernel. The aoecmd_cfg_pkts() function improperly updates the refcnt on
`struct net_device`, and a use-after-free can be triggered by racing
between the free on the struct and the access through the `skbtxq`
global queue. This could lead to a denial of service condition or
potential code execution.
In aoecmd_cfg_pkts(), it always calls dev_put(ifp) when skb initial
code is finished. But the net_device ifp will still be used in
later tx()->dev_queue_xmit() in kthread. Which means that the
dev_put(ifp) should NOT be called in the success path of skb
initial code in aoecmd_cfg_pkts(). Otherwise tx() may run into
use-after-free because the net_device is freed.
This patch removed the dev_put(ifp) in the success path in
aoecmd_cfg_pkts(), and added dev_put() after skb xmit in tx().
Link: https://nvd.nist.gov/vuln/detail/CVE-2023-6270
Fixes: 7562f876cd93 ("[NET]: Rework dev_base via list_head (v3)")
Signed-off-by: Chun-Yi Lee <jlee@suse.com>
Link: https://lore.kernel.org/r/20240305082048.25526-1-jlee@suse.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 31edf4bbe0ba27fd03ac7d87eb2ee3d2a231af6d ]
nla_nest_start() may fail and return NULL. Insert a check and set errno
based on other call sites within the same source code.
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Fixes: 47d902b90a32 ("nbd: add a status netlink command")
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240218042534.it.206-kees@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4ce6e2db00de8103a0687fb0f65fd17124a51aaa ]
Ensure no remaining requests in virtqueues before resetting vdev and
deleting virtqueues. Otherwise these requests will never be completed.
It may cause the system to become unresponsive.
Function blk_mq_quiesce_queue() can ensure that requests have become
in_flight status, but it cannot guarantee that requests have been
processed by the device. Virtqueues should never be deleted before
all requests become complete status.
Function blk_mq_freeze_queue() ensure that all requests in virtqueues
become complete status. And no requests can enter in virtqueues.
Signed-off-by: Yi Sun <yi.sun@unisoc.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20240129085250.1550594-1-yi.sun@unisoc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit ded080c86b3f99683774af0441a58fc2e3d60cae upstream.
The running list is supposed to contain requests that are pinning the
exclusive lock, i.e. those that must be flushed before exclusive lock
is released. When wake_lock_waiters() is called to handle an error,
requests on the acquiring list are failed with that error and no
flushing takes place. Briefly moving them to the running list is not
only pointless but also harmful: if exclusive lock gets acquired
before all of their state machines are scheduled and go through
rbd_lock_del_request(), we trigger
rbd_assert(list_empty(&rbd_dev->running_list));
in rbd_try_acquire_lock().
Cc: stable@vger.kernel.org
Fixes: 637cd060537d ("rbd: new exclusive lock wait/wake code")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d13bc4d84a8e91060d3797fc95c1a0202bfd1499 upstream.
This driver is for fairly obscure hardware, and has only seen random
drive-by changes after the maintainer stopped working on it in 2005
(about a year and a half after it was introduced). It has some
"interesting" block layer interactions, so let's just drop it unless
anyone complains.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220721064102.1715460-1-hch@lst.de
[axboe: fix date typo, it was in 2005, not 2015]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0b207d02bd9ab8dcc31b262ca9f60dbc1822500d upstream.
rbd_dev_refresh() has been holding header_rwsem across header and
parent info read-in unnecessarily for ages. With commit 870611e4877e
("rbd: get snapshot context after exclusive lock is ensured to be
held"), the potential for deadlocks became much more real owning to
a) header_rwsem now nesting inside lock_rwsem and b) rw_semaphores
not allowing new readers after a writer is registered.
For example, assuming that I/O request 1, I/O request 2 and header
read-in request all target the same OSD:
1. I/O request 1 comes in and gets submitted
2. watch error occurs
3. rbd_watch_errcb() takes lock_rwsem for write, clears owner_cid and
releases lock_rwsem
4. after reestablishing the watch, rbd_reregister_watch() calls
rbd_dev_refresh() which takes header_rwsem for write and submits
a header read-in request
5. I/O request 2 comes in: after taking lock_rwsem for read in
__rbd_img_handle_request(), it blocks trying to take header_rwsem
for read in rbd_img_object_requests()
6. another watch error occurs
7. rbd_watch_errcb() blocks trying to take lock_rwsem for write
8. I/O request 1 completion is received by the messenger but can't be
processed because lock_rwsem won't be granted anymore
9. header read-in request completion can't be received, let alone
processed, because the messenger is stranded
Change rbd_dev_refresh() to take header_rwsem only for actually
updating rbd_dev->header. Header and parent info read-in don't need
any locking.
Cc: stable@vger.kernel.org # 0b035401c570: rbd: move rbd_dev_refresh() definition
Cc: stable@vger.kernel.org # 510a7330c82a: rbd: decouple header read-in from updating rbd_dev->header
Cc: stable@vger.kernel.org # c10311776f0a: rbd: decouple parent info read-in from updating rbd_dev
Cc: stable@vger.kernel.org
Fixes: 870611e4877e ("rbd: get snapshot context after exclusive lock is ensured to be held")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
[idryomov@gmail.com: backport to 5.4: open-code rbd_is_snap(), preserve
rbd_exists_validate() call]
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit c10311776f0a8ddea2276df96e255625b07045a8 upstream.
Unlike header read-in, parent info read-in is already decoupled in
get_parent_info(), but it's buried in rbd_dev_v2_parent_info() along
with the processing logic.
Separate the initial read-in and update read-in logic into
rbd_dev_setup_parent() and rbd_dev_update_parent() respectively and
have rbd_dev_v2_parent_info() just populate struct parent_image_info
(i.e. what get_parent_info() did). Some existing QoI issues, like
flatten of a standalone clone being disregarded on refresh, remain.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
[idryomov@gmail.com: backport to 5.4: context]
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 510a7330c82a7754d5df0117a8589e8a539067c7 upstream.
Make rbd_dev_header_info() populate a passed struct rbd_image_header
instead of rbd_dev->header and introduce rbd_dev_update_header() for
updating mutable fields in rbd_dev->header upon refresh. The initial
read-in of both mutable and immutable fields in rbd_dev_image_probe()
passes in rbd_dev->header so no update step is required there.
rbd_init_layout() is now called directly from rbd_dev_image_probe()
instead of individually in format 1 and format 2 implementations.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
[idryomov@gmail.com: backport to 5.4: _rbd_dev_v2_snap_features()
doesn't have read_only param]
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 0b035401c57021fc6c300272cbb1c5a889d4fe45 upstream.
Move rbd_dev_refresh() definition further down to avoid having to
move struct parent_image_info definition in the next commit. This
spares some forward declarations too.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
[idryomov@gmail.com: backport to 5.4: drop rbd_is_snap() assert,
preserve rbd_exists_validate() call]
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 2112f5c1330a671fa852051d85cb9eadc05d7eb7 upstream.
We noticed that the user interface of Android devices becomes very slow
under memory pressure. This is because Android uses the zram driver on top
of the loop driver for swapping, because under memory pressure the swap
code alternates reads and writes quickly, because mq-deadline is the
default scheduler for loop devices and because mq-deadline delays writes by
five seconds for such a workload with default settings. Fix this by making
the kernel select I/O scheduler 'none' from inside add_disk() for loop
devices. This default can be overridden at any time from user space,
e.g. via a udev rule. This approach has an advantage compared to changing
the I/O scheduler from userspace from 'mq-deadline' into 'none', namely
that synchronize_rcu() does not get called.
This patch changes the default I/O scheduler for loop devices from
'mq-deadline' into 'none'.
Additionally, this patch reduces the Android boot time on my test setup
with 0.5 seconds compared to configuring the loop I/O scheduler from user
space.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Martijn Coenen <maco@android.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210805174200.3250718-3-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit f12bc113ce904777fd6ca003b473b427782b3dde ]
If the index allocated by idr_alloc greater than MINORMASK >> part_shift,
the device number will overflow, resulting in failure to create a block
device.
Fix it by imiting the size of the max allocation.
Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230605122159.2134384-1-zhongjinghua@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b6ebaa8100090092aa602530d7e8316816d0c98d ]
The existing code silently converts read operations with the
REQ_FUA bit set into write-barrier operations. This results in data
loss as the backend scribbles zeroes over the data instead of returning
it.
While the REQ_FUA bit doesn't make sense on a read operation, at least
one well-known out-of-tree kernel module does set it and since it
results in data loss, let's be safe here and only look at REQ_FUA for
writes.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230426164005.2213139-1-ross.lagerwall@citrix.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 870611e4877eff1e8413c3fb92a585e45d5291f6 upstream.
Move capturing the snapshot context into the image request state
machine, after exclusive lock is ensured to be held for the duration of
dealing with the image request. This is needed to ensure correctness
of fast-diff states (OBJECT_EXISTS vs OBJECT_EXISTS_CLEAN) and object
deltas computed based off of them. Otherwise the object map that is
forked for the snapshot isn't guaranteed to accurately reflect the
contents of the snapshot when the snapshot is taken under I/O. This
breaks differential backup and snapshot-based mirroring use cases with
fast-diff enabled: since some object deltas may be incomplete, the
destination image may get corrupted.
Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/61472
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
[idryomov@gmail.com: backport to 5.4: no rbd_img_capture_header(),
img_request not embedded in blk-mq pdu]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 09fe05c57b5aaf23e2c35036c98ea9f282b19a77 upstream.
Move RBD_OBJ_FLAG_COPYUP_ENABLED flag setting into the object request
state machine to allow for the snapshot context to be captured in the
image request state machine rather than in rbd_queue_workfn().
Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3f649ab728cda8038259d8f14492fe400fbab911 upstream.
Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.
In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:
git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
xargs perl -pi -e \
's/\buninitialized_var\(([^\)]+)\)/\1/g;
s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'
drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.
No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.
[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/
Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 4913cfcf014c95f0437db2df1734472fd3e15098 ]
The debugfs_create_dir function returns ERR_PTR in case of error, and the
only correct way to check if an error occurred is 'IS_ERR' inline function.
This patch will replace the null-comparison with IS_ERR.
Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Link: https://lore.kernel.org/r/20230512130533.98709-1-ivan.orlov0322@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 3899d94e3831ee07ea6821c032dc297aec80586a upstream.
When we receive a flush command (or "barrier" in DRBD), we currently use
a REQ_OP_FLUSH with the REQ_PREFLUSH flag set.
The correct way to submit a flush bio is by using a REQ_OP_WRITE without
any data, and set the REQ_PREFLUSH flag.
Since commit b4a6bb3a67aa ("block: add a sanity check for non-write
flush/fua bios"), this triggers a warning in the block layer, but this
has been broken for quite some time before that.
So use the correct set of flags to actually make the flush happen.
Cc: Christoph Hellwig <hch@infradead.org>
Cc: stable@vger.kernel.org
Fixes: f9ff0da56437 ("drbd: allow parallel flushes for multi-volume resources")
Reported-by: Thomas Voegtle <tv@lio96.de>
Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230503121937.17232-1-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 6030363199e3a6341afb467ddddbed56640cbf6a ]
In vdc_port_probe(), we should check the return value of mdesc_grab() as
it may return NULL, which can cause potential NPD bug.
Fixes: 43fdf27470b2 ("[SPARC64]: Abstract out mdesc accesses for better MD update handling.")
Signed-off-by: Liang He <windhl@126.com>
Link: https://lore.kernel.org/r/20230315062032.1741692-1-windhl@126.com
[axboe: style cleanup]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit f7c4d9b133c7a04ca619355574e96b6abf209fba upstream.
If getting an ID or setting up a work queue in rbd_dev_create() fails,
use-after-free on rbd_dev->rbd_client, rbd_dev->spec and rbd_dev->opts
is triggered in do_rbd_add(). The root cause is that the ownership of
these structures is transfered to rbd_dev prematurely and they all end
up getting freed when rbd_dev_create() calls rbd_dev_free() prior to
returning to do_rbd_add().
Found by Linux Verification Center (linuxtesting.org) with SVACE, an
incomplete patch submitted by Natalia Petrova <n.petrova@fintech.ru>.
Cc: stable@vger.kernel.org
Fixes: 1643dfa4c2c8 ("rbd: introduce a per-device ordered workqueue")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a7a1598189228b5007369a9622ccdf587be0730f ]
The drbd_destroy_connection() frees the "connection" so use the _safe()
iterator to prevent a use after free.
Fixes: b6f85ef9538b ("drbd: Iterate over all connections")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/Y3Jd5iZRbNQ9w6gm@kili
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 1de7c3cf48fc41cd95adb12bd1ea9033a917798a ]
syzbot reported hung task [1]. The following program is a simplified
version of the reproducer:
int main(void)
{
int sv[2], fd;
if (socketpair(AF_UNIX, SOCK_STREAM, 0, sv) < 0)
return 1;
if ((fd = open("/dev/nbd0", 0)) < 0)
return 1;
if (ioctl(fd, NBD_SET_SIZE_BLOCKS, 0x81) < 0)
return 1;
if (ioctl(fd, NBD_SET_SOCK, sv[0]) < 0)
return 1;
if (ioctl(fd, NBD_DO_IT) < 0)
return 1;
return 0;
}
When signal interrupt nbd_start_device_ioctl() waiting the condition
atomic_read(&config->recv_threads) == 0, the task can hung because it
waits the completion of the inflight IOs.
This patch fixes the issue by clearing queue, not just shutdown, when
signal interrupt nbd_start_device_ioctl().
Link: https://syzkaller.appspot.com/bug?id=7d89a3ffacd2b83fdd39549bc4d8e0a89ef21239 [1]
Reported-by: syzbot+38e6c55d4969a14c1534@syzkaller.appspotmail.com
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20220907163502.577561-1-syoshida@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit c490a0b5a4f36da3918181a8acdc6991d967c5f3 upstream.
The userspace can configure a loop using an ioctl call, wherein
a configuration of type loop_config is passed (see lo_ioctl()'s
case on line 1550 of drivers/block/loop.c). This proceeds to call
loop_configure() which in turn calls loop_set_status_from_info()
(see line 1050 of loop.c), passing &config->info which is of type
loop_info64*. This function then sets the appropriate values, like
the offset.
loop_device has lo_offset of type loff_t (see line 52 of loop.c),
which is typdef-chained to long long, whereas loop_info64 has
lo_offset of type __u64 (see line 56 of include/uapi/linux/loop.h).
The function directly copies offset from info to the device as
follows (See line 980 of loop.c):
lo->lo_offset = info->lo_offset;
This results in an overflow, which triggers a warning in iomap_iter()
due to a call to iomap_iter_done() which has:
WARN_ON_ONCE(iter->iomap.offset > iter->pos);
Thus, check for negative value during loop_set_status_from_info().
Bug report: https://syzkaller.appspot.com/bug?id=c620fe14aac810396d3c3edc9ad73848bf69a29e
Reported-and-tested-by: syzbot+a8e049cd3abd342936b6@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Siddh Raman Pant <code@siddh.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220823160810.181275-1-code@siddh.me
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ee452a8d984f94fa8e894f003a52e776e4572881 ]
There needs to be some error checking if ida_simple_get() fails.
Also call ida_free() if there are errors later.
Fixes: 94bc02e30fb8 ("nullb: use ida to manage index")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/YtEhXsr6vJeoiYhd@kili
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 2400617da7eebf9167d71a46122828bc479d64c9 upstream.
Split the current bounce buffering logic used with persistent grants
into it's own option, and allow enabling it independently of
persistent grants. This allows to reuse the same code paths to
perform the bounce buffering required to avoid leaking contiguous data
in shared pages not part of the request fragments.
Reporting whether the backend is to be trusted can be done using a
module parameter, or from the xenstore frontend path as set by the
toolstack when adding the device.
This is CVE-2022-33742, part of XSA-403.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2f446ffe9d737e9a844b97887919c4fda18246e7 upstream.
When allocating pages to be used for shared communication with the
backend always zero them, this avoids leaking unintended data present
on the pages.
This is CVE-2022-26365, part of XSA-403.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 09dadb5985023e27d4740ebd17e6fea4640110e5 ]
In our tests, "qemu-nbd" triggers a io hung:
INFO: task qemu-nbd:11445 blocked for more than 368 seconds.
Not tainted 5.18.0-rc3-next-20220422-00003-g2176915513ca #884
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:qemu-nbd state:D stack: 0 pid:11445 ppid: 1 flags:0x00000000
Call Trace:
<TASK>
__schedule+0x480/0x1050
? _raw_spin_lock_irqsave+0x3e/0xb0
schedule+0x9c/0x1b0
blk_mq_freeze_queue_wait+0x9d/0xf0
? ipi_rseq+0x70/0x70
blk_mq_freeze_queue+0x2b/0x40
nbd_add_socket+0x6b/0x270 [nbd]
nbd_ioctl+0x383/0x510 [nbd]
blkdev_ioctl+0x18e/0x3e0
__x64_sys_ioctl+0xac/0x120
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fd8ff706577
RSP: 002b:00007fd8fcdfebf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000040000000 RCX: 00007fd8ff706577
RDX: 000000000000000d RSI: 000000000000ab00 RDI: 000000000000000f
RBP: 000000000000000f R08: 000000000000fbe8 R09: 000055fe497c62b0
R10: 00000002aff20000 R11: 0000000000000246 R12: 000000000000006d
R13: 0000000000000000 R14: 00007ffe82dc5e70 R15: 00007fd8fcdff9c0
"qemu-ndb -d" will call ioctl 'NBD_DISCONNECT' first, however, following
message was found:
block nbd0: Send disconnect failed -32
Which indicate that something is wrong with the server. Then,
"qemu-nbd -d" will call ioctl 'NBD_CLEAR_SOCK', however ioctl can't clear
requests after commit 2516ab1543fd("nbd: only clear the queue on device
teardown"). And in the meantime, request can't complete through timeout
because nbd_xmit_timeout() will always return 'BLK_EH_RESET_TIMER', which
means such request will never be completed in this situation.
Now that the flag 'NBD_CMD_INFLIGHT' can make sure requests won't
complete multiple times, switch back to call nbd_clear_sock() in
nbd_clear_sock_ioctl(), so that inflight requests can be cleared.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20220521073749.3146892-5-yukuai3@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c55b2b983b0fa012942c3eb16384b2b722caa810 ]
When nbd module is being removing, nbd_alloc_config() may be
called concurrently by nbd_genl_connect(), although try_module_get()
will return false, but nbd_alloc_config() doesn't handle it.
The race may lead to the leak of nbd_config and its related
resources (e.g, recv_workq) and oops in nbd_read_stat() due
to the unload of nbd module as shown below:
BUG: kernel NULL pointer dereference, address: 0000000000000040
Oops: 0000 [#1] SMP PTI
CPU: 5 PID: 13840 Comm: kworker/u17:33 Not tainted 5.14.0+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Workqueue: knbd16-recv recv_work [nbd]
RIP: 0010:nbd_read_stat.cold+0x130/0x1a4 [nbd]
Call Trace:
recv_work+0x3b/0xb0 [nbd]
process_one_work+0x1ed/0x390
worker_thread+0x4a/0x3d0
kthread+0x12a/0x150
ret_from_fork+0x22/0x30
Fixing it by checking the return value of try_module_get()
in nbd_alloc_config(). As nbd_alloc_config() may return ERR_PTR(-ENODEV),
assign nbd->config only when nbd_alloc_config() succeeds to ensure
the value of nbd->config is binary (valid or NULL).
Also adding a debug message to check the reference counter
of nbd_config during module removal.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20220521073749.3146892-3-yukuai3@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 06c4da89c24e7023ea448cadf8e9daf06a0aae6e ]
Otherwise there may be race between module removal and the handling of
netlink command, which can lead to the oops as shown below:
BUG: kernel NULL pointer dereference, address: 0000000000000098
Oops: 0002 [#1] SMP PTI
CPU: 1 PID: 31299 Comm: nbd-client Tainted: G E 5.14.0-rc4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
RIP: 0010:down_write+0x1a/0x50
Call Trace:
start_creating+0x89/0x130
debugfs_create_dir+0x1b/0x130
nbd_start_device+0x13d/0x390 [nbd]
nbd_genl_connect+0x42f/0x748 [nbd]
genl_family_rcv_msg_doit.isra.0+0xec/0x150
genl_rcv_msg+0xe5/0x1e0
netlink_rcv_skb+0x55/0x100
genl_rcv+0x29/0x40
netlink_unicast+0x1a8/0x250
netlink_sendmsg+0x21b/0x430
____sys_sendmsg+0x2a4/0x2d0
___sys_sendmsg+0x81/0xc0
__sys_sendmsg+0x62/0xb0
__x64_sys_sendmsg+0x1f/0x30
do_syscall_64+0x3b/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae
Modules linked in: nbd(E-)
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20220521073749.3146892-2-yukuai3@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 62952cc5bccd89b76d710de1d0b43244af0f2903 ]
The discard_alignment queue limit is named a bit misleading means the
offset into the block device at which the discard granularity starts.
On the other hand the discard_sector_alignment from the virtio 1.1 looks
similar to what Linux uses as discard granularity (even if not very well
described):
"discard_sector_alignment can be used by OS when splitting a request
based on alignment. "
And at least qemu does set it to the discard granularity.
So stop setting the discard_alignment and use the virtio
discard_sector_alignment to set the discard granularity.
Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220418045314.360785-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 33cb0917bbe241dd17a2b87ead63514c1b7e5615 ]
There are two initializers for P_RETRY_WRITE:
drivers/block/drbd/drbd_main.c:3676:22: warning: initialized field overwritten [-Woverride-init]
Remove the first one since it was already ignored by the compiler
and reorder the list to match the enum definition. As P_ZEROES had
no entry, add that one instead.
Fixes: 036b17eaab93 ("drbd: Receiving part for the PROTOCOL_UPDATE packet")
Fixes: f31e583aa2c2 ("drbd: introduce P_ZEROES (REQ_OP_WRITE_ZEROES on the "wire")")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20220406190715.1938174-2-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 491bf8f236fdeec698fa6744993f1ecf3fafd1a5 ]
When userspace closes the socket before sending a disconnect
request, the following I/O requests will be blocked in
wait_for_reconnect() until dead timeout. This will cause the
following disconnect request also hung on blk_mq_quiesce_queue().
That means we have no way to disconnect a nbd device if there
are some I/O requests waiting for reconnecting until dead timeout.
It's not expected. So let's wake up the thread waiting for
reconnecting directly when a disconnect request is sent.
Reported-by: Xu Jianhai <zero.xu@bytedance.com>
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20220322080639.142-1-xieyongji@bytedance.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|