summaryrefslogtreecommitdiff
path: root/drivers/block
AgeCommit message (Collapse)AuthorFilesLines
2022-01-27floppy: Add max size check for user space requestXiongwei Song1-1/+3
[ Upstream commit 545a32498c536ee152331cd2e7d2416aa0f20e01 ] We need to check the max request size that is from user space before allocating pages. If the request size exceeds the limit, return -EINVAL. This check can avoid the warning below from page allocator. WARNING: CPU: 3 PID: 16525 at mm/page_alloc.c:5344 current_gfp_context include/linux/sched/mm.h:195 [inline] WARNING: CPU: 3 PID: 16525 at mm/page_alloc.c:5344 __alloc_pages+0x45d/0x500 mm/page_alloc.c:5356 Modules linked in: CPU: 3 PID: 16525 Comm: syz-executor.3 Not tainted 5.15.0-syzkaller #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 RIP: 0010:__alloc_pages+0x45d/0x500 mm/page_alloc.c:5344 Code: be c9 00 00 00 48 c7 c7 20 4a 97 89 c6 05 62 32 a7 0b 01 e8 74 9a 42 07 e9 6a ff ff ff 0f 0b e9 a0 fd ff ff 40 80 e5 3f eb 88 <0f> 0b e9 18 ff ff ff 4c 89 ef 44 89 e6 45 31 ed e8 1e 76 ff ff e9 RSP: 0018:ffffc90023b87850 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 1ffff92004770f0b RCX: dffffc0000000000 RDX: 0000000000000000 RSI: 0000000000000033 RDI: 0000000000010cc1 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001 R10: ffffffff81bb4686 R11: 0000000000000001 R12: ffffffff902c1960 R13: 0000000000000033 R14: 0000000000000000 R15: ffff88804cf64a30 FS: 0000000000000000(0000) GS:ffff88802cd00000(0063) knlGS:00000000f44b4b40 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 000000002c921000 CR3: 000000004f507000 CR4: 0000000000150ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> alloc_pages+0x1a7/0x300 mm/mempolicy.c:2191 __get_free_pages+0x8/0x40 mm/page_alloc.c:5418 raw_cmd_copyin drivers/block/floppy.c:3113 [inline] raw_cmd_ioctl drivers/block/floppy.c:3160 [inline] fd_locked_ioctl+0x12e5/0x2820 drivers/block/floppy.c:3528 fd_ioctl drivers/block/floppy.c:3555 [inline] fd_compat_ioctl+0x891/0x1b60 drivers/block/floppy.c:3869 compat_blkdev_ioctl+0x3b8/0x810 block/ioctl.c:662 __do_compat_sys_ioctl+0x1c7/0x290 fs/ioctl.c:972 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178 do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:203 entry_SYSENTER_compat_after_hwframe+0x4d/0x5c Reported-by: syzbot+23a02c7df2cf2bc93fa2@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20211116131033.27685-1-sxwjean@me.com Signed-off-by: Xiongwei Song <sxwjean@gmail.com> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27floppy: Fix hang in watchdog when disk is ejectedTasos Sahanidis1-1/+1
[ Upstream commit fb48febce7e30baed94dd791e19521abd2c3fd83 ] When the watchdog detects a disk change, it calls cancel_activity(), which in turn tries to cancel the fd_timer delayed work. In the above scenario, fd_timer_fn is set to fd_watchdog(), meaning it is trying to cancel its own work. This results in a hang as cancel_delayed_work_sync() is waiting for the watchdog (itself) to return, which never happens. This can be reproduced relatively consistently by attempting to read a broken floppy, and ejecting it while IO is being attempted and retried. To resolve this, this patch calls cancel_delayed_work() instead, which cancels the work without waiting for the watchdog to return and finish. Before this regression was introduced, the code in this section used del_timer(), and not del_timer_sync() to delete the watchdog timer. Link: https://lore.kernel.org/r/399e486c-6540-db27-76aa-7a271b061f76@tasossah.com Fixes: 070ad7e793dc ("floppy: convert to delayed work and single-thread wq") Signed-off-by: Tasos Sahanidis <tasos@tasossah.com> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-22xen/blkfront: harden blkfront against event channel stormsJuergen Gross1-3/+12
commit 0fd08a34e8e3b67ec9bd8287ac0facf8374b844a upstream. The Xen blkfront driver is still vulnerable for an attack via excessive number of events sent by the backend. Fix that by using lateeoi event channels. This is part of XSA-391 Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01xen/blkfront: don't trust the backend response data blindlyJuergen Gross1-17/+53
commit b94e4b147fd1992ad450e1fea1fdaa3738753373 upstream. Today blkfront will trust the backend to send only sane response data. In order to avoid privilege escalations or crashes in case of malicious backends verify the data to be within expected limits. Especially make sure that the response always references an outstanding request. Introduce a new state of the ring BLKIF_STATE_ERROR which will be switched to in case an inconsistency is being detected. Recovering from this state is possible only via removing and adding the virtual device again (e.g. via a suspend/resume cycle). Make all warning messages issued due to valid error responses rate limited in order to avoid message floods being triggered by a malicious backend. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20210730103854.12681-4-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01xen/blkfront: don't take local copy of a request from the ring pageJuergen Gross1-10/+15
commit 8f5a695d99000fc3aa73934d7ced33cfc64dcdab upstream. In order to avoid a malicious backend being able to influence the local copy of a request build the request locally first and then copy it to the ring page instead of doing it the other way round as today. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20210730103854.12681-3-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01xen/blkfront: read response from backend only onceJuergen Gross1-17/+18
commit 71b66243f9898d0e54296b4e7035fb33cdcb0707 upstream. In order to avoid problems in case the backend is modifying a response on the ring page while the frontend has already seen it, just read the response into a local buffer in one go and then operate on that buffer only. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20210730103854.12681-2-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-17zram: off by one in read_block_state()Dan Carpenter1-1/+1
[ Upstream commit a88e03cf3d190cf46bc4063a9b7efe87590de5f4 ] snprintf() returns the number of bytes it would have printed if there were space. But it does not count the NUL terminator. So that means that if "count == copied" then this has already overflowed by one character. This bug likely isn't super harmful in real life. Link: https://lkml.kernel.org/r/20210916130404.GA25094@kili Fixes: c0265342bff4 ("zram: introduce zram memory tracking") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-17block: ataflop: fix breakage introduced at blk-mq refactoringMichael Schmitz1-15/+3
[ Upstream commit 86d46fdaa12ae5befc16b8d73fc85a3ca0399ea6 ] Refactoring of the Atari floppy driver when converting to blk-mq has broken the state machine in not-so-subtle ways: finish_fdc() must be called when operations on the floppy device have completed. This is crucial in order to relase the ST-DMA lock, which protects against concurrent access to the ST-DMA controller by other drivers (some DMA related, most just related to device register access - broken beyond compare, I know). When rewriting the driver's old do_request() function, the fact that finish_fdc() was called only when all queued requests had completed appears to have been overlooked. Instead, the new request function calls finish_fdc() immediately after the last request has been queued. finish_fdc() executes a dummy seek after most requests, and this overwrites the state machine's interrupt hander that was set up to wait for completion of the read/write request just prior. To make matters worse, finish_fdc() is called before device interrupts are re-enabled, making certain that the read/write interupt is missed. Shifting the finish_fdc() call into the read/write request completion handler ensures the driver waits for the request to actually complete. With a queue depth of 2, we won't see long request sequences, so calling finish_fdc() unconditionally just adds a little overhead for the dummy seeks, and keeps the code simple. While we're at it, kill ataflop_commit_rqs() which does nothing but run finish_fdc() unconditionally, again likely wiping out an in-flight request. Signed-off-by: Michael Schmitz <schmitzmic@gmail.com> Fixes: 6ec3938cff95 ("ataflop: convert to blk-mq") CC: linux-block@vger.kernel.org CC: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Link: https://lore.kernel.org/r/20211019061321.26425-1-schmitzmic@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-16Revert "block: nbd: add sanity check for first_minor"Greg Kroah-Hartman1-10/+0
This reverts commit b3fa499d72a0db612f12645265a36751955c0037 which is commit b1a811633f7321cf1ae2bb76a66805b7720e44c9 upstream. The backport of this is reported to be causing some problems, so revert this for now until they are worked out. Link: https://lore.kernel.org/r/CACPK8XfUWoOHr-0RwRoYoskia4fbAbZ7DYf5wWBnv6qUnGq18w@mail.gmail.com Reported-by: Joel Stanley <joel@jms.id.au> Cc: Christoph Hellwig <hch@lst.de> Cc: Pavel Skripkin <paskripkin@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-15block: nbd: add sanity check for first_minorPavel Skripkin1-0/+10
[ Upstream commit b1a811633f7321cf1ae2bb76a66805b7720e44c9 ] Syzbot hit WARNING in internal_create_group(). The problem was in too big disk->first_minor. disk->first_minor is initialized by value, which comes from userspace and there wasn't any sanity checks about value correctness. It can cause duplicate creation of sysfs files/links, because disk->first_minor will be passed to MKDEV() which causes truncation to byte. Since maximum minor value is 0xff, let's check if first_minor is correct minor number. NOTE: the root case of the reported warning was in wrong error handling in register_disk(), but we can avoid passing knowingly wrong values to sysfs API, because sysfs error messages can confuse users. For example: user passed 1048576 as index, but sysfs complains about duplicate creation of /dev/block/43:0. It's not obvious how 1048576 becomes 0. Log and reproducer for above example can be found on syzkaller bug report page. Link: https://syzkaller.appspot.com/bug?id=03c2ae9146416edf811958d5fd7acfab75b143d1 Fixes: b0d9111a2d53 ("nbd: use an idr to keep track of nbd devices") Reported-by: syzbot+9937dc42271cd87d4b98@syzkaller.appspotmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-12cryptoloop: add a deprecation warningChristoph Hellwig2-2/+4
[ Upstream commit 222013f9ac30b9cec44301daa8dbd0aae38abffb ] Support for cryptoloop has been officially marked broken and deprecated in favor of dm-crypt (which supports the same broken algorithms if needed) in Linux 2.6.4 (released in March 2004), and support for it has been entirely removed from losetup in util-linux 2.23 (released in April 2013). Add a warning and a deprecation schedule. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210827163250.255325-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-03Revert "floppy: reintroduce O_NDELAY fix"Denis Efremov1-14/+13
commit c7e9d0020361f4308a70cdfd6d5335e273eb8717 upstream. The patch breaks userspace implementations (e.g. fdutils) and introduces regressions in behaviour. Previously, it was possible to O_NDELAY open a floppy device with no media inserted or with write protected media without an error. Some userspace tools use this particular behavior for probing. It's not the first time when we revert this patch. Previous revert is in commit f2791e7eadf4 (Revert "floppy: refactor open() flags handling"). This reverts commit 8a0c014cd20516ade9654fc13b51345ec58e7be8. Link: https://lore.kernel.org/linux-block/de10cb47-34d1-5a88-7751-225ca380f735@compro.net/ Reported-by: Mark Hounschell <markh@compro.net> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Wim Osterholt <wim@djo.tudelft.nl> Cc: Kurt Garloff <kurt@garloff.de> Cc: <stable@vger.kernel.org> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-08-18nbd: Aovid double completion of a requestXie Yongji1-3/+11
[ Upstream commit cddce01160582a5f52ada3da9626c052d852ec42 ] There is a race between iterating over requests in nbd_clear_que() and completing requests in recv_work(), which can lead to double completion of a request. To fix it, flush the recv worker before iterating over the requests and don't abort the completed request while iterating. Fixes: 96d97e17828f ("nbd: clear_sock on netlink disconnect") Reported-by: Jiang Yadong <jiangyadong@bytedance.com> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20210813151330.96-1-xieyongji@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-28rbd: always kick acquire on "acquired" and "released" notificationsIlya Dryomov1-13/+7
commit 8798d070d416d18a75770fc19787e96705073f43 upstream. Skipping the "lock has been released" notification if the lock owner is not what we expect based on owner_cid can lead to I/O hangs. One example is our own notifications: because owner_cid is cleared in rbd_unlock(), when we get our own notification it is processed as unexpected/duplicate and maybe_kick_acquire() isn't called. If a peer that requested the lock then doesn't go through with acquiring it, I/O requests that came in while the lock was being quiesced would be stalled until another I/O request is submitted and kicks acquire from rbd_img_exclusive_lock(). This makes the comment in rbd_release_lock() actually true: prior to this change the canceled work was being requeued in response to the "lock has been acquired" notification from rbd_handle_acquired_lock(). Cc: stable@vger.kernel.org # 5.3+ Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Robin Geuze <robin.geuze@nl.team.blue> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-28rbd: don't hold lock_rwsem while running_list is being drainedIlya Dryomov1-7/+5
commit ed9eb71085ecb7ded9a5118cec2ab70667cc7350 upstream. Currently rbd_quiesce_lock() holds lock_rwsem for read while blocking on releasing_wait completion. On the I/O completion side, each image request also needs to take lock_rwsem for read. Because rw_semaphore implementation doesn't allow new readers after a writer has indicated interest in the lock, this can result in a deadlock if something that needs to take lock_rwsem for write gets involved. For example: 1. watch error occurs 2. rbd_watch_errcb() takes lock_rwsem for write, clears owner_cid and releases lock_rwsem 3. after reestablishing the watch, rbd_reregister_watch() takes lock_rwsem for write and calls rbd_reacquire_lock() 4. rbd_quiesce_lock() downgrades lock_rwsem to for read and blocks on releasing_wait until running_list becomes empty 5. another watch error occurs 6. rbd_watch_errcb() blocks trying to take lock_rwsem for write 7. no in-flight image request can complete and delete itself from running_list because lock_rwsem won't be granted anymore A similar scenario can occur with "lock has been acquired" and "lock has been released" notification handers which also take lock_rwsem for write to update owner_cid. We don't actually get anything useful from sitting on lock_rwsem in rbd_quiesce_lock() -- owner_cid updates certainly don't need to be synchronized with. In fact the whole owner_cid tracking logic could probably be removed from the kernel client because we don't support proxied maintenance operations. Cc: stable@vger.kernel.org # 5.3+ URL: https://tracker.ceph.com/issues/42757 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Robin Geuze <robin.geuze@nl.team.blue> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-20virtio-blk: Fix memory leak among suspend/resume procedureXie Yongji1-0/+2
[ Upstream commit b71ba22e7c6c6b279c66f53ee7818709774efa1f ] The vblk->vqs should be freed before we call init_vqs() in virtblk_restore(). Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Link: https://lore.kernel.org/r/20210517084332.280-1-xieyongji@bytedance.com Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-19nbd: Fix NULL pointer in flush_workqueueSun Ke1-1/+2
[ Upstream commit 79ebe9110fa458d58f1fceb078e2068d7ad37390 ] Open /dev/nbdX first, the config_refs will be 1 and the pointers in nbd_device are still null. Disconnect /dev/nbdX, then reference a null recv_workq. The protection by config_refs in nbd_genl_disconnect is useless. [ 656.366194] BUG: kernel NULL pointer dereference, address: 0000000000000020 [ 656.368943] #PF: supervisor write access in kernel mode [ 656.369844] #PF: error_code(0x0002) - not-present page [ 656.370717] PGD 10cc87067 P4D 10cc87067 PUD 1074b4067 PMD 0 [ 656.371693] Oops: 0002 [#1] SMP [ 656.372242] CPU: 5 PID: 7977 Comm: nbd-client Not tainted 5.11.0-rc5-00040-g76c057c84d28 #1 [ 656.373661] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 [ 656.375904] RIP: 0010:mutex_lock+0x29/0x60 [ 656.376627] Code: 00 0f 1f 44 00 00 55 48 89 fd 48 83 05 6f d7 fe 08 01 e8 7a c3 ff ff 48 83 05 6a d7 fe 08 01 31 c0 65 48 8b 14 25 00 6d 01 00 <f0> 48 0f b1 55 d [ 656.378934] RSP: 0018:ffffc900005eb9b0 EFLAGS: 00010246 [ 656.379350] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 656.379915] RDX: ffff888104cf2600 RSI: ffffffffaae8f452 RDI: 0000000000000020 [ 656.380473] RBP: 0000000000000020 R08: 0000000000000000 R09: ffff88813bd6b318 [ 656.381039] R10: 00000000000000c7 R11: fefefefefefefeff R12: ffff888102710b40 [ 656.381599] R13: ffffc900005eb9e0 R14: ffffffffb2930680 R15: ffff88810770ef00 [ 656.382166] FS: 00007fdf117ebb40(0000) GS:ffff88813bd40000(0000) knlGS:0000000000000000 [ 656.382806] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 656.383261] CR2: 0000000000000020 CR3: 0000000100c84000 CR4: 00000000000006e0 [ 656.383819] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 656.384370] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 656.384927] Call Trace: [ 656.385111] flush_workqueue+0x92/0x6c0 [ 656.385395] nbd_disconnect_and_put+0x81/0xd0 [ 656.385716] nbd_genl_disconnect+0x125/0x2a0 [ 656.386034] genl_family_rcv_msg_doit.isra.0+0x102/0x1b0 [ 656.386422] genl_rcv_msg+0xfc/0x2b0 [ 656.386685] ? nbd_ioctl+0x490/0x490 [ 656.386954] ? genl_family_rcv_msg_doit.isra.0+0x1b0/0x1b0 [ 656.387354] netlink_rcv_skb+0x62/0x180 [ 656.387638] genl_rcv+0x34/0x60 [ 656.387874] netlink_unicast+0x26d/0x590 [ 656.388162] netlink_sendmsg+0x398/0x6c0 [ 656.388451] ? netlink_rcv_skb+0x180/0x180 [ 656.388750] ____sys_sendmsg+0x1da/0x320 [ 656.389038] ? ____sys_recvmsg+0x130/0x220 [ 656.389334] ___sys_sendmsg+0x8e/0xf0 [ 656.389605] ? ___sys_recvmsg+0xa2/0xf0 [ 656.389889] ? handle_mm_fault+0x1671/0x21d0 [ 656.390201] __sys_sendmsg+0x6d/0xe0 [ 656.390464] __x64_sys_sendmsg+0x23/0x30 [ 656.390751] do_syscall_64+0x45/0x70 [ 656.391017] entry_SYSCALL_64_after_hwframe+0x44/0xa9 To fix it, just add if (nbd->recv_workq) to nbd_disconnect_and_put(). Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs") Signed-off-by: Sun Ke <sunke32@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20210512114331.1233964-2-sunke32@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14drivers/block/null_blk/main: Fix a double free in null_init.Lv Yunlong1-0/+1
[ Upstream commit 72ce11ddfa4e9e1879103581a60b7e34547eaa0a ] In null_init, null_add_dev(dev) is called. In null_add_dev, it calls null_free_zoned_dev(dev) to free dev->zones via kvfree(dev->zones) in out_cleanup_zone branch and returns err. Then null_init accept the err code and then calls null_free_dev(dev). But in null_free_dev(dev), dev->zones is freed again by null_free_zoned_dev(). My patch set dev->zones to NULL in null_free_zoned_dev() after kvfree(dev->zones) is called, to avoid the double free. Fixes: 2984c8684f962 ("nullb: factor disk parameters") Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn> Link: https://lore.kernel.org/r/20210426143229.7374-1-lyl2019@mail.ustc.edu.cn Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-14xen-blkback: fix compatibility bug with single page ringsPaul Durrant2-22/+17
[ Upstream commit d75e7f63b7c95c527cde42efb5d410d7f961498f ] Prior to commit 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to avoid inconsistent xenstore 'ring-page-order' set by malicious blkfront"), the behaviour of xen-blkback when connecting to a frontend was: - read 'ring-page-order' - if not present then expect a single page ring specified by 'ring-ref' - else expect a ring specified by 'ring-refX' where X is between 0 and 1 << ring-page-order This was correct behaviour, but was broken by the afforementioned commit to become: - read 'ring-page-order' - if not present then expect a single page ring (i.e. ring-page-order = 0) - expect a ring specified by 'ring-refX' where X is between 0 and 1 << ring-page-order - if that didn't work then see if there's a single page ring specified by 'ring-ref' This incorrect behaviour works most of the time but fails when a frontend that sets 'ring-page-order' is unloaded and replaced by one that does not because, instead of reading 'ring-ref', xen-blkback will read the stale 'ring-ref0' left around by the previous frontend will try to map the wrong grant reference. This patch restores the original behaviour. Fixes: 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to avoid inconsistent xenstore 'ring-page-order' set by malicious blkfront") Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: "Roger Pau Monné" <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20210202175659.18452-1-paul@xen.org Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-30xen-blkback: don't leak persistent grants from xen_blkbk_map()Jan Beulich1-1/+1
commit a846738f8c3788d846ed1f587270d2f2e3d32432 upstream. The fix for XSA-365 zapped too many of the ->persistent_gnt[] entries. Ones successfully obtained should not be overwritten, but instead left for xen_blkbk_unmap_prepare() to pick up and put. This is XSA-371. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: stable@vger.kernel.org Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wl@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17zram: fix return value on writeback_storeMinchan Kim1-3/+8
commit 57e0076e6575a7b7cef620a0bd2ee2549ef77818 upstream. writeback_store's return value is overwritten by submit_bio_wait's return value. Thus, writeback_store will return zero since there was no IO error. In the end, write syscall from userspace will see the zero as return value, which could make the process stall to keep trying the write until it will succeed. Link: https://lkml.kernel.org/r/20210312173949.2197662-1-minchan@kernel.org Fixes: 3b82a051c101("drivers/block/zram/zram_drv.c: fix error return codes not being returned in writeback_store") Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: John Dias <joaodias@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-17block: rsxx: fix error return code of rsxx_pci_probe()Jia-Ju Bai1-0/+1
[ Upstream commit df66617bfe87487190a60783d26175b65d2502ce ] When create_singlethread_workqueue returns NULL to card->event_wq, no error return code of rsxx_pci_probe() is assigned. To fix this bug, st is assigned with -ENOMEM in this case. Fixes: 8722ff8cdbfa ("block: IBM RamSan 70/80 device driver") Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Link: https://lore.kernel.org/r/20210310033017.4023-1-baijiaju1990@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-09rsxx: Return -EFAULT if copy_to_user() failsDan Carpenter1-3/+5
[ Upstream commit 77516d25f54912a7baedeeac1b1b828b6f285152 ] The copy_to_user() function returns the number of bytes remaining but we want to return -EFAULT to the user if it can't complete the copy. The "st" variable only holds zero on success or negative error codes on failure so the type should be int. Fixes: 36f988e978f8 ("rsxx: Adding in debugfs entries.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-07zsmalloc: account the number of compacted pages correctlyRokudo Yan1-1/+1
commit 2395928158059b8f9858365fce7713ce7fef62e4 upstream. There exists multiple path may do zram compaction concurrently. 1. auto-compaction triggered during memory reclaim 2. userspace utils write zram<id>/compaction node So, multiple threads may call zs_shrinker_scan/zs_compact concurrently. But pages_compacted is a per zsmalloc pool variable and modification of the variable is not serialized(through under class->lock). There are two issues here: 1. the pages_compacted may not equal to total number of pages freed(due to concurrently add). 2. zs_shrinker_scan may not return the correct number of pages freed(issued by current shrinker). The fix is simple: 1. account the number of pages freed in zs_compact locally. 2. use actomic variable pages_compacted to accumulate total number. Link: https://lkml.kernel.org/r/20210202122235.26885-1-wu-yan@tcl.com Fixes: 860c707dca155a56 ("zsmalloc: account the number of compacted pages") Signed-off-by: Rokudo Yan <wu-yan@tcl.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-07nbd: handle device refs for DESTROY_ON_DISCONNECT properlyJosef Bacik1-13/+19
commit c9a2f90f4d6b9d42b9912f7aaf68e8d748acfffd upstream. There exists a race where we can be attempting to create a new nbd configuration while a previous configuration is going down, both configured with DESTROY_ON_DISCONNECT. Normally devices all have a reference of 1, as they won't be cleaned up until the module is torn down. However with DESTROY_ON_DISCONNECT we'll make sure that there is only 1 reference (generally) on the device for the config itself, and then once the config is dropped, the device is torn down. The race that exists looks like this TASK1 TASK2 nbd_genl_connect() idr_find() refcount_inc_not_zero(nbd) * count is 2 here ^^ nbd_config_put() nbd_put(nbd) (count is 1) setup new config check DESTROY_ON_DISCONNECT put_dev = true if (put_dev) nbd_put(nbd) * free'd here ^^ In nbd_genl_connect() we assume that the nbd ref count will be 2, however clearly that won't be true if the nbd device had been setup as DESTROY_ON_DISCONNECT with its prior configuration. Fix this by getting rid of the runtime flag to check if we need to mess with the nbd device refcount, and use the device NBD_DESTROY_ON_DISCONNECT flag to check if we need to adjust the ref counts. This was reported by syzkaller with the following kasan dump BUG: KASAN: use-after-free in instrument_atomic_read include/linux/instrumented.h:71 [inline] BUG: KASAN: use-after-free in atomic_read include/asm-generic/atomic-instrumented.h:27 [inline] BUG: KASAN: use-after-free in refcount_dec_not_one+0x71/0x1e0 lib/refcount.c:76 Read of size 4 at addr ffff888143bf71a0 by task systemd-udevd/8451 CPU: 0 PID: 8451 Comm: systemd-udevd Not tainted 5.11.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:230 __kasan_report mm/kasan/report.c:396 [inline] kasan_report.cold+0x79/0xd5 mm/kasan/report.c:413 check_memory_region_inline mm/kasan/generic.c:179 [inline] check_memory_region+0x13d/0x180 mm/kasan/generic.c:185 instrument_atomic_read include/linux/instrumented.h:71 [inline] atomic_read include/asm-generic/atomic-instrumented.h:27 [inline] refcount_dec_not_one+0x71/0x1e0 lib/refcount.c:76 refcount_dec_and_mutex_lock+0x19/0x140 lib/refcount.c:115 nbd_put drivers/block/nbd.c:248 [inline] nbd_release+0x116/0x190 drivers/block/nbd.c:1508 __blkdev_put+0x548/0x800 fs/block_dev.c:1579 blkdev_put+0x92/0x570 fs/block_dev.c:1632 blkdev_close+0x8c/0xb0 fs/block_dev.c:1640 __fput+0x283/0x920 fs/file_table.c:280 task_work_run+0xdd/0x190 kernel/task_work.c:140 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:174 [inline] exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:201 __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline] syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fc1e92b5270 Code: 73 01 c3 48 8b 0d 38 7d 20 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 59 c1 20 00 00 75 10 b8 03 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee fb ff ff 48 89 04 24 RSP: 002b:00007ffe8beb2d18 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000007 RCX: 00007fc1e92b5270 RDX: 000000000aba9500 RSI: 0000000000000000 RDI: 0000000000000007 RBP: 00007fc1ea16f710 R08: 000000000000004a R09: 0000000000000008 R10: 0000562f8cb0b2a8 R11: 0000000000000246 R12: 0000000000000000 R13: 0000562f8cb0afd0 R14: 0000000000000003 R15: 000000000000000e Allocated by task 1: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:401 [inline] ____kasan_kmalloc.constprop.0+0x82/0xa0 mm/kasan/common.c:429 kmalloc include/linux/slab.h:552 [inline] kzalloc include/linux/slab.h:682 [inline] nbd_dev_add+0x44/0x8e0 drivers/block/nbd.c:1673 nbd_init+0x250/0x271 drivers/block/nbd.c:2394 do_one_initcall+0x103/0x650 init/main.c:1223 do_initcall_level init/main.c:1296 [inline] do_initcalls init/main.c:1312 [inline] do_basic_setup init/main.c:1332 [inline] kernel_init_freeable+0x605/0x689 init/main.c:1533 kernel_init+0xd/0x1b8 init/main.c:1421 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296 Freed by task 8451: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38 kasan_set_track+0x1c/0x30 mm/kasan/common.c:46 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:356 ____kasan_slab_free+0xe1/0x110 mm/kasan/common.c:362 kasan_slab_free include/linux/kasan.h:192 [inline] slab_free_hook mm/slub.c:1547 [inline] slab_free_freelist_hook+0x5d/0x150 mm/slub.c:1580 slab_free mm/slub.c:3143 [inline] kfree+0xdb/0x3b0 mm/slub.c:4139 nbd_dev_remove drivers/block/nbd.c:243 [inline] nbd_put.part.0+0x180/0x1d0 drivers/block/nbd.c:251 nbd_put drivers/block/nbd.c:295 [inline] nbd_config_put+0x6dd/0x8c0 drivers/block/nbd.c:1242 nbd_release+0x103/0x190 drivers/block/nbd.c:1507 __blkdev_put+0x548/0x800 fs/block_dev.c:1579 blkdev_put+0x92/0x570 fs/block_dev.c:1632 blkdev_close+0x8c/0xb0 fs/block_dev.c:1640 __fput+0x283/0x920 fs/file_table.c:280 task_work_run+0xdd/0x190 kernel/task_work.c:140 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:174 [inline] exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:201 __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline] syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The buggy address belongs to the object at ffff888143bf7000 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 416 bytes inside of 1024-byte region [ffff888143bf7000, ffff888143bf7400) The buggy address belongs to the page: page:000000005238f4ce refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x143bf0 head:000000005238f4ce order:3 compound_mapcount:0 compound_pincount:0 flags: 0x57ff00000010200(slab|head) raw: 057ff00000010200 ffffea00004b1400 0000000300000003 ffff888010c41140 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888143bf7080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888143bf7100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff888143bf7180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888143bf7200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Reported-and-tested-by: syzbot+429d3f82d757c211bff3@syzkaller.appspotmail.com Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-03-04floppy: reintroduce O_NDELAY fixJiri Kosina1-13/+14
commit 8a0c014cd20516ade9654fc13b51345ec58e7be8 upstream. This issue was originally fixed in 09954bad4 ("floppy: refactor open() flags handling"). The fix as a side-effect, however, introduce issue for open(O_ACCMODE) that is being used for ioctl-only open. I wrote a fix for that, but instead of it being merged, full revert of 09954bad4 was performed, re-introducing the O_NDELAY / O_NONBLOCK issue, and it strikes again. This is a forward-port of the original fix to current codebase; the original submission had the changelog below: ==== Commit 09954bad4 ("floppy: refactor open() flags handling"), as a side-effect, causes open(/dev/fdX, O_ACCMODE) to fail. It turns out that this is being used setfdprm userspace for ioctl-only open(). Reintroduce back the original behavior wrt !(FMODE_READ|FMODE_WRITE) modes, while still keeping the original O_NDELAY bug fixed. Link: https://lore.kernel.org/r/nycvar.YFH.7.76.2101221209060.5622@cbobk.fhfr.pm Cc: stable@vger.kernel.org Reported-by: Wim Osterholt <wim@djo.tudelft.nl> Tested-by: Wim Osterholt <wim@djo.tudelft.nl> Reported-and-tested-by: Kurt Garloff <kurt@garloff.de> Fixes: 09954bad4 ("floppy: refactor open() flags handling") Fixes: f2791e7ead ("Revert "floppy: refactor open() flags handling"") Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-23xen-blkback: fix error handling in xen_blkbk_map()Jan Beulich1-10/+14
commit 871997bc9e423f05c7da7c9178e62dde5df2a7f8 upstream. The function uses a goto-based loop, which may lead to an earlier error getting discarded by a later iteration. Exit this ad-hoc loop when an error was encountered. The out-of-memory error path additionally fails to fill a structure field looked at by xen_blkbk_unmap_prepare() before inspecting the handle which does get properly set (to BLKBACK_INVALID_HANDLE). Since the earlier exiting from the ad-hoc loop requires the same field filling (invalidation) as that on the out-of-memory path, fold both paths. While doing so, drop the pr_alert(), as extra log messages aren't going to help the situation (the kernel will log oom conditions already anyway). This is XSA-365. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Julien Grall <julien@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-23xen-blkback: don't "handle" error by BUG()Jan Beulich1-4/+2
commit 5a264285ed1cd32e26d9de4f3c8c6855e467fd63 upstream. In particular -ENOMEM may come back here, from set_foreign_p2m_mapping(). Don't make problems worse, the more that handling elsewhere (together with map's status fields now indicating whether a mapping wasn't even attempted, and hence has to be considered failed) doesn't require this odd way of dealing with errors. This is part of XSA-362. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: stable@vger.kernel.org Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-04xen-blkfront: allow discard-* nodes to be optionalRoger Pau Monne1-13/+7
commit 0549cd67b01016b579047bce045b386202a8bcfc upstream. This is inline with the specification described in blkif.h: * discard-granularity: should be set to the physical block size if node is not present. * discard-alignment, discard-secure: should be set to 0 if node not present. This was detected as QEMU would only create the discard-granularity node but not discard-alignment, and thus the setup done in blkfront_setup_discard would fail. Fix blkfront_setup_discard to not fail on missing nodes, and also fix blkif_set_queue_limits to set the discard granularity to the physical block size if none is specified in xenbus. Fixes: ed30bf317c5ce ('xen-blkfront: Handle discard requests.') Reported-by: Arthur Borsboom <arthurborsboom@gmail.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Tested-By: Arthur Borsboom <arthurborsboom@gmail.com> Link: https://lore.kernel.org/r/20210119105727.95173-1-roger.pau@citrix.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-04nbd: freeze the queue while we're adding connectionsJosef Bacik1-0/+8
commit b98e762e3d71e893b221f871825dc64694cfb258 upstream. When setting up a device, we can krealloc the config->socks array to add new sockets to the configuration. However if we happen to get a IO request in at this point even though we aren't setup we could hit a UAF, as we deref config->socks without any locking, assuming that the configuration was setup already and that ->socks is safe to access it as we have a reference on the configuration. But there's nothing really preventing IO from occurring at this point of the device setup, we don't want to incur the overhead of a lock to access ->socks when it will never change while the device is running. To fix this UAF scenario simply freeze the queue if we are adding sockets. This will protect us from this particular case without adding any additional overhead for the normal running case. Cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-17block: rsxx: select CONFIG_CRC32Arnd Bergmann1-0/+1
commit 36a106a4c1c100d55ba3d32a21ef748cfcd4fa99 upstream. Without crc32, the driver fails to link: arm-linux-gnueabi-ld: drivers/block/rsxx/config.o: in function `rsxx_load_config': config.c:(.text+0x124): undefined reference to `crc32_le' Fixes: 8722ff8cdbfa ("block: IBM RamSan 70/80 device driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-06null_blk: Fix zone size initializationDamien Le Moal1-7/+12
commit 0ebcdd702f49aeb0ad2e2d894f8c124a0acc6e23 upstream. For a null_blk device with zoned mode enabled is currently initialized with a number of zones equal to the device capacity divided by the zone size, without considering if the device capacity is a multiple of the zone size. If the zone size is not a divisor of the capacity, the zones end up not covering the entire capacity, potentially resulting is out of bounds accesses to the zone array. Fix this by adding one last smaller zone with a size equal to the remainder of the disk capacity divided by the zone size if the capacity is not a multiple of the zone size. For such smaller last zone, the zone capacity is also checked so that it does not exceed the smaller zone size. Reported-by: Naohiro Aota <naohiro.aota@wdc.com> Fixes: ca4b2a011948 ("null_blk: add zone support") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30xen/xenbus: Add 'will_handle' callback support in xenbus_watch_path()SeongJae Park1-1/+2
commit 2e85d32b1c865bec703ce0c962221a5e955c52c2 upstream. Some code does not directly make 'xenbus_watch' object and call 'register_xenbus_watch()' but use 'xenbus_watch_path()' instead. This commit adds support of 'will_handle' callback in the 'xenbus_watch_path()' and it's wrapper, 'xenbus_watch_pathfmt()'. This is part of XSA-349 Cc: stable@vger.kernel.org Signed-off-by: SeongJae Park <sjpark@amazon.de> Reported-by: Michael Kurth <mku@amazon.de> Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30xen-blkback: set ring->xenblkd to NULL after kthread_stop()Pawel Wieczorkiewicz1-0/+1
commit 1c728719a4da6e654afb9cc047164755072ed7c9 upstream. When xen_blkif_disconnect() is called, the kernel thread behind the block interface is stopped by calling kthread_stop(ring->xenblkd). The ring->xenblkd thread pointer being non-NULL determines if the thread has been already stopped. Normally, the thread's function xen_blkif_schedule() sets the ring->xenblkd to NULL, when the thread's main loop ends. However, when the thread has not been started yet (i.e. wake_up_process() has not been called on it), the xen_blkif_schedule() function would not be called yet. In such case the kthread_stop() call returns -EINTR and the ring->xenblkd remains dangling. When this happens, any consecutive call to xen_blkif_disconnect (for example in frontend_changed() callback) leads to a kernel crash in kthread_stop() (e.g. NULL pointer dereference in exit_creds()). This is XSA-350. Cc: <stable@vger.kernel.org> # 4.12 Fixes: a24fa22ce22a ("xen/blkback: don't use xen_blkif_get() in xen-blkback kthread") Reported-by: Olivier Benjamin <oliben@amazon.com> Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de> Signed-off-by: Pawel Wieczorkiewicz <wipawel@amazon.de> Reviewed-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-18nbd: fix a block_device refcount leak in nbd_releaseChristoph Hellwig1-0/+1
[ Upstream commit 2bd645b2d3f0bacadaa6037f067538e1cd4e42ef ] bdget_disk needs to be paired with bdput to not leak a reference on the block device inode. Fixes: 08ba91ee6e2c ("nbd: Add the nbd NBD_DISCONNECT_ON_CLOSE config flag.") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-18nbd: don't update block size after device is startedMing Lei1-4/+5
[ Upstream commit b40813ddcd6bf9f01d020804e4cb8febc480b9e4 ] Mounted NBD device can be resized, one use case is rbd-nbd. Fix the issue by setting up default block size, then not touch it in nbd_size_update() any more. This kind of usage is aligned with loop which has same use case too. Cc: stable@vger.kernel.org Fixes: c8a83a6b54d0 ("nbd: Use set_blocksize() to set device blocksize") Reported-by: lining <lining2020x@163.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Jan Kara <jack@suse.cz> Tested-by: lining <lining2020x@163.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-05nbd: make the config put is called before the notifying the waiterXiubo Li1-1/+1
[ Upstream commit 87aac3a80af5cbad93e63250e8a1e19095ba0d30 ] There has one race case for ceph's rbd-nbd tool. When do mapping it may fail with EBUSY from ioctl(nbd, NBD_DO_IT), but actually the nbd device has already unmaped. It dues to if just after the wake_up(), the recv_work() is scheduled out and defers calling the nbd_config_put(), though the map process has exited the "nbd->recv_task" is not cleared. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-11-05xen/blkback: use lateeoi irq bindingJuergen Gross2-8/+19
commit 01263a1fabe30b4d542f34c7e2364a22587ddaf2 upstream. In order to reduce the chance for the system becoming unresponsive due to event storms triggered by a misbehaving blkfront use the lateeoi irq binding for blkback and unmask the event channel only after processing all pending requests. As the thread processing requests is used to do purging work in regular intervals an EOI may be sent only after having received an event. If there was no pending I/O request flag the EOI as spurious. This is part of XSA-332. Cc: stable@vger.kernel.org Reported-by: Julien Grall <julien@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wl@xen.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-17rbd: require global CAP_SYS_ADMIN for mapping and unmappingIlya Dryomov1-0/+12
commit f44d04e696feaf13d192d942c4f14ad2e117065a upstream. It turns out that currently we rely only on sysfs attribute permissions: $ ll /sys/bus/rbd/{add*,remove*} --w------- 1 root root 4096 Sep 3 20:37 /sys/bus/rbd/add --w------- 1 root root 4096 Sep 3 20:37 /sys/bus/rbd/add_single_major --w------- 1 root root 4096 Sep 3 20:37 /sys/bus/rbd/remove --w------- 1 root root 4096 Sep 3 20:38 /sys/bus/rbd/remove_single_major This means that images can be mapped and unmapped (i.e. block devices can be created and deleted) by a UID 0 process even after it drops all privileges or by any process with CAP_DAC_OVERRIDE in its user namespace as long as UID 0 is mapped into that user namespace. Be consistent with other virtual block devices (loop, nbd, dm, md, etc) and require CAP_SYS_ADMIN in the initial user namespace for mapping and unmapping, and also for dumping the configuration string and refreshing the image header. Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-09nbd: restore default timeout when setting it to zeroHou Pu1-0/+2
[ Upstream commit acb19e17c5134dd78668c429ecba5b481f038e6a ] If we configured io timeout of nbd0 to 100s. Later after we finished using it, we configured nbd0 again and set the io timeout to 0. We expect it would timeout after 30 seconds and keep retry. But in fact we could not change the timeout when we set it to 0. the timeout is still the original 100s. So change the timeout to default 30s when we set it to zero. It also behaves same as commit 2da22da57348 ("nbd: fix zero cmd timeout handling v2"). It becomes more important if we were reconfigure a nbd device and the io timeout it set to zero. Because it could take 30s to detect the new socket and thus io could be completed more quickly compared to 100s. Signed-off-by: Hou Pu <houpu@bytedance.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-03block: loop: set discard granularity and alignment for block device backed loopMing Lei1-15/+18
commit bcb21c8cc9947286211327d663ace69f07d37a76 upstream. In case of block device backend, if the backend supports write zeros, the loop device will set queue flag of QUEUE_FLAG_DISCARD. However, limits.discard_granularity isn't setup, and this way is wrong, see the following description in Documentation/ABI/testing/sysfs-block: A discard_granularity of 0 means that the device does not support discard functionality. Especially 9b15d109a6b2 ("block: improve discard bio alignment in __blkdev_issue_discard()") starts to take q->limits.discard_granularity for computing max discard sectors. And zero discard granularity may cause kernel oops, or fail discard request even though the loop queue claims discard support via QUEUE_FLAG_DISCARD. Fix the issue by setup discard granularity and alignment. Fixes: c52abf563049 ("loop: Better discard support for block devices") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Coly Li <colyli@suse.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Xiao Ni <xni@redhat.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Evan Green <evgreen@chromium.org> Cc: Gwendal Grignou <gwendal@chromium.org> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Andrzej Pietrasiewicz <andrzej.p@collabora.com> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-03null_blk: fix passing of REQ_FUA flag in null_handle_rqHou Pu1-1/+1
[ Upstream commit 2d62e6b038e729c3e4bfbfcfbd44800ef0883680 ] REQ_FUA should be checked using rq->cmd_flags instead of req_op(). Fixes: deb78b419dfda ("nullb: emulate cache") Signed-off-by: Hou Pu <houpu@bytedance.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-03block: virtio_blk: fix handling single range discard requestMing Lei1-8/+23
[ Upstream commit af822aa68fbdf0a480a17462ed70232998127453 ] 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support") starts to support multi-range discard for virtio-blk. However, the virtio-blk disk may report max discard segment as 1, at least that is exactly what qemu is doing. So far, block layer switches to normal request merge if max discard segment limit is 1, and multiple bios can be merged to single segment. This way may cause memory corruption in virtblk_setup_discard_write_zeroes(). Fix the issue by handling single max discard segment in straightforward way. Fixes: 1f23816b8eb8 ("virtio_blk: add discard and write zeroes support") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Changpeng Liu <changpeng.liu@intel.com> Cc: Daniel Verkamp <dverkamp@chromium.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-19loop: be paranoid on exit and prevent new additions / removalsLuis Chamberlain1-0/+4
[ Upstream commit 200f93377220504c5e56754823e7adfea6037f1a ] Be pedantic on removal as well and hold the mutex. This should prevent uses of addition while we exit. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-22Revert "zram: convert remaining CLASS_ATTR() to CLASS_ATTR_RO()"Wade Mealing1-1/+2
commit 853eab68afc80f59f36bbdeb715e5c88c501e680 upstream. Turns out that the permissions for 0400 really are what we want here, otherwise any user can read from this file. [fixed formatting, added changelog, and made attribute static - gregkh] Reported-by: Wade Mealing <wmealing@redhat.com> Cc: stable <stable@vger.kernel.org> Fixes: f40609d1591f ("zram: convert remaining CLASS_ATTR() to CLASS_ATTR_RO()") Link: https://bugzilla.redhat.com/show_bug.cgi?id=1847832 Reviewed-by: Steffen Maier <maier@linux.ibm.com> Acked-by: Minchan Kim <minchan@kernel.org> Link: https://lore.kernel.org/r/20200617114946.GA2131650@kroah.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-16nbd: Fix memory leak in nbd_add_socketZheng Bin1-10/+15
[ Upstream commit 579dd91ab3a5446b148e7f179b6596b270dace46 ] When adding first socket to nbd, if nsock's allocation failed, the data structure member "config->socks" was reallocated, but the data structure member "config->num_connections" was not updated. A memory leak will occur then because the function "nbd_config_put" will free "config->socks" only when "config->num_connections" is not zero. Fixes: 03bf73c315ed ("nbd: prevent memory leak") Reported-by: syzbot+934037347002901b8d2a@syzkaller.appspotmail.com Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-09virtio-blk: free vblk-vqs in error path of virtblk_probe()Hou Tao1-0/+1
[ Upstream commit e7eea44eefbdd5f0345a0a8b80a3ca1c21030d06 ] Else there will be memory leak if alloc_disk() fails. Fixes: 6a27b656fc02 ("block: virtio-blk: support multi virt queues per virtio-blk device") Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-30loop: replace kill_bdev with invalidate_bdevZheng Bin1-3/+3
commit f4bd34b139a3fa2808c4205f12714c65e1548c6c upstream. When a filesystem is mounted on a loop device and on a loop ioctl LOOP_SET_STATUS64, because of kill_bdev, buffer_head mappings are getting destroyed. kill_bdev truncate_inode_pages truncate_inode_pages_range do_invalidatepage block_invalidatepage discard_buffer -->clear BH_Mapped flag sb_bread __bread_gfp bh = __getblk_gfp -->discard_buffer clear BH_Mapped flag __bread_slow submit_bh submit_bh_wbc BUG_ON(!buffer_mapped(bh)) --> hit this BUG_ON Fixes: 5db470e229e2 ("loop: drop caches if offset or block_size are changed") Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-24ps3disk: use the default segment boundaryEmmanuel Nicolet1-1/+0
[ Upstream commit 720bc316690bd27dea9d71510b50f0cd698ffc32 ] Since commit dcebd755926b ("block: use bio_for_each_bvec() to compute multi-page bvec count"), the kernel will bug_on on the PS3 because bio_split() is called with sectors == 0: kernel BUG at block/bio.c:1853! Oops: Exception in kernel mode, sig: 5 [#1] BE PAGE_SIZE=4K MMU=Hash PREEMPT SMP NR_CPUS=8 NUMA PS3 Modules linked in: firewire_sbp2 rtc_ps3(+) soundcore ps3_gelic(+) \ ps3rom(+) firewire_core ps3vram(+) usb_common crc_itu_t CPU: 0 PID: 97 Comm: blkid Not tainted 5.3.0-rc4 #1 NIP: c00000000027d0d0 LR: c00000000027d0b0 CTR: 0000000000000000 REGS: c00000000135ae90 TRAP: 0700 Not tainted (5.3.0-rc4) MSR: 8000000000028032 <SF,EE,IR,DR,RI> CR: 44008240 XER: 20000000 IRQMASK: 0 GPR00: c000000000289368 c00000000135b120 c00000000084a500 c000000004ff8300 GPR04: 0000000000000c00 c000000004c905e0 c000000004c905e0 000000000000ffff GPR08: 0000000000000000 0000000000000001 0000000000000000 000000000000ffff GPR12: 0000000000000000 c0000000008ef000 000000000000003e 0000000000080001 GPR16: 0000000000000100 000000000000ffff 0000000000000000 0000000000000004 GPR20: c00000000062fd7e 0000000000000001 000000000000ffff 0000000000000080 GPR24: c000000000781788 c00000000135b350 0000000000000080 c000000004c905e0 GPR28: c00000000135b348 c000000004ff8300 0000000000000000 c000000004c90000 NIP [c00000000027d0d0] .bio_split+0x28/0xac LR [c00000000027d0b0] .bio_split+0x8/0xac Call Trace: [c00000000135b120] [c00000000027d130] .bio_split+0x88/0xac (unreliable) [c00000000135b1b0] [c000000000289368] .__blk_queue_split+0x11c/0x53c [c00000000135b2d0] [c00000000028f614] .blk_mq_make_request+0x80/0x7d4 [c00000000135b3d0] [c000000000283a8c] .generic_make_request+0x118/0x294 [c00000000135b4b0] [c000000000283d34] .submit_bio+0x12c/0x174 [c00000000135b580] [c000000000205a44] .mpage_bio_submit+0x3c/0x4c [c00000000135b600] [c000000000206184] .mpage_readpages+0xa4/0x184 [c00000000135b750] [c0000000001ff8fc] .blkdev_readpages+0x24/0x38 [c00000000135b7c0] [c0000000001589f0] .read_pages+0x6c/0x1a8 [c00000000135b8b0] [c000000000158c74] .__do_page_cache_readahead+0x118/0x184 [c00000000135b9b0] [c0000000001591a8] .force_page_cache_readahead+0xe4/0xe8 [c00000000135ba50] [c00000000014fc24] .generic_file_read_iter+0x1d8/0x830 [c00000000135bb50] [c0000000001ffadc] .blkdev_read_iter+0x40/0x5c [c00000000135bbc0] [c0000000001b9e00] .new_sync_read+0x144/0x1a0 [c00000000135bcd0] [c0000000001bc454] .vfs_read+0xa0/0x124 [c00000000135bd70] [c0000000001bc7a4] .ksys_read+0x70/0xd8 [c00000000135be20] [c00000000000a524] system_call+0x5c/0x70 Instruction dump: 7fe3fb78 482e30dc 7c0802a6 482e3085 7c9e2378 f821ff71 7ca42b78 7d3e00d0 7c7d1b78 79290fe0 7cc53378 69290001 <0b090000> 81230028 7bca0020 7929ba62 [ end trace 313fec760f30aa1f ]--- The problem originates from setting the segment boundary of the request queue to -1UL. This makes get_max_segment_size() return zero when offset is zero, whatever the max segment size. The test with BLK_SEG_BOUNDARY_MASK fails and 'mask - (mask & offset) + 1' overflows to zero in the return statement. Not setting the segment boundary and using the default value (BLK_SEG_BOUNDARY_MASK) fixes the problem. Signed-off-by: Emmanuel Nicolet <emmanuel.nicolet@gmail.com> Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/060a416c43138f45105c0540eff1a45539f7e2fc.1589049250.git.geoff@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-17block/floppy: fix contended case in floppy_queue_rq()Jiri Kosina1-5/+5
commit 263c61581a38d0a5ad1f5f4a9143b27d68caeffd upstream. Since the switch of floppy driver to blk-mq, the contended (fdc_busy) case in floppy_queue_rq() is not handled correctly. In case we reach floppy_queue_rq() with fdc_busy set (i.e. with the floppy locked due to another request still being in-flight), we put the request on the list of requests and return BLK_STS_OK to the block core, without actually scheduling delayed work / doing further processing of the request. This means that processing of this request is postponed until another request comes and passess uncontended. Which in some cases might actually never happen and we keep waiting indefinitely. The simple testcase is for i in `seq 1 2000`; do echo -en $i '\r'; blkid --info /dev/fd0 2> /dev/null; done run in quemu. That reliably causes blkid eventually indefinitely hanging in __floppy_read_block_0() waiting for completion, as the BIO callback never happens, and no further IO is ever submitted on the (non-existent) floppy device. This was observed reliably on qemu-emulated device. Fix that by not queuing the request in the contended case, and return BLK_STS_RESOURCE instead, so that blk core handles the request rescheduling and let it pass properly non-contended later. Fixes: a9f38e1dec107a ("floppy: convert to blk-mq") Cc: stable@vger.kernel.org Tested-by: Libor Pechacek <lpechacek@suse.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>