| Age | Commit message (Collapse) | Author | Files | Lines |
|
commit a846738f8c3788d846ed1f587270d2f2e3d32432 upstream.
The fix for XSA-365 zapped too many of the ->persistent_gnt[] entries.
Ones successfully obtained should not be overwritten, but instead left
for xen_blkbk_unmap_prepare() to pick up and put.
This is XSA-371.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit eeb05595d22c19c8f814ff893dcf88ec277a2365 ]
Fix to return negative error code -ENOMEM from the blk_alloc_queue()
and dma_alloc_coherent() error handling cases instead of 0, as done
elsewhere in this function.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Link: https://lore.kernel.org/r/20210308123501.2573816-1-weiyongjun1@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 2766f1821600cc7562bae2128ad0b163f744c5d9 upstream.
commit 0d8359620d9b ("zram: support page writeback") introduced two
problems. It overwrites writeback_store's return value as kstrtol's
return value, which makes return value zero so user could see zero as
return value of write syscall even though it wrote data successfully.
It also breaks index value in the loop in that it doesn't increase the
index any longer. It means it can write only first starting block index
so user couldn't write all idle pages in the zram so lose memory saving
chance.
This patch fixes those issues.
Link: https://lkml.kernel.org/r/20210312173949.2197662-2-minchan@kernel.org
Fixes: 0d8359620d9b("zram: support page writeback")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Amos Bianchi <amosbianchi@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: John Dias <joaodias@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 57e0076e6575a7b7cef620a0bd2ee2549ef77818 upstream.
writeback_store's return value is overwritten by submit_bio_wait's return
value. Thus, writeback_store will return zero since there was no IO
error. In the end, write syscall from userspace will see the zero as
return value, which could make the process stall to keep trying the write
until it will succeed.
Link: https://lkml.kernel.org/r/20210312173949.2197662-1-minchan@kernel.org
Fixes: 3b82a051c101("drivers/block/zram/zram_drv.c: fix error return codes not being returned in writeback_store")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: John Dias <joaodias@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit df66617bfe87487190a60783d26175b65d2502ce ]
When create_singlethread_workqueue returns NULL to card->event_wq, no
error return code of rsxx_pci_probe() is assigned.
To fix this bug, st is assigned with -ENOMEM in this case.
Fixes: 8722ff8cdbfa ("block: IBM RamSan 70/80 device driver")
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Link: https://lore.kernel.org/r/20210310033017.4023-1-baijiaju1990@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 77516d25f54912a7baedeeac1b1b828b6f285152 ]
The copy_to_user() function returns the number of bytes remaining but
we want to return -EFAULT to the user if it can't complete the copy.
The "st" variable only holds zero on success or negative error codes on
failure so the type should be int.
Fixes: 36f988e978f8 ("rsxx: Adding in debugfs entries.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit c9a2f90f4d6b9d42b9912f7aaf68e8d748acfffd upstream.
There exists a race where we can be attempting to create a new nbd
configuration while a previous configuration is going down, both
configured with DESTROY_ON_DISCONNECT. Normally devices all have a
reference of 1, as they won't be cleaned up until the module is torn
down. However with DESTROY_ON_DISCONNECT we'll make sure that there is
only 1 reference (generally) on the device for the config itself, and
then once the config is dropped, the device is torn down.
The race that exists looks like this
TASK1 TASK2
nbd_genl_connect()
idr_find()
refcount_inc_not_zero(nbd)
* count is 2 here ^^
nbd_config_put()
nbd_put(nbd) (count is 1)
setup new config
check DESTROY_ON_DISCONNECT
put_dev = true
if (put_dev) nbd_put(nbd)
* free'd here ^^
In nbd_genl_connect() we assume that the nbd ref count will be 2,
however clearly that won't be true if the nbd device had been setup as
DESTROY_ON_DISCONNECT with its prior configuration. Fix this by getting
rid of the runtime flag to check if we need to mess with the nbd device
refcount, and use the device NBD_DESTROY_ON_DISCONNECT flag to check if
we need to adjust the ref counts. This was reported by syzkaller with
the following kasan dump
BUG: KASAN: use-after-free in instrument_atomic_read include/linux/instrumented.h:71 [inline]
BUG: KASAN: use-after-free in atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
BUG: KASAN: use-after-free in refcount_dec_not_one+0x71/0x1e0 lib/refcount.c:76
Read of size 4 at addr ffff888143bf71a0 by task systemd-udevd/8451
CPU: 0 PID: 8451 Comm: systemd-udevd Not tainted 5.11.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x107/0x163 lib/dump_stack.c:120
print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:230
__kasan_report mm/kasan/report.c:396 [inline]
kasan_report.cold+0x79/0xd5 mm/kasan/report.c:413
check_memory_region_inline mm/kasan/generic.c:179 [inline]
check_memory_region+0x13d/0x180 mm/kasan/generic.c:185
instrument_atomic_read include/linux/instrumented.h:71 [inline]
atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
refcount_dec_not_one+0x71/0x1e0 lib/refcount.c:76
refcount_dec_and_mutex_lock+0x19/0x140 lib/refcount.c:115
nbd_put drivers/block/nbd.c:248 [inline]
nbd_release+0x116/0x190 drivers/block/nbd.c:1508
__blkdev_put+0x548/0x800 fs/block_dev.c:1579
blkdev_put+0x92/0x570 fs/block_dev.c:1632
blkdev_close+0x8c/0xb0 fs/block_dev.c:1640
__fput+0x283/0x920 fs/file_table.c:280
task_work_run+0xdd/0x190 kernel/task_work.c:140
tracehook_notify_resume include/linux/tracehook.h:189 [inline]
exit_to_user_mode_loop kernel/entry/common.c:174 [inline]
exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:201
__syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline]
syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fc1e92b5270
Code: 73 01 c3 48 8b 0d 38 7d 20 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 59 c1 20 00 00 75 10 b8 03 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee fb ff ff 48 89 04 24
RSP: 002b:00007ffe8beb2d18 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000007 RCX: 00007fc1e92b5270
RDX: 000000000aba9500 RSI: 0000000000000000 RDI: 0000000000000007
RBP: 00007fc1ea16f710 R08: 000000000000004a R09: 0000000000000008
R10: 0000562f8cb0b2a8 R11: 0000000000000246 R12: 0000000000000000
R13: 0000562f8cb0afd0 R14: 0000000000000003 R15: 000000000000000e
Allocated by task 1:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:46 [inline]
set_alloc_info mm/kasan/common.c:401 [inline]
____kasan_kmalloc.constprop.0+0x82/0xa0 mm/kasan/common.c:429
kmalloc include/linux/slab.h:552 [inline]
kzalloc include/linux/slab.h:682 [inline]
nbd_dev_add+0x44/0x8e0 drivers/block/nbd.c:1673
nbd_init+0x250/0x271 drivers/block/nbd.c:2394
do_one_initcall+0x103/0x650 init/main.c:1223
do_initcall_level init/main.c:1296 [inline]
do_initcalls init/main.c:1312 [inline]
do_basic_setup init/main.c:1332 [inline]
kernel_init_freeable+0x605/0x689 init/main.c:1533
kernel_init+0xd/0x1b8 init/main.c:1421
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
Freed by task 8451:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_set_track+0x1c/0x30 mm/kasan/common.c:46
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:356
____kasan_slab_free+0xe1/0x110 mm/kasan/common.c:362
kasan_slab_free include/linux/kasan.h:192 [inline]
slab_free_hook mm/slub.c:1547 [inline]
slab_free_freelist_hook+0x5d/0x150 mm/slub.c:1580
slab_free mm/slub.c:3143 [inline]
kfree+0xdb/0x3b0 mm/slub.c:4139
nbd_dev_remove drivers/block/nbd.c:243 [inline]
nbd_put.part.0+0x180/0x1d0 drivers/block/nbd.c:251
nbd_put drivers/block/nbd.c:295 [inline]
nbd_config_put+0x6dd/0x8c0 drivers/block/nbd.c:1242
nbd_release+0x103/0x190 drivers/block/nbd.c:1507
__blkdev_put+0x548/0x800 fs/block_dev.c:1579
blkdev_put+0x92/0x570 fs/block_dev.c:1632
blkdev_close+0x8c/0xb0 fs/block_dev.c:1640
__fput+0x283/0x920 fs/file_table.c:280
task_work_run+0xdd/0x190 kernel/task_work.c:140
tracehook_notify_resume include/linux/tracehook.h:189 [inline]
exit_to_user_mode_loop kernel/entry/common.c:174 [inline]
exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:201
__syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline]
syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The buggy address belongs to the object at ffff888143bf7000
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 416 bytes inside of
1024-byte region [ffff888143bf7000, ffff888143bf7400)
The buggy address belongs to the page:
page:000000005238f4ce refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x143bf0
head:000000005238f4ce order:3 compound_mapcount:0 compound_pincount:0
flags: 0x57ff00000010200(slab|head)
raw: 057ff00000010200 ffffea00004b1400 0000000300000003 ffff888010c41140
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888143bf7080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888143bf7100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff888143bf7180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888143bf7200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Reported-and-tested-by: syzbot+429d3f82d757c211bff3@syzkaller.appspotmail.com
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8a0c014cd20516ade9654fc13b51345ec58e7be8 upstream.
This issue was originally fixed in 09954bad4 ("floppy: refactor open()
flags handling").
The fix as a side-effect, however, introduce issue for open(O_ACCMODE)
that is being used for ioctl-only open. I wrote a fix for that, but
instead of it being merged, full revert of 09954bad4 was performed,
re-introducing the O_NDELAY / O_NONBLOCK issue, and it strikes again.
This is a forward-port of the original fix to current codebase; the
original submission had the changelog below:
====
Commit 09954bad4 ("floppy: refactor open() flags handling"), as a
side-effect, causes open(/dev/fdX, O_ACCMODE) to fail. It turns out that
this is being used setfdprm userspace for ioctl-only open().
Reintroduce back the original behavior wrt !(FMODE_READ|FMODE_WRITE)
modes, while still keeping the original O_NDELAY bug fixed.
Link: https://lore.kernel.org/r/nycvar.YFH.7.76.2101221209060.5622@cbobk.fhfr.pm
Cc: stable@vger.kernel.org
Reported-by: Wim Osterholt <wim@djo.tudelft.nl>
Tested-by: Wim Osterholt <wim@djo.tudelft.nl>
Reported-and-tested-by: Kurt Garloff <kurt@garloff.de>
Fixes: 09954bad4 ("floppy: refactor open() flags handling")
Fixes: f2791e7ead ("Revert "floppy: refactor open() flags handling"")
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2395928158059b8f9858365fce7713ce7fef62e4 upstream.
There exists multiple path may do zram compaction concurrently.
1. auto-compaction triggered during memory reclaim
2. userspace utils write zram<id>/compaction node
So, multiple threads may call zs_shrinker_scan/zs_compact concurrently.
But pages_compacted is a per zsmalloc pool variable and modification
of the variable is not serialized(through under class->lock).
There are two issues here:
1. the pages_compacted may not equal to total number of pages
freed(due to concurrently add).
2. zs_shrinker_scan may not return the correct number of pages
freed(issued by current shrinker).
The fix is simple:
1. account the number of pages freed in zs_compact locally.
2. use actomic variable pages_compacted to accumulate total number.
Link: https://lkml.kernel.org/r/20210202122235.26885-1-wu-yan@tcl.com
Fixes: 860c707dca155a56 ("zsmalloc: account the number of compacted pages")
Signed-off-by: Rokudo Yan <wu-yan@tcl.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 871997bc9e423f05c7da7c9178e62dde5df2a7f8 upstream.
The function uses a goto-based loop, which may lead to an earlier error
getting discarded by a later iteration. Exit this ad-hoc loop when an
error was encountered.
The out-of-memory error path additionally fails to fill a structure
field looked at by xen_blkbk_unmap_prepare() before inspecting the
handle which does get properly set (to BLKBACK_INVALID_HANDLE).
Since the earlier exiting from the ad-hoc loop requires the same field
filling (invalidation) as that on the out-of-memory path, fold both
paths. While doing so, drop the pr_alert(), as extra log messages aren't
going to help the situation (the kernel will log oom conditions already
anyway).
This is XSA-365.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <julien@xen.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5a264285ed1cd32e26d9de4f3c8c6855e467fd63 upstream.
In particular -ENOMEM may come back here, from set_foreign_p2m_mapping().
Don't make problems worse, the more that handling elsewhere (together
with map's status fields now indicating whether a mapping wasn't even
attempted, and hence has to be considered failed) doesn't require this
odd way of dealing with errors.
This is part of XSA-362.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Pull block fixes from Jens Axboe:
"All over the place fixes for this release:
- blk-cgroup iteration teardown resched fix (Baolin)
- NVMe pull request from Christoph:
- add another Write Zeroes quirk (Chaitanya Kulkarni)
- handle a no path available corner case (Daniel Wagner)
- use the proper RCU aware list_add helper (Chao Leng)
- bcache regression fix (Coly)
- bdev->bd_size_lock IRQ fix. This will be fixed in drivers for 5.12,
but for now, we'll make it IRQ safe (Damien)
- null_blk zoned init fix (Damien)
- add_partition() error handling fix (Dinghao)
- s390 dasd kobject fix (Jan)
- nbd fix for freezing queue while adding connections (Josef)
- tag queueing regression fix (Ming)
- revert of a patch that inadvertently meant that we regressed write
performance on raid (Maxim)"
* tag 'block-5.11-2021-01-29' of git://git.kernel.dk/linux-block:
null_blk: cleanup zoned mode initialization
nvme-core: use list_add_tail_rcu instead of list_add_tail for nvme_init_ns_head
nvme-multipath: Early exit if no path is available
nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a SPCC device
bcache: only check feature sets when sb->version >= BCACHE_SB_VERSION_CDEV_WITH_FEATURES
block: fix bd_size_lock use
blk-cgroup: Use cond_resched() when destroy blkgs
Revert "block: simplify set_init_blocksize" to regain lost performance
nbd: freeze the queue while we're adding connections
s390/dasd: Fix inconsistent kobject removal
block: Fix an error handling in add_partition
blk-mq: test QUEUE_FLAG_HCTX_ACTIVE for sbitmap_shared in hctx_may_queue
|
|
To avoid potential compilation problems, replaced the badly written
MB_TO_SECTS() macro (missing parenthesis around the argument use) with
the inline function mb_to_sects(). And while at it, simplify the
calculation of the total number of zones of the device using the
round_up() macro.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- A fix for a regression introduced in 5.11 resulting in Xen dom0
having problems to correctly initialize Xenstore.
- A fix for avoiding WARN splats when booting as Xen dom0 with
CONFIG_AMD_MEM_ENCRYPT enabled due to a missing trap handler for the
#VC exception (even if the handler should never be called).
- A fix for the Xen bklfront driver adapting to the correct but
unexpected behavior of new qemu.
* tag 'for-linus-5.11-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: avoid warning in Xen pv guest with CONFIG_AMD_MEM_ENCRYPT enabled
xen: Fix XenStore initialisation for XS_LOCAL
xen-blkfront: allow discard-* nodes to be optional
|
|
This is inline with the specification described in blkif.h:
* discard-granularity: should be set to the physical block size if
node is not present.
* discard-alignment, discard-secure: should be set to 0 if node not
present.
This was detected as QEMU would only create the discard-granularity
node but not discard-alignment, and thus the setup done in
blkfront_setup_discard would fail.
Fix blkfront_setup_discard to not fail on missing nodes, and also fix
blkif_set_queue_limits to set the discard granularity to the physical
block size if none is specified in xenbus.
Fixes: ed30bf317c5ce ('xen-blkfront: Handle discard requests.')
Reported-by: Arthur Borsboom <arthurborsboom@gmail.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Tested-By: Arthur Borsboom <arthurborsboom@gmail.com>
Link: https://lore.kernel.org/r/20210119105727.95173-1-roger.pau@citrix.com
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
When setting up a device, we can krealloc the config->socks array to add
new sockets to the configuration. However if we happen to get a IO
request in at this point even though we aren't setup we could hit a UAF,
as we deref config->socks without any locking, assuming that the
configuration was setup already and that ->socks is safe to access it as
we have a reference on the configuration.
But there's nothing really preventing IO from occurring at this point of
the device setup, we don't want to incur the overhead of a lock to
access ->socks when it will never change while the device is running.
To fix this UAF scenario simply freeze the queue if we are adding
sockets. This will protect us from this particular case without adding
any additional overhead for the normal running case.
Cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We had kernel panic, it is caused by unload module and last
close confirmation.
call trace:
[1196029.743127] free_sess+0x15/0x50 [rtrs_client]
[1196029.743128] rtrs_clt_close+0x4c/0x70 [rtrs_client]
[1196029.743129] ? rnbd_clt_unmap_device+0x1b0/0x1b0 [rnbd_client]
[1196029.743130] close_rtrs+0x25/0x50 [rnbd_client]
[1196029.743131] rnbd_client_exit+0x93/0xb99 [rnbd_client]
[1196029.743132] __x64_sys_delete_module+0x190/0x260
And in the crashdump confirmation kworker is also running.
PID: 6943 TASK: ffff9e2ac8098000 CPU: 4 COMMAND: "kworker/4:2"
#0 [ffffb206cf337c30] __schedule at ffffffff9f93f891
#1 [ffffb206cf337cc8] schedule at ffffffff9f93fe98
#2 [ffffb206cf337cd0] schedule_timeout at ffffffff9f943938
#3 [ffffb206cf337d50] wait_for_completion at ffffffff9f9410a7
#4 [ffffb206cf337da0] __flush_work at ffffffff9f08ce0e
#5 [ffffb206cf337e20] rtrs_clt_close_conns at ffffffffc0d5f668 [rtrs_client]
#6 [ffffb206cf337e48] rtrs_clt_close at ffffffffc0d5f801 [rtrs_client]
#7 [ffffb206cf337e68] close_rtrs at ffffffffc0d26255 [rnbd_client]
#8 [ffffb206cf337e78] free_sess at ffffffffc0d262ad [rnbd_client]
#9 [ffffb206cf337e88] rnbd_clt_put_dev at ffffffffc0d266a7 [rnbd_client]
The problem is both code path try to close same session, which lead to
panic.
To fix it, just skip the sess if the refcount already drop to 0.
Fixes: f7a7a5c228d4 ("block/rnbd: client: main functionality")
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Adding name to the Contributors List
Signed-off-by: Swapnil Ingle <ingleswapnil@gmail.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Acked-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Since dynamically allocate sglist is used for rnbd_iu, we can't free sg
table after send_usr_msg since the callback function (cqe.done) could
still access the sglist.
Otherwise KASAN reports UAF issue:
[ 4856.600257] BUG: KASAN: use-after-free in dma_direct_unmap_sg+0x53/0x290
[ 4856.600772] Read of size 4 at addr ffff888206af3a98 by task swapper/1/0
[ 4856.601729] CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Tainted: G W 5.10.0-pserver #5.10.0-1+feature+linux+next+20201214.1025+0910d71
[ 4856.601748] Hardware name: Supermicro Super Server/X11DDW-L, BIOS 3.3 02/21/2020
[ 4856.601766] Call Trace:
[ 4856.601785] <IRQ>
[ 4856.601822] dump_stack+0x99/0xcb
[ 4856.601856] ? dma_direct_unmap_sg+0x53/0x290
[ 4856.601888] print_address_description.constprop.7+0x1e/0x230
[ 4856.601913] ? freeze_kernel_threads+0x73/0x73
[ 4856.601965] ? mark_held_locks+0x29/0xa0
[ 4856.602019] ? dma_direct_unmap_sg+0x53/0x290
[ 4856.602039] ? dma_direct_unmap_sg+0x53/0x290
[ 4856.602079] kasan_report.cold.9+0x37/0x7c
[ 4856.602188] ? mlx5_ib_post_recv+0x430/0x520 [mlx5_ib]
[ 4856.602209] ? dma_direct_unmap_sg+0x53/0x290
[ 4856.602256] dma_direct_unmap_sg+0x53/0x290
[ 4856.602366] complete_rdma_req+0x188/0x4b0 [rtrs_client]
[ 4856.602451] ? rtrs_clt_close+0x80/0x80 [rtrs_client]
[ 4856.602535] ? mlx5_ib_poll_cq+0x48b/0x16e0 [mlx5_ib]
[ 4856.602589] ? radix_tree_insert+0x3a0/0x3a0
[ 4856.602610] ? do_raw_spin_lock+0x119/0x1d0
[ 4856.602647] ? rwlock_bug.part.1+0x60/0x60
[ 4856.602740] rtrs_clt_rdma_done+0x3f7/0x670 [rtrs_client]
[ 4856.602804] ? rtrs_clt_rdma_cm_handler+0xda0/0xda0 [rtrs_client]
[ 4856.602857] ? check_flags.part.31+0x6c/0x1f0
[ 4856.602927] ? rcu_read_lock_sched_held+0xaf/0xe0
[ 4856.602963] ? rcu_read_lock_bh_held+0xc0/0xc0
[ 4856.603137] __ib_process_cq+0x10a/0x350 [ib_core]
[ 4856.603309] ib_poll_handler+0x41/0x1c0 [ib_core]
[ 4856.603358] irq_poll_softirq+0xe6/0x280
[ 4856.603392] ? lockdep_hardirqs_on_prepare+0x111/0x210
[ 4856.603446] __do_softirq+0x10d/0x646
[ 4856.603540] asm_call_irq_on_stack+0x12/0x20
[ 4856.603563] </IRQ>
[ 4856.605096] Allocated by task 8914:
[ 4856.605510] kasan_save_stack+0x19/0x40
[ 4856.605532] __kasan_kmalloc.constprop.7+0xc1/0xd0
[ 4856.605552] __kmalloc+0x155/0x320
[ 4856.605574] __sg_alloc_table+0x155/0x1c0
[ 4856.605594] sg_alloc_table+0x1f/0x50
[ 4856.605620] send_msg_sess_info+0x119/0x2e0 [rnbd_client]
[ 4856.605646] remap_devs+0x71/0x210 [rnbd_client]
[ 4856.605676] init_sess+0xad8/0xe10 [rtrs_client]
[ 4856.605706] rtrs_clt_reconnect_work+0xd6/0x170 [rtrs_client]
[ 4856.605728] process_one_work+0x521/0xa90
[ 4856.605748] worker_thread+0x65/0x5b0
[ 4856.605769] kthread+0x1f2/0x210
[ 4856.605789] ret_from_fork+0x22/0x30
[ 4856.606159] Freed by task 8914:
[ 4856.606559] kasan_save_stack+0x19/0x40
[ 4856.606580] kasan_set_track+0x1c/0x30
[ 4856.606601] kasan_set_free_info+0x1b/0x30
[ 4856.606622] __kasan_slab_free+0x108/0x150
[ 4856.606642] slab_free_freelist_hook+0x64/0x190
[ 4856.606661] kfree+0xe2/0x650
[ 4856.606681] __sg_free_table+0xa4/0x100
[ 4856.606707] send_msg_sess_info+0x1d6/0x2e0 [rnbd_client]
[ 4856.606733] remap_devs+0x71/0x210 [rnbd_client]
[ 4856.606763] init_sess+0xad8/0xe10 [rtrs_client]
[ 4856.606792] rtrs_clt_reconnect_work+0xd6/0x170 [rtrs_client]
[ 4856.606813] process_one_work+0x521/0xa90
[ 4856.606833] worker_thread+0x65/0x5b0
[ 4856.606853] kthread+0x1f2/0x210
[ 4856.606872] ret_from_fork+0x22/0x30
The solution is to free iu's sgtable after the iu is not used anymore.
And also move sg_alloc_table into rnbd_get_iu accordingly.
Fixes: 5a1328d0c3a7 ("block/rnbd-clt: Dynamically allocate sglist for rnbd_iu")
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
KASAN detect following BUG:
[ 778.215311] ==================================================================
[ 778.216696] BUG: KASAN: use-after-free in rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
[ 778.219037] Read of size 8 at addr ffff88b1d6516c28 by task tee/8842
[ 778.220500] CPU: 37 PID: 8842 Comm: tee Kdump: loaded Not tainted 5.10.0-pserver #5.10.0-1+feature+linux+next+20201214.1025+0910d71
[ 778.220529] Hardware name: Supermicro Super Server/X11DDW-L, BIOS 3.3 02/21/2020
[ 778.220555] Call Trace:
[ 778.220609] dump_stack+0x99/0xcb
[ 778.220667] ? rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
[ 778.220715] print_address_description.constprop.7+0x1e/0x230
[ 778.220750] ? freeze_kernel_threads+0x73/0x73
[ 778.220896] ? rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
[ 778.220932] ? rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
[ 778.220994] kasan_report.cold.9+0x37/0x7c
[ 778.221066] ? kobject_put+0x80/0x270
[ 778.221102] ? rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
[ 778.221184] rnbd_srv_sess_dev_force_close+0x38/0x60 [rnbd_server]
[ 778.221240] rnbd_srv_dev_session_force_close_store+0x6a/0xc0 [rnbd_server]
[ 778.221304] ? sysfs_file_ops+0x90/0x90
[ 778.221353] kernfs_fop_write+0x141/0x240
[ 778.221451] vfs_write+0x142/0x4d0
[ 778.221553] ksys_write+0xc0/0x160
[ 778.221602] ? __ia32_sys_read+0x50/0x50
[ 778.221684] ? lockdep_hardirqs_on_prepare+0x13d/0x210
[ 778.221718] ? syscall_enter_from_user_mode+0x1c/0x50
[ 778.221821] do_syscall_64+0x33/0x40
[ 778.221862] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 778.221896] RIP: 0033:0x7f4affdd9504
[ 778.221928] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 778.221956] RSP: 002b:00007fffebb36b28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 778.222011] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f4affdd9504
[ 778.222038] RDX: 0000000000000002 RSI: 00007fffebb36c50 RDI: 0000000000000003
[ 778.222066] RBP: 00007fffebb36c50 R08: 0000556a151aa600 R09: 00007f4affeb1540
[ 778.222094] R10: fffffffffffffc19 R11: 0000000000000246 R12: 0000556a151aa520
[ 778.222121] R13: 0000000000000002 R14: 00007f4affea6760 R15: 0000000000000002
[ 778.222764] Allocated by task 3212:
[ 778.223285] kasan_save_stack+0x19/0x40
[ 778.223316] __kasan_kmalloc.constprop.7+0xc1/0xd0
[ 778.223347] kmem_cache_alloc_trace+0x186/0x350
[ 778.223382] rnbd_srv_rdma_ev+0xf16/0x1690 [rnbd_server]
[ 778.223422] process_io_req+0x4d1/0x670 [rtrs_server]
[ 778.223573] __ib_process_cq+0x10a/0x350 [ib_core]
[ 778.223709] ib_cq_poll_work+0x31/0xb0 [ib_core]
[ 778.223743] process_one_work+0x521/0xa90
[ 778.223773] worker_thread+0x65/0x5b0
[ 778.223802] kthread+0x1f2/0x210
[ 778.223833] ret_from_fork+0x22/0x30
[ 778.224296] Freed by task 8842:
[ 778.224800] kasan_save_stack+0x19/0x40
[ 778.224829] kasan_set_track+0x1c/0x30
[ 778.224860] kasan_set_free_info+0x1b/0x30
[ 778.224889] __kasan_slab_free+0x108/0x150
[ 778.224919] slab_free_freelist_hook+0x64/0x190
[ 778.224947] kfree+0xe2/0x650
[ 778.224982] rnbd_destroy_sess_dev+0x2fa/0x3b0 [rnbd_server]
[ 778.225011] kobject_put+0xda/0x270
[ 778.225046] rnbd_srv_sess_dev_force_close+0x30/0x60 [rnbd_server]
[ 778.225081] rnbd_srv_dev_session_force_close_store+0x6a/0xc0 [rnbd_server]
[ 778.225111] kernfs_fop_write+0x141/0x240
[ 778.225140] vfs_write+0x142/0x4d0
[ 778.225169] ksys_write+0xc0/0x160
[ 778.225198] do_syscall_64+0x33/0x40
[ 778.225227] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 778.226506] The buggy address belongs to the object at ffff88b1d6516c00
which belongs to the cache kmalloc-512 of size 512
[ 778.227464] The buggy address is located 40 bytes inside of
512-byte region [ffff88b1d6516c00, ffff88b1d6516e00)
The problem is in the sess_dev release function we call
rnbd_destroy_sess_dev, and could free the sess_dev already, but we still
set the keep_id in rnbd_srv_sess_dev_force_close, which lead to use
after free.
To fix it, move the keep_id before the sysfs removal, and cache the
rnbd_srv_session for lock accessing,
Fixes: 786998050cbc ("block/rnbd-srv: close a mapped device from server side.")
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
lkp reboot following build error:
drivers/block/rnbd/rnbd-clt.c: In function 'rnbd_softirq_done_fn':
>> drivers/block/rnbd/rnbd-clt.c:387:2: error: implicit declaration of function 'sg_free_table_chained' [-Werror=implicit-function-declaration]
387 | sg_free_table_chained(&iu->sgt, RNBD_INLINE_SG_CNT);
| ^~~~~~~~~~~~~~~~~~~~~
The reason is CONFIG_SG_POOL is not enabled in the config, to
avoid such failure, select SG_POOL in Kconfig for RNBD_CLIENT.
Fixes: 5a1328d0c3a7 ("block/rnbd-clt: Dynamically allocate sglist for rnbd_iu")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Without crc32, the driver fails to link:
arm-linux-gnueabi-ld: drivers/block/rsxx/config.o: in function `rsxx_load_config':
config.c:(.text+0x124): undefined reference to `crc32_le'
Fixes: 8722ff8cdbfa ("block: IBM RamSan 70/80 device driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull block fixes from Jens Axboe:
"A few stragglers in here, but mostly just straight fixes. In
particular:
- Set of rnbd fixes for issues around changes for the merge window
(Gioh, Jack, Md Haris Iqbal)
- iocost tracepoint addition (Baolin)
- Copyright/maintainers update (Christoph)
- Remove old blk-mq fast path CPU warning (Daniel)
- loop max_part fix (Josh)
- Remote IPI threaded IRQ fix (Sebastian)
- dasd stable fixes (Stefan)
- bcache merge window fixup and style fixup (Yi, Zheng)"
* tag 'block-5.11-2020-12-23' of git://git.kernel.dk/linux-block:
md/bcache: convert comma to semicolon
bcache:remove a superfluous check in register_bcache
block: update some copyrights
block: remove a pointless self-reference in block_dev.c
MAINTAINERS: add fs/block_dev.c to the block section
blk-mq: Don't complete on a remote CPU in force threaded mode
s390/dasd: fix list corruption of lcu list
s390/dasd: fix list corruption of pavgroup group list
s390/dasd: prevent inconsistent LCU device data
s390/dasd: fix hanging device offline processing
blk-iocost: Add iocg idle state tracepoint
nbd: Respect max_part for all partition scans
block/rnbd-clt: Does not request pdu to rtrs-clt
block/rnbd-clt: Dynamically allocate sglist for rnbd_iu
block/rnbd: Set write-back cache and fua same to the target device
block/rnbd: Fix typos
block/rnbd-srv: Protect dev session sysfs removal
block/rnbd-clt: Fix possible memleak
block/rnbd-clt: Get rid of warning regarding size argument in strlcpy
blk-mq: Remove 'running from the wrong CPU' warning
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull more xen updates from Juergen Gross:
"Some minor cleanup patches and a small series disentangling some Xen
related Kconfig options"
* tag 'for-linus-5.11-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: Kconfig: remove X86_64 depends from XEN_512GB
xen/manage: Fix fall-through warnings for Clang
xen-blkfront: Fix fall-through warnings for Clang
xen: remove trailing semicolon in macro definition
xen: Kconfig: nest Xen guest options
xen: Remove Xen PVH/PVHVM dependency on PCI
x86/xen: Convert to DEFINE_SHOW_ATTRIBUTE
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Switch to the generic C VDSO, as well as some cleanups of our VDSO
setup/handling code.
- Support for KUAP (Kernel User Access Prevention) on systems using the
hashed page table MMU, using memory protection keys.
- Better handling of PowerVM SMT8 systems where all threads of a core
do not share an L2, allowing the scheduler to make better scheduling
decisions.
- Further improvements to our machine check handling.
- Show registers when unwinding interrupt frames during stack traces.
- Improvements to our pseries (PowerVM) partition migration code.
- Several series from Christophe refactoring and cleaning up various
parts of the 32-bit code.
- Other smaller features, fixes & cleanups.
Thanks to: Alan Modra, Alexey Kardashevskiy, Andrew Donnellan, Aneesh
Kumar K.V, Ard Biesheuvel, Athira Rajeev, Balamuruhan S, Bill Wendling,
Cédric Le Goater, Christophe Leroy, Christophe Lombard, Colin Ian King,
Daniel Axtens, David Hildenbrand, Frederic Barrat, Ganesh Goudar,
Gautham R. Shenoy, Geert Uytterhoeven, Giuseppe Sacco, Greg Kurz,
Harish, Jan Kratochvil, Jordan Niethe, Kaixu Xia, Laurent Dufour,
Leonardo Bras, Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu
Desnoyers, Nathan Lynch, Nicholas Piggin, Oleg Nesterov, Oliver
O'Halloran, Oscar Salvador, Po-Hsu Lin, Qian Cai, Qinglang Miao, Randy
Dunlap, Ravi Bangoria, Sachin Sant, Sandipan Das, Sebastian Andrzej
Siewior , Segher Boessenkool, Srikar Dronamraju, Tyrel Datwyler, Uwe
Kleine-König, Vincent Stehlé, Youling Tang, and Zhang Xiaoxu.
* tag 'powerpc-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (304 commits)
powerpc/32s: Fix cleanup_cpu_mmu_context() compile bug
powerpc: Add config fragment for disabling -Werror
powerpc/configs: Add ppc64le_allnoconfig target
powerpc/powernv: Rate limit opal-elog read failure message
powerpc/pseries/memhotplug: Quieten some DLPAR operations
powerpc/ps3: use dma_mapping_error()
powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10
powerpc/perf: Fix Threshold Event Counter Multiplier width for P10
powerpc/mm: Fix hugetlb_free_pmd_range() and hugetlb_free_pud_range()
KVM: PPC: Book3S HV: Fix mask size for emulated msgsndp
KVM: PPC: fix comparison to bool warning
KVM: PPC: Book3S: Assign boolean values to a bool variable
powerpc: Inline setup_kup()
powerpc/64s: Mark the kuap/kuep functions non __init
KVM: PPC: Book3S HV: XIVE: Add a comment regarding VP numbering
powerpc/xive: Improve error reporting of OPAL calls
powerpc/xive: Simplify xive_do_source_eoi()
powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_EOI_FW
powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_MASK_FW
powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_SHIFT_BUG
...
|
|
Pull ceph updates from Ilya Dryomov:
"The big ticket item here is support for msgr2 on-wire protocol, which
adds the option of full in-transit encryption using AES-GCM algorithm
(myself).
On top of that we have a series to avoid intermittent errors during
recovery with recover_session=clean and some MDS request encoding work
from Jeff, a cap handling fix and assorted observability improvements
from Luis and Xiubo and a good number of cleanups.
Luis also ran into a corner case with quotas which sadly means that we
are back to denying cross-quota-realm renames"
* tag 'ceph-for-5.11-rc1' of git://github.com/ceph/ceph-client: (59 commits)
libceph: drop ceph_auth_{create,update}_authorizer()
libceph, ceph: make use of __ceph_auth_get_authorizer() in msgr1
libceph, ceph: implement msgr2.1 protocol (crc and secure modes)
libceph: introduce connection modes and ms_mode option
libceph, rbd: ignore addr->type while comparing in some cases
libceph, ceph: get and handle cluster maps with addrvecs
libceph: factor out finish_auth()
libceph: drop ac->ops->name field
libceph: amend cephx init_protocol() and build_request()
libceph, ceph: incorporate nautilus cephx changes
libceph: safer en/decoding of cephx requests and replies
libceph: more insight into ticket expiry and invalidation
libceph: move msgr1 protocol specific fields to its own struct
libceph: move msgr1 protocol implementation to its own file
libceph: separate msgr1 protocol implementation
libceph: export remaining protocol independent infrastructure
libceph: export zero_page
libceph: rename and export con->flags bits
libceph: rename and export con->state states
libceph: make con->state an int
...
|
|
The creation path of the NBD device respects max_part and only scans for
partitions if max_part is not 0. However, some other code paths ignore
max_part, and unconditionally scan for partitions. Add a check for
max_part on each partition scan.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Previously the rnbd client requested the rtrs to allocate rnbd_iu
just after the rtrs_iu. So the rnbd client passes the size of
rnbd_iu for rtrs_clt_open() and rtrs creates an array of
rnbd_iu and rtrs_iu.
For IO handling, rnbd_iu exists after the request because we pass
the size of rnbd_iu when setting the tag-set. Therefore we do not
use the rnbd_iu allocated by rtrs for IO handling.
We only use the rnbd_iu allocated by rtrs when doing session
initialization. Almost all rnbd_iu allocated by rtrs are wasted.
By this patch the rnbd client does not request rnbd_iu allocation
to rtrs but allocate it for itself when doing session initialization.
Also remove unused rtrs_permit_to_pdu from rtrs.
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The BMAX_SEGMENT static array for scatterlist is embedded in
rnbd_iu structure to avoid memory allocation in hot IO path.
In many cases, we do need only several sg entries because many IOs
have only several segments.
This patch change rnbd_iu to check the number of segments in the request
and allocate sglist dynamically.
For io path, use sg_alloc_table_chained to allocate sg list faster.
First it makes two sg entries after pdu of request.
The sg_alloc_table_chained uses the pre-allocated sg entries
if the number of segments of the request is less than two.
So it reduces the number of memory allocation.
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The rnbd-client always sets the write-back cache and fua attributes
of the rnbd device queue regardless of the target device on the server.
That generates IO hang issue when the target device does not
support both of write-back cacne and fua.
This patch adds more fields for the cache policy and fua into the
device opening message. The rnbd-server sends the information
if the target device supports the write-back cache and fua
and rnbd-client recevives it and set the device queue accordingly.
Signed-off-by: Gioh Kim <gi-oh.kim@cloud.ionos.com>
[jwang: some minor change, rename a few varables, remove unrelated comments.]
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Since the removal of the session sysfs can also be called from the
function destroy_sess, there is a need to protect the call from the
function rnbd_srv_sess_dev_force_close
Fixes: 786998050cbc ("block/rnbd-srv: close a mapped device from server side.")
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Reviewed-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
In error case, we do not free the memory for blk_symlink_name.
Do it by free the memory in error case, and set to NULL
afterwards.
Also fix the condition in rnbd_clt_remove_dev_symlink.
Fixes: 64e8a6ece1a5 ("block/rnbd-clt: Dynamically alloc buffer for pathname & blk_symlink_name")
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The kernel test robot triggerred the following warning,
>> drivers/block/rnbd/rnbd-clt.c:1397:42: warning: size argument in
'strlcpy' call appears to be size of the source; expected the size of the
destination [-Wstrlcpy-strlcat-size]
strlcpy(dev->pathname, pathname, strlen(pathname) + 1);
~~~~~~~^~~~~~~~~~~~~
To get rid of the above warning, use a kstrdup as Bart suggested.
Fixes: 64e8a6ece1a5 ("block/rnbd-clt: Dynamically alloc buffer for pathname & blk_symlink_name")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull block driver updates from Jens Axboe:
"Nothing major in here:
- NVMe pull request from Christoph:
- nvmet passthrough improvements (Chaitanya Kulkarni)
- fcloop error injection support (James Smart)
- read-only support for zoned namespaces without Zone Append
(Javier González)
- improve some error message (Minwoo Im)
- reject I/O to offline fabrics namespaces (Victor Gladkov)
- PCI queue allocation cleanups (Niklas Schnelle)
- remove an unused allocation in nvmet (Amit Engel)
- a Kconfig spelling fix (Colin Ian King)
- nvme_req_qid simplication (Baolin Wang)
- MD pull request from Song:
- Fix race condition in md_ioctl() (Dae R. Jeong)
- Initialize read_slot properly for raid10 (Kevin Vigor)
- Code cleanup (Pankaj Gupta)
- md-cluster resync/reshape fix (Zhao Heming)
- Move null_blk into its own directory (Damien Le Moal)
- null_blk zone and discard improvements (Damien Le Moal)
- bcache race fix (Dongsheng Yang)
- Set of rnbd fixes/improvements (Gioh Kim, Guoqing Jiang, Jack Wang,
Lutz Pogrell, Md Haris Iqbal)
- lightnvm NULL pointer deref fix (tangzhenhao)
- sr in_interrupt() removal (Sebastian Andrzej Siewior)
- FC endpoint security support for s390/dasd (Jan Höppner, Sebastian
Ott, Vineeth Vijayan). From the s390 arch guys, arch bits included
as it made it easier for them to funnel the feature through the
block driver tree.
- Follow up fixes (Colin Ian King)"
* tag 'for-5.11/drivers-2020-12-14' of git://git.kernel.dk/linux-block: (64 commits)
block: drop dead assignments in loop_init()
sr: Remove in_interrupt() usage in sr_init_command().
sr: Switch the sector size back to 2048 if sr_read_sector() changed it.
cdrom: Reset sector_size back it is not 2048.
drivers/lightnvm: fix a null-ptr-deref bug in pblk-core.c
null_blk: Move driver into its own directory
null_blk: Allow controlling max_hw_sectors limit
null_blk: discard zones on reset
null_blk: cleanup discard handling
null_blk: Improve implicit zone close
null_blk: improve zone locking
block: Align max_hw_sectors to logical blocksize
null_blk: Fail zone append to conventional zones
null_blk: Fix zone size initialization
bcache: fix race between setting bdev state to none and new write request direct to backing
block/rnbd: fix a null pointer dereference on dev->blk_symlink_name
block/rnbd-clt: Dynamically alloc buffer for pathname & blk_symlink_name
block/rnbd: call kobject_put in the failure path
Documentation/ABI/rnbd-srv: add document for force_close
block/rnbd-srv: close a mapped device from server side.
...
|
|
Pull block updates from Jens Axboe:
"Another series of killing more code than what is being added, again
thanks to Christoph's relentless cleanups and tech debt tackling.
This contains:
- blk-iocost improvements (Baolin Wang)
- part0 iostat fix (Jeffle Xu)
- Disable iopoll for split bios (Jeffle Xu)
- block tracepoint cleanups (Christoph Hellwig)
- Merging of struct block_device and hd_struct (Christoph Hellwig)
- Rework/cleanup of how block device sizes are updated (Christoph
Hellwig)
- Simplification of gendisk lookup and removal of block device
aliasing (Christoph Hellwig)
- Block device ioctl cleanups (Christoph Hellwig)
- Removal of bdget()/blkdev_get() as exported API (Christoph Hellwig)
- Disk change rework, avoid ->revalidate_disk() (Christoph Hellwig)
- sbitmap improvements (Pavel Begunkov)
- Hybrid polling fix (Pavel Begunkov)
- bvec iteration improvements (Pavel Begunkov)
- Zone revalidation fixes (Damien Le Moal)
- blk-throttle limit fix (Yu Kuai)
- Various little fixes"
* tag 'for-5.11/block-2020-12-14' of git://git.kernel.dk/linux-block: (126 commits)
blk-mq: fix msec comment from micro to milli seconds
blk-mq: update arg in comment of blk_mq_map_queue
blk-mq: add helper allocating tagset->tags
Revert "block: Fix a lockdep complaint triggered by request queue flushing"
nvme-loop: use blk_mq_hctx_set_fq_lock_class to set loop's lock class
blk-mq: add new API of blk_mq_hctx_set_fq_lock_class
block: disable iopoll for split bio
block: Improve blk_revalidate_disk_zones() checks
sbitmap: simplify wrap check
sbitmap: replace CAS with atomic and
sbitmap: remove swap_lock
sbitmap: optimise sbitmap_deferred_clear()
blk-mq: skip hybrid polling if iopoll doesn't spin
blk-iocost: Factor out the base vrate change into a separate function
blk-iocost: Factor out the active iocgs' state check into a separate function
blk-iocost: Move the usage ratio calculation to the correct place
blk-iocost: Remove unnecessary advance declaration
blk-iocost: Fix some typos in comments
blktrace: fix up a kerneldoc comment
block: remove the request_queue to argument request based tracepoints
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
"Fixes for security issues just having been disclosed:
- a five patch series for fixing of XSA-349 (DoS via resource
depletion in Xen dom0)
- a patch fixing XSA-350 (access of stale pointer in a Xen dom0)"
* tag 'for-linus-5.11-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen-blkback: set ring->xenblkd to NULL after kthread_stop()
xenbus/xenbus_backend: Disallow pending watch messages
xen/xenbus: Count pending messages for each watch
xen/xenbus/xen_bus_type: Support will_handle watch callback
xen/xenbus: Add 'will_handle' callback support in xenbus_watch_path()
xen/xenbus: Allow watches discard events before queueing
|
|
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a break statement instead of letting the code fall
through to the next case.
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/33057688012c34dd60315ad765ff63f070e98c0c.1605896059.git.gustavoars@kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
From the beginning, the zram block device always enabled CRYPTO_LZO,
since lzo-rle is hardcoded as the fallback compression algorithm. As a
consequence, on systems where another compression algorithm is chosen
(e.g. CRYPTO_ZSTD), the lzo kernel module becomes unused, while still
having to be built/loaded.
This patch removes the hardcoded lzo-rle dependency and allows the user
to select the default compression algorithm for zram at build time. The
previous behaviour is kept, as the default algorithm is still lzo-rle.
Link: https://lkml.kernel.org/r/20201207121245.50529-1-rsalvaterra@gmail.com
Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com>
Suggested-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Suggested-by: Minchan Kim <minchan@kernel.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Currently, zram supports the stat via /sys/block/zram/mm_stat to represent
how many of incompressible pages are stored at the moment but it couldn't
show how many times incompressible pages were wrote down since zram set
up. It's also good indication to see how zram is effective in the system.
Link: https://lkml.kernel.org/r/20201130201907.1284910-1-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There is demand to writeback specific process pages to backing store
instead of all idles pages in the system due to storage wear out concerns
and to launching latency of apps which are most of the time idle but are
critical for resume latency.
This patch extends the writeback knob to support a specific page
writeback.
Link: https://lkml.kernel.org/r/20201020190506.3758660-1-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
For libceph, this ensures that libceph instance sharing (share option)
continues to work. For rbd, this avoids blocklisting alive lock owners
(locker addr is always LEGACY, while watcher addr is ANY in nautilus).
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
When xen_blkif_disconnect() is called, the kernel thread behind the
block interface is stopped by calling kthread_stop(ring->xenblkd).
The ring->xenblkd thread pointer being non-NULL determines if the
thread has been already stopped.
Normally, the thread's function xen_blkif_schedule() sets the
ring->xenblkd to NULL, when the thread's main loop ends.
However, when the thread has not been started yet (i.e.
wake_up_process() has not been called on it), the xen_blkif_schedule()
function would not be called yet.
In such case the kthread_stop() call returns -EINTR and the
ring->xenblkd remains dangling.
When this happens, any consecutive call to xen_blkif_disconnect (for
example in frontend_changed() callback) leads to a kernel crash in
kthread_stop() (e.g. NULL pointer dereference in exit_creds()).
This is XSA-350.
Cc: <stable@vger.kernel.org> # 4.12
Fixes: a24fa22ce22a ("xen/blkback: don't use xen_blkif_get() in xen-blkback kthread")
Reported-by: Olivier Benjamin <oliben@amazon.com>
Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
Signed-off-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
Some code does not directly make 'xenbus_watch' object and call
'register_xenbus_watch()' but use 'xenbus_watch_path()' instead. This
commit adds support of 'will_handle' callback in the
'xenbus_watch_path()' and it's wrapper, 'xenbus_watch_pathfmt()'.
This is part of XSA-349
Cc: stable@vger.kernel.org
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reported-by: Michael Kurth <mku@amazon.de>
Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
Commit 8410d38c2552 ("loop: use __register_blkdev to allocate devices on
demand") simplified loop_init(); so computing the range of the block region
is not required anymore and can be dropped.
Drop dead assignments in loop_init().
As compilers will detect these unneeded assignments and optimize this,
the resulting object code is identical before and after this change.
No functional change. No change in object code.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Instead of having similar helpers in multiple backend drivers use
common helpers for caching pages allocated via gnttab_alloc_pages().
Make use of those helpers in blkback and scsiback.
Cc: <stable@vger.kernel.org> # 5.9
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovksy@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
Move null_blk driver code into the new sub-directory
drivers/block/null_blk.
Suggested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add the module option and configfs attribute max_sectors to allow
configuring the maximum size of a command issued to a null_blk device.
This allows exercising the block layer BIO splitting with different
limits than the default BLK_SAFE_MAX_SECTORS. This is also useful for
testing the zone append write path of file systems as the max_hw_sectors
limit value is also used for the max_zone_append_sectors limit.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When memory backing is enabled, use null_handle_discard() to free the
backing memory used by a zone when the zone is being reset.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
null_handle_discard() is called from both null_handle_rq() and
null_handle_bio(). As these functions are only passed a nullb_cmd
structure, this forces pointer dereferences to identiify the discard
operation code and to access the sector range to be discarded.
Simplify all this by changing the interface of the functions
null_handle_discard() and null_handle_memory_backed() to pass along
the operation code, operation start sector and number of sectors. With
this change null_handle_discard() can be called directly from
null_handle_memory_backed().
Also add a message warning that the discard configuration attribute has
no effect when memory backing is disabled.
No functional change is introduced by this patch.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|