kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2022-09-22	md/raid10: factor out code from wait_barrier() to stop_waiting_barrier()	Yu Kuai	1	-22/+28
	Currently the nasty condition in wait_barrier() is hard to read. This patch factors out the condition into a function. There are no functional changes. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Song Liu <song@kernel.org>
2022-09-22	md: Remove extra mddev_get() in md_seq_start()	Logan Gunthorpe	1	-1/+0
	A regression is seen where mddev devices stay permanently after they are stopped due to an elevated reference count. This was tracked down to an extra mddev_get() in md_seq_start(). It only happened rarely because most of the time the md_seq_start() is called with a zero offset. The path with an extra mddev_get() only happens when it starts with a non-zero offset. The commit noted below changed an mddev_get() to check its success but inadvertently left the original call in. Remove the extra call. Fixes: 12a6caf27324 ("md: only delete entries from all_mddevs when the disk is freed") Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Guoqing Jiang <Guoqing.jiang@linux.dev> Signed-off-by: Song Liu <song@kernel.org>
2022-09-22	md/raid5: Remove unnecessary bio_put() in raid5_read_one_chunk()	David Sloan	1	-1/+0
	When running chunk-sized reads on disks with badblocks duplicate bio free/puts are observed: ============================================================================= BUG bio-200 (Not tainted): Object already free ----------------------------------------------------------------------------- Allocated in mempool_alloc_slab+0x17/0x20 age=3 cpu=2 pid=7504 __slab_alloc.constprop.0+0x5a/0xb0 kmem_cache_alloc+0x31e/0x330 mempool_alloc_slab+0x17/0x20 mempool_alloc+0x100/0x2b0 bio_alloc_bioset+0x181/0x460 do_mpage_readpage+0x776/0xd00 mpage_readahead+0x166/0x320 blkdev_readahead+0x15/0x20 read_pages+0x13f/0x5f0 page_cache_ra_unbounded+0x18d/0x220 force_page_cache_ra+0x181/0x1c0 page_cache_sync_ra+0x65/0xb0 filemap_get_pages+0x1df/0xaf0 filemap_read+0x1e1/0x700 blkdev_read_iter+0x1e5/0x330 vfs_read+0x42a/0x570 Freed in mempool_free_slab+0x17/0x20 age=3 cpu=2 pid=7504 kmem_cache_free+0x46d/0x490 mempool_free_slab+0x17/0x20 mempool_free+0x66/0x190 bio_free+0x78/0x90 bio_put+0x100/0x1a0 raid5_make_request+0x2259/0x2450 md_handle_request+0x402/0x600 md_submit_bio+0xd9/0x120 __submit_bio+0x11f/0x1b0 submit_bio_noacct_nocheck+0x204/0x480 submit_bio_noacct+0x32e/0xc70 submit_bio+0x98/0x1a0 mpage_readahead+0x250/0x320 blkdev_readahead+0x15/0x20 read_pages+0x13f/0x5f0 page_cache_ra_unbounded+0x18d/0x220 Slab 0xffffea000481b600 objects=21 used=0 fp=0xffff8881206d8940 flags=0x17ffffc0010201(locked\|slab\|head\|node=0\|zone=2\|lastcpupid=0x1fffff) CPU: 0 PID: 34525 Comm: kworker/u24:2 Not tainted 6.0.0-rc2-localyes-265166-gf11c5343fa3f #143 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Workqueue: raid5wq raid5_do_work Call Trace: <TASK> dump_stack_lvl+0x5a/0x78 dump_stack+0x10/0x16 print_trailer+0x158/0x165 object_err+0x35/0x50 free_debug_processing.cold+0xb7/0xbe __slab_free+0x1ae/0x330 kmem_cache_free+0x46d/0x490 mempool_free_slab+0x17/0x20 mempool_free+0x66/0x190 bio_free+0x78/0x90 bio_put+0x100/0x1a0 mpage_end_io+0x36/0x150 bio_endio+0x2fd/0x360 md_end_io_acct+0x7e/0x90 bio_endio+0x2fd/0x360 handle_failed_stripe+0x960/0xb80 handle_stripe+0x1348/0x3760 handle_active_stripes.constprop.0+0x72a/0xaf0 raid5_do_work+0x177/0x330 process_one_work+0x616/0xb20 worker_thread+0x2bd/0x6f0 kthread+0x179/0x1b0 ret_from_fork+0x22/0x30 </TASK> The double free is caused by an unnecessary bio_put() in the if(is_badblock(...)) error path in raid5_read_one_chunk(). The error path was moved ahead of bio_alloc_clone() in c82aa1b76787c ("md/raid5: move checking badblock before clone bio in raid5_read_one_chunk"). The previous code checked and freed align_bio which required a bio_put. After the move that is no longer needed as raid_bio is returned to the control of the common io path which performs its own endio resulting in a double free on bad device blocks. Fixes: c82aa1b76787c ("md/raid5: move checking badblock before clone bio in raid5_read_one_chunk") Signed-off-by: David Sloan <david.sloan@eideticom.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Guoqing Jiang <Guoqing.jiang@linux.dev> Signed-off-by: Song Liu <song@kernel.org>
2022-09-22	md/raid5: Ensure stripe_fill happens on non-read IO with journal	Logan Gunthorpe	1	-1/+1
	When doing degrade/recover tests using the journal a kernel BUG is hit at drivers/md/raid5.c:4381 in handle_parity_checks5(): BUG_ON(!test_bit(R5_UPTODATE, &dev->flags)); This was found to occur because handle_stripe_fill() was skipped for stripes in the journal due to a condition in that function. Thus blocks were not fetched and R5_UPTODATE was not set when the code reached handle_parity_checks5(). To fix this, don't skip handle_stripe_fill() unless the stripe is for read. Fixes: 07e83364845e ("md/r5cache: shift complex rmw from read path to write path") Link: https://lore.kernel.org/linux-raid/e05c4239-41a9-d2f7-3cfa-4aa9d2cea8c1@deltatee.com/ Suggested-by: Song Liu <song@kernel.org> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org>
2022-09-22	md/raid5: Don't read ->active_stripes if it's not needed	Logan Gunthorpe	1	-3/+2
	The atomic_read() is not needed in many cases so only do the read after the first checks are done. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org>
2022-09-22	md/raid5: Cleanup prototype of raid5_get_active_stripe()	Logan Gunthorpe	3	-25/+39
	Drop the three bools in the prototype of raid5_get_active_stripe() and replace them with a flags parameter. At the same time, drop the distinction with __raid5_get_active_stripe(). Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org>
2022-09-22	md/raid5: Drop extern on function declarations in raid5.h	Logan Gunthorpe	1	-12/+10
	externs should not be used in function declarations, so clean those up. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org>
2022-09-22	md/raid5: Refactor raid5_get_active_stripe()	Logan Gunthorpe	1	-41/+41
	Refactor raid5_get_active_stripe() without the gotos with an explicit infinite loop and some additional nesting. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org>
2022-09-22	md: Replace snprintf with scnprintf	Saurabh Sengar	1	-1/+1
	Current code produces a warning as shown below when total characters in the constituent block device names plus the slashes exceeds 200. snprintf() returns the number of characters generated from the given input, which could cause the expression “200 – len” to wrap around to a large positive number. Fix this by using scnprintf() instead, which returns the actual number of characters written into the buffer. [ 1513.267938] ------------[ cut here ]------------ [ 1513.267943] WARNING: CPU: 15 PID: 37247 at <snip>/lib/vsprintf.c:2509 vsnprintf+0x2c8/0x510 [ 1513.267944] Modules linked in: <snip> [ 1513.267969] CPU: 15 PID: 37247 Comm: mdadm Not tainted 5.4.0-1085-azure #90~18.04.1-Ubuntu [ 1513.267969] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 05/09/2022 [ 1513.267971] RIP: 0010:vsnprintf+0x2c8/0x510 <-snip-> [ 1513.267982] Call Trace: [ 1513.267986] snprintf+0x45/0x70 [ 1513.267990] ? disk_name+0x71/0xa0 [ 1513.267993] dump_zones+0x114/0x240 [raid0] [ 1513.267996] ? _cond_resched+0x19/0x40 [ 1513.267998] raid0_run+0x19e/0x270 [raid0] [ 1513.268000] md_run+0x5e0/0xc50 [ 1513.268003] ? security_capable+0x3f/0x60 [ 1513.268005] do_md_run+0x19/0x110 [ 1513.268006] md_ioctl+0x195e/0x1f90 [ 1513.268007] blkdev_ioctl+0x91f/0x9f0 [ 1513.268010] block_ioctl+0x3d/0x50 [ 1513.268012] do_vfs_ioctl+0xa9/0x640 [ 1513.268014] ? __fput+0x162/0x260 [ 1513.268016] ksys_ioctl+0x75/0x80 [ 1513.268017] __x64_sys_ioctl+0x1a/0x20 [ 1513.268019] do_syscall_64+0x5e/0x200 [ 1513.268021] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 766038846e875 ("md/raid0: replace printk() with pr_*()") Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Signed-off-by: Song Liu <song@kernel.org>
2022-09-22	md/raid10: fix compile warning	Guoqing Jiang	1	-1/+1
	With W=1, compiler complains. drivers/md/raid10.c:1983: warning: bad line: Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Song Liu <song@kernel.org>
2022-09-22	md/raid5: Fix spelling mistakes in comments	XU pengfei	1	-3/+3
	Fix spelling of 'waitting' in comments. Signed-off-by: XU pengfei <xupengfei@nfschina.com> Signed-off-by: Song Liu <song@kernel.org>
2022-09-19	bcache: fix set_at_max_writeback_rate() for multiple attached devices	Coly Li	1	-21/+52
	Inside set_at_max_writeback_rate() the calculation in following if() check is wrong, if (atomic_inc_return(&c->idle_counter) < atomic_read(&c->attached_dev_nr) * 6) Because each attached backing device has its own writeback thread running and increasing c->idle_counter, the counter increates much faster than expected. The correct calculation should be, (counter / dev_nr) < dev_nr * 6 which equals to, counter < dev_nr * dev_nr * 6 This patch fixes the above mistake with correct calculation, and helper routine idle_counter_exceeded() is added to make code be more clear. Reported-by: Mingzhe Zou <mingzhe.zou@easystack.cn> Signed-off-by: Coly Li <colyli@suse.de> Acked-by: Mingzhe Zou <mingzhe.zou@easystack.cn> Link: https://lore.kernel.org/r/20220919161647.81238-6-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-19	bcache:: fix repeated words in comments	Jilin Yuan	1	-1/+1
	Delete the redundant word 'we'. Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com> Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20220919161647.81238-5-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-19	bcache: bset: Fix comment typos	Jules Maselbas	1	-1/+1
	Remove the redundant word `by`, correct the typo `creaated`. CC: Kent Overstreet <kent.overstreet@gmail.com> CC: linux-bcache@vger.kernel.org Signed-off-by: Jules Maselbas <jmaselbas@kalray.eu> Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20220919161647.81238-4-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-19	bcache: remove unused bch_mark_cache_readahead function def in stats.h	Lin Feng	1	-1/+0
	This is a cleanup for commit 1616a4c2ab1a ("bcache: remove bcache device self-defined readahead")', currently no user for bch_mark_cache_readahead() since that commit. Signed-off-by: Lin Feng <linf@wangsu.com> Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20220919161647.81238-3-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-19	bcache: remove unnecessary flush_workqueue	Li Lei	1	-3/+2
	All pending works will be drained by destroy_workqueue(), no need to call flush_workqueue() explicitly. Signed-off-by: Li Lei <lilei@szsandstone.com> Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20220919161647.81238-2-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-08	dm: verity-loadpin: Only trust verity targets with enforcement	Matthias Kaehlcke	3	-0/+25
	Verity targets can be configured to ignore corrupted data blocks. LoadPin must only trust verity targets that are configured to perform some kind of enforcement when data corruption is detected, like returning an error, restarting the system or triggering a panic. Fixes: b6c1c5745ccc ("dm: Add verity helpers for LoadPin") Reported-by: Sarthak Kukreti <sarthakkukreti@chromium.org> Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Reviewed-by: Sarthak Kukreti <sarthakkukreti@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220907133055.1.Ic8a1dafe960dc0f8302e189642bc88ebb785d274@changeid
2022-08-26	Merge tag 'block-6.0-2022-08-26' of git://git.kernel.dk/linux-block	Linus Torvalds	2	-8/+9
	Pull block fixes from Jens Axboe: - MD pull request via Song: - Fix for clustered raid (Guoqing Jiang) - req_op fix (Bart Van Assche) - Fix race condition in raid recreate (David Sloan) - loop configuration overflow fix (Siddh) - Fix missing commit_rqs call for certain conditions (Yu) * tag 'block-6.0-2022-08-26' of git://git.kernel.dk/linux-block: md: call __md_stop_writes in md_stop Revert "md-raid: destroy the bitmap after destroying the thread" md: Flush workqueue md_rdev_misc_wq in md_alloc() md/raid10: Fix the data type of an r10_sync_page_io() argument loop: Check for overflow while configuring loop blk-mq: fix io hung due to missing commit_rqs
2022-08-24	md: call __md_stop_writes in md_stop	Guoqing Jiang	1	-0/+1
	From the link [1], we can see raid1d was running even after the path raid_dtr -> md_stop -> __md_stop. Let's stop write first in destructor to align with normal md-raid to fix the KASAN issue. [1]. https://lore.kernel.org/linux-raid/CAPhsuW5gc4AakdGNdF8ubpezAuDLFOYUO_sfMZcec6hQFm8nhg@mail.gmail.com/T/#m7f12bf90481c02c6d2da68c64aeed4779b7df74a Fixes: 48df498daf62 ("md: move bitmap_destroy to the beginning of __md_stop") Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Song Liu <song@kernel.org>
2022-08-24	Revert "md-raid: destroy the bitmap after destroying the thread"	Guoqing Jiang	1	-1/+1
	This reverts commit e151db8ecfb019b7da31d076130a794574c89f6f. Because it obviously breaks clustered raid as noticed by Neil though it fixed KASAN issue for dm-raid, let's revert it and fix KASAN issue in next commit. [1]. https://lore.kernel.org/linux-raid/a6657e08-b6a7-358b-2d2a-0ac37d49d23a@linux.dev/T/#m95ac225cab7409f66c295772483d091084a6d470 Fixes: e151db8ecfb0 ("md-raid: destroy the bitmap after destroying the thread") Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: Song Liu <song@kernel.org>
2022-08-24	md: Flush workqueue md_rdev_misc_wq in md_alloc()	David Sloan	1	-0/+1
	A race condition still exists when removing and re-creating md devices in test cases. However, it is only seen on some setups. The race condition was tracked down to a reference still being held to the kobject by the rdev in the md_rdev_misc_wq which will be released in rdev_delayed_delete(). md_alloc() waits for previous deletions by waiting on the md_misc_wq, but the md_rdev_misc_wq may still be holding a reference to a recently removed device. To fix this, also flush the md_rdev_misc_wq in md_alloc(). Signed-off-by: David Sloan <david.sloan@eideticom.com> [logang@deltatee.com: rewrote commit message] Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org>
2022-08-24	md/raid10: Fix the data type of an r10_sync_page_io() argument	Bart Van Assche	1	-7/+6
	Fix the following sparse warning: drivers/md/raid10.c:2647:60: sparse: sparse: incorrect type in argument 5 (different base types) @@ expected restricted blk_opf_t [usertype] opf @@ got int rw @@ This patch does not change any functionality since REQ_OP_READ = READ = 0 and since REQ_OP_WRITE = WRITE = 1. Cc: Rong A Chen <rong.a.chen@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Paul Menzel <pmenzel@molgen.mpg.de> Fixes: 4ce4c73f662b ("md/core: Combine two sync_page_io() arguments") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Song Liu <song@kernel.org>
2022-08-23	fs: dlm: remove DLM_LSFL_FS from uapi	Alexander Aring	1	-2/+2
	The DLM_LSFL_FS flag is set in lockspaces created directly for a kernel user, as opposed to those lockspaces created for user space applications. The user space libdlm allowed this flag to be set for lockspaces created from user space, but then used by a kernel user. No kernel user has ever used this method, so remove the ability to do it. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-08-12	Merge tag 'for-6.0/dm-fixes' of ↵	Linus Torvalds	3	-10/+27
	git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - A few fixes for the DM verity and bufio changes in this merge window - A smatch warning fix for DM writecache locking in writecache_map * tag 'for-6.0/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm bufio: fix some cases where the code sleeps with spinlock held dm writecache: fix smatch warning about invalid return from writecache_map dm verity: fix verity_parse_opt_args parsing dm verity: fix DM_VERITY_OPTS_MAX value yet again dm bufio: simplify DM_BUFIO_CLIENT_NO_SLEEP locking
2022-08-11	dm bufio: fix some cases where the code sleeps with spinlock held	Mikulas Patocka	1	-1/+9
	Commit b32d45824aa7 ("dm bufio: Add DM_BUFIO_CLIENT_NO_SLEEP flag") added a "NO_SLEEP" mode, it replaces a mutex with a spinlock, and it is only usable when the device is in read-only mode (because the write path may be sleeping while holding the dm_bufio_client lock). However, there are still two points where the code could sleep even in read-only mode. One is in __get_unclaimed_buffer -> __make_buffer_clean. The other is in __try_evict_buffer -> __make_buffer_clean. These functions will call __make_buffer_clean which sleeps if the buffer is being read. Fix these cases so that if c->no_sleep is set __make_buffer_clean will not be called and the buffer will be skipped instead. Fixes: b32d45824aa7 ("dm bufio: Add DM_BUFIO_CLIENT_NO_SLEEP flag") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-08-10	dm writecache: fix smatch warning about invalid return from writecache_map	Mikulas Patocka	1	-1/+2
	There's a smatch warning "inconsistent returns '&wc->lock'" in dm-writecache. The reason for the warning is that writecache_map() doesn't drop the lock on the impossible path. Fix this warning by adding wc_unlock() after the BUG statement (so that it will be compiled-away anyway). Fixes: df699cc16ea5e ("dm writecache: report invalid return from writecache_map helpers") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-08-10	dm verity: fix verity_parse_opt_args parsing	Mike Snitzer	1	-1/+11
	Commit df326e7a0699 ("dm verity: allow optional args to alter primary args handling") introduced a bug where verity_parse_opt_args() wouldn't properly shift past an optional argument's additional params (by ignoring them). Fix this by avoiding returning with error if an unknown argument is encountered when @only_modifier_opts=true is passed to verity_parse_opt_args(). In practice this regressed the cryptsetup testsuite's FEC testing because unknown optional arguments were encountered, wherey short-circuiting ever testing FEC mode. With this fix all of the cryptsetup testsuite's verity FEC tests pass. Fixes: df326e7a0699 ("dm verity: allow optional args to alter primary args handling") Reported-by: Milan Broz <gmazyland@gmail.com>> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-08-10	dm verity: fix DM_VERITY_OPTS_MAX value yet again	Mike Snitzer	1	-1/+1
	Must account for the possibility that "try_verify_in_tasklet" is used. This is the same issue that was fixed with commit 160f99db94322 -- it is far too easy to miss that additional a new argument(s) require bumping DM_VERITY_OPTS_MAX accordingly. Fixes: 5721d4e5a9cd ("dm verity: Add optional "try_verify_in_tasklet" feature") Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-08-10	dm bufio: simplify DM_BUFIO_CLIENT_NO_SLEEP locking	Mike Snitzer	1	-6/+4
	Historically none of the bufio code runs in interrupt context but with the use of DM_BUFIO_CLIENT_NO_SLEEP a bufio client can, see: commit 5721d4e5a9cd ("dm verity: Add optional "try_verify_in_tasklet" feature") That said, the new tasklet usecase still doesn't require interrupts be disabled by bufio (let alone conditionally restore them). Yet with PREEMPT_RT, and falling back from tasklet to workqueue, care must be taken to properly synchronize between softirq and process context, otherwise ABBA deadlock may occur. While it is unnecessary to disable bottom-half preemption within a tasklet, we must consistently do so in process context to ensure locking is in the proper order. Fix these issues by switching from spin_lock_irq{save,restore} to using spin_{lock,unlock}_bh instead. Also remove the 'spinlock_flags' member in dm_bufio_client struct (that can be used unsafely if bufio must recurse on behalf of some caller, e.g. block layer's submit_bio). Fixes: 5721d4e5a9cd ("dm verity: Add optional "try_verify_in_tasklet" feature") Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-08-06	Merge tag 'for-6.0/dm-changes-2' of ↵	Linus Torvalds	8	-32/+180
	git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull more device mapper updates from Mike Snitzer: - Add flags argument to dm_bufio_client_create and introduce DM_BUFIO_CLIENT_NO_SLEEP flag to have dm-bufio use spinlock rather than mutex for its locking. - Add optional "try_verify_in_tasklet" feature to DM verity target. This feature gives users the option to improve IO latency by using a tasklet to verify, using hashes in bufio's cache, rather than wait to schedule a work item via workqueue. But if there is a bufio cache miss, or an error, then the tasklet will fallback to using workqueue. - Incremental changes to both dm-bufio and the DM verity target to use jump_label to minimize cost of branching associated with the niche "try_verify_in_tasklet" feature. DM-bufio in particular is used by quite a few other DM targets so it doesn't make sense to incur additional bufio cost in those targets purely for the benefit of this niche verity feature if the feature isn't ever used. - Optimize verity_verify_io, which is used by both workqueue and tasklet based verification, if FEC is not configured or tasklet based verification isn't used. - Remove DM verity target's verify_wq's use of the WQ_CPU_INTENSIVE flag since it uses WQ_UNBOUND. Also, use the WQ_HIGHPRI flag if "try_verify_in_tasklet" is specified. * tag 'for-6.0/dm-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm verity: have verify_wq use WQ_HIGHPRI if "try_verify_in_tasklet" dm verity: remove WQ_CPU_INTENSIVE flag since using WQ_UNBOUND dm verity: only copy bvec_iter in verity_verify_io if in_tasklet dm verity: optimize verity_verify_io if FEC not configured dm verity: conditionally enable branching for "try_verify_in_tasklet" dm bufio: conditionally enable branching for DM_BUFIO_CLIENT_NO_SLEEP dm verity: allow optional args to alter primary args handling dm verity: Add optional "try_verify_in_tasklet" feature dm bufio: Add DM_BUFIO_CLIENT_NO_SLEEP flag dm bufio: Add flags argument to dm_bufio_client_create
2022-08-06	Merge tag 'mm-stable-2022-08-03' of ↵	Linus Torvalds	5	-5/+8
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Most of the MM queue. A few things are still pending. Liam's maple tree rework didn't make it. This has resulted in a few other minor patch series being held over for next time. Multi-gen LRU still isn't merged as we were waiting for mapletree to stabilize. The current plan is to merge MGLRU into -mm soon and to later reintroduce mapletree, with a view to hopefully getting both into 6.1-rc1. Summary: - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe Lin, Yang Shi, Anshuman Khandual and Mike Rapoport - Some kmemleak fixes from Patrick Wang and Waiman Long - DAMON updates from SeongJae Park - memcg debug/visibility work from Roman Gushchin - vmalloc speedup from Uladzislau Rezki - more folio conversion work from Matthew Wilcox - enhancements for coherent device memory mapping from Alex Sierra - addition of shared pages tracking and CoW support for fsdax, from Shiyang Ruan - hugetlb optimizations from Mike Kravetz - Mel Gorman has contributed some pagealloc changes to improve latency and realtime behaviour. - mprotect soft-dirty checking has been improved by Peter Xu - Many other singleton patches all over the place" [ XFS merge from hell as per Darrick Wong in https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ] * tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits) tools/testing/selftests/vm/hmm-tests.c: fix build mm: Kconfig: fix typo mm: memory-failure: convert to pr_fmt() mm: use is_zone_movable_page() helper hugetlbfs: fix inaccurate comment in hugetlbfs_statfs() hugetlbfs: cleanup some comments in inode.c hugetlbfs: remove unneeded header file hugetlbfs: remove unneeded hugetlbfs_ops forward declaration hugetlbfs: use helper macro SZ_1{K,M} mm: cleanup is_highmem() mm/hmm: add a test for cross device private faults selftests: add soft-dirty into run_vmtests.sh selftests: soft-dirty: add test for mprotect mm/mprotect: fix soft-dirty check in can_change_pte_writable() mm: memcontrol: fix potential oom_lock recursion deadlock mm/gup.c: fix formatting in check_and_migrate_movable_page() xfs: fail dax mount if reflink is enabled on a partition mm/memcontrol.c: remove the redundant updating of stats_flush_threshold userfaultfd: don't fail on unrecognized features hugetlb_cgroup: fix wrong hugetlb cgroup numa stat ...
2022-08-05	Merge tag 'for-5.20/block-2022-08-04' of git://git.kernel.dk/linux-block	Linus Torvalds	13	-528/+804
	Pull block driver updates from Jens Axboe: - NVMe pull requests via Christoph: - add support for In-Band authentication (Hannes Reinecke) - handle the persistent internal error AER (Michael Kelley) - use in-capsule data for TCP I/O queue connect (Caleb Sander) - remove timeout for getting RDMA-CM established event (Israel Rukshin) - misc cleanups (Joel Granados, Sagi Grimberg, Chaitanya Kulkarni, Guixin Liu, Xiang wangx) - use command_id instead of req->tag in trace_nvme_complete_rq() (Bean Huo) - various fixes for the new authentication code (Lukas Bulwahn, Dan Carpenter, Colin Ian King, Chaitanya Kulkarni, Hannes Reinecke) - small cleanups (Liu Song, Christoph Hellwig) - restore compat_ioctl support (Nick Bowler) - make a nvmet-tcp workqueue lockdep-safe (Sagi Grimberg) - enable generic interface (/dev/ngXnY) for unknown command sets (Joel Granados, Christoph Hellwig) - don't always build constants.o (Christoph Hellwig) - print the command name of aborted commands (Christoph Hellwig) - MD pull requests via Song: - Improve raid5 lock contention, by Logan Gunthorpe. - Misc fixes to raid5, by Logan Gunthorpe. - Fix race condition with md_reap_sync_thread(), by Guoqing Jiang. - Fix potential deadlock with raid5_quiesce and raid5_get_active_stripe, by Logan Gunthorpe. - Refactoring md_alloc(), by Christoph" - Fix md disk_name lifetime problems, by Christoph Hellwig - Convert prepare_to_wait() to wait_woken() api, by Logan Gunthorpe; - Fix sectors_to_do bitmap issue, by Logan Gunthorpe. - Work on unifying the null_blk module parameters and configfs API (Vincent) - drbd bitmap IO error fix (Lars) - Set of rnbd fixes (Guoqing, Md Haris) - Remove experimental marker on bcache async device registration (Coly) - Series from cleaning up the bio splitting (Christoph) - Removal of the sx8 block driver. This hardware never really widespread, and it didn't receive a lot of attention after the initial merge of it back in 2005 (Christoph) - A few fixes for s390 dasd (Eric, Jiang) - Followup set of fixes for ublk (Ming) - Support for UBLK_IO_NEED_GET_DATA for ublk (ZiyangZhang) - Fixes for the dio dma alignment (Keith) - Misc fixes and cleanups (Ming, Yu, Dan, Christophe * tag 'for-5.20/block-2022-08-04' of git://git.kernel.dk/linux-block: (136 commits) s390/dasd: Establish DMA alignment s390/dasd: drop unexpected word 'for' in comments ublk_drv: add support for UBLK_IO_NEED_GET_DATA ublk_cmd.h: add one new ublk command: UBLK_IO_NEED_GET_DATA ublk_drv: cleanup ublksrv_ctrl_dev_info ublk_drv: add SET_PARAMS/GET_PARAMS control command ublk_drv: fix ublk device leak in case that add_disk fails ublk_drv: cancel device even though disk isn't up block: fix leaking page ref on truncated direct io block: ensure bio_iov_add_page can't fail block: ensure iov_iter advances for added pages drivers:md:fix a potential use-after-free bug md/raid5: Ensure batch_last is released before sleeping for quiesce md/raid5: Move stripe_request_ctx up md/raid5: Drop unnecessary call to r5c_check_stripe_cache_usage() md/raid5: Make is_inactive_blocked() helper md/raid5: Refactor raid5_get_active_stripe() block: pass struct queue_limits to the bio splitting helpers block: move bio_allowed_max_sectors to blk-merge.c block: move the call to get_max_io_size out of blk_bio_segment_split ...
2022-08-04	dm verity: have verify_wq use WQ_HIGHPRI if "try_verify_in_tasklet"	Mike Snitzer	1	-1/+11
	Allow verify_wq to preempt softirq since verification in tasklet will fall-back to using it for error handling (or if the bufio cache doesn't have required hashes). Suggested-by: Nathan Huckleberry <nhuck@google.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-08-04	dm verity: remove WQ_CPU_INTENSIVE flag since using WQ_UNBOUND	Mike Snitzer	1	-1/+1
	The documentation [1] says that WQ_CPU_INTENSIVE is "meaningless" for unbound wq. So remove WQ_CPU_INTENSIVE from the verify_wq allocation. 1. https://www.kernel.org/doc/html/latest/core-api/workqueue.html#flags Suggested-by: Maksym Planeta <mplaneta@os.inf.tu-dresden.de> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-08-04	dm verity: only copy bvec_iter in verity_verify_io if in_tasklet	Mike Snitzer	1	-9/+16
	Avoid extra bvec_iter copy unless it is needed to allow retrying verification, that failed from a tasklet, from a workqueue. Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-08-04	dm verity: optimize verity_verify_io if FEC not configured	Mike Snitzer	1	-1/+8
	Only declare and copy bvec_iter if CONFIG_DM_VERITY_FEC is defined and FEC enabled for the verity device. Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-08-04	dm verity: conditionally enable branching for "try_verify_in_tasklet"	Mike Snitzer	1	-5/+14
	Use jump_label to limit the need for branching unless the optional "try_verify_in_tasklet" feature is used. Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-08-04	dm bufio: conditionally enable branching for DM_BUFIO_CLIENT_NO_SLEEP	Mike Snitzer	1	-4/+11
	Use jump_label to limit the need for branching unless the optional DM_BUFIO_CLIENT_NO_SLEEP is used. Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-08-04	dm verity: allow optional args to alter primary args handling	Mike Snitzer	1	-8/+24
	The previous commit ("dm verity: Add optional "try_verify_in_tasklet" feature") imposed that CRYPTO_ALG_ASYNC mask be used even if the optional "try_verify_in_tasklet" feature was not specified. This was because verity_parse_opt_args() was called after handling the primary args (due to it having data dependencies on having first parsed all primary args). Enhance verity_ctr() so that simple optional args, that don't have a data dependency on primary args parsing, can alter how the primary args are handled. In practice this means verity_parse_opt_args() gets called twice. First with the new 'only_modifier_opts' arg set to true, then again with it set to false _after_ parsing all primary args. This allows the v->use_tasklet flag to be properly set and then used when verity_ctr() parses the primary args and then calls crypto_alloc_ahash() with CRYPTO_ALG_ASYNC conditionally set. Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-08-04	dm verity: Add optional "try_verify_in_tasklet" feature	Nathan Huckleberry	2	-18/+91
	Using tasklets for disk verification can reduce IO latency. When there are accelerated hash instructions it is often better to compute the hash immediately using a tasklet rather than deferring verification to a work-queue. This reduces time spent waiting to schedule work-queue jobs, but requires spending slightly more time in interrupt context. If the dm-bufio cache does not have the required hashes we fallback to the work-queue implementation. FEC is only possible using work-queue because code to support the FEC feature may sleep. The following shows a speed comparison of random reads on a dm-verity device. The dm-verity device uses a 1G ramdisk for data and a 1G ramdisk for hashes. One test was run using tasklets and one test was run using the existing work-queue solution. Both tests were run when the dm-bufio cache was hot. The tasklet implementation performs significantly better since there is no time spent waiting for work-queue jobs to be scheduled. READ: bw=181MiB/s (190MB/s), 181MiB/s-181MiB/s (190MB/s-190MB/s), io=512MiB (537MB), run=2827-2827msec READ: bw=23.6MiB/s (24.8MB/s), 23.6MiB/s-23.6MiB/s (24.8MB/s-24.8MB/s), io=512MiB (537MB), run=21688-21688msec Signed-off-by: Nathan Huckleberry <nhuck@google.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-08-03	drivers:md:fix a potential use-after-free bug	Wentao_Liang	1	-1/+1
	In line 2884, "raid5_release_stripe(sh);" drops the reference to sh and may cause sh to be released. However, sh is subsequently used in lines 2886 "if (sh->batch_head && sh != sh->batch_head)". This may result in an use-after-free bug. It can be fixed by moving "raid5_release_stripe(sh);" to the bottom of the function. Signed-off-by: Wentao_Liang <Wentao_Liang_g@163.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03	md/raid5: Ensure batch_last is released before sleeping for quiesce	Logan Gunthorpe	2	-9/+29
	A race condition exists where if raid5_quiesce() is called in the middle of a request that has set batch_last, it will deadlock. batch_last will hold a reference to a stripe when raid5_quiesce() is called. This will cause the next raid5_get_active_stripe() call to sleep waiting for the quiesce to finish, but the raid5_quiesce() thread will wait for active_stripes to go to zero which will never happen because request thread is waiting for the quiesce to stop. Fix this by creating a special __raid5_get_active_stripe() function which takes the request context and clears the last_batch before sleeping. While we're at it, change the arguments of raid5_get_active_stripe() to bools. Fixes: 3312e6c887fe ("md/raid5: Keep a reference to last stripe_head for batch") Reported-by: David Sloan <David.Sloan@eideticom.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03	md/raid5: Move stripe_request_ctx up	Logan Gunthorpe	1	-27/+27
	Move stripe_request_ctx up. No functional changes intended. This will be necessary in the next patch to release the batch_last in the context before sleeping. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03	md/raid5: Drop unnecessary call to r5c_check_stripe_cache_usage()	Logan Gunthorpe	1	-1/+0
	Now that raid5_get_active_stripe() has been refactored it is appearant that r5c_check_stripe_cache_usage() doesn't need to be called in the wait_for_stripe branch. r5c_check_stripe_cache_usage() will only conditionally call r5l_wake_reclaim(), but that function is called two lines later. Drop the call for cleanup. Reported-by: Martin Oliveira <martin.oliveira@eideticom.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03	md/raid5: Make is_inactive_blocked() helper	Logan Gunthorpe	1	-5/+19
	The logic to wait_for_stripe is difficult to parse being on so many lines and with confusing operator precedence. Move it to a helper function to make it easier to read. No functional changes intended. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03	md/raid5: Refactor raid5_get_active_stripe()	Logan Gunthorpe	1	-31/+36
	Refactor the raid5_get_active_stripe() to read more linearly in the order it's typically executed. The init_stripe() call is called if a free stripe is found and the function is exited early which removes a lot of if (sh) checks and unindents the following code. Remove the while loop in favour of the 'goto retry' pattern, which reduces indentation further. And use a 'goto wait_for_stripe' instead of an additional indent seeing it is the unusual path and this makes the code easier to read. No functional changes intended. Will make subsequent changes in patches easier to understand. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03	block: move ->bio_split to the gendisk	Christoph Hellwig	1	-1/+1
	Only non-passthrough requests are split by the block layer and use the ->bio_split bio_set. Move it from the request_queue to the gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220727162300.3089193-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03	block: change the blk_queue_split calling convention	Christoph Hellwig	2	-4/+4
	The double indirect bio leads to somewhat suboptimal code generation. Instead return the (original or split) bio, and make sure the request_queue arguments to the lower level helpers is passed after the bio to avoid constant reshuffling of the argument passing registers. Also give it and the helpers used to implement it more descriptive names. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220727162300.3089193-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03	md-raid10: fix KASAN warning	Mikulas Patocka	1	-1/+4
	There's a KASAN warning in raid10_remove_disk when running the lvm test lvconvert-raid-reshape.sh. We fix this warning by verifying that the value "number" is valid. BUG: KASAN: slab-out-of-bounds in raid10_remove_disk+0x61/0x2a0 [raid10] Read of size 8 at addr ffff889108f3d300 by task mdX_raid10/124682 CPU: 3 PID: 124682 Comm: mdX_raid10 Not tainted 5.19.0-rc6 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0x45/0x57a ? __lock_text_start+0x18/0x18 ? raid10_remove_disk+0x61/0x2a0 [raid10] kasan_report+0xa8/0xe0 ? raid10_remove_disk+0x61/0x2a0 [raid10] raid10_remove_disk+0x61/0x2a0 [raid10] Buffer I/O error on dev dm-76, logical block 15344, async page read ? __mutex_unlock_slowpath.constprop.0+0x1e0/0x1e0 remove_and_add_spares+0x367/0x8a0 [md_mod] ? super_written+0x1c0/0x1c0 [md_mod] ? mutex_trylock+0xac/0x120 ? _raw_spin_lock+0x72/0xc0 ? _raw_spin_lock_bh+0xc0/0xc0 md_check_recovery+0x848/0x960 [md_mod] raid10d+0xcf/0x3360 [raid10] ? sched_clock_cpu+0x185/0x1a0 ? rb_erase+0x4d4/0x620 ? var_wake_function+0xe0/0xe0 ? psi_group_change+0x411/0x500 ? preempt_count_sub+0xf/0xc0 ? _raw_spin_lock_irqsave+0x78/0xc0 ? __lock_text_start+0x18/0x18 ? raid10_sync_request+0x36c0/0x36c0 [raid10] ? preempt_count_sub+0xf/0xc0 ? _raw_spin_unlock_irqrestore+0x19/0x40 ? del_timer_sync+0xa9/0x100 ? try_to_del_timer_sync+0xc0/0xc0 ? _raw_spin_lock_irqsave+0x78/0xc0 ? __lock_text_start+0x18/0x18 ? _raw_spin_unlock_irq+0x11/0x24 ? __list_del_entry_valid+0x68/0xa0 ? finish_wait+0xa3/0x100 md_thread+0x161/0x260 [md_mod] ? unregister_md_personality+0xa0/0xa0 [md_mod] ? _raw_spin_lock_irqsave+0x78/0xc0 ? prepare_to_wait_event+0x2c0/0x2c0 ? unregister_md_personality+0xa0/0xa0 [md_mod] kthread+0x148/0x180 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 124495: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x80/0xa0 setup_conf+0x140/0x5c0 [raid10] raid10_run+0x4cd/0x740 [raid10] md_run+0x6f9/0x1300 [md_mod] raid_ctr+0x2531/0x4ac0 [dm_raid] dm_table_add_target+0x2b0/0x620 [dm_mod] table_load+0x1c8/0x400 [dm_mod] ctl_ioctl+0x29e/0x560 [dm_mod] dm_compat_ctl_ioctl+0x7/0x20 [dm_mod] __do_compat_sys_ioctl+0xfa/0x160 do_syscall_64+0x90/0xc0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Last potentially related work creation: kasan_save_stack+0x1e/0x40 __kasan_record_aux_stack+0x9e/0xc0 kvfree_call_rcu+0x84/0x480 timerfd_release+0x82/0x140 L __fput+0xfa/0x400 task_work_run+0x80/0xc0 exit_to_user_mode_prepare+0x155/0x160 syscall_exit_to_user_mode+0x12/0x40 do_syscall_64+0x42/0xc0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Second to last potentially related work creation: kasan_save_stack+0x1e/0x40 __kasan_record_aux_stack+0x9e/0xc0 kvfree_call_rcu+0x84/0x480 timerfd_release+0x82/0x140 __fput+0xfa/0x400 task_work_run+0x80/0xc0 exit_to_user_mode_prepare+0x155/0x160 syscall_exit_to_user_mode+0x12/0x40 do_syscall_64+0x42/0xc0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 The buggy address belongs to the object at ffff889108f3d200 which belongs to the cache kmalloc-256 of size 256 The buggy address is located 0 bytes to the right of 256-byte region [ffff889108f3d200, ffff889108f3d300) The buggy address belongs to the physical page: page:000000007ef2a34c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1108f3c head:000000007ef2a34c order:2 compound_mapcount:0 compound_pincount:0 flags: 0x4000000000010200(slab\|head\|zone=2) raw: 4000000000010200 0000000000000000 dead000000000001 ffff889100042b40 raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff889108f3d200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff889108f3d280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff889108f3d300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff889108f3d380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff889108f3d400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03	md-raid: destroy the bitmap after destroying the thread	Mikulas Patocka	1	-1/+1
	When we ran the lvm test "shell/integrity-blocksize-3.sh" on a kernel with kasan, we got failure in write_page. The reason for the failure is that md_bitmap_destroy is called before destroying the thread and the thread may be waiting in the function write_page for the bio to complete. When the thread finishes waiting, it executes "if (test_bit(BITMAP_WRITE_ERROR, &bitmap->flags))", which triggers the kasan warning. Note that the commit 48df498daf62 that caused this bug claims that it is neede for md-cluster, you should check md-cluster and possibly find another bugfix for it. BUG: KASAN: use-after-free in write_page+0x18d/0x680 [md_mod] Read of size 8 at addr ffff889162030c78 by task mdX_raid1/5539 CPU: 10 PID: 5539 Comm: mdX_raid1 Not tainted 5.19.0-rc2 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0x45/0x57a ? __lock_text_start+0x18/0x18 ? write_page+0x18d/0x680 [md_mod] kasan_report+0xa8/0xe0 ? write_page+0x18d/0x680 [md_mod] kasan_check_range+0x13f/0x180 write_page+0x18d/0x680 [md_mod] ? super_sync+0x4d5/0x560 [dm_raid] ? md_bitmap_file_kick+0xa0/0xa0 [md_mod] ? rs_set_dev_and_array_sectors+0x2e0/0x2e0 [dm_raid] ? mutex_trylock+0x120/0x120 ? preempt_count_add+0x6b/0xc0 ? preempt_count_sub+0xf/0xc0 md_update_sb+0x707/0xe40 [md_mod] md_reap_sync_thread+0x1b2/0x4a0 [md_mod] md_check_recovery+0x533/0x960 [md_mod] raid1d+0xc8/0x2a20 [raid1] ? var_wake_function+0xe0/0xe0 ? psi_group_change+0x411/0x500 ? preempt_count_sub+0xf/0xc0 ? _raw_spin_lock_irqsave+0x78/0xc0 ? __lock_text_start+0x18/0x18 ? raid1_end_read_request+0x2a0/0x2a0 [raid1] ? preempt_count_sub+0xf/0xc0 ? _raw_spin_unlock_irqrestore+0x19/0x40 ? del_timer_sync+0xa9/0x100 ? try_to_del_timer_sync+0xc0/0xc0 ? _raw_spin_lock_irqsave+0x78/0xc0 ? __lock_text_start+0x18/0x18 ? __list_del_entry_valid+0x68/0xa0 ? finish_wait+0xa3/0x100 md_thread+0x161/0x260 [md_mod] ? unregister_md_personality+0xa0/0xa0 [md_mod] ? _raw_spin_lock_irqsave+0x78/0xc0 ? prepare_to_wait_event+0x2c0/0x2c0 ? unregister_md_personality+0xa0/0xa0 [md_mod] kthread+0x148/0x180 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 5522: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x80/0xa0 md_bitmap_create+0xa8/0xe80 [md_mod] md_run+0x777/0x1300 [md_mod] raid_ctr+0x249c/0x4a30 [dm_raid] dm_table_add_target+0x2b0/0x620 [dm_mod] table_load+0x1c8/0x400 [dm_mod] ctl_ioctl+0x29e/0x560 [dm_mod] dm_compat_ctl_ioctl+0x7/0x20 [dm_mod] __do_compat_sys_ioctl+0xfa/0x160 do_syscall_64+0x90/0xc0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 5680: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x40 kasan_set_free_info+0x20/0x40 __kasan_slab_free+0xf7/0x140 kfree+0x80/0x240 md_bitmap_free+0x1c3/0x280 [md_mod] __md_stop+0x21/0x120 [md_mod] md_stop+0x9/0x40 [md_mod] raid_dtr+0x1b/0x40 [dm_raid] dm_table_destroy+0x98/0x1e0 [dm_mod] __dm_destroy+0x199/0x360 [dm_mod] dev_remove+0x10c/0x160 [dm_mod] ctl_ioctl+0x29e/0x560 [dm_mod] dm_compat_ctl_ioctl+0x7/0x20 [dm_mod] __do_compat_sys_ioctl+0xfa/0x160 do_syscall_64+0x90/0xc0 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Fixes: 48df498daf62 ("md: move bitmap_destroy to the beginning of __md_stop") Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>