summaryrefslogtreecommitdiff
path: root/drivers/block
AgeCommit message (Collapse)AuthorFilesLines
2019-12-05block: drbd: remove a stray unlock in __drbd_send_protocol()Dan Carpenter1-1/+0
[ Upstream commit 8e9c523016cf9983b295e4bc659183d1fa6ef8e0 ] There are two callers of this function and they both unlock the mutex so this ends up being a double unlock. Fixes: 44ed167da748 ("drbd: rcu_read_lock() and rcu_dereference() for tconn->net_conf") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-29nbd: prevent memory leakNavid Emamdoost1-2/+3
commit 03bf73c315edca28f47451913177e14cd040a216 upstream. In nbd_add_socket when krealloc succeeds, if nsock's allocation fail the reallocted memory is leak. The correct behaviour should be assigning the reallocted memory to config->socks right after success. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-29nbd:fix memory leak in nbd_get_socket()Sun Ke1-0/+1
commit dff10bbea4be47bdb615b036c834a275b7c68133 upstream. Before returning NULL, put the sock first. Cc: stable@vger.kernel.org Fixes: cf1b2326b734 ("nbd: verify socket is supported during setup") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Sun Ke <sunke32@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-10nbd: handle racing with error'ed out commandsJosef Bacik1-0/+6
[ Upstream commit 7ce23e8e0a9cd38338fc8316ac5772666b565ca9 ] We hit the following warning in production print_req_error: I/O error, dev nbd0, sector 7213934408 flags 80700 ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 25 PID: 32407 at lib/refcount.c:190 refcount_sub_and_test_checked+0x53/0x60 Workqueue: knbd-recv recv_work [nbd] RIP: 0010:refcount_sub_and_test_checked+0x53/0x60 Call Trace: blk_mq_free_request+0xb7/0xf0 blk_mq_complete_request+0x62/0xf0 recv_work+0x29/0xa1 [nbd] process_one_work+0x1f5/0x3f0 worker_thread+0x2d/0x3d0 ? rescuer_thread+0x340/0x340 kthread+0x111/0x130 ? kthread_create_on_node+0x60/0x60 ret_from_fork+0x1f/0x30 ---[ end trace b079c3c67f98bb7c ]--- This was preceded by us timing out everything and shutting down the sockets for the device. The problem is we had a request in the queue at the same time, so we completed the request twice. This can actually happen in a lot of cases, we fail to get a ref on our config, we only have one connection and just error out the command, etc. Fix this by checking cmd->status in nbd_read_stat. We only change this under the cmd->lock, so we are safe to check this here and see if we've already error'ed this command out, which would indicate that we've completed it as well. Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-10nbd: protect cmd->status with cmd->lockJosef Bacik1-5/+7
[ Upstream commit de6346ecbc8f5591ebd6c44ac164e8b8671d71d7 ] We already do this for the most part, except in timeout and clear_req. For the timeout case we take the lock after we grab a ref on the config, but that isn't really necessary because we're safe to touch the cmd at this point, so just move the order around. For the clear_req cause this is initiated by the user, so again is safe. Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-06nbd: verify socket is supported during setupMike Christie1-2/+21
[ Upstream commit cf1b2326b734896734c6e167e41766f9cee7686a ] nbd requires socket families to support the shutdown method so the nbd recv workqueue can be woken up from its sock_recvmsg call. If the socket does not support the callout we will leave recv works running or get hangs later when the device or module is removed. This adds a check during socket connection/reconnection to make sure the socket being passed in supports the needed callout. Reported-by: syzbot+24c12fa8d218ed26011a@syzkaller.appspotmail.com Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs") Tested-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-06nbd: fix possible sysfs duplicate warningXiubo Li1-1/+1
[ Upstream commit 862488105b84ca744b3d8ff131e0fcfe10644be1 ] 1. nbd_put takes the mutex and drops nbd->ref to 0. It then does idr_remove and drops the mutex. 2. nbd_genl_connect takes the mutex. idr_find/idr_for_each fails to find an existing device, so it does nbd_dev_add. 3. just before the nbd_put could call nbd_dev_remove or not finished totally, but if nbd_dev_add try to add_disk, we can hit: debugfs: Directory 'nbd1' with parent 'block' already present! This patch will make sure all the disk add/remove stuff are done by holding the nbd_index_mutex lock. Reported-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-29zram: fix race between backing_dev_show and backing_dev_storeChenwandun1-2/+3
commit f7daefe4231e57381d92c2e2ad905a899c28e402 upstream. CPU0: CPU1: backing_dev_show backing_dev_store ...... ...... file = zram->backing_dev; down_read(&zram->init_lock); down_read(&zram->init_init_lock) file_path(file, ...); zram->backing_dev = backing_dev; up_read(&zram->init_lock); up_read(&zram->init_lock); gets the value of zram->backing_dev too early in backing_dev_show, which resultin the value being NULL at the beginning, and not NULL later. backtrace: d_path+0xcc/0x174 file_path+0x10/0x18 backing_dev_show+0x40/0xb4 dev_attr_show+0x20/0x54 sysfs_kf_seq_show+0x9c/0x10c kernfs_seq_show+0x28/0x30 seq_read+0x184/0x488 kernfs_fop_read+0x5c/0x1a4 __vfs_read+0x44/0x128 vfs_read+0xa0/0x138 SyS_read+0x54/0xb4 Link: http://lkml.kernel.org/r/1571046839-16814-1-git-send-email-chenwandun@huawei.com Signed-off-by: Chenwandun <chenwandun@huawei.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> [4.14+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-29loop: change queue block size to match when using DIOMartijn Coenen1-0/+10
[ Upstream commit 85560117d00f5d528e928918b8f61cadcefff98b ] The loop driver assumes that if the passed in fd is opened with O_DIRECT, the caller wants to use direct I/O on the loop device. However, if the underlying block device has a different block size than the loop block queue, direct I/O can't be enabled. Instead of requiring userspace to manually change the blocksize and re-enable direct I/O, just change the queue block sizes to match, as well as the io_min size. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martijn Coenen <maco@android.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-11nbd: fix max number of supported devsMike Christie1-14/+25
commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4 upstream. This fixes a bug added in 4.10 with commit: commit 9561a7ade0c205bc2ee035a2ac880478dcc1a024 Author: Josef Bacik <jbacik@fb.com> Date: Tue Nov 22 14:04:40 2016 -0500 nbd: add multi-connection support that limited the number of devices to 256. Before the patch we could create 1000s of devices, but the patch switched us from using our own thread to using a work queue which has a default limit of 256 active works. The problem is that our recv_work function sits in a loop until disconnection but only handles IO for one connection. The work is started when the connection is started/restarted, but if we end up creating 257 or more connections, the queue_work call just queues connection257+'s recv_work and that waits for connection 1 - 256's recv_work to be disconnected and that work instance completing. Instead of reverting back to kthreads, this has us allocate a workqueue_struct per device, so we can block in the work. Cc: stable@vger.kernel.org Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-07pktcdvd: remove warning on attempting to register non-passthrough devJens Axboe1-1/+0
[ Upstream commit eb09b3cc464d2c3bbde9a6648603c8d599ea8582 ] Anatoly reports that he gets the below warning when booting -git on a sparc64 box on debian unstable: ... [ 13.352975] aes_sparc64: Using sparc64 aes opcodes optimized AES implementation [ 13.428002] ------------[ cut here ]------------ [ 13.428081] WARNING: CPU: 21 PID: 586 at drivers/block/pktcdvd.c:2597 pkt_setup_dev+0x2e4/0x5a0 [pktcdvd] [ 13.428147] Attempt to register a non-SCSI queue [ 13.428184] Modules linked in: pktcdvd libdes cdrom aes_sparc64 n2_rng md5_sparc64 sha512_sparc64 rng_core sha256_sparc64 flash sha1_sparc64 ip_tables x_tables ipv6 crc_ccitt nf_defrag_ipv6 autofs4 ext4 crc16 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear md_mod crc32c_sparc64 [ 13.428452] CPU: 21 PID: 586 Comm: pktsetup Not tainted 5.3.0-10169-g574cc4539762 #1234 [ 13.428507] Call Trace: [ 13.428542] [00000000004635c0] __warn+0xc0/0x100 [ 13.428582] [0000000000463634] warn_slowpath_fmt+0x34/0x60 [ 13.428626] [000000001045b244] pkt_setup_dev+0x2e4/0x5a0 [pktcdvd] [ 13.428674] [000000001045ccf4] pkt_ctl_ioctl+0x94/0x220 [pktcdvd] [ 13.428724] [00000000006b95c8] do_vfs_ioctl+0x628/0x6e0 [ 13.428764] [00000000006b96c8] ksys_ioctl+0x48/0x80 [ 13.428803] [00000000006b9714] sys_ioctl+0x14/0x40 [ 13.428847] [0000000000406294] linux_sparc_syscall+0x34/0x44 [ 13.428890] irq event stamp: 4181 [ 13.428924] hardirqs last enabled at (4189): [<00000000004e0a74>] console_unlock+0x634/0x6c0 [ 13.428984] hardirqs last disabled at (4196): [<00000000004e0540>] console_unlock+0x100/0x6c0 [ 13.429048] softirqs last enabled at (3978): [<0000000000b2e2d8>] __do_softirq+0x498/0x520 [ 13.429110] softirqs last disabled at (3967): [<000000000042cfb4>] do_softirq_own_stack+0x34/0x60 [ 13.429172] ---[ end trace 2220ca468f32967d ]--- [ 13.430018] pktcdvd: setup of pktcdvd device failed [ 13.455589] des_sparc64: Using sparc64 des opcodes optimized DES implementation [ 13.515334] camellia_sparc64: Using sparc64 camellia opcodes optimized CAMELLIA implementation [ 13.522856] pktcdvd: setup of pktcdvd device failed [ 13.529327] pktcdvd: setup of pktcdvd device failed [ 13.532932] pktcdvd: setup of pktcdvd device failed [ 13.536165] pktcdvd: setup of pktcdvd device failed [ 13.539372] pktcdvd: setup of pktcdvd device failed [ 13.542834] pktcdvd: setup of pktcdvd device failed [ 13.546536] pktcdvd: setup of pktcdvd device failed [ 15.431071] XFS (dm-0): Mounting V5 Filesystem ... Apparently debian auto-attaches any cdrom like device to pktcdvd, which can lead to the above warning. There's really no reason to warn for this situation, kill it. Reported-by: Anatoly Pugachev <matorola@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05nbd: add missing config putMike Christie1-1/+3
[ Upstream commit 887e975c4172d0d5670c39ead2f18ba1e4ec8133 ] Fix bug added with the patch: commit 8f3ea35929a0806ad1397db99a89ffee0140822a Author: Josef Bacik <josef@toxicpanda.com> Date: Mon Jul 16 12:11:35 2018 -0400 nbd: handle unexpected replies better where if the timeout handler runs when the completion path is and we fail to grab the mutex in the timeout handler we will leave a config reference and cannot free the config later. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-05loop: Add LOOP_SET_DIRECT_IO to compat ioctlAlessio Balsini1-0/+1
[ Upstream commit fdbe4eeeb1aac219b14f10c0ed31ae5d1123e9b8 ] Enabling Direct I/O with loop devices helps reducing memory usage by avoiding double caching. 32 bit applications running on 64 bits systems are currently not able to request direct I/O because is missing from the lo_compat_ioctl. This patch fixes the compatibility issue mentioned above by exporting LOOP_SET_DIRECT_IO as additional lo_compat_ioctl() entry. The input argument for this ioctl is a single long converted to a 1-bit boolean, so compatibility is preserved. Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Alessio Balsini <balsini@android.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-21floppy: fix usercopy directionJann Horn1-2/+2
commit 52f6f9d74f31078964ca1574f7bb612da7877ac8 upstream. As sparse points out, these two copy_from_user() should actually be copy_to_user(). Fixes: 229b53c9bf4e ("take floppy compat ioctls to sodding floppy.c") Cc: stable@vger.kernel.org Acked-by: Alexander Popov <alex.popov@linux.com> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-28rbd: restore zeroing past the overlap when reading from parentIlya Dryomov1-0/+11
The parent image is read only up to the overlap point, the rest of the buffer should be zeroed. This snuck in because as it turns out the overlap test case has not been triggering this code path for a while now. Fixes: a9b67e69949d ("rbd: replace obj_req->tried_parent with obj_req->read_state") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2019-08-20Merge branch 'siginfo-linus' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull kernel thread signal handling fix from Eric Biederman: "I overlooked the fact that kernel threads are created with all signals set to SIG_IGN, and accidentally caused a regression in cifs and drbd when replacing force_sig with send_sig. This is my fix for that regression. I add a new function allow_kernel_signal which allows kernel threads to receive signals sent from the kernel, but continues to ignore all signals sent from userspace. This ensures the user space interface for cifs and drbd remain the same. These kernel threads depend on blocking networking calls which block until something is received or a signal is pending. Making receiving of signals somewhat necessary for these kernel threads. Perhaps someday we can cleanup those interfaces and remove allow_kernel_signal. If not allow_kernel_signal is pretty trivial and clearly documents what is going on so I don't think we will mind carrying it" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: signal: Allow cifs and drbd to receive their terminating signals
2019-08-19signal: Allow cifs and drbd to receive their terminating signalsEric W. Biederman1-0/+2
My recent to change to only use force_sig for a synchronous events wound up breaking signal reception cifs and drbd. I had overlooked the fact that by default kthreads start out with all signals set to SIG_IGN. So a change I thought was safe turned out to have made it impossible for those kernel thread to catch their signals. Reverting the work on force_sig is a bad idea because what the code was doing was very much a misuse of force_sig. As the way force_sig ultimately allowed the signal to happen was to change the signal handler to SIG_DFL. Which after the first signal will allow userspace to send signals to these kernel threads. At least for wake_ack_receiver in drbd that does not appear actively wrong. So correct this problem by adding allow_kernel_signal that will allow signals whose siginfo reports they were sent by the kernel through, but will not allow userspace generated signals, and update cifs and drbd to call allow_kernel_signal in an appropriate place so that their thread can receive this signal. Fixing things this way ensures that userspace won't be able to send signals and cause problems, that it is clear which signals the threads are expecting to receive, and it guarantees that nothing else in the system will be affected. This change was partly inspired by similar cifs and drbd patches that added allow_signal. Reported-by: ronnie sahlberg <ronniesahlberg@gmail.com> Reported-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Cc: Steve French <smfrench@gmail.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: David Laight <David.Laight@ACULAB.COM> Fixes: 247bc9470b1e ("cifs: fix rmmod regression in cifs.ko caused by force_sig changes") Fixes: 72abe3bcf091 ("signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force_sig") Fixes: fee109901f39 ("signal/drbd: Use send_sig not force_sig") Fixes: 3cf5d076fb4d ("signal: Remove task parameter from force_sig") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2019-08-12xen/blkback: fix memory leaksWenwen Wang1-3/+3
In read_per_ring_refs(), after 'req' and related memory regions are allocated, xen_blkif_map() is invoked to map the shared frame, irq, and etc. However, if this mapping process fails, no cleanup is performed, leading to memory leaks. To fix this issue, invoke the cleanup before returning the error. Acked-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-08loop: set PF_MEMALLOC_NOIO for the worker threadMikulas Patocka1-1/+1
A deadlock with this stacktrace was observed. The loop thread does a GFP_KERNEL allocation, it calls into dm-bufio shrinker and the shrinker depends on I/O completion in the dm-bufio subsystem. In order to fix the deadlock (and other similar ones), we set the flag PF_MEMALLOC_NOIO at loop thread entry. PID: 474 TASK: ffff8813e11f4600 CPU: 10 COMMAND: "kswapd0" #0 [ffff8813dedfb938] __schedule at ffffffff8173f405 #1 [ffff8813dedfb990] schedule at ffffffff8173fa27 #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186 #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8 #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81 #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio] #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio] #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio] #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778 #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f #13 [ffff8813dedfbec0] kthread at ffffffff810a8428 #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242 PID: 14127 TASK: ffff881455749c00 CPU: 11 COMMAND: "loop1" #0 [ffff88272f5af228] __schedule at ffffffff8173f405 #1 [ffff88272f5af280] schedule at ffffffff8173fa27 #2 [ffff88272f5af2a0] schedule_preempt_disabled at ffffffff8173fd5e #3 [ffff88272f5af2b0] __mutex_lock_slowpath at ffffffff81741fb5 #4 [ffff88272f5af330] mutex_lock at ffffffff81742133 #5 [ffff88272f5af350] dm_bufio_shrink_count at ffffffffa03865f9 [dm_bufio] #6 [ffff88272f5af380] shrink_slab at ffffffff811a86bd #7 [ffff88272f5af470] shrink_zone at ffffffff811ad778 #8 [ffff88272f5af500] do_try_to_free_pages at ffffffff811adb34 #9 [ffff88272f5af590] try_to_free_pages at ffffffff811adef8 #10 [ffff88272f5af610] __alloc_pages_nodemask at ffffffff811a09c3 #11 [ffff88272f5af710] alloc_pages_current at ffffffff811e8b71 #12 [ffff88272f5af760] new_slab at ffffffff811f4523 #13 [ffff88272f5af7b0] __slab_alloc at ffffffff8173a1b5 #14 [ffff88272f5af880] kmem_cache_alloc at ffffffff811f484b #15 [ffff88272f5af8d0] do_blockdev_direct_IO at ffffffff812535b3 #16 [ffff88272f5afb00] __blockdev_direct_IO at ffffffff81255dc3 #17 [ffff88272f5afb30] xfs_vm_direct_IO at ffffffffa01fe3fc [xfs] #18 [ffff88272f5afb90] generic_file_read_iter at ffffffff81198994 #19 [ffff88272f5afc50] __dta_xfs_file_read_iter_2398 at ffffffffa020c970 [xfs] #20 [ffff88272f5afcc0] lo_rw_aio at ffffffffa0377042 [loop] #21 [ffff88272f5afd70] loop_queue_work at ffffffffa0377c3b [loop] #22 [ffff88272f5afe60] kthread_worker_fn at ffffffff810a8a0c #23 [ffff88272f5afec0] kthread at ffffffff810a8428 #24 [ffff88272f5aff50] ret_from_fork at ffffffff81745242 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-08block: aoe: Fix kernel crash due to atomic sleep when exitingHe Zhe1-3/+10
Since commit 3582dd291788 ("aoe: convert aoeblk to blk-mq"), aoedev_downdev has had the possibility of sleeping and causing the following crash. BUG: scheduling while atomic: rmmod/2242/0x00000003 Modules linked in: aoe Preemption disabled at: [<ffffffffc01d95e5>] flush+0x95/0x4a0 [aoe] CPU: 7 PID: 2242 Comm: rmmod Tainted: G I 5.2.3 #1 Hardware name: Intel Corporation S5520HC/S5520HC, BIOS S5500.86B.01.10.0025.030220091519 03/02/2009 Call Trace: dump_stack+0x4f/0x6a ? flush+0x95/0x4a0 [aoe] __schedule_bug.cold+0x44/0x54 __schedule+0x44f/0x680 schedule+0x44/0xd0 blk_mq_freeze_queue_wait+0x46/0xb0 ? wait_woken+0x80/0x80 blk_mq_freeze_queue+0x1b/0x20 aoedev_downdev+0x111/0x160 [aoe] flush+0xff/0x4a0 [aoe] aoedev_exit+0x23/0x30 [aoe] aoe_exit+0x35/0x948 [aoe] __se_sys_delete_module+0x183/0x210 __x64_sys_delete_module+0x16/0x20 do_syscall_64+0x4d/0x130 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f24e0043b07 Code: 73 01 c3 48 8b 0d 89 73 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 59 73 0b 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe18f7f1e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f24e0043b07 RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000555c3ecf87c8 RBP: 00007ffe18f7f1f0 R08: 0000000000000000 R09: 0000000000000000 R10: 00007f24e00b4ac0 R11: 0000000000000206 R12: 00007ffe18f7f238 R13: 00007ffe18f7f410 R14: 00007ffe18f80e73 R15: 0000555c3ecf8760 This patch, handling in the same way of pass two, unlocks the locks and restart pass one after aoedev_downdev is done. Fixes: 3582dd291788 ("aoe: convert aoeblk to blk-mq") Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-31nbd: replace kill_bdev() with __invalidate_device() againMunehisa Kamata1-1/+1
Commit abbbdf12497d ("replace kill_bdev() with __invalidate_device()") once did this, but 29eaadc03649 ("nbd: stop using the bdev everywhere") resurrected kill_bdev() and it has been there since then. So buffer_head mappings still get killed on a server disconnection, and we can still hit the BUG_ON on a filesystem on the top of the nbd device. EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null) block nbd0: Receive control failed (result -32) block nbd0: shutting down sockets print_req_error: I/O error, dev nbd0, sector 66264 flags 3000 EXT4-fs warning (device nbd0): htree_dirblock_to_tree:979: inode #2: lblock 0: comm ls: error -5 reading directory block print_req_error: I/O error, dev nbd0, sector 2264 flags 3000 EXT4-fs error (device nbd0): __ext4_get_inode_loc:4690: inode #2: block 283: comm ls: unable to read itable block EXT4-fs error (device nbd0) in ext4_reserve_inode_write:5894: IO failure ------------[ cut here ]------------ kernel BUG at fs/buffer.c:3057! invalid opcode: 0000 [#1] SMP PTI CPU: 7 PID: 40045 Comm: jbd2/nbd0-8 Not tainted 5.1.0-rc3+ #4 Hardware name: Amazon EC2 m5.12xlarge/, BIOS 1.0 10/16/2017 RIP: 0010:submit_bh_wbc+0x18b/0x190 ... Call Trace: jbd2_write_superblock+0xf1/0x230 [jbd2] ? account_entity_enqueue+0xc5/0xf0 jbd2_journal_update_sb_log_tail+0x94/0xe0 [jbd2] jbd2_journal_commit_transaction+0x12f/0x1d20 [jbd2] ? __switch_to_asm+0x40/0x70 ... ? lock_timer_base+0x67/0x80 kjournald2+0x121/0x360 [jbd2] ? remove_wait_queue+0x60/0x60 kthread+0xf8/0x130 ? commit_timeout+0x10/0x10 [jbd2] ? kthread_bind+0x10/0x10 ret_from_fork+0x35/0x40 With __invalidate_device(), I no longer hit the BUG_ON with sync or unmount on the disconnected device. Fixes: 29eaadc03649 ("nbd: stop using the bdev everywhere") Cc: linux-block@vger.kernel.org Cc: Ratna Manoj Bolla <manoj.br@gmail.com> Cc: nbd@other.debian.org Cc: stable@vger.kernel.org Cc: David Woodhouse <dwmw@amazon.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Munehisa Kamata <kamatam@amazon.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-30loop: Fix mount(2) failure due to race with LOOP_SET_FDJan Kara1-7/+9
Commit 33ec3e53e7b1 ("loop: Don't change loop device under exclusive opener") made LOOP_SET_FD ioctl acquire exclusive block device reference while it updates loop device binding. However this can make perfectly valid mount(2) fail with EBUSY due to racing LOOP_SET_FD holding temporarily the exclusive bdev reference in cases like this: for i in {a..z}{a..z}; do dd if=/dev/zero of=$i.image bs=1k count=0 seek=1024 mkfs.ext2 $i.image mkdir mnt$i done echo "Run" for i in {a..z}{a..z}; do mount -o loop -t ext2 $i.image mnt$i & done Fix the problem by not getting full exclusive bdev reference in LOOP_SET_FD but instead just mark the bdev as being claimed while we update the binding information. This just blocks new exclusive openers instead of failing them with EBUSY thus fixing the problem. Fixes: 33ec3e53e7b1 ("loop: Don't change loop device under exclusive opener") Cc: stable@vger.kernel.org Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-30ataflop: Mark expected switch fall-throughGustavo A. R. Silva1-0/+1
Mark switch cases where we are expecting to fall through. This patch fixes the following warning (Building: m68k): drivers/block/ataflop.c: In function ‘fd_locked_ioctl’: drivers/block/ataflop.c:1728:3: warning: this statement may fall through [-Wimplicit-fallthrough=] set_capacity(floppy->disk, MAX_DISK_SIZE * 2); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/block/ataflop.c:1729:2: note: here case FDFMTEND: ^~~~ Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-26Merge tag 'for-linus-20190726' of git://git.kernel.dk/linux-blockLinus Torvalds1-2/+12
Pull block fixes from Jens Axboe: - Several io_uring fixes/improvements: - Blocking fix for O_DIRECT (me) - Latter page slowness for registered buffers (me) - Fix poll hang under certain conditions (me) - Defer sequence check fix for wrapped rings (Zhengyuan) - Mismatch in async inc/dec accounting (Zhengyuan) - Memory ordering issue that could cause stall (Zhengyuan) - Track sequential defer in bytes, not pages (Zhengyuan) - NVMe pull request from Christoph - Set of hang fixes for wbt (Josef) - Redundant error message kill for libahci (Ding) - Remove unused blk_mq_sched_started_request() and related ops (Marcos) - drbd dynamic alloc shash descriptor to reduce stack use (Arnd) - blkcg ->pd_stat() non-debug print (Tejun) - bcache memory leak fix (Wei) - Comment fix (Akinobu) - BFQ perf regression fix (Paolo) * tag 'for-linus-20190726' of git://git.kernel.dk/linux-block: (24 commits) io_uring: ensure ->list is initialized for poll commands Revert "nvme-pci: don't create a read hctx mapping without read queues" nvme: fix multipath crash when ANA is deactivated nvme: fix memory leak caused by incorrect subsystem free nvme: ignore subnqn for ADATA SX6000LNP drbd: dynamically allocate shash descriptor block: blk-mq: Remove blk_mq_sched_started_request and started_request bcache: fix possible memory leak in bch_cached_dev_run() io_uring: track io length in async_list based on bytes io_uring: don't use iov_iter_advance() for fixed buffers block: properly handle IOCB_NOWAIT for async O_DIRECT IO blk-mq: allow REQ_NOWAIT to return an error inline io_uring: add a memory barrier before atomic_read rq-qos: use a mb for got_token rq-qos: set ourself TASK_UNINTERRUPTIBLE after we schedule rq-qos: don't reset has_sleepers on spurious wakeups rq-qos: fix missed wake-ups in rq_qos_throttle wait: add wq_has_single_sleeper helper block, bfq: check also in-flight I/O in dispatch plugging block: fix sysfs module parameters directory path in comment ...
2019-07-23drbd: dynamically allocate shash descriptorArnd Bergmann1-2/+12
Building with clang and KASAN, we get a warning about an overly large stack frame on 32-bit architectures: drivers/block/drbd/drbd_receiver.c:921:31: error: stack frame size of 1280 bytes in function 'conn_connect' [-Werror,-Wframe-larger-than=] We already allocate other data dynamically in this function, so just do the same for the shash descriptor, which makes up most of this memory. Link: https://lore.kernel.org/lkml/20190617132440.2721536-1-arnd@arndb.de/ Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Roland Kammerer <roland.kammerer@linbit.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-18Merge tag 'ceph-for-5.3-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds2-588/+1610
Pull ceph updates from Ilya Dryomov: "Lots of exciting things this time! - support for rbd object-map and fast-diff features (myself). This will speed up reads, discards and things like snap diffs on sparse images. - ceph.snap.btime vxattr to expose snapshot creation time (David Disseldorp). This will be used to integrate with "Restore Previous Versions" feature added in Windows 7 for folks who reexport ceph through SMB. - security xattrs for ceph (Zheng Yan). Only selinux is supported for now due to the limitations of ->dentry_init_security(). - support for MSG_ADDR2, FS_BTIME and FS_CHANGE_ATTR features (Jeff Layton). This is actually a single feature bit which was missing because of the filesystem pieces. With this in, the kernel client will finally be reported as "luminous" by "ceph features" -- it is still being reported as "jewel" even though all required Luminous features were implemented in 4.13. - stop NULL-terminating ceph vxattrs (Jeff Layton). The convention with xattrs is to not terminate and this was causing inconsistencies with ceph-fuse. - change filesystem time granularity from 1 us to 1 ns, again fixing an inconsistency with ceph-fuse (Luis Henriques). On top of this there are some additional dentry name handling and cap flushing fixes from Zheng. Finally, Jeff is formally taking over for Zheng as the filesystem maintainer" * tag 'ceph-for-5.3-rc1' of git://github.com/ceph/ceph-client: (71 commits) ceph: fix end offset in truncate_inode_pages_range call ceph: use generic_delete_inode() for ->drop_inode ceph: use ceph_evict_inode to cleanup inode's resource ceph: initialize superblock s_time_gran to 1 MAINTAINERS: take over for Zheng as CephFS kernel client maintainer rbd: setallochint only if object doesn't exist rbd: support for object-map and fast-diff rbd: call rbd_dev_mapping_set() from rbd_dev_image_probe() libceph: export osd_req_op_data() macro libceph: change ceph_osdc_call() to take page vector for response libceph: bump CEPH_MSG_MAX_DATA_LEN (again) rbd: new exclusive lock wait/wake code rbd: quiescing lock should wait for image requests rbd: lock should be quiesced on reacquire rbd: introduce copyup state machine rbd: rename rbd_obj_setup_*() to rbd_obj_init_*() rbd: move OSD request allocation into object request state machines rbd: factor out __rbd_osd_setup_discard_ops() rbd: factor out rbd_osd_setup_copyup() rbd: introduce obj_req->osd_reqs list ...
2019-07-18Merge branch 'floppy'Linus Torvalds1-2/+32
Merge floppy ioctl verification fixes from Denis Efremov. This also marks the floppy driver as orphaned - it turns out that Jiri no longer has working hardware. Actual working physical floppy hardware is getting hard to find, and while Willy was able to test this, I think the driver can be considered pretty much dead from an actual hardware standpoint. The hardware that is still sold seems to be mainly USB-based, which doesn't use this legacy driver at all. The old floppy disk controller is still emulated in various VM environments, so the driver isn't going away, but let's see if anybody is interested to step up to maintain it. The lack of hardware also likely means that the ioctl range verification fixes are probably mostly relevant to anybody using floppies in a virtual environment. Which is probably also going away in favor of USB storage emulation, but who knows. Will Decon reviewed the patches but I'm not rebasing them just for that, so I'll add a Reviewed-by: Will Deacon <will@kernel.org> here instead. * floppy: MAINTAINERS: mark floppy.c orphaned floppy: fix out-of-bounds read in copy_buffer floppy: fix invalid pointer dereference in drive_name floppy: fix out-of-bounds read in next_valid_format floppy: fix div-by-zero in setup_format_params
2019-07-18floppy: fix out-of-bounds read in copy_bufferDenis Efremov1-2/+4
This fixes a global out-of-bounds read access in the copy_buffer function of the floppy driver. The FDDEFPRM ioctl allows one to set the geometry of a disk. The sect and head fields (unsigned int) of the floppy_drive structure are used to compute the max_sector (int) in the make_raw_rw_request function. It is possible to overflow the max_sector. Next, max_sector is passed to the copy_buffer function and used in one of the memcpy calls. An unprivileged user could trigger the bug if the device is accessible, but requires a floppy disk to be inserted. The patch adds the check for the .sect * .head multiplication for not overflowing in the set_geometry function. The bug was found by syzkaller. Signed-off-by: Denis Efremov <efremov@ispras.ru> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18floppy: fix invalid pointer dereference in drive_nameDenis Efremov1-3/+8
This fixes the invalid pointer dereference in the drive_name function of the floppy driver. The native_format field of the struct floppy_drive_params is used as floppy_type array index in the drive_name function. Thus, the field should be checked the same way as the autodetect field. To trigger the bug, one could use a value out of range and set the drive parameters with the FDSETDRVPRM ioctl. Next, FDGETDRVTYP ioctl should be used to call the drive_name. A floppy disk is not required to be inserted. CAP_SYS_ADMIN is required to call FDSETDRVPRM. The patch adds the check for a value of the native_format field to be in the '0 <= x < ARRAY_SIZE(floppy_type)' range of the floppy_type array indices. The bug was found by syzkaller. Signed-off-by: Denis Efremov <efremov@ispras.ru> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18floppy: fix out-of-bounds read in next_valid_formatDenis Efremov1-0/+18
This fixes a global out-of-bounds read access in the next_valid_format function of the floppy driver. The values from autodetect field of the struct floppy_drive_params are used as indices for the floppy_type array in the next_valid_format function 'floppy_type[DP->autodetect[probed_format]].sect'. To trigger the bug, one could use a value out of range and set the drive parameters with the FDSETDRVPRM ioctl. A floppy disk is not required to be inserted. CAP_SYS_ADMIN is required to call FDSETDRVPRM. The patch adds the check for values of the autodetect field to be in the '0 <= x < ARRAY_SIZE(floppy_type)' range of the floppy_type array indices. The bug was found by syzkaller. Signed-off-by: Denis Efremov <efremov@ispras.ru> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18floppy: fix div-by-zero in setup_format_paramsDenis Efremov1-0/+5
This fixes a divide by zero error in the setup_format_params function of the floppy driver. Two consecutive ioctls can trigger the bug: The first one should set the drive geometry with such .sect and .rate values for the F_SECT_PER_TRACK to become zero. Next, the floppy format operation should be called. A floppy disk is not required to be inserted. An unprivileged user could trigger the bug if the device is accessible. The patch checks F_SECT_PER_TRACK for a non-zero value in the set_geometry function. The proper check should involve a reasonable upper limit for the .sect and .rate fields, but it could change the UAPI. The patch also checks F_SECT_PER_TRACK in the setup_format_params, and cancels the formatting operation in case of zero. The bug was found by syzkaller. Signed-off-by: Denis Efremov <efremov@ispras.ru> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16Merge tag 'docs/v5.3-1' of ↵Linus Torvalds3-8/+8
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull rst conversion of docs from Mauro Carvalho Chehab: "As agreed with Jon, I'm sending this big series directly to you, c/c him, as this series required a special care, in order to avoid conflicts with other trees" * tag 'docs/v5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (77 commits) docs: kbuild: fix build with pdf and fix some minor issues docs: block: fix pdf output docs: arm: fix a breakage with pdf output docs: don't use nested tables docs: gpio: add sysfs interface to the admin-guide docs: locking: add it to the main index docs: add some directories to the main documentation index docs: add SPDX tags to new index files docs: add a memory-devices subdir to driver-api docs: phy: place documentation under driver-api docs: serial: move it to the driver-api docs: driver-api: add remaining converted dirs to it docs: driver-api: add xilinx driver API documentation docs: driver-api: add a series of orphaned documents docs: admin-guide: add a series of orphaned documents docs: cgroup-v1: add it to the admin-guide book docs: aoe: add it to the driver-api book docs: add some documentation dirs to the driver-api book docs: driver-model: move it to the driver-api book docs: lp855x-driver.rst: add it to the driver-api book ...
2019-07-16Merge tag 'for-linus-20190715' of git://git.kernel.dk/linux-blockLinus Torvalds3-18/+49
Pull more block updates from Jens Axboe: "A later pull request with some followup items. I had some vacation coming up to the merge window, so certain things items were delayed a bit. This pull request also contains fixes that came in within the last few days of the merge window, which I didn't want to push right before sending you a pull request. This contains: - NVMe pull request, mostly fixes, but also a few minor items on the feature side that were timing constrained (Christoph et al) - Report zones fixes (Damien) - Removal of dead code (Damien) - Turn on cgroup psi memstall (Josef) - block cgroup MAINTAINERS entry (Konstantin) - Flush init fix (Josef) - blk-throttle low iops timing fix (Konstantin) - nbd resize fixes (Mike) - nbd 0 blocksize crash fix (Xiubo) - block integrity error leak fix (Wenwen) - blk-cgroup writeback and priority inheritance fixes (Tejun)" * tag 'for-linus-20190715' of git://git.kernel.dk/linux-block: (42 commits) MAINTAINERS: add entry for block io cgroup null_blk: fixup ->report_zones() for !CONFIG_BLK_DEV_ZONED block: Limit zone array allocation size sd_zbc: Fix report zones buffer allocation block: Kill gfp_t argument of blkdev_report_zones() block: Allow mapping of vmalloc-ed buffers block/bio-integrity: fix a memory leak bug nvme: fix NULL deref for fabrics options nbd: add netlink reconfigure resize support nbd: fix crash when the blksize is zero block: Disable write plugging for zoned block devices block: Fix elevator name declaration block: Remove unused definitions nvme: fix regression upon hot device removal and insertion blk-throttle: fix zero wait time for iops throttled group block: Fix potential overflow in blk_report_zones() blkcg: implement REQ_CGROUP_PUNT blkcg, writeback: Implement wbc_blkcg_css() blkcg, writeback: Add wbc->no_cgroup_owner blkcg, writeback: Rename wbc_account_io() to wbc_account_cgroup_owner() ...
2019-07-15docs: blockdev: add it to the admin-guideMauro Carvalho Chehab3-8/+8
The blockdev book basically contains user-faced documentation. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-07-15docs: blockdev: convert to ReSTMauro Carvalho Chehab3-8/+8
Rename the blockdev documentation files to ReST, add an index for them and adjust in order to produce a nice html output via the Sphinx build system. The drbd sub-directory contains some graphs and data flows. Add those too to the documentation. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-07-12null_blk: fixup ->report_zones() for !CONFIG_BLK_DEV_ZONEDJens Axboe1-1/+1
A previous commit changed the prototype, but didn't adjust the function for when zoned device support is disabled. Fix it up. Fixes: bd976e527259 ("block: Kill gfp_t argument of blkdev_report_zones()") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-12block: Kill gfp_t argument of blkdev_report_zones()Damien Le Moal2-4/+2
Only GFP_KERNEL and GFP_NOIO are used with blkdev_report_zones(). In preparation of using vmalloc() for large report buffer and zone array allocations used by this function, remove its "gfp_t gfp_mask" argument and rely on the caller context to use memalloc_noio_save/restore() where necessary (block layer zone revalidation and dm-zoned I/O error path). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-11nbd: add netlink reconfigure resize supportMike Christie1-16/+32
If the device is setup with ioctl we can resize the device after the initial setup, but if the device is setup with netlink we cannot use the resize related ioctls and there is no netlink reconfigure size ATTR handling code. This patch adds netlink reconfigure resize support to match the ioctl interface. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-11nbd: fix crash when the blksize is zeroXiubo Li1-3/+20
This will allow the blksize to be set zero and then use 1024 as default. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Xiubo Li <xiubli@redhat.com> [fix to use goto out instead of return in genl_connect] Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-09Merge tag 'docs-5.3' of git://git.lwn.net/linuxLinus Torvalds1-1/+1
Pull Documentation updates from Jonathan Corbet: "It's been a relatively busy cycle for docs: - A fair pile of RST conversions, many from Mauro. These create more than the usual number of simple but annoying merge conflicts with other trees, unfortunately. He has a lot more of these waiting on the wings that, I think, will go to you directly later on. - A new document on how to use merges and rebases in kernel repos, and one on Spectre vulnerabilities. - Various improvements to the build system, including automatic markup of function() references because some people, for reasons I will never understand, were of the opinion that :c:func:``function()`` is unattractive and not fun to type. - We now recommend using sphinx 1.7, but still support back to 1.4. - Lots of smaller improvements, warning fixes, typo fixes, etc" * tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits) docs: automarkup.py: ignore exceptions when seeking for xrefs docs: Move binderfs to admin-guide Disable Sphinx SmartyPants in HTML output doc: RCU callback locks need only _bh, not necessarily _irq docs: format kernel-parameters -- as code Doc : doc-guide : Fix a typo platform: x86: get rid of a non-existent document Add the RCU docs to the core-api manual Documentation: RCU: Add TOC tree hooks Documentation: RCU: Rename txt files to rst Documentation: RCU: Convert RCU UP systems to reST Documentation: RCU: Convert RCU linked list to reST Documentation: RCU: Convert RCU basic concepts to reST docs: filesystems: Remove uneeded .rst extension on toctables scripts/sphinx-pre-install: fix out-of-tree build docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/ Documentation: PGP: update for newer HW devices Documentation: Add section about CPU vulnerabilities for Spectre Documentation: platform: Delete x86-laptop-drivers.txt docs: Note that :c:func: should no longer be used ...
2019-07-09Merge tag 'for-5.3/block-20190708' of git://git.kernel.dk/linux-blockLinus Torvalds8-92/+17
Pull block updates from Jens Axboe: "This is the main block updates for 5.3. Nothing earth shattering or major in here, just fixes, additions, and improvements all over the map. This contains: - Series of documentation fixes (Bart) - Optimization of the blk-mq ctx get/put (Bart) - null_blk removal race condition fix (Bob) - req/bio_op() cleanups (Chaitanya) - Series cleaning up the segment accounting, and request/bio mapping (Christoph) - Series cleaning up the page getting/putting for bios (Christoph) - block cgroup cleanups and moving it to where it is used (Christoph) - block cgroup fixes (Tejun) - Series of fixes and improvements to bcache, most notably a write deadlock fix (Coly) - blk-iolatency STS_AGAIN and accounting fixes (Dennis) - Series of improvements and fixes to BFQ (Douglas, Paolo) - debugfs_create() return value check removal for drbd (Greg) - Use struct_size(), where appropriate (Gustavo) - Two lighnvm fixes (Heiner, Geert) - MD fixes, including a read balance and corruption fix (Guoqing, Marcos, Xiao, Yufen) - block opal shadow mbr additions (Jonas, Revanth) - sbitmap compare-and-exhange improvemnts (Pavel) - Fix for potential bio->bi_size overflow (Ming) - NVMe pull requests: - improved PCIe suspent support (Keith Busch) - error injection support for the admin queue (Akinobu Mita) - Fibre Channel discovery improvements (James Smart) - tracing improvements including nvmetc tracing support (Minwoo Im) - misc fixes and cleanups (Anton Eidelman, Minwoo Im, Chaitanya Kulkarni)" - Various little fixes and improvements to drivers and core" * tag 'for-5.3/block-20190708' of git://git.kernel.dk/linux-block: (153 commits) blk-iolatency: fix STS_AGAIN handling block: nr_phys_segments needs to be zero for REQ_OP_WRITE_ZEROES blk-mq: simplify blk_mq_make_request() blk-mq: remove blk_mq_put_ctx() sbitmap: Replace cmpxchg with xchg block: fix .bi_size overflow block: sed-opal: check size of shadow mbr block: sed-opal: ioctl for writing to shadow mbr block: sed-opal: add ioctl for done-mark of shadow mbr block: never take page references for ITER_BVEC direct-io: use bio_release_pages in dio_bio_complete block_dev: use bio_release_pages in bio_unmap_user block_dev: use bio_release_pages in blkdev_bio_end_io iomap: use bio_release_pages in iomap_dio_bio_end_io block: use bio_release_pages in bio_map_user_iov block: use bio_release_pages in bio_unmap_user block: optionally mark pages dirty in bio_release_pages block: move the BIO_NO_PAGE_REF check into bio_release_pages block: skd_main.c: Remove call to memset after dma_alloc_coherent block: mtip32xx: Remove call to memset after dma_alloc_coherent ...
2019-07-09Merge branch 'siginfo-linus' of ↵Linus Torvalds3-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull force_sig() argument change from Eric Biederman: "A source of error over the years has been that force_sig has taken a task parameter when it is only safe to use force_sig with the current task. The force_sig function is built for delivering synchronous signals such as SIGSEGV where the userspace application caused a synchronous fault (such as a page fault) and the kernel responded with a signal. Because the name force_sig does not make this clear, and because the force_sig takes a task parameter the function force_sig has been abused for sending other kinds of signals over the years. Slowly those have been fixed when the oopses have been tracked down. This set of changes fixes the remaining abusers of force_sig and carefully rips out the task parameter from force_sig and friends making this kind of error almost impossible in the future" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits) signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus signal: Remove the signal number and task parameters from force_sig_info signal: Factor force_sig_info_to_task out of force_sig_info signal: Generate the siginfo in force_sig signal: Move the computation of force into send_signal and correct it. signal: Properly set TRACE_SIGNAL_LOSE_INFO in __send_signal signal: Remove the task parameter from force_sig_fault signal: Use force_sig_fault_to_task for the two calls that don't deliver to current signal: Explicitly call force_sig_fault on current signal/unicore32: Remove tsk parameter from __do_user_fault signal/arm: Remove tsk parameter from __do_user_fault signal/arm: Remove tsk parameter from ptrace_break signal/nds32: Remove tsk parameter from send_sigtrap signal/riscv: Remove tsk parameter from do_trap signal/sh: Remove tsk parameter from force_sig_info_fault signal/um: Remove task parameter from send_sigtrap signal/x86: Remove task parameter from send_sigtrap signal: Remove task parameter from force_sig_mceerr signal: Remove task parameter from force_sig signal: Remove task parameter from force_sigsegv ...
2019-07-08rbd: setallochint only if object doesn't existIlya Dryomov1-5/+14
setallochint is really only useful on object creation. Continue hinting unconditionally if object map cannot be used. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
2019-07-08rbd: support for object-map and fast-diffIlya Dryomov2-3/+727
Speed up reads, discards and zeroouts through RBD_OBJ_FLAG_MAY_EXIST and RBD_OBJ_FLAG_NOOP_FOR_NONEXISTENT based on object map. Invalid object maps are not trusted, but still updated. Note that we never iterate, resize or invalidate object maps. If object-map feature is enabled but object map fails to load, we just fail the requester (either "rbd map" or I/O, by way of post-acquire action). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-07-08rbd: call rbd_dev_mapping_set() from rbd_dev_image_probe()Ilya Dryomov1-8/+6
Snapshot object map will be loaded in rbd_dev_image_probe(), so we need to know snapshot's size (as opposed to HEAD's size) sooner. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
2019-07-08libceph: change ceph_osdc_call() to take page vector for responseIlya Dryomov1-4/+4
This will be used for loading object map. rbd_obj_read_sync() isn't suitable because object map must be accessed through class methods. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2019-07-08rbd: new exclusive lock wait/wake codeIlya Dryomov1-143/+186
rbd_wait_state_locked() is built around rbd_dev->lock_waitq and blocks rbd worker threads while waiting for the lock, potentially impacting other rbd devices. There is no good way to pass an error code into image request state machines when acquisition fails, hence the use of RBD_DEV_FLAG_BLACKLISTED for everything and various other issues. Introduce rbd_dev->acquiring_list and move acquisition into image request state machine. Use rbd_img_schedule() for kicking and passing error codes. No blocking occurs while waiting for the lock, but rbd_dev->lock_rwsem is still held across lock, unlock and set_cookie calls. Always acquire the lock on "rbd map" to avoid associating the latency of acquiring the lock with the first I/O request. A slight regression is that lock_timeout is now respected only if lock acquisition is triggered by "rbd map" and not by I/O. This is somewhat compensated by the fact that we no longer block if the peer refuses to release lock -- I/O is failed with EROFS right away. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
2019-07-08rbd: quiescing lock should wait for image requestsIlya Dryomov1-14/+90
Syncing OSD requests doesn't really work. A single image request may be comprised of multiple object requests, each of which can go through a series of OSD requests (original, copyups, etc). On top of that, the OSD cliest may be shared with other rbd devices. What we want is to ensure that all in-flight image requests complete. Introduce rbd_dev->running_list and block in RBD_LOCK_STATE_RELEASING until that happens. New OSD requests may be started during this time. Note that __rbd_img_handle_request() acquires rbd_dev->lock_rwsem only if need_exclusive_lock() returns true. This avoids a deadlock similar to the one outlined in the previous commit between unlock and I/O that doesn't require lock, such as a read with object-map feature disabled. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
2019-07-08rbd: lock should be quiesced on reacquireIlya Dryomov1-14/+21
Quiesce exclusive lock at the top of rbd_reacquire_lock() instead of only when ceph_cls_set_cookie() fails. This avoids a deadlock on rbd_dev->lock_rwsem. If rbd_dev->lock_rwsem is needed for I/O completion, set_cookie can hang ceph-msgr worker thread if set_cookie reply ends up behind an I/O reply, because, like lock and unlock requests, set_cookie is sent and waited upon with rbd_dev->lock_rwsem held for write. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
2019-07-08rbd: introduce copyup state machineIlya Dryomov1-64/+123
Both write and copyup paths will get more complex with object map. Factor copyup code out into a separate state machine. While at it, take advantage of obj_req->osd_reqs list and issue empty and current snapc OSD requests together, one after another. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>