summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2020-10-01io_uring: provide IORING_ENTER_SQ_WAIT for SQPOLL SQ ring waitsJens Axboe1-3/+37
When using SQPOLL, applications can run into the issue of running out of SQ ring entries because the thread hasn't consumed them yet. The only option for dealing with that is checking later, or busy checking for the condition. Provide IORING_ENTER_SQ_WAIT if applications want to wait on this condition. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-01io_uring: mark io_uring_fops/io_op_defs as __read_mostlyJens Axboe1-2/+2
These structures are never written, move them appropriately. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-01io_uring: enable IORING_SETUP_ATTACH_WQ to attach to SQPOLL thread tooJens Axboe1-0/+33
We support using IORING_SETUP_ATTACH_WQ to share async backends between rings created by the same process, this now also allows the same to happen with SQPOLL. The setup procedure remains the same, the caller sets io_uring_params->wq_fd to the 'parent' context, and then the newly created ring will attach to that async backend. This means that multiple rings can share the same SQPOLL thread, saving resources. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-01io_uring: base SQPOLL handling off io_sq_dataJens Axboe1-25/+101
Remove the SQPOLL thread from the ctx, and use the io_sq_data as the data structure we pass in. io_sq_data has a list of ctx's that we can then iterate over and handle. As of now we're ready to handle multiple ctx's, though we're still just handling a single one after this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-01io_uring: split SQPOLL data into separate structureJens Axboe1-41/+86
Move all the necessary state out of io_ring_ctx, and into a new structure, io_sq_data. The latter now deals with any state or variables associated with the SQPOLL thread itself. In preparation for supporting more than one io_ring_ctx per SQPOLL thread. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-01io_uring: split work handling part of SQPOLL into helperJens Axboe1-84/+93
This is done in preparation for handling more than one ctx, but it also cleans up the code a bit since io_sq_thread() was a bit too unwieldy to get a get overview on. __io_sq_thread() is now the main handler, and it returns an enum sq_ret that tells io_sq_thread() what it ended up doing. The parent then makes a decision on idle, spinning, or work handling based on that. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-01io_uring: move SQPOLL post-wakeup ring need wakeup flag into wake handlerJens Axboe1-1/+18
We need to decouple the clearing on wakeup from the the inline schedule, as that is going to be required for handling multiple rings in one thread. Wrap our wakeup handler so we can clear it when we get the wakeup, by definition that is when we no longer need the flag set. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-01io_uring: use private ctx wait queue entries for SQPOLLJens Axboe1-15/+18
This is in preparation to sharing the poller thread between rings. For that we need per-ring wait_queue_entry storage, and we can't easily put that on the stack if one thread is managing multiple rings. We'll also be sharing the wait_queue_head across rings for the purposes of wakeups, provide the usual private ring wait_queue_head for now but make it a pointer so we can easily override it when sharing. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-01io_uring: io_sq_thread() doesn't need to flush signalsJens Axboe1-2/+0
We're not handling signals by default in kernel threads, and we never use TWA_SIGNAL for the SQPOLL thread internally. Hence we can never have a signal pending, and we don't need to check for it (nor flush it). Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-01io_wq: Make io_wqe::lock a raw_spinlock_tSebastian Andrzej Siewior1-26/+26
During a context switch the scheduler invokes wq_worker_sleeping() with disabled preemption. Disabling preemption is needed because it protects access to `worker->sleeping'. As an optimisation it avoids invoking schedule() within the schedule path as part of possible wake up (thus preempt_enable_no_resched() afterwards). The io-wq has been added to the mix in the same section with disabled preemption. This breaks on PREEMPT_RT because io_wq_worker_sleeping() acquires a spinlock_t. Also within the schedule() the spinlock_t must be acquired after tsk_is_pi_blocked() otherwise it will block on the sleeping lock again while scheduling out. While playing with `io_uring-bench' I didn't notice a significant latency spike after converting io_wqe::lock to a raw_spinlock_t. The latency was more or less the same. In order to keep the spinlock_t it would have to be moved after the tsk_is_pi_blocked() check which would introduce a branch instruction into the hot path. The lock is used to maintain the `work_list' and wakes one task up at most. Should io_wqe_cancel_pending_work() cause latency spikes, while searching for a specific item, then it would need to drop the lock during iterations. revert_creds() is also invoked under the lock. According to debug cred::non_rcu is 0. Otherwise it should be moved outside of the locked section because put_cred_rcu()->free_uid() acquires a sleeping lock. Convert io_wqe::lock to a raw_spinlock_t.c Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-01io_uring: allow disabling rings during the creationStefano Garzarella1-7/+54
This patch adds a new IORING_SETUP_R_DISABLED flag to start the rings disabled, allowing the user to register restrictions, buffers, files, before to start processing SQEs. When IORING_SETUP_R_DISABLED is set, SQE are not processed and SQPOLL kthread is not started. The restrictions registration are allowed only when the rings are disable to prevent concurrency issue while processing SQEs. The rings can be enabled using IORING_REGISTER_ENABLE_RINGS opcode with io_uring_register(2). Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-01io_uring: add IOURING_REGISTER_RESTRICTIONS opcodeStefano Garzarella1-1/+123
The new io_uring_register(2) IOURING_REGISTER_RESTRICTIONS opcode permanently installs a feature allowlist on an io_ring_ctx. The io_ring_ctx can then be passed to untrusted code with the knowledge that only operations present in the allowlist can be executed. The allowlist approach ensures that new features added to io_uring do not accidentally become available when an existing application is launched on a newer kernel version. Currently is it possible to restrict sqe opcodes, sqe flags, and register opcodes. IOURING_REGISTER_RESTRICTIONS can only be made once. Afterwards it is not possible to change restrictions anymore. This prevents untrusted code from removing restrictions. Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-01io_uring: reference ->nsproxy for file table commandsJens Axboe3-0/+8
If we don't get and assign the namespace for the async work, then certain paths just don't work properly (like /dev/stdin, /proc/mounts, etc). Anything that references the current namespace of the given task should be assigned for async work on behalf of that task. Cc: stable@vger.kernel.org # v5.5+ Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-01io_uring: don't rely on weak ->files referencesJens Axboe3-37/+277
Grab actual references to the files_struct. To avoid circular references issues due to this, we add a per-task note that keeps track of what io_uring contexts a task has used. When the tasks execs or exits its assigned files, we cancel requests based on this tracking. With that, we can grab proper references to the files table, and no longer need to rely on stashing away ring_fd and ring_file to check if the ring_fd may have been closed. Cc: stable@vger.kernel.org # v5.5+ Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-01io_uring: enable task/files specific overflow flushingJens Axboe1-16/+25
This allows us to selectively flush out pending overflows, depending on the task and/or files_struct being passed in. No intended functional changes in this patch. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-01io_uring: return cancelation status from poll/timeout/files handlersJens Axboe1-5/+22
Return whether we found and canceled requests or not. This is in preparation for using this information, no functional changes in this patch. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-01io_uring: unconditionally grab req->taskJens Axboe1-38/+9
Sometimes we assign a weak reference to it, sometimes we grab a reference to it. Clean this up and make it unconditional, and drop the flag related to tracking this state. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-01io_uring: stash ctx task reference for SQPOLLJens Axboe1-13/+34
We can grab a reference to the task instead of stashing away the task files_struct. This is doable without creating a circular reference between the ring fd and the task itself. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-01io_uring: move dropping of files into separate helperJens Axboe1-11/+16
No functional changes in this patch, prep patch for grabbing references to the files_struct. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-01io_uring: allow timeout/poll/files killing to take task into accountJens Axboe1-9/+24
We currently cancel these when the ring exits, and we cancel all of them. This is in preparation for killing only the ones associated with a given task. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-01Merge branch 'io_uring-5.9' into for-5.10/io_uringJens Axboe1-5/+18
* io_uring-5.9: io_uring: fix async buffered reads when readahead is disabled io_uring: fix potential ABBA deadlock in ->show_fdinfo() io_uring: always delete double poll wait entry on match
2020-09-30btrfs: fix filesystem corruption after a device replaceFilipe Manana1-1/+39
We use a device's allocation state tree to track ranges in a device used for allocated chunks, and we set ranges in this tree when allocating a new chunk. However after a device replace operation, we were not setting the allocated ranges in the new device's allocation state tree, so that tree is empty after a device replace. This means that a fitrim operation after a device replace will trim the device ranges that have allocated chunks and extents, as we trim every range for which there is not a range marked in the device's allocation state tree. It is also important during chunk allocation, since the device's allocation state is used to determine if a range is already allocated when allocating a new chunk. This is trivial to reproduce and the following script triggers the bug: $ cat reproducer.sh #!/bin/bash DEV1="/dev/sdg" DEV2="/dev/sdh" DEV3="/dev/sdi" wipefs -a $DEV1 $DEV2 $DEV3 &> /dev/null # Create a raid1 test fs on 2 devices. mkfs.btrfs -f -m raid1 -d raid1 $DEV1 $DEV2 > /dev/null mount $DEV1 /mnt/btrfs xfs_io -f -c "pwrite -S 0xab 0 10M" /mnt/btrfs/foo echo "Starting to replace $DEV1 with $DEV3" btrfs replace start -B $DEV1 $DEV3 /mnt/btrfs echo echo "Running fstrim" fstrim /mnt/btrfs echo echo "Unmounting filesystem" umount /mnt/btrfs echo "Mounting filesystem in degraded mode using $DEV3 only" wipefs -a $DEV1 $DEV2 &> /dev/null mount -o degraded $DEV3 /mnt/btrfs if [ $? -ne 0 ]; then dmesg | tail echo echo "Failed to mount in degraded mode" exit 1 fi echo echo "File foo data (expected all bytes = 0xab):" od -A d -t x1 /mnt/btrfs/foo umount /mnt/btrfs When running the reproducer: $ ./replace-test.sh wrote 10485760/10485760 bytes at offset 0 10 MiB, 2560 ops; 0.0901 sec (110.877 MiB/sec and 28384.5216 ops/sec) Starting to replace /dev/sdg with /dev/sdi Running fstrim Unmounting filesystem Mounting filesystem in degraded mode using /dev/sdi only mount: /mnt/btrfs: wrong fs type, bad option, bad superblock on /dev/sdi, missing codepage or helper program, or other error. [19581.748641] BTRFS info (device sdg): dev_replace from /dev/sdg (devid 1) to /dev/sdi started [19581.803842] BTRFS info (device sdg): dev_replace from /dev/sdg (devid 1) to /dev/sdi finished [19582.208293] BTRFS info (device sdi): allowing degraded mounts [19582.208298] BTRFS info (device sdi): disk space caching is enabled [19582.208301] BTRFS info (device sdi): has skinny extents [19582.212853] BTRFS warning (device sdi): devid 2 uuid 1f731f47-e1bb-4f00-bfbb-9e5a0cb4ba9f is missing [19582.213904] btree_readpage_end_io_hook: 25839 callbacks suppressed [19582.213907] BTRFS error (device sdi): bad tree block start, want 30490624 have 0 [19582.214780] BTRFS warning (device sdi): failed to read root (objectid=7): -5 [19582.231576] BTRFS error (device sdi): open_ctree failed Failed to mount in degraded mode So fix by setting all allocated ranges in the replace target device when the replace operation is finishing, when we are holding the chunk mutex and we can not race with new chunk allocations. A test case for fstests follows soon. Fixes: 1c11b63eff2a67 ("btrfs: replace pending/pinned chunks lists with io tree") CC: stable@vger.kernel.org # 5.2+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-09-30btrfs: move btrfs_rm_dev_replace_free_srcdev outside of all locksJosef Bacik2-1/+5
When closing and freeing the source device we could end up doing our final blkdev_put() on the bdev, which will grab the bd_mutex. As such we want to be holding as few locks as possible, so move this call outside of the dev_replace->lock_finishing_cancel_unmount lock. Since we're modifying the fs_devices we need to make sure we're holding the uuid_mutex here, so take that as well. There's a report from syzbot probably hitting one of the cases where the bd_mutex and device_list_mutex are taken in the wrong order, however it's not with device replace, like this patch fixes. As there's no reproducer available so far, we can't verify the fix. https://lore.kernel.org/lkml/000000000000fc04d105afcf86d7@google.com/ dashboard link: https://syzkaller.appspot.com/bug?extid=84a0634dc5d21d488419 WARNING: possible circular locking dependency detected 5.9.0-rc5-syzkaller #0 Not tainted ------------------------------------------------------ syz-executor.0/6878 is trying to acquire lock: ffff88804c17d780 (&bdev->bd_mutex){+.+.}-{3:3}, at: blkdev_put+0x30/0x520 fs/block_dev.c:1804 but task is already holding lock: ffff8880908cfce0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: close_fs_devices.part.0+0x2e/0x800 fs/btrfs/volumes.c:1159 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (&fs_devs->device_list_mutex){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 btrfs_finish_chunk_alloc+0x281/0xf90 fs/btrfs/volumes.c:5255 btrfs_create_pending_block_groups+0x2f3/0x700 fs/btrfs/block-group.c:2109 __btrfs_end_transaction+0xf5/0x690 fs/btrfs/transaction.c:916 find_free_extent_update_loop fs/btrfs/extent-tree.c:3807 [inline] find_free_extent+0x23b7/0x2e60 fs/btrfs/extent-tree.c:4127 btrfs_reserve_extent+0x166/0x460 fs/btrfs/extent-tree.c:4206 cow_file_range+0x3de/0x9b0 fs/btrfs/inode.c:1063 btrfs_run_delalloc_range+0x2cf/0x1410 fs/btrfs/inode.c:1838 writepage_delalloc+0x150/0x460 fs/btrfs/extent_io.c:3439 __extent_writepage+0x441/0xd00 fs/btrfs/extent_io.c:3653 extent_write_cache_pages.constprop.0+0x69d/0x1040 fs/btrfs/extent_io.c:4249 extent_writepages+0xcd/0x2b0 fs/btrfs/extent_io.c:4370 do_writepages+0xec/0x290 mm/page-writeback.c:2352 __writeback_single_inode+0x125/0x1400 fs/fs-writeback.c:1461 writeback_sb_inodes+0x53d/0xf40 fs/fs-writeback.c:1721 wb_writeback+0x2ad/0xd40 fs/fs-writeback.c:1894 wb_do_writeback fs/fs-writeback.c:2039 [inline] wb_workfn+0x2dc/0x13e0 fs/fs-writeback.c:2080 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 -> #3 (sb_internal#2){.+.+}-{0:0}: percpu_down_read include/linux/percpu-rwsem.h:51 [inline] __sb_start_write+0x234/0x470 fs/super.c:1672 sb_start_intwrite include/linux/fs.h:1690 [inline] start_transaction+0xbe7/0x1170 fs/btrfs/transaction.c:624 find_free_extent_update_loop fs/btrfs/extent-tree.c:3789 [inline] find_free_extent+0x25e1/0x2e60 fs/btrfs/extent-tree.c:4127 btrfs_reserve_extent+0x166/0x460 fs/btrfs/extent-tree.c:4206 cow_file_range+0x3de/0x9b0 fs/btrfs/inode.c:1063 btrfs_run_delalloc_range+0x2cf/0x1410 fs/btrfs/inode.c:1838 writepage_delalloc+0x150/0x460 fs/btrfs/extent_io.c:3439 __extent_writepage+0x441/0xd00 fs/btrfs/extent_io.c:3653 extent_write_cache_pages.constprop.0+0x69d/0x1040 fs/btrfs/extent_io.c:4249 extent_writepages+0xcd/0x2b0 fs/btrfs/extent_io.c:4370 do_writepages+0xec/0x290 mm/page-writeback.c:2352 __writeback_single_inode+0x125/0x1400 fs/fs-writeback.c:1461 writeback_sb_inodes+0x53d/0xf40 fs/fs-writeback.c:1721 wb_writeback+0x2ad/0xd40 fs/fs-writeback.c:1894 wb_do_writeback fs/fs-writeback.c:2039 [inline] wb_workfn+0x2dc/0x13e0 fs/fs-writeback.c:2080 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 -> #2 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}: __flush_work+0x60e/0xac0 kernel/workqueue.c:3041 wb_shutdown+0x180/0x220 mm/backing-dev.c:355 bdi_unregister+0x174/0x590 mm/backing-dev.c:872 del_gendisk+0x820/0xa10 block/genhd.c:933 loop_remove drivers/block/loop.c:2192 [inline] loop_control_ioctl drivers/block/loop.c:2291 [inline] loop_control_ioctl+0x3b1/0x480 drivers/block/loop.c:2257 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (loop_ctl_mutex){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 lo_open+0x19/0xd0 drivers/block/loop.c:1893 __blkdev_get+0x759/0x1aa0 fs/block_dev.c:1507 blkdev_get fs/block_dev.c:1639 [inline] blkdev_open+0x227/0x300 fs/block_dev.c:1753 do_dentry_open+0x4b9/0x11b0 fs/open.c:817 do_open fs/namei.c:3251 [inline] path_openat+0x1b9a/0x2730 fs/namei.c:3368 do_filp_open+0x17e/0x3c0 fs/namei.c:3395 do_sys_openat2+0x16d/0x420 fs/open.c:1168 do_sys_open fs/open.c:1184 [inline] __do_sys_open fs/open.c:1192 [inline] __se_sys_open fs/open.c:1188 [inline] __x64_sys_open+0x119/0x1c0 fs/open.c:1188 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&bdev->bd_mutex){+.+.}-{3:3}: check_prev_add kernel/locking/lockdep.c:2496 [inline] check_prevs_add kernel/locking/lockdep.c:2601 [inline] validate_chain kernel/locking/lockdep.c:3218 [inline] __lock_acquire+0x2a96/0x5780 kernel/locking/lockdep.c:4426 lock_acquire+0x1f3/0xae0 kernel/locking/lockdep.c:5006 __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 blkdev_put+0x30/0x520 fs/block_dev.c:1804 btrfs_close_bdev fs/btrfs/volumes.c:1117 [inline] btrfs_close_bdev fs/btrfs/volumes.c:1107 [inline] btrfs_close_one_device fs/btrfs/volumes.c:1133 [inline] close_fs_devices.part.0+0x1a4/0x800 fs/btrfs/volumes.c:1161 close_fs_devices fs/btrfs/volumes.c:1193 [inline] btrfs_close_devices+0x95/0x1f0 fs/btrfs/volumes.c:1179 close_ctree+0x688/0x6cb fs/btrfs/disk-io.c:4149 generic_shutdown_super+0x144/0x370 fs/super.c:464 kill_anon_super+0x36/0x60 fs/super.c:1108 btrfs_kill_super+0x38/0x50 fs/btrfs/super.c:2265 deactivate_locked_super+0x94/0x160 fs/super.c:335 deactivate_super+0xad/0xd0 fs/super.c:366 cleanup_mnt+0x3a3/0x530 fs/namespace.c:1118 task_work_run+0xdd/0x190 kernel/task_work.c:141 tracehook_notify_resume include/linux/tracehook.h:188 [inline] exit_to_user_mode_loop kernel/entry/common.c:163 [inline] exit_to_user_mode_prepare+0x1e1/0x200 kernel/entry/common.c:190 syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:265 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Chain exists of: &bdev->bd_mutex --> sb_internal#2 --> &fs_devs->device_list_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&fs_devs->device_list_mutex); lock(sb_internal#2); lock(&fs_devs->device_list_mutex); lock(&bdev->bd_mutex); *** DEADLOCK *** 3 locks held by syz-executor.0/6878: #0: ffff88809070c0e0 (&type->s_umount_key#70){++++}-{3:3}, at: deactivate_super+0xa5/0xd0 fs/super.c:365 #1: ffffffff8a5b37a8 (uuid_mutex){+.+.}-{3:3}, at: btrfs_close_devices+0x23/0x1f0 fs/btrfs/volumes.c:1178 #2: ffff8880908cfce0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: close_fs_devices.part.0+0x2e/0x800 fs/btrfs/volumes.c:1159 stack backtrace: CPU: 0 PID: 6878 Comm: syz-executor.0 Not tainted 5.9.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x198/0x1fd lib/dump_stack.c:118 check_noncircular+0x324/0x3e0 kernel/locking/lockdep.c:1827 check_prev_add kernel/locking/lockdep.c:2496 [inline] check_prevs_add kernel/locking/lockdep.c:2601 [inline] validate_chain kernel/locking/lockdep.c:3218 [inline] __lock_acquire+0x2a96/0x5780 kernel/locking/lockdep.c:4426 lock_acquire+0x1f3/0xae0 kernel/locking/lockdep.c:5006 __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x134/0x10e0 kernel/locking/mutex.c:1103 blkdev_put+0x30/0x520 fs/block_dev.c:1804 btrfs_close_bdev fs/btrfs/volumes.c:1117 [inline] btrfs_close_bdev fs/btrfs/volumes.c:1107 [inline] btrfs_close_one_device fs/btrfs/volumes.c:1133 [inline] close_fs_devices.part.0+0x1a4/0x800 fs/btrfs/volumes.c:1161 close_fs_devices fs/btrfs/volumes.c:1193 [inline] btrfs_close_devices+0x95/0x1f0 fs/btrfs/volumes.c:1179 close_ctree+0x688/0x6cb fs/btrfs/disk-io.c:4149 generic_shutdown_super+0x144/0x370 fs/super.c:464 kill_anon_super+0x36/0x60 fs/super.c:1108 btrfs_kill_super+0x38/0x50 fs/btrfs/super.c:2265 deactivate_locked_super+0x94/0x160 fs/super.c:335 deactivate_super+0xad/0xd0 fs/super.c:366 cleanup_mnt+0x3a3/0x530 fs/namespace.c:1118 task_work_run+0xdd/0x190 kernel/task_work.c:141 tracehook_notify_resume include/linux/tracehook.h:188 [inline] exit_to_user_mode_loop kernel/entry/common.c:163 [inline] exit_to_user_mode_prepare+0x1e1/0x200 kernel/entry/common.c:190 syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:265 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x460027 RSP: 002b:00007fff59216328 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 0000000000076035 RCX: 0000000000460027 RDX: 0000000000403188 RSI: 0000000000000002 RDI: 00007fff592163d0 RBP: 0000000000000333 R08: 0000000000000000 R09: 000000000000000b R10: 0000000000000005 R11: 0000000000000246 R12: 00007fff59217460 R13: 0000000002df2a60 R14: 0000000000000000 R15: 00007fff59217460 Signed-off-by: Josef Bacik <josef@toxicpanda.com> [ add syzbot reference ] Signed-off-by: David Sterba <dsterba@suse.com>
2020-09-30xfs: fix finobt btree block recovery orderingDave Chinner1-0/+2
Nathan popped up on #xfs and pointed out that we fail to handle finobt btree blocks in xlog_recover_get_buf_lsn(). This means they always fall through the entire magic number matching code to "recover immediately". Whilst most of the time this is the correct behaviour, occasionally it will be incorrect and could potentially overwrite more recent metadata because we don't check the LSN in the on disk metadata at all. This bug has been present since the finobt was first introduced, and is a potential cause of the occasional xfs_iget_check_free_state() failures we see that indicate that the inode btree state does not match the on disk inode state. Fixes: aafc3c246529 ("xfs: support the XFS_BTNUM_FINOBT free inode btree type") Reported-by: Nathan Scott <nathans@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-09-30autofs: use __kernel_write() for the autofs pipe writingLinus Torvalds2-1/+9
autofs got broken in some configurations by commit 13c164b1a186 ("autofs: switch to kernel_write") because there is now an extra LSM permission check done by security_file_permission() in rw_verify_area(). autofs is one if the few places that really does want the much more limited __kernel_write(), because the write is an internal kernel one that shouldn't do any user permission checks (it also doesn't need the file_start_write/file_end_write logic, since it's just a pipe). There are a couple of other cases like that - accounting, core dumping, and splice - but autofs stands out because it can be built as a module. As a result, we need to export this internal __kernel_write() function again. We really don't want any other module to use this, but we don't have a "EXPORT_SYMBOL_FOR_AUTOFS_ONLY()". But we can mark it GPL-only to at least approximate that "internal use only" for licensing. While in this area, make autofs pass in NULL for the file position pointer, since it's always a pipe, and we now use a NULL file pointer for streaming file descriptors (see file_ppos() and commit 438ab720c675: "vfs: pass ppos=NULL to .read()/.write() of FMODE_STREAM files") This effectively reverts commits 9db977522449 ("fs: unexport __kernel_write") and 13c164b1a186 ("autofs: switch to kernel_write"). Fixes: 13c164b1a186 ("autofs: switch to kernel_write") Reported-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-29fs: dlm: rework receive handlingAlexander Aring5-180/+141
This patch reworks the current receive handling of dlm. As I tried to change the send handling to fix reorder issues I took a look into the receive handling and simplified it, it works as the following: Each connection has a preallocated receive buffer with a minimum length of 4096. On receive, the upper layer protocol will process all dlm message until there is not enough data anymore. If there exists "leftover" data at the end of the receive buffer because the dlm message wasn't fully received it will be copied to the begin of the preallocated receive buffer. Next receive more data will be appended to the previous "leftover" data and processing will begin again. This will remove a lot of code of the current mechanism. Inside the processing functionality we will ensure with a memmove() that the dlm message should be memory aligned. To have a dlm message always started at the beginning of the buffer will reduce some amount of memmove() calls because src and dest pointers are the same. The cluster attribute "buffer_size" becomes a new meaning, it's now the size of application layer receive buffer size. If this is changed during runtime the receive buffer will be reallocated. It's important that the receive buffer size has at minimum the size of the maximum possible dlm message size otherwise the received message cannot be placed inside the receive buffer size. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2020-09-29fs: dlm: disallow buffer size below defaultAlexander Aring1-2/+7
I observed that the upper layer will not send messages above this value. As conclusion the application receive buffer should not below that value, otherwise we are not capable to deliver the dlm message to the upper layer. This patch forbids to set the receive buffer below the maximum possible dlm message size. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2020-09-29fs: dlm: handle range check as callbackAlexander Aring1-18/+23
This patch adds a callback to CLUSTER_ATTR macro to allow individual callbacks for attributes which might have a more complex attribute range checking just than non zero. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2020-09-29fs: dlm: fix mark per nodeid settingAlexander Aring3-13/+17
This patch fixes to set per nodeid mark configuration for accepted sockets as well. Before this patch only the listen socket mark value was used for all accepted connections. This patch will ensure that the cluster mark attribute value will be always used for all sockets, if a per nodeid mark value is specified dlm will use this value for the specific node. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2020-09-29fs: dlm: remove lock dependency warningAlexander Aring1-10/+8
During my experiments to make dlm robust against tcpkill application I was able to run sometimes in a circular lock dependency warning between clusters_root.subsys.su_mutex and con->sock_mutex. We don't need to held the sock_mutex when getting the mark value which held the clusters_root.subsys.su_mutex. This patch moves the specific handling just before the sock_mutex will be held. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2020-09-29f2fs: fix slab leak of rpages pointerJaegeuk Kim2-1/+3
This fixes the below mem leak. [ 130.157600] ============================================================================= [ 130.159662] BUG f2fs_page_array_entry-252:16 (Tainted: G W O ): Objects remaining in f2fs_page_array_entry-252:16 on __kmem_cache_shutdown() [ 130.162742] ----------------------------------------------------------------------------- [ 130.162742] [ 130.164979] Disabling lock debugging due to kernel taint [ 130.166188] INFO: Slab 0x000000009f5a52d2 objects=22 used=4 fp=0x00000000ba72c3e9 flags=0xfffffc0010200 [ 130.168269] CPU: 7 PID: 3560 Comm: umount Tainted: G B W O 5.9.0-rc4+ #35 [ 130.170019] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 [ 130.171941] Call Trace: [ 130.172528] dump_stack+0x74/0x9a [ 130.173298] slab_err+0xb7/0xdc [ 130.174044] ? kernel_poison_pages+0xc0/0xc0 [ 130.175065] ? on_each_cpu_cond_mask+0x48/0x90 [ 130.176096] __kmem_cache_shutdown.cold+0x34/0x141 [ 130.177190] kmem_cache_destroy+0x59/0x100 [ 130.178223] f2fs_destroy_page_array_cache+0x15/0x20 [f2fs] [ 130.179527] f2fs_put_super+0x1bc/0x380 [f2fs] [ 130.180538] generic_shutdown_super+0x72/0x110 [ 130.181547] kill_block_super+0x27/0x50 [ 130.182438] kill_f2fs_super+0x76/0xe0 [f2fs] [ 130.183448] deactivate_locked_super+0x3b/0x80 [ 130.184456] deactivate_super+0x3e/0x50 [ 130.185363] cleanup_mnt+0x109/0x160 [ 130.186179] __cleanup_mnt+0x12/0x20 [ 130.187003] task_work_run+0x70/0xb0 [ 130.187841] exit_to_user_mode_prepare+0x18f/0x1b0 [ 130.188917] syscall_exit_to_user_mode+0x31/0x170 [ 130.189989] do_syscall_64+0x45/0x90 [ 130.190828] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 130.191986] RIP: 0033:0x7faf868ea2eb [ 130.192815] Code: 7b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 75 7b 0c 00 f7 d8 64 89 01 [ 130.196872] RSP: 002b:00007fffb7edb478 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 130.198494] RAX: 0000000000000000 RBX: 00007faf86a18204 RCX: 00007faf868ea2eb [ 130.201021] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055971df71c50 [ 130.203415] RBP: 000055971df71a40 R08: 0000000000000000 R09: 00007fffb7eda1f0 [ 130.205772] R10: 00007faf86a04339 R11: 0000000000000246 R12: 000055971df71c50 [ 130.208150] R13: 0000000000000000 R14: 000055971df71b38 R15: 0000000000000000 [ 130.210515] INFO: Object 0x00000000a980843a @offset=744 [ 130.212476] INFO: Allocated in page_array_alloc+0x3d/0xe0 [f2fs] age=1572 cpu=0 pid=3297 [ 130.215030] __slab_alloc+0x20/0x40 [ 130.216566] kmem_cache_alloc+0x2a0/0x2e0 [ 130.218217] page_array_alloc+0x3d/0xe0 [f2fs] [ 130.219940] f2fs_init_compress_ctx+0x1f/0x40 [f2fs] [ 130.221736] f2fs_write_cache_pages+0x3db/0x860 [f2fs] [ 130.223591] f2fs_write_data_pages+0x2c9/0x300 [f2fs] [ 130.225414] do_writepages+0x43/0xd0 [ 130.226907] __filemap_fdatawrite_range+0xd5/0x110 [ 130.228632] filemap_write_and_wait_range+0x48/0xb0 [ 130.230336] __generic_file_write_iter+0x18a/0x1d0 [ 130.232035] f2fs_file_write_iter+0x226/0x550 [f2fs] [ 130.233737] new_sync_write+0x113/0x1a0 [ 130.235204] vfs_write+0x1a6/0x200 [ 130.236579] ksys_write+0x67/0xe0 [ 130.237898] __x64_sys_write+0x1a/0x20 [ 130.239309] do_syscall_64+0x38/0x90 Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-29f2fs: compress: fix to disallow enabling compress on non-empty fileChao Yu1-0/+2
Compressed inode and normal inode has different layout, so we should disallow enabling compress on non-empty file to avoid race condition during inode .i_addr array parsing and updating. Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: Fix missing condition] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-29f2fs: compress: introduce cic/dic slab cacheChao Yu3-7/+70
Add two slab caches: "f2fs_cic_entry" and "f2fs_dic_entry" for memory allocation of compress_io_ctx and decompress_io_ctx structure. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-29f2fs: compress: introduce page array slab cacheChao Yu3-31/+102
Add a per-sbi slab cache "f2fs_page_array_entry-%u:%u" for memory allocation of page pointer array in compress context. Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: Fix wrong memory allocation] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-29udf: Limit sparing table sizeJan Kara1-0/+6
Although UDF standard allows it, we don't support sparing table larger than a single block. Check it during mount so that we don't try to access memory beyond end of buffer. Reported-by: syzbot+9991561e714f597095da@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz>
2020-09-29udf: Remove pointless union in udf_inode_infoJan Kara10-47/+39
We use only a single member out of the i_ext union in udf_inode_info. Just remove the pointless union. Signed-off-by: Jan Kara <jack@suse.cz>
2020-09-29udf: Avoid accessing uninitialized data on failed inode readJan Kara1-11/+14
When we fail to read inode, some data accessed in udf_evict_inode() may be uninitialized. Move the accesses to !is_bad_inode() branch. Reported-by: syzbot+91f02b28f9bb5f5f1341@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz>
2020-09-29io_uring: fix async buffered reads when readahead is disabledHao Xu1-0/+2
The async buffered reads feature is not working when readahead is turned off. There are two things to concern: - when doing retry in io_read, not only the IOCB_WAITQ flag but also the IOCB_NOWAIT flag is still set, which makes it goes to would_block phase in generic_file_buffered_read() and then return -EAGAIN. After that, the io-wq thread work is queued, and later doing the async reads in the old way. - even if we remove IOCB_NOWAIT when doing retry, the feature is still not running properly, since in generic_file_buffered_read() it goes to lock_page_killable() after calling mapping->a_ops->readpage() to do IO, and thus causing process to sleep. Fixes: 1a0a7853b901 ("mm: support async buffered reads in generic_file_buffered_read()") Fixes: 3b2a4439e0ae ("io_uring: get rid of kiocb_wait_page_queue_init()") Signed-off-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-29f2fs: fix to do sanity check on segment/section countChao Yu1-0/+6
As syzbot reported: BUG: KASAN: slab-out-of-bounds in init_min_max_mtime fs/f2fs/segment.c:4710 [inline] BUG: KASAN: slab-out-of-bounds in f2fs_build_segment_manager+0x9302/0xa6d0 fs/f2fs/segment.c:4792 Read of size 8 at addr ffff8880a1b934a8 by task syz-executor682/6878 CPU: 1 PID: 6878 Comm: syz-executor682 Not tainted 5.9.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x198/0x1fd lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 init_min_max_mtime fs/f2fs/segment.c:4710 [inline] f2fs_build_segment_manager+0x9302/0xa6d0 fs/f2fs/segment.c:4792 f2fs_fill_super+0x381a/0x6e80 fs/f2fs/super.c:3633 mount_bdev+0x32e/0x3f0 fs/super.c:1417 legacy_get_tree+0x105/0x220 fs/fs_context.c:592 vfs_get_tree+0x89/0x2f0 fs/super.c:1547 do_new_mount fs/namespace.c:2875 [inline] path_mount+0x1387/0x20a0 fs/namespace.c:3192 do_mount fs/namespace.c:3205 [inline] __do_sys_mount fs/namespace.c:3413 [inline] __se_sys_mount fs/namespace.c:3390 [inline] __x64_sys_mount+0x27f/0x300 fs/namespace.c:3390 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The root cause is: if segs_per_sec is larger than one, and segment count in last section is less than segs_per_sec, we will suffer out-of-boundary memory access on sit_i->sentries[] in init_min_max_mtime(). Fix this by adding sanity check among segment count, section count and segs_per_sec value in sanity_check_raw_super(). Reported-by: syzbot+481a3ffab50fed41dcc0@syzkaller.appspotmail.com Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-29f2fs: fix to check segment boundary during SIT page readaheadChao Yu1-0/+2
As syzbot reported: kernel BUG at fs/f2fs/segment.h:657! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 16220 Comm: syz-executor.0 Not tainted 5.9.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:f2fs_ra_meta_pages+0xa51/0xdc0 fs/f2fs/segment.h:657 Call Trace: build_sit_entries fs/f2fs/segment.c:4195 [inline] f2fs_build_segment_manager+0x4b8a/0xa3c0 fs/f2fs/segment.c:4779 f2fs_fill_super+0x377d/0x6b80 fs/f2fs/super.c:3633 mount_bdev+0x32e/0x3f0 fs/super.c:1417 legacy_get_tree+0x105/0x220 fs/fs_context.c:592 vfs_get_tree+0x89/0x2f0 fs/super.c:1547 do_new_mount fs/namespace.c:2875 [inline] path_mount+0x1387/0x2070 fs/namespace.c:3192 do_mount fs/namespace.c:3205 [inline] __do_sys_mount fs/namespace.c:3413 [inline] __se_sys_mount fs/namespace.c:3390 [inline] __x64_sys_mount+0x27f/0x300 fs/namespace.c:3390 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 @blkno in f2fs_ra_meta_pages could exceed max segment count, causing panic in following sanity check in current_sit_addr(), add check condition to avoid this issue. Reported-by: syzbot+3698081bcf0bb2d12174@syzkaller.appspotmail.com Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-29f2fs: fix uninit-value in f2fs_lookupChao Yu1-5/+3
As syzbot reported: Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x21c/0x280 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:122 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:219 f2fs_lookup+0xe05/0x1a80 fs/f2fs/namei.c:503 lookup_open fs/namei.c:3082 [inline] open_last_lookups fs/namei.c:3177 [inline] path_openat+0x2729/0x6a90 fs/namei.c:3365 do_filp_open+0x2b8/0x710 fs/namei.c:3395 do_sys_openat2+0xa88/0x1140 fs/open.c:1168 do_sys_open fs/open.c:1184 [inline] __do_compat_sys_openat fs/open.c:1242 [inline] __se_compat_sys_openat+0x2a4/0x310 fs/open.c:1240 __ia32_compat_sys_openat+0x56/0x70 fs/open.c:1240 do_syscall_32_irqs_on arch/x86/entry/common.c:80 [inline] __do_fast_syscall_32+0x129/0x180 arch/x86/entry/common.c:139 do_fast_syscall_32+0x6a/0xc0 arch/x86/entry/common.c:162 do_SYSENTER_32+0x73/0x90 arch/x86/entry/common.c:205 entry_SYSENTER_compat_after_hwframe+0x4d/0x5c In f2fs_lookup(), @res_page could be used before being initialized, because in __f2fs_find_entry(), once F2FS_I(dir)->i_current_depth was been fuzzed to zero, then @res_page will never be initialized, causing this kmsan warning, relocating @res_page initialization place to fix this bug. Reported-by: syzbot+0eac6f0bbd558fd866d7@syzkaller.appspotmail.com Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-29f2fs: remove unneeded parameter in find_in_block()Chao Yu1-11/+6
We can relocate @res_page assignment in find_in_block() to its caller, so unneeded parameter could be removed for cleanup. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-29f2fs: fix wrong total_sections check and fsmeta checkWang Xiaojun2-4/+5
Meta area is not included in section_count computation. So the minimum number of total_sections is 1 meanwhile it cannot be greater than segment_count_main. The minimum number of meta segments is 8 (SB + 2 (CP + SIT + NAT) + SSA). Signed-off-by: Wang Xiaojun <wangxiaojun11@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-29f2fs: remove duplicated code in sanity_check_area_boundaryWang Xiaojun1-8/+4
Use seg_end_blkaddr instead of "segment0_blkaddr + (segment_count << log_blocks_per_seg)". Signed-off-by: Wang Xiaojun <wangxiaojun11@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-29f2fs: remove unused check on version_bitmapWang Xiaojun1-3/+0
A NULL will not be return by __bitmap_ptr here. Remove the unused check. Signed-off-by: Wang Xiaojun <wangxiaojun11@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-29f2fs: relocate blkzoned feature checkChao Yu1-12/+11
Relocate blkzoned feature check into parse_options() like other feature check. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-29f2fs: do sanity check on zoned block device pathChao Yu1-0/+6
sbi->devs would be initialized only if image enables multiple device feature or blkzoned feature, if blkzoned feature flag was set by fuzz in non-blkzoned device, we will suffer below panic: get_zone_idx fs/f2fs/segment.c:4892 [inline] f2fs_usable_zone_blks_in_seg fs/f2fs/segment.c:4943 [inline] f2fs_usable_blks_in_seg+0x39b/0xa00 fs/f2fs/segment.c:4999 Call Trace: check_block_count+0x69/0x4e0 fs/f2fs/segment.h:704 build_sit_entries fs/f2fs/segment.c:4403 [inline] f2fs_build_segment_manager+0x51da/0xa370 fs/f2fs/segment.c:5100 f2fs_fill_super+0x3880/0x6ff0 fs/f2fs/super.c:3684 mount_bdev+0x32e/0x3f0 fs/super.c:1417 legacy_get_tree+0x105/0x220 fs/fs_context.c:592 vfs_get_tree+0x89/0x2f0 fs/super.c:1547 do_new_mount fs/namespace.c:2896 [inline] path_mount+0x12ae/0x1e70 fs/namespace.c:3216 do_mount fs/namespace.c:3229 [inline] __do_sys_mount fs/namespace.c:3437 [inline] __se_sys_mount fs/namespace.c:3414 [inline] __x64_sys_mount+0x27f/0x300 fs/namespace.c:3414 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 Add sanity check to inconsistency on factors: blkzoned flag, device path and device character to avoid above panic. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-29f2fs: add trace exit in exception pathZhang Qilong1-1/+5
Missing the trace exit in f2fs_sync_dirty_inodes Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-29f2fs: change return value of reserved_segments to unsigned intXiaojun Wang1-2/+2
The type of SM_I(sbi)->reserved_segments is unsigned int, so change the return value to unsigned int. The type cast can be removed in reserved_sections as a result. Signed-off-by: Xiaojun Wang <wangxiaojun11@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-29ubifs: journal: Make sure to not dirty twice for auth nodesRichard Weinberger1-2/+2
When removing the last reference of an inode the size of an auth node is already part of write_len. So we must not call ubifs_add_auth_dirt(). Call it only when needed. Cc: <stable@vger.kernel.org> Cc: Sascha Hauer <s.hauer@pengutronix.de> Cc: Kristof Havasi <havasiefr@gmail.com> Fixes: 6a98bc4614de ("ubifs: Add authentication nodes to journal") Reported-and-tested-by: Kristof Havasi <havasiefr@gmail.com> Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>