summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2021-05-18NFSD: Capture every CB state transitionChuck Lever1-13/+15
We were missing one. As a clean-up, add a helper that sets the new CB state and fires a tracepoint. The tracepoint fires only when the state changes, to help reduce trace log noise. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-05-18NFSD: Constify @fh argument of knfsd_fh_hash()Chuck Lever1-5/+2
Enable knfsd_fh_hash() to be invoked in functions where the filehandle pointer is a const. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-05-18NFSD: Add tracepoints for EXCHANGEID edge casesChuck Lever2-3/+10
Some of the most common cases are traced. Enough infrastructure is now in place that more can be added later, as needed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-05-18NFSD: Add tracepoints for SETCLIENTID edge casesChuck Lever2-11/+45
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-05-18NFSD: Add a couple more nfsd_clid_expired call sitesChuck Lever2-4/+8
Improve observation of NFSv4 lease expiry. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-05-18NFSD: Add nfsd_clid_destroyed tracepointChuck Lever2-0/+2
Record client-requested termination of client IDs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-05-18NFSD: Add nfsd_clid_reclaim_complete tracepointChuck Lever2-0/+2
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-05-18NFSD: Add nfsd_clid_confirmed tracepointChuck Lever2-5/+6
This replaces a dprintk call site in order to get greater visibility on when client IDs are confirmed or re-used. Simple example: nfsd-995 [000] 126.622975: nfsd_compound: xid=0x3a34e2b1 opcnt=1 nfsd-995 [000] 126.623005: nfsd_cb_args: addr=192.168.2.51:45901 client 60958e3b:9213ef0e prog=1073741824 ident=1 nfsd-995 [000] 126.623007: nfsd_compound_status: op=1/1 OP_SETCLIENTID status=0 nfsd-996 [001] 126.623142: nfsd_compound: xid=0x3b34e2b1 opcnt=1 >>>> nfsd-996 [001] 126.623146: nfsd_clid_confirmed: client 60958e3b:9213ef0e nfsd-996 [001] 126.623148: nfsd_cb_probe: addr=192.168.2.51:45901 client 60958e3b:9213ef0e state=UNKNOWN nfsd-996 [001] 126.623154: nfsd_compound_status: op=1/1 OP_SETCLIENTID_CONFIRM status=0 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-05-18NFSD: Remove trace_nfsd_clid_inuse_errChuck Lever1-24/+0
This tracepoint has been replaced by nfsd_clid_cred_mismatch and nfsd_clid_verf_mismatch, and can simply be removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-05-18NFSD: Add nfsd_clid_verf_mismatch tracepointChuck Lever2-3/+40
Record when a client presents a different boot verifier than the one we know about. Typically this is a sign the client has rebooted, but sometimes it signals a conflicting client ID, which the client's administrator will need to address. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-05-18NFSD: Add nfsd_clid_cred_mismatch tracepointChuck Lever2-4/+38
Record when a client tries to establish a lease record but uses an unexpected credential. This is often a sign of a configuration problem. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-05-18NFSD: Add an RPC authflavor tracepoint display helperChuck Lever1-0/+16
To be used in subsequent patches. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-05-18NFSD: Fix TP_printk() format specifier in nfsd_clid_classChuck Lever2-32/+0
Since commit 9a6944fee68e ("tracing: Add a verifier to check string pointers for trace events"), which was merged in v5.13-rc1, TP_printk() no longer tacitly supports the "%.*s" format specifier. These are low value tracepoints, so just remove them. Reported-by: David Wysochanski <dwysocha@redhat.com> Fixes: dd5e3fbc1f47 ("NFSD: Add tracepoints to the NFSD state management code") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-05-18debugfs: fix security_locked_down() call for SELinuxOndrej Mosnacek1-3/+6
When (ia->ia_valid & (ATTR_MODE | ATTR_UID | ATTR_GID)) is zero, then the SELinux implementation of the locked_down hook might report a denial even though the operation would actually be allowed. To fix this, make sure that security_locked_down() is called only when the return value will be taken into account (i.e. when changing one of the problematic attributes). Note: this was introduced by commit 5496197f9b08 ("debugfs: Restrict debugfs when the kernel is locked down"), but it didn't matter at that time, as the SELinux support came in later. Fixes: 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown") Cc: stable <stable@vger.kernel.org> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Link: https://lore.kernel.org/r/20210507125304.144394-1-omosnace@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-18cifsd: add support for FSCTL_DUPLICATE_EXTENTS_TO_FILENamjae Jeon5-3/+63
Add support for FSCTL_DUPLICATE_EXTENTS_TO_FILE in smb2 ioctl. Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-05-18cifsd: Do not use 0 or 0xFFFFFFFF for TreeIDMarios Makassikis1-3/+1
Returning TreeID=0 is valid behaviour according to [MS-SMB2] 2.2.1.2: TreeId (4 bytes): Uniquely identifies the tree connect for the command. This MUST be 0 for the SMB2 TREE_CONNECT Request. The TreeId can be any unsigned 32-bit integer that is received from a previous SMB2 TREE_CONNECT Response. TreeId SHOULD be set to 0 for the following commands: [...] However, some client implementations reject it as invalid. Windows10 assigns ids starting from 1, and samba4 returns a random uint32_t which suggests there may be other clients that consider it is invalid behaviour. Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-05-18d_path: saner calling conventions for __dentry_path()Al Viro1-20/+13
1) lift NUL-termination into the callers 2) pass pointer to the end of buffer instead of that to beginning. (1) allows to simplify dentry_path() - we don't need to play silly games with restoring the leading / of "//deleted" after __dentry_path() would've overwritten it with NUL. We also do not need to check if (either) prepend() in there fails - if the buffer is not large enough, we'll end with negative buflen after prepend() and __dentry_path() will return the right value (ERR_PTR(-ENAMETOOLONG)) just fine. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-05-18d_path: "\0" is {0,0}, not {0}Al Viro1-5/+5
Single-element array consisting of one NUL is spelled ""... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-05-18reiserfs: Fix fall-through warnings for ClangGustavo A. R. Silva1-0/+1
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning by explicitly adding a break statement instead of letting the code fall through to the next case. Link: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2021-05-17Merge tag 'for-5.13-rc2-tag' of ↵Linus Torvalds4-2/+26
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more fixes: - fix fiemap to print extents that could get misreported due to internal extent splitting and logical merging for fiemap output - fix RCU stalls during delayed iputs - fix removed dentries still existing after log is synced" * tag 'for-5.13-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix removed dentries still existing after log is synced btrfs: return whole extents in fiemap btrfs: avoid RCU stalls while running delayed iputs btrfs: return 0 for dev_extent_hole_check_zoned hole_start in case of error
2021-05-17btrfs: do not BUG_ON in link_to_fixup_dirJosef Bacik1-2/+0
While doing error injection testing I got the following panic kernel BUG at fs/btrfs/tree-log.c:1862! invalid opcode: 0000 [#1] SMP NOPTI CPU: 1 PID: 7836 Comm: mount Not tainted 5.13.0-rc1+ #305 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 RIP: 0010:link_to_fixup_dir+0xd5/0xe0 RSP: 0018:ffffb5800180fa30 EFLAGS: 00010216 RAX: fffffffffffffffb RBX: 00000000fffffffb RCX: ffff8f595287faf0 RDX: ffffb5800180fa37 RSI: ffff8f5954978800 RDI: 0000000000000000 RBP: ffff8f5953af9450 R08: 0000000000000019 R09: 0000000000000001 R10: 000151f408682970 R11: 0000000120021001 R12: ffff8f5954978800 R13: ffff8f595287faf0 R14: ffff8f5953c77dd0 R15: 0000000000000065 FS: 00007fc5284c8c40(0000) GS:ffff8f59bbd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc5287f47c0 CR3: 000000011275e002 CR4: 0000000000370ee0 Call Trace: replay_one_buffer+0x409/0x470 ? btree_read_extent_buffer_pages+0xd0/0x110 walk_up_log_tree+0x157/0x1e0 walk_log_tree+0xa6/0x1d0 btrfs_recover_log_trees+0x1da/0x360 ? replay_one_extent+0x7b0/0x7b0 open_ctree+0x1486/0x1720 btrfs_mount_root.cold+0x12/0xea ? __kmalloc_track_caller+0x12f/0x240 legacy_get_tree+0x24/0x40 vfs_get_tree+0x22/0xb0 vfs_kern_mount.part.0+0x71/0xb0 btrfs_mount+0x10d/0x380 ? vfs_parse_fs_string+0x4d/0x90 legacy_get_tree+0x24/0x40 vfs_get_tree+0x22/0xb0 path_mount+0x433/0xa10 __x64_sys_mount+0xe3/0x120 do_syscall_64+0x3d/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae We can get -EIO or any number of legitimate errors from btrfs_search_slot(), panicing here is not the appropriate response. The error path for this code handles errors properly, simply return the error. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-17btrfs: release path before starting transaction when cloning inline extentFilipe Manana1-0/+5
When cloning an inline extent there are a few cases, such as when we have an implicit hole at file offset 0, where we start a transaction while holding a read lock on a leaf. Starting the transaction results in a call to sb_start_intwrite(), which results in doing a read lock on a percpu semaphore. Lockdep doesn't like this and complains about it: [46.580704] ====================================================== [46.580752] WARNING: possible circular locking dependency detected [46.580799] 5.13.0-rc1 #28 Not tainted [46.580832] ------------------------------------------------------ [46.580877] cloner/3835 is trying to acquire lock: [46.580918] c00000001301d638 (sb_internal#2){.+.+}-{0:0}, at: clone_copy_inline_extent+0xe4/0x5a0 [46.581167] [46.581167] but task is already holding lock: [46.581217] c000000007fa2550 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x70/0x1d0 [46.581293] [46.581293] which lock already depends on the new lock. [46.581293] [46.581351] [46.581351] the existing dependency chain (in reverse order) is: [46.581410] [46.581410] -> #1 (btrfs-tree-00){++++}-{3:3}: [46.581464] down_read_nested+0x68/0x200 [46.581536] __btrfs_tree_read_lock+0x70/0x1d0 [46.581577] btrfs_read_lock_root_node+0x88/0x200 [46.581623] btrfs_search_slot+0x298/0xb70 [46.581665] btrfs_set_inode_index+0xfc/0x260 [46.581708] btrfs_new_inode+0x26c/0x950 [46.581749] btrfs_create+0xf4/0x2b0 [46.581782] lookup_open.isra.57+0x55c/0x6a0 [46.581855] path_openat+0x418/0xd20 [46.581888] do_filp_open+0x9c/0x130 [46.581920] do_sys_openat2+0x2ec/0x430 [46.581961] do_sys_open+0x90/0xc0 [46.581993] system_call_exception+0x3d4/0x410 [46.582037] system_call_common+0xec/0x278 [46.582078] [46.582078] -> #0 (sb_internal#2){.+.+}-{0:0}: [46.582135] __lock_acquire+0x1e90/0x2c50 [46.582176] lock_acquire+0x2b4/0x5b0 [46.582263] start_transaction+0x3cc/0x950 [46.582308] clone_copy_inline_extent+0xe4/0x5a0 [46.582353] btrfs_clone+0x5fc/0x880 [46.582388] btrfs_clone_files+0xd8/0x1c0 [46.582434] btrfs_remap_file_range+0x3d8/0x590 [46.582481] do_clone_file_range+0x10c/0x270 [46.582558] vfs_clone_file_range+0x1b0/0x310 [46.582605] ioctl_file_clone+0x90/0x130 [46.582651] do_vfs_ioctl+0x874/0x1ac0 [46.582697] sys_ioctl+0x6c/0x120 [46.582733] system_call_exception+0x3d4/0x410 [46.582777] system_call_common+0xec/0x278 [46.582822] [46.582822] other info that might help us debug this: [46.582822] [46.582888] Possible unsafe locking scenario: [46.582888] [46.582942] CPU0 CPU1 [46.582984] ---- ---- [46.583028] lock(btrfs-tree-00); [46.583062] lock(sb_internal#2); [46.583119] lock(btrfs-tree-00); [46.583174] lock(sb_internal#2); [46.583212] [46.583212] *** DEADLOCK *** [46.583212] [46.583266] 6 locks held by cloner/3835: [46.583299] #0: c00000001301d448 (sb_writers#12){.+.+}-{0:0}, at: ioctl_file_clone+0x90/0x130 [46.583382] #1: c00000000f6d3768 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}, at: lock_two_nondirectories+0x58/0xc0 [46.583477] #2: c00000000f6d72a8 (&sb->s_type->i_mutex_key#15/4){+.+.}-{3:3}, at: lock_two_nondirectories+0x9c/0xc0 [46.583574] #3: c00000000f6d7138 (&ei->i_mmap_lock){+.+.}-{3:3}, at: btrfs_remap_file_range+0xd0/0x590 [46.583657] #4: c00000000f6d35f8 (&ei->i_mmap_lock/1){+.+.}-{3:3}, at: btrfs_remap_file_range+0xe0/0x590 [46.583743] #5: c000000007fa2550 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x70/0x1d0 [46.583828] [46.583828] stack backtrace: [46.583872] CPU: 1 PID: 3835 Comm: cloner Not tainted 5.13.0-rc1 #28 [46.583931] Call Trace: [46.583955] [c0000000167c7200] [c000000000c1ee78] dump_stack+0xec/0x144 (unreliable) [46.584052] [c0000000167c7240] [c000000000274058] print_circular_bug.isra.32+0x3a8/0x400 [46.584123] [c0000000167c72e0] [c0000000002741f4] check_noncircular+0x144/0x190 [46.584191] [c0000000167c73b0] [c000000000278fc0] __lock_acquire+0x1e90/0x2c50 [46.584259] [c0000000167c74f0] [c00000000027aa94] lock_acquire+0x2b4/0x5b0 [46.584317] [c0000000167c75e0] [c000000000a0d6cc] start_transaction+0x3cc/0x950 [46.584388] [c0000000167c7690] [c000000000af47a4] clone_copy_inline_extent+0xe4/0x5a0 [46.584457] [c0000000167c77c0] [c000000000af525c] btrfs_clone+0x5fc/0x880 [46.584514] [c0000000167c7990] [c000000000af5698] btrfs_clone_files+0xd8/0x1c0 [46.584583] [c0000000167c7a00] [c000000000af5b58] btrfs_remap_file_range+0x3d8/0x590 [46.584652] [c0000000167c7ae0] [c0000000005d81dc] do_clone_file_range+0x10c/0x270 [46.584722] [c0000000167c7b40] [c0000000005d84f0] vfs_clone_file_range+0x1b0/0x310 [46.584793] [c0000000167c7bb0] [c00000000058bf80] ioctl_file_clone+0x90/0x130 [46.584861] [c0000000167c7c10] [c00000000058c894] do_vfs_ioctl+0x874/0x1ac0 [46.584922] [c0000000167c7d10] [c00000000058db4c] sys_ioctl+0x6c/0x120 [46.584978] [c0000000167c7d60] [c0000000000364a4] system_call_exception+0x3d4/0x410 [46.585046] [c0000000167c7e10] [c00000000000d45c] system_call_common+0xec/0x278 [46.585114] --- interrupt: c00 at 0x7ffff7e22990 [46.585160] NIP: 00007ffff7e22990 LR: 00000001000010ec CTR: 0000000000000000 [46.585224] REGS: c0000000167c7e80 TRAP: 0c00 Not tainted (5.13.0-rc1) [46.585280] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 28000244 XER: 00000000 [46.585374] IRQMASK: 0 [46.585374] GPR00: 0000000000000036 00007fffffffdec0 00007ffff7f17100 0000000000000004 [46.585374] GPR04: 000000008020940d 00007fffffffdf40 0000000000000000 0000000000000000 [46.585374] GPR08: 0000000000000004 0000000000000000 0000000000000000 0000000000000000 [46.585374] GPR12: 0000000000000000 00007ffff7ffa940 0000000000000000 0000000000000000 [46.585374] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [46.585374] GPR20: 0000000000000000 000000009123683e 00007fffffffdf40 0000000000000000 [46.585374] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000004 [46.585374] GPR28: 0000000100030260 0000000100030280 0000000000000003 000000000000005f [46.585919] NIP [00007ffff7e22990] 0x7ffff7e22990 [46.585964] LR [00000001000010ec] 0x1000010ec [46.586010] --- interrupt: c00 This should be a false positive, as both locks are acquired in read mode. Nevertheless, we don't need to hold a leaf locked when we start the transaction, so just release the leaf (path) before starting it. Reported-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/linux-btrfs/20210513214404.xks77p566fglzgum@riteshh-domain/ Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-17io_uring: don't modify req->poll for rwPavel Begunkov1-3/+3
__io_queue_proc() is used by both poll and apoll, so we should not access req->poll directly but selecting right struct io_poll_iocb depending on use case. Reported-and-tested-by: syzbot+a84b8783366ecb1c65d0@syzkaller.appspotmail.com Fixes: ea6a693d862d ("io_uring: disable multishot poll for double poll add cases") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4a6a1de31142d8e0250fe2dfd4c8923d82a5bbfc.1621251795.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-17reiserfs: add check for invalid 1st journal blockPavel Skripkin1-0/+14
syzbot reported divide error in reiserfs. The problem was in incorrect journal 1st block. Syzbot's reproducer manualy generated wrong superblock with incorrect 1st block. In journal_init() wasn't any checks about this particular case. For example, if 1st journal block is before superblock 1st block, it can cause zeroing important superblock members in do_journal_end(). Link: https://lore.kernel.org/r/20210517121545.29645-1-paskripkin@gmail.com Reported-by: syzbot+0ba9909df31c6a36974d@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2021-05-17nilfs2: Switch to use %ptTsAndy Shevchenko1-16/+3
Use %ptTs instead of open coded variant to print contents of time64_t type in human readable form. Use sysfs_emit() at the same time in the changed functions. Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: linux-nilfs@vger.kernel.org Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20210511153958.34527-3-andriy.shevchenko@linux.intel.com
2021-05-17Merge 5.13-rc2 into driver-core-nextGreg Kroah-Hartman23-104/+179
We need the driver core fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-17cifs: remove deadstore in cifs_close_all_deferred_files()wenhuizhang1-2/+0
Deadstore detected by Lukas Bulwahn's CodeChecker Tool (ELISA group). line 741 struct cifsInodeInfo *cinode; line 747 cinode = CIFS_I(d_inode(cfile->dentry)); could be deleted. cinode on filesystem should not be deleted when files are closed, they are representations of some data fields on a physical disk, thus no further action is required. The virtual inode on vfs will be handled by vfs automatically, and the denotation is inode, which is different from the cinode. Signed-off-by: wenhuizhang <wenhui@gwmail.gwu.edu> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-05-17xfs: adjust rt allocation minlen when extszhint > rtextsizeDarrick J. Wong1-26/+57
xfs_bmap_rtalloc doesn't handle realtime extent files with extent size hints larger than the rt volume's extent size properly, because xfs_bmap_extsize_align can adjust the offset/length parameters to try to fit the extent size hint. Under these conditions, minlen has to be large enough so that any allocation returned by xfs_rtallocate_extent will be large enough to cover at least one of the blocks that the caller asked for. If the allocation is too short, bmapi_write will return no mapping for the requested range, which causes ENOSPC errors in other parts of the filesystem. Therefore, adjust minlen upwards to fix this. This can be found by running generic/263 (g/127 or g/522) with a realtime extent size hint that's larger than the rt volume extent size. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2021-05-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds4-8/+14
Merge misc fixes from Andrew Morton: "13 patches. Subsystems affected by this patch series: resource, squashfs, hfsplus, modprobe, and mm (hugetlb, slub, userfaultfd, ksm, pagealloc, kasan, pagemap, and ioremap)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/ioremap: fix iomap_max_page_shift docs: admin-guide: update description for kernel.modprobe sysctl hfsplus: prevent corruption in shrinking truncate mm/filemap: fix readahead return types kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled mm: fix struct page layout on 32-bit systems ksm: revert "use GET_KSM_PAGE_NOLOCK to get ksm page in remove_rmap_item_from_tree()" userfaultfd: release page in error path to avoid BUG_ON squashfs: fix divide error in calculate_skip() kernel/resource: fix return code check in __request_free_mem_region mm, slub: move slub_debug static key enabling outside slab_mutex mm/hugetlb: fix cow where page writtable in child mm/hugetlb: fix F_SEAL_FUTURE_WRITE
2021-05-15Merge tag 'io_uring-5.13-2021-05-14' of git://git.kernel.dk/linux-blockLinus Torvalds1-9/+10
Pull io_uring fixes from Jens Axboe: "Just a few minor fixes/changes: - Fix issue with double free race for linked timeout completions - Fix reference issue with timeouts - Remove last few places that make SQPOLL special, since it's just an io thread now. - Bump maximum allowed registered buffers, as we don't allocate as much anymore" * tag 'io_uring-5.13-2021-05-14' of git://git.kernel.dk/linux-block: io_uring: increase max number of reg buffers io_uring: further remove sqpoll limits on opcodes io_uring: fix ltout double free on completion race io_uring: fix link timeout refs
2021-05-15Merge tag 'erofs-for-5.13-rc2-fixes' of ↵Linus Torvalds1-2/+19
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: "This mainly fixes 1 lcluster-sized pclusters for the big pcluster feature, which can be forcely generated by mkfs as a specific on-disk case for per-(sub)file compression strategies but missed to handle in runtime properly. Also, documentation updates are included to fix the broken illustration due to the ReST conversion by accident and complete the big pcluster introduction. Summary: - update documentation to fix the broken illustration due to ReST conversion by accident at that time and complete the big pcluster introduction - fix 1 lcluster-sized pclusters for the big pcluster feature" * tag 'erofs-for-5.13-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: fix 1 lcluster-sized pcluster for big pcluster erofs: update documentation about data compression erofs: fix broken illustration in documentation
2021-05-15Merge tag 'dax-fixes-5.13-rc2' of ↵Linus Torvalds1-12/+23
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull dax fixes from Dan Williams: "A fix for a hang condition due to missed wakeups in the filesystem-dax core when exercised by virtiofs. This bug has been there from the beginning, but the condition has not triggered on other filesystems since they hold a lock over invalidation events" * tag 'dax-fixes-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: Wake up all waiters after invalidating dax entry dax: Add a wakeup mode parameter to put_unlocked_entry() dax: Add an enum for specifying dax wakup mode
2021-05-15hfsplus: prevent corruption in shrinking truncateJouni Roivas1-3/+4
I believe there are some issues introduced by commit 31651c607151 ("hfsplus: avoid deadlock on file truncation") HFS+ has extent records which always contains 8 extents. In case the first extent record in catalog file gets full, new ones are allocated from extents overflow file. In case shrinking truncate happens to middle of an extent record which locates in extents overflow file, the logic in hfsplus_file_truncate() was changed so that call to hfs_brec_remove() is not guarded any more. Right action would be just freeing the extents that exceed the new size inside extent record by calling hfsplus_free_extents(), and then check if the whole extent record should be removed. However since the guard (blk_cnt > start) is now after the call to hfs_brec_remove(), this has unfortunate effect that the last matching extent record is removed unconditionally. To reproduce this issue, create a file which has at least 10 extents, and then perform shrinking truncate into middle of the last extent record, so that the number of remaining extents is not under or divisible by 8. This causes the last extent record (8 extents) to be removed totally instead of truncating into middle of it. Thus this causes corruption, and lost data. Fix for this is simply checking if the new truncated end is below the start of this extent record, making it safe to remove the full extent record. However call to hfs_brec_remove() can't be moved to it's previous place since we're dropping ->tree_lock and it can cause a race condition and the cached info being invalidated possibly corrupting the node data. Another issue is related to this one. When entering into the block (blk_cnt > start) we are not holding the ->tree_lock. We break out from the loop not holding the lock, but hfs_find_exit() does unlock it. Not sure if it's possible for someone else to take the lock under our feet, but it can cause hard to debug errors and premature unlocking. Even if there's no real risk of it, the locking should still always be kept in balance. Thus taking the lock now just before the check. Link: https://lkml.kernel.org/r/20210429165139.3082828-1-jouni.roivas@tuxera.com Fixes: 31651c607151f ("hfsplus: avoid deadlock on file truncation") Signed-off-by: Jouni Roivas <jouni.roivas@tuxera.com> Reviewed-by: Anton Altaparmakov <anton@tuxera.com> Cc: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Cc: Viacheslav Dubeyko <slava@dubeyko.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-15mm/filemap: fix readahead return typesMatthew Wilcox (Oracle)1-2/+2
A readahead request will not allocate more memory than can be represented by a size_t, even on systems that have HIGHMEM available. Change the length functions from returning an loff_t to a size_t. Link: https://lkml.kernel.org/r/20210510201201.1558972-1-willy@infradead.org Fixes: 32c0a6bcaa1f57 ("btrfs: add and use readahead_batch_length") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-15squashfs: fix divide error in calculate_skip()Phillip Lougher1-3/+3
Sysbot has reported a "divide error" which has been identified as being caused by a corrupted file_size value within the file inode. This value has been corrupted to a much larger value than expected. Calculate_skip() is passed i_size_read(inode) >> msblk->block_log. Due to the file_size value corruption this overflows the int argument/variable in that function, leading to the divide error. This patch changes the function to use u64. This will accommodate any unexpectedly large values due to corruption. The value returned from calculate_skip() is clamped to be never more than SQUASHFS_CACHED_BLKS - 1, or 7. So file_size corruption does not lead to an unexpectedly large return result here. Link: https://lkml.kernel.org/r/20210507152618.9447-1-phillip@squashfs.org.uk Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reported-by: <syzbot+e8f781243ce16ac2f962@syzkaller.appspotmail.com> Reported-by: <syzbot+7b98870d4fec9447b951@syzkaller.appspotmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-15mm/hugetlb: fix F_SEAL_FUTURE_WRITEPeter Xu1-0/+5
Patch series "mm/hugetlb: Fix issues on file sealing and fork", v2. Hugh reported issue with F_SEAL_FUTURE_WRITE not applied correctly to hugetlbfs, which I can easily verify using the memfd_test program, which seems that the program is hardly run with hugetlbfs pages (as by default shmem). Meanwhile I found another probably even more severe issue on that hugetlb fork won't wr-protect child cow pages, so child can potentially write to parent private pages. Patch 2 addresses that. After this series applied, "memfd_test hugetlbfs" should start to pass. This patch (of 2): F_SEAL_FUTURE_WRITE is missing for hugetlb starting from the first day. There is a test program for that and it fails constantly. $ ./memfd_test hugetlbfs memfd-hugetlb: CREATE memfd-hugetlb: BASIC memfd-hugetlb: SEAL-WRITE memfd-hugetlb: SEAL-FUTURE-WRITE mmap() didn't fail as expected Aborted (core dumped) I think it's probably because no one is really running the hugetlbfs test. Fix it by checking FUTURE_WRITE also in hugetlbfs_file_mmap() as what we do in shmem_mmap(). Generalize a helper for that. Link: https://lkml.kernel.org/r/20210503234356.9097-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20210503234356.9097-2-peterx@redhat.com Fixes: ab3948f58ff84 ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd") Signed-off-by: Peter Xu <peterx@redhat.com> Reported-by: Hugh Dickins <hughd@google.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-14f2fs: compress: clean up parameter of __f2fs_cluster_blocks()Chao Yu1-20/+13
Previously, in order to reuse __f2fs_cluster_blocks(), f2fs_is_compressed_cluster() assigned a compress_ctx type variable, which is used to pass few parameters (cc.inode, cc.cluster_size, cc.cluster_idx), it's wasteful to allocate such large space in stack. Let's clean up parameters of __f2fs_cluster_blocks() to avoid that. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-14f2fs: compress: remove unneeded f2fs_put_dnode()Chao Yu1-1/+0
If we don't initialize dn.inode_page for f2fs_get_block(), f2fs_get_block() will call f2fs_put_dnode() itself, so let's remove unneeded f2fs_put_dnode() in f2fs_vm_page_mkwrite(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-14f2fs: atgc: fix to set default age thresholdChao Yu1-0/+1
Default age threshold value is missed to set, fix it. Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection") Reported-by: Sahitya Tummala <stummala@codeaurora.org> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-14f2fs: Prevent swap file in LFS modeShin'ichiro Kawasaki1-0/+6
The kernel writes to swap files on f2fs directly without the assistance of the filesystem. This direct write by kernel can be non-sequential even when the f2fs is in LFS mode. Such non-sequential write conflicts with the LFS semantics. Especially when f2fs is set up on zoned block devices, the non-sequential write causes unaligned write command errors. To avoid the non-sequential writes to swap files, prevent swap file activation when the filesystem is in LFS mode. Fixes: 4969c06a0d83 ("f2fs: support swap file w/ DIO") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Cc: stable@vger.kernel.org # v5.10+ Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-14f2fs: fix to avoid racing on fsync_entry_slab by multi filesystem instancesChao Yu3-10/+23
As syzbot reported, there is an use-after-free issue during f2fs recovery: Use-after-free write at 0xffff88823bc16040 (in kfence-#10): kmem_cache_destroy+0x1f/0x120 mm/slab_common.c:486 f2fs_recover_fsync_data+0x75b0/0x8380 fs/f2fs/recovery.c:869 f2fs_fill_super+0x9393/0xa420 fs/f2fs/super.c:3945 mount_bdev+0x26c/0x3a0 fs/super.c:1367 legacy_get_tree+0xea/0x180 fs/fs_context.c:592 vfs_get_tree+0x86/0x270 fs/super.c:1497 do_new_mount fs/namespace.c:2905 [inline] path_mount+0x196f/0x2be0 fs/namespace.c:3235 do_mount fs/namespace.c:3248 [inline] __do_sys_mount fs/namespace.c:3456 [inline] __se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3433 do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae The root cause is multi f2fs filesystem instances can race on accessing global fsync_entry_slab pointer, result in use-after-free issue of slab cache, fixes to init/destroy this slab cache only once during module init/destroy procedure to avoid this issue. Reported-by: syzbot+9d90dad32dd9727ed084@syzkaller.appspotmail.com Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-14f2fs: restructure f2fs page.private layoutChao Yu11-109/+146
Restruct f2fs page private layout for below reasons: There are some cases that f2fs wants to set a flag in a page to indicate a specified status of page: a) page is in transaction list for atomic write b) page contains dummy data for aligned write c) page is migrating for GC d) page contains inline data for inline inode flush e) page belongs to merkle tree, and is verified for fsverity f) page is dirty and has filesystem/inode reference count for writeback g) page is temporary and has decompress io context reference for compression There are existed places in page structure we can use to store f2fs private status/data: - page.flags: PG_checked, PG_private - page.private However it was a mess when we using them, which may cause potential confliction: page.private PG_private PG_checked page._refcount (+1 at most) a) -1 set +1 b) -2 set c), d), e) set f) 0 set +1 g) pointer set The other problem is page.flags has no free slot, if we can avoid set zero to page.private and set PG_private flag, then we use non-zero value to indicate PG_private status, so that we may have chance to reclaim PG_private slot for other usage. [1] The other concern is f2fs has bad scalability in aspect of indicating more page status. So in this patch, let's restructure f2fs' page.private as below to solve above issues: Layout A: lowest bit should be 1 | bit0 = 1 | bit1 | bit2 | ... | bit MAX | private data .... | bit 0 PAGE_PRIVATE_NOT_POINTER bit 1 PAGE_PRIVATE_ATOMIC_WRITE bit 2 PAGE_PRIVATE_DUMMY_WRITE bit 3 PAGE_PRIVATE_ONGOING_MIGRATION bit 4 PAGE_PRIVATE_INLINE_INODE bit 5 PAGE_PRIVATE_REF_RESOURCE bit 6- f2fs private data Layout B: lowest bit should be 0 page.private is a wrapped pointer. After the change: page.private PG_private PG_checked page._refcount (+1 at most) a) 11 set +1 b) 101 set +1 c) 1001 set +1 d) 10001 set +1 e) set f) 100001 set +1 g) pointer set +1 [1] https://lore.kernel.org/linux-f2fs-devel/20210422154705.GO3596236@casper.infradead.org/T/#u Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-14f2fs: add cp_error check in f2fs_write_compressed_pagesChao Yu1-0/+6
This patch adds cp_error check in f2fs_write_compressed_pages() like we did in f2fs_write_single_data_page() Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-14f2fs: compress: rename __cluster_may_compressChao Yu1-4/+4
This patch renames __cluster_may_compress() to cluster_has_invalid_data() for better readability. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-05-14Merge tag 'f2fs-5.13-rc1-fix' of ↵Linus Torvalds5-47/+56
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs fixes from Jaegeuk Kim: "This fixes some critical bugs such as memory leak in compression flows, kernel panic when handling errors, and swapon failure due to newly added condition check" * tag 'f2fs-5.13-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: return EINVAL for hole cases in swap file f2fs: avoid swapon failure by giving a warning first f2fs: compress: fix to assign cc.cluster_idx correctly f2fs: compress: fix race condition of overwrite vs truncate f2fs: compress: fix to free compress page correctly f2fs: support iflag change given the mask f2fs: avoid null pointer access when handling IPU error
2021-05-14io_uring: increase max number of reg buffersPavel Begunkov1-1/+3
Since recent changes instead of storing a large array of struct io_mapped_ubuf, we store pointers to them, that is 4 times slimmer and we should not to so worry about restricting max number of registererd buffer slots, increase the limit 4 times. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d3dee1da37f46da416aa96a16bf9e5094e10584d.1620990371.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-14io_uring: further remove sqpoll limits on opcodesPavel Begunkov1-4/+2
There are three types of requests that left disabled for sqpoll, namely epoll ctx, statx, and resources update. Since SQPOLL task is now closely mimics a userspace thread, remove the restrictions. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/909b52d70c45636d8d7897582474ea5aab5eed34.1620990306.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-14io_uring: fix ltout double free on completion racePavel Begunkov1-3/+4
Always remove linked timeout on io_link_timeout_fn() from the master request link list, otherwise we may get use-after-free when first io_link_timeout_fn() puts linked timeout in the fail path, and then will be found and put on master's free. Cc: stable@vger.kernel.org # 5.10+ Fixes: 90cd7e424969d ("io_uring: track link timeout's master explicitly") Reported-and-tested-by: syzbot+5a864149dd970b546223@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/69c46bf6ce37fec4fdcd98f0882e18eb07ce693a.1620990121.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-14debugfs: only accept read attributes for blobsWolfram Sang1-2/+3
Blobs can only be read. So, keep only 'read' file attributes because the others will not work and only confuse users. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/20210504131350.46586-1-wsa+renesas@sang-engineering.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-14cifsd: fix xfstests generic/504 test failureNamjae Jeon1-14/+11
If lock length in smb2 lock request from client is over flock max length size, lock length is changed to flock max length and don't return error response. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Steve French <stfrench@microsoft.com>