summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2021-06-07erofs: clean up file headers & footersGao Xiang18-34/+0
- Remove my outdated misleading email address; - Get rid of all unnecessary trailing newline by accident. Link: https://lore.kernel.org/r/20210602160634.10757-1-xiang@kernel.org Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-06-07erofs: remove the occupied parameter from z_erofs_pagevec_enqueue()Yue Hu2-7/+2
No any behavior to variable occupied in z_erofs_attach_page() which is only caller to z_erofs_pagevec_enqueue(). Link: https://lore.kernel.org/r/20210419102623.2015-1-zbestahu@gmail.com Signed-off-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Gao Xiang <xiang@kernel.org> Signed-off-by: Gao Xiang <xiang@kernel.org>
2021-06-07erofs: fix error return code in erofs_read_superblock()Wei Yongjun1-0/+1
'ret' will be overwritten to 0 if erofs_sb_has_sb_chksum() return true, thus 0 will return in some error handling cases. Fix to return negative error code -EINVAL instead of 0. Link: https://lore.kernel.org/r/20210519141657.3062715-1-weiyongjun1@huawei.com Fixes: b858a4844cfb ("erofs: support superblock checksum") Cc: stable <stable@vger.kernel.org> # 5.5+ Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Gao Xiang <xiang@kernel.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <xiang@kernel.org>
2021-06-07quota: Change quotactl_path() systcall to an fd-based oneJan Kara1-15/+13
Some users have pointed out that path-based syscalls are problematic in some environments and at least directory fd argument and possibly also resolve flags are desirable for such syscalls. Rather than reimplementing all details of pathname lookup and following where it may eventually evolve, let's go for full file descriptor based syscall similar to how ioctl(2) works since the beginning. Managing of quotas isn't performance sensitive so the extra overhead of open does not matter and we are able to consume O_PATH descriptors as well which makes open cheap anyway. Also for frequent operations (such as retrieving usage information for all users) we can reuse single fd and in fact get even better performance as well as avoiding races with possible remounts etc. Tested-by: Sascha Hauer <s.hauer@pengutronix.de> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2021-06-07cifsd: remove duplicated argumentWan Jiabing1-4/+4
Fix the following coccicheck warning: ./fs/cifsd/smb2pdu.c:1713:27-41: duplicated argument to & or | FILE_DELETE_LE is duplicated. Remove one and reorder argument to make coding style reasonable. Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-07cifsd: fix possible compile error for asn1.cHyunchul Lee1-3/+6
spnego_negtokeninit.asn1.h and spnego_negtokentarg.asn1.h have to be generated before asn1.o is compiled. Because of parallel build, the dependency could be broken, we need to specify the dependency in Makefile. Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-07xfs: merge xfs_buf_allocate_memoryDave Chinner1-31/+13
It only has one caller and is now a simple function, so merge it into the caller. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-06-07xfs: cleanup error handling in xfs_buf_get_mapChristoph Hellwig1-8/+7
Use a single goto label for freeing the buffer and returning an error. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com>
2021-06-07xfs: get rid of xb_to_gfp()Dave Chinner1-4/+6
Only used in one place, so just open code the logic in the macro. Based on a patch from Christoph Hellwig. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-06-07xfs: simplify the b_page_count calculationChristoph Hellwig1-11/+3
Ever since we stopped using the Linux page cache to back XFS buffers there is no need to take the start sector into account for calculating the number of pages in a buffer, as the data always start from the beginning of the buffer. Signed-off-by: Christoph Hellwig <hch@lst.de> [dgc: modified to suit this series] Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-06-07xfs: remove ->b_offset handling for page backed buffersChristoph Hellwig2-6/+5
->b_offset can only be non-zero for _XBF_KMEM backed buffers, so remove all code dealing with it for page backed buffers. Signed-off-by: Christoph Hellwig <hch@lst.de> [dgc: modified to fit this patchset] Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-06-07cifsd: set epoch in smb2_lease_break responseNamjae Jeon4-35/+102
When running generic/591 after smb2 leases is enable, all smb2 lease ack requests failed in ksmbd. because cifs client seems to support only smb2 v2 lease. So cifs doesn't update lease state in ack request if epoch is not set in smb2 lease break request from ksmbd. epoch is used for smb2 v2 leases. So this patch add smb2 create v2 lease context and set increased epoch in smb2 lease break response. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-07cifsd: fix list_add double add BUG_ON trap in setup_async_work()Namjae Jeon1-3/+5
BUG_ON trap is coming when running xfstests generic/591 and smb2 leases = yes in smb.conf. [ 597.224978] list_add double add: new=ffff9110d292bb20, prev=ffff9110d292bb20, next=ffff9110d6c389e8. [ 597.225073] ------------[ cut here ]------------ [ 597.225077] kernel BUG at lib/list_debug.c:31! [ 597.225090] invalid opcode: 0000 [#1] SMP PTI [ 597.225095] CPU: 2 PID: 501 Comm: kworker/2:3 Tainted: G OE 5.13.0-rc1+ #2 [ 597.225099] Hardware name: SAMSUNG ELECTRONICS CO., LTD. Samsung DeskTop System/SAMSUNG_DT1234567890, BIOS P04KBM.022.121023.SK 10/23/2012 [ 597.225102] Workqueue: ksmbd-io handle_ksmbd_work [ksmbd] [ 597.225125] RIP: 0010:__list_add_valid+0x66/0x70 [ 597.225132] Code: 0b 48 89 c1 4c 89 c6 48 c7 c7 c8 08 c0 95 e8 fd 54 66 00 0f 0b 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 20 09 c0 95 e8 e6 54 66 00 <0f> 0b 0f 1f 84 00 00 00 00 00 55 48 8b 07 48 b9 00 01 00 00 00 00 [ 597.225136] RSP: 0018:ffffb9c9408dbac0 EFLAGS: 00010282 [ 597.225139] RAX: 0000000000000058 RBX: ffff9110d292ba40 RCX: 0000000000000000 [ 597.225142] RDX: 0000000000000000 RSI: ffff9111da328c30 RDI: ffff9111da328c30 [ 597.225144] RBP: ffffb9c9408dbac0 R08: 0000000000000001 R09: 0000000000000001 [ 597.225147] R10: 0000000003dd35ed R11: ffffb9c9408db888 R12: ffff9110d6c38998 [ 597.225149] R13: ffff9110d6c38800 R14: ffff9110d292bb20 R15: ffff9110d292bb20 [ 597.225152] FS: 0000000000000000(0000) GS:ffff9111da300000(0000) knlGS:0000000000000000 [ 597.225155] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 597.225157] CR2: 00007fd1629f84d0 CR3: 00000000c9a12006 CR4: 00000000001706e0 [ 597.225160] Call Trace: [ 597.225163] setup_async_work+0xa2/0x120 [ksmbd] [ 597.225191] oplock_break+0x396/0x5d0 [ksmbd] [ 597.225206] smb_grant_oplock+0x7a1/0x900 [ksmbd] [ 597.225218] ? smb_grant_oplock+0x7a1/0x900 [ksmbd] [ 597.225231] smb2_open+0xbbb/0x2960 [ksmbd] [ 597.225243] ? smb2_open+0xbbb/0x2960 [ksmbd] [ 597.225257] ? find_held_lock+0x35/0xa0 [ 597.225261] ? xa_load+0xaf/0x160 [ 597.225268] handle_ksmbd_work+0x2e0/0x420 [ksmbd] [ 597.225280] ? handle_ksmbd_work+0x2e0/0x420 [ksmbd] [ 597.225292] process_one_work+0x25a/0x5d0 [ 597.225298] worker_thread+0x3f/0x3a0 [ 597.225302] ? __kthread_parkme+0x6f/0xa0 [ 597.225306] ? process_one_work+0x5d0/0x5d0 [ 597.225309] kthread+0x142/0x160 [ 597.225313] ? kthread_park+0x90/0x90 [ 597.225316] ret_from_fork+0x22/0x30 same work struct can be add to list in smb_break_all_write_oplock() and smb_grant_oplock(). If client send invalid lease break ack to server, This issue can occur by calling both functions. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-06-07Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds8-126/+135
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Miscellaneous ext4 bug fixes" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Only advertise encrypted_casefold when encryption and unicode are enabled ext4: fix no-key deletion for encrypt+casefold ext4: fix memory leak in ext4_fill_super ext4: fix fast commit alignment issues ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed ext4: fix accessing uninit percpu counter variable with fast_commit ext4: fix memory leak in ext4_mb_init_backend on error path.
2021-06-06ext4: Only advertise encrypted_casefold when encryption and unicode are enabledDaniel Rosenberg1-0/+4
Encrypted casefolding is only supported when both encryption and casefolding are both enabled in the config. Fixes: 471fbbea7ff7 ("ext4: handle casefolding with encryption") Cc: stable@vger.kernel.org # 5.13+ Signed-off-by: Daniel Rosenberg <drosen@google.com> Link: https://lore.kernel.org/r/20210603094849.314342-1-drosen@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-06ext4: fix no-key deletion for encrypt+casefoldDaniel Rosenberg1-2/+4
commit 471fbbea7ff7 ("ext4: handle casefolding with encryption") is missing a few checks for the encryption key which are needed to support deleting enrypted casefolded files when the key is not present. This bug made it impossible to delete encrypted+casefolded directories without the encryption key, due to errors like: W : EXT4-fs warning (device vdc): __ext4fs_dirhash:270: inode #49202: comm Binder:378_4: Siphash requires key Repro steps in kvm-xfstests test appliance: mkfs.ext4 -F -E encoding=utf8 -O encrypt /dev/vdc mount /vdc mkdir /vdc/dir chattr +F /vdc/dir keyid=$(head -c 64 /dev/zero | xfs_io -c add_enckey /vdc | awk '{print $NF}') xfs_io -c "set_encpolicy $keyid" /vdc/dir for i in `seq 1 100`; do mkdir /vdc/dir/$i done xfs_io -c "rm_enckey $keyid" /vdc rm -rf /vdc/dir # fails with the bug Fixes: 471fbbea7ff7 ("ext4: handle casefolding with encryption") Signed-off-by: Daniel Rosenberg <drosen@google.com> Link: https://lore.kernel.org/r/20210522004132.2142563-1-drosen@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-06ext4: fix memory leak in ext4_fill_superAlexey Makhalov1-2/+9
Buffer head references must be released before calling kill_bdev(); otherwise the buffer head (and its page referenced by b_data) will not be freed by kill_bdev, and subsequently that bh will be leaked. If blocksizes differ, sb_set_blocksize() will kill current buffers and page cache by using kill_bdev(). And then super block will be reread again but using correct blocksize this time. sb_set_blocksize() didn't fully free superblock page and buffer head, and being busy, they were not freed and instead leaked. This can easily be reproduced by calling an infinite loop of: systemctl start <ext4_on_lvm>.mount, and systemctl stop <ext4_on_lvm>.mount ... since systemd creates a cgroup for each slice which it mounts, and the bh leak get amplified by a dying memory cgroup that also never gets freed, and memory consumption is much more easily noticed. Fixes: ce40733ce93d ("ext4: Check for return value from sb_set_blocksize") Fixes: ac27a0ec112a ("ext4: initial copy of files from ext3") Link: https://lore.kernel.org/r/20210521075533.95732-1-amakhalov@vmware.com Signed-off-by: Alexey Makhalov <amakhalov@vmware.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2021-06-06ext4: fix fast commit alignment issuesHarshad Shirwadkar2-99/+90
Fast commit recovery data on disk may not be aligned. So, when the recovery code reads it, this patch makes sure that fast commit info found on-disk is first memcpy-ed into an aligned variable before accessing it. As a consequence of it, we also remove some macros that could resulted in unaligned accesses. Cc: stable@kernel.org Fixes: 8016e29f4362 ("ext4: fast commit recovery path") Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20210519215920.2037527-1-harshads@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-06ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failedYe Bin1-20/+23
We got follow bug_on when run fsstress with injecting IO fault: [130747.323114] kernel BUG at fs/ext4/extents_status.c:762! [130747.323117] Internal error: Oops - BUG: 0 [#1] SMP ...... [130747.334329] Call trace: [130747.334553] ext4_es_cache_extent+0x150/0x168 [ext4] [130747.334975] ext4_cache_extents+0x64/0xe8 [ext4] [130747.335368] ext4_find_extent+0x300/0x330 [ext4] [130747.335759] ext4_ext_map_blocks+0x74/0x1178 [ext4] [130747.336179] ext4_map_blocks+0x2f4/0x5f0 [ext4] [130747.336567] ext4_mpage_readpages+0x4a8/0x7a8 [ext4] [130747.336995] ext4_readpage+0x54/0x100 [ext4] [130747.337359] generic_file_buffered_read+0x410/0xae8 [130747.337767] generic_file_read_iter+0x114/0x190 [130747.338152] ext4_file_read_iter+0x5c/0x140 [ext4] [130747.338556] __vfs_read+0x11c/0x188 [130747.338851] vfs_read+0x94/0x150 [130747.339110] ksys_read+0x74/0xf0 This patch's modification is according to Jan Kara's suggestion in: https://patchwork.ozlabs.org/project/linux-ext4/patch/20210428085158.3728201-1-yebin10@huawei.com/ "I see. Now I understand your patch. Honestly, seeing how fragile is trying to fix extent tree after split has failed in the middle, I would probably go even further and make sure we fix the tree properly in case of ENOSPC and EDQUOT (those are easily user triggerable). Anything else indicates a HW problem or fs corruption so I'd rather leave the extent tree as is and don't try to fix it (which also means we will not create overlapping extents)." Cc: stable@kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210506141042.3298679-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-06-05ocfs2: fix data corruption by fallocateJunxiao Bi1-5/+50
When fallocate punches holes out of inode size, if original isize is in the middle of last cluster, then the part from isize to the end of the cluster will be zeroed with buffer write, at that time isize is not yet updated to match the new size, if writeback is kicked in, it will invoke ocfs2_writepage()->block_write_full_page() where the pages out of inode size will be dropped. That will cause file corruption. Fix this by zero out eof blocks when extending the inode size. Running the following command with qemu-image 4.2.1 can get a corrupted coverted image file easily. qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ -O qcow2 -o compat=1.1 $qcow_image.conv The usage of fallocate in qemu is like this, it first punches holes out of inode size, then extend the inode size. fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 fallocate(11, 0, 2276196352, 65536) = 0 v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html v2: https://lore.kernel.org/linux-fsdevel/20210525093034.GB4112@quack2.suse.cz/T/ Link: https://lkml.kernel.org/r/20210528210648.9124-1-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Jan Kara <jack@suse.cz> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-05fscrypt: fix derivation of SipHash keys on big endian CPUsEric Biggers1-8/+32
Typically, the cryptographic APIs that fscrypt uses take keys as byte arrays, which avoids endianness issues. However, siphash_key_t is an exception. It is defined as 'u64 key[2];', i.e. the 128-bit key is expected to be given directly as two 64-bit words in CPU endianness. fscrypt_derive_dirhash_key() and fscrypt_setup_iv_ino_lblk_32_key() forgot to take this into account. Therefore, the SipHash keys used to index encrypted+casefolded directories differ on big endian vs. little endian platforms, as do the SipHash keys used to hash inode numbers for IV_INO_LBLK_32-encrypted directories. This makes such directories non-portable between these platforms. Fix this by always using the little endian order. This is a breaking change for big endian platforms, but this should be fine in practice since these features (encrypt+casefold support, and the IV_INO_LBLK_32 flag) aren't known to actually be used on any big endian platforms yet. Fixes: aa408f835d02 ("fscrypt: derive dirhash key for casefolded directories") Fixes: e3b1078bedd3 ("fscrypt: add support for IV_INO_LBLK_32 policies") Cc: <stable@vger.kernel.org> # v5.6+ Link: https://lore.kernel.org/r/20210605075033.54424-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-06-05fscrypt: don't ignore minor_hash when hash is 0Eric Biggers1-7/+3
When initializing a no-key name, fscrypt_fname_disk_to_usr() sets the minor_hash to 0 if the (major) hash is 0. This doesn't make sense because 0 is a valid hash code, so we shouldn't ignore the filesystem-provided minor_hash in that case. Fix this by removing the special case for 'hash == 0'. This is an old bug that appears to have originated when the encryption code in ext4 and f2fs was moved into fs/crypto/. The original ext4 and f2fs code passed the hash by pointer instead of by value. So 'if (hash)' actually made sense then, as it was checking whether a pointer was NULL. But now the hashes are passed by value, and filesystems just pass 0 for any hashes they don't have. There is no need to handle this any differently from the hashes actually being 0. It is difficult to reproduce this bug, as it only made a difference in the case where a filename's 32-bit major hash happened to be 0. However, it probably had the largest chance of causing problems on ubifs, since ubifs uses minor_hash to do lookups of no-key names, in addition to using it as a readdir cookie. ext4 only uses minor_hash as a readdir cookie, and f2fs doesn't use minor_hash at all. Fixes: 0b81d0779072 ("fs crypto: move per-file encryption from f2fs tree to fs/crypto") Cc: <stable@vger.kernel.org> # v4.6+ Link: https://lore.kernel.org/r/20210527235236.2376556-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2021-06-04remove the raw driverChristoph Hellwig1-4/+2
The raw driver used to provide direct unbuffered access to block devices before O_DIRECT was invented. It has been obsolete for more than a decade. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/lkml/Pine.LNX.4.64.0703180754060.6605@CPE00045a9c397f-CM001225dbafb6/ Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210531072526.97052-1-hch@lst.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-04debugfs: Fix debugfs_read_file_str()Dietmar Eggemann1-1/+1
Read the entire size of the buffer, including the trailing new line character. Discovered while reading the sched domain names of CPU0: before: cat /sys/kernel/debug/sched/domains/cpu0/domain*/name SMTMCDIE after: cat /sys/kernel/debug/sched/domains/cpu0/domain*/name SMT MC DIE Fixes: 9af0440ec86eb ("debugfs: Implement debugfs_create_str()") Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lore.kernel.org/r/20210527091105.258457-1-dietmar.eggemann@arm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-04btrfs: promote debugging asserts to full-fledged checks in validate_superNikolay Borisov1-8/+18
Syzbot managed to trigger this assert while performing its fuzzing. Turns out it's better to have those asserts turned into full-fledged checks so that in case buggy btrfs images are mounted the users gets an error and mounting is stopped. Alternatively with CONFIG_BTRFS_ASSERT disabled such image would have been erroneously allowed to be mounted. Reported-by: syzbot+a6bf271c02e4fe66b4e4@syzkaller.appspotmail.com CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add uuids to the messages ] Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-04btrfs: return value from btrfs_mark_extent_written() in case of errorRitesh Harjani1-2/+2
We always return 0 even in case of an error in btrfs_mark_extent_written(). Fix it to return proper error value in case of a failure. All callers handle it. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-04btrfs: zoned: fix zone number to sector/physical calculationNaohiro Aota1-5/+18
In btrfs_get_dev_zone_info(), we have "u32 sb_zone" and calculate "sector_t sector" by shifting it. But, this "sector" is calculated in 32bit, leading it to be 0 for the 2nd superblock copy. Since zone number is u32, shifting it to sector (sector_t) or physical address (u64) can easily trigger a missing cast bug like this. This commit introduces helpers to convert zone number to sector/LBA, so we won't fall into the same pitfall again. Reported-by: Dmitry Fomichev <Dmitry.Fomichev@wdc.com> Fixes: 12659251ca5d ("btrfs: implement log-structured superblock for ZONED mode") CC: stable@vger.kernel.org # 5.11+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-04btrfs: do not write supers if we have an fs errorJosef Bacik1-0/+16
Error injection testing uncovered a pretty severe problem where we could end up committing a super that pointed to the wrong tree roots, resulting in transid mismatch errors. The way we commit the transaction is we update the super copy with the current generations and bytenrs of the important roots, and then copy that into our super_for_commit. Then we allow transactions to continue again, we write out the dirty pages for the transaction, and then we write the super. If the write out fails we'll bail and skip writing the supers. However since we've allowed a new transaction to start, we can have a log attempting to sync at this point, which would be blocked on fs_info->tree_log_mutex. Once the commit fails we're allowed to do the log tree commit, which uses super_for_commit, which now points at fs tree's that were not written out. Fix this by checking BTRFS_FS_STATE_ERROR once we acquire the tree_log_mutex. This way if the transaction commit fails we're sure to see this bit set and we can skip writing the super out. This patch fixes this specific transid mismatch error I was seeing with this particular error path. CC: stable@vger.kernel.org # 5.12+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-04xfs: refactor per-AG inode tagging functionsDarrick J. Wong4-88/+80
In preparation for adding another incore inode tree tag, refactor the code that sets and clears tags from the per-AG inode tree and the tree of per-AG structures, and remove the open-coded versions used by the blockgc code. Note: For reclaim, we now rely on the radix tree tags instead of the reclaimable inode count more heavily than we used to. The conversion should be fine, but the logic isn't 100% identical. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-06-04xfs: merge xfs_reclaim_inodes_ag into xfs_inode_walk_agDarrick J. Wong3-115/+53
Merge these two inode walk loops together, since they're pretty similar now. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-06-04xfs: pass struct xfs_eofblocks to the inode scan callbackDarrick J. Wong1-19/+15
Pass a pointer to the actual eofb structure around the inode scanner functions instead of a void pointer, now that none of the functions is used as a callback. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-06-04xfs: fix radix tree tag signsDarrick J. Wong2-3/+3
Radix tree tags are supposed to be unsigned ints, so fix the callers. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-06-04xfs: make the icwalk processing functions clean up the grab stateDarrick J. Wong1-9/+11
Soon we're going to be adding two new callers to the incore inode walk code: reclaim of incore inodes, and (later) inactivation of inodes. Both states operate on inodes that no longer have any VFS state, so we need to move the xfs_irele calls into the processing functions. In other words, icwalk processing functions are responsible for cleaning up whatever state changes are made by the corresponding icwalk igrab function that picked the inode for processing. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-06-04xfs: clean up inode state flag tests in xfs_blockgc_igrabDarrick J. Wong1-2/+5
Clean up the definition of which inode states are not eligible for speculative preallocation garbage collecting by creating a private #define. The deferred inactivation patchset will add two new entries to the set of flags-to-ignore, so we want the definition not to end up a cluttered mess. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-06-04xfs: remove indirect calls from xfs_inode_walk{,_ag}Darrick J. Wong1-24/+36
It turns out that there is a 1:1 mapping between the execute and goal parameters that are passed to xfs_inode_walk_ag: xfs_blockgc_scan_inode <=> XFS_ICWALK_BLOCKGC xfs_dqrele_inode <=> XFS_ICWALK_DQRELE Because of this exact correspondence, we don't need the execute function pointer and can replace it with a direct call. For the price of a forward static declaration, we can eliminate the indirect function call. This likely has a negligible impact on performance (since the execute function runs transactions), but it also simplifies the function signature. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-06-04xfs: remove iter_flags parameter from xfs_inode_walk_*Darrick J. Wong2-26/+12
The sole iter_flags is XFS_INODE_WALK_INEW_WAIT, and there are no users. Remove the flag, and the parameter, and all the code that used it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-06-04xfs: move xfs_inew_wait call into xfs_dqrele_inodeDarrick J. Wong1-2/+4
Move the INEW wait into xfs_dqrele_inode so that we can drop the iter_flags parameter in the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-06-04xfs: separate the dqrele_all inode grab logic from xfs_inode_walk_ag_grabDarrick J. Wong1-5/+66
Disentangle the dqrele_all inode grab code from the "generic" inode walk grabbing code, and and use the opportunity to document why the dqrele grab function does what it does. Since xfs_inode_walk_ag_grab is now only used for blockgc, rename it to reflect that. Ultimately, there will be four reasons to perform a walk of incore inodes: quotaoff dquote releasing (dqrele), garbage collection of speculative preallocations (blockgc), reclamation of incore inodes (reclaim), and deferred inactivation (inodegc). Each of these four have their own slightly different criteria for deciding if they want to handle an inode, so it makes more sense to have four cohesive igrab functions than one confusing parameteric grab function like we do now. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-06-04xfs: pass the goal of the incore inode walk to xfs_inode_walk()Darrick J. Wong2-21/+43
As part of removing the indirect calls and radix tag implementation details from the incore inode walk loop, create an enum to represent the goal of the inode iteration. More immediately, this separate removes the need for the "ICI_NOTAG" define which makes little sense. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-06-04xfs: rename xfs_inode_walk functions to xfs_icwalkDarrick J. Wong1-11/+11
Shorten the prefix so that all the incore inode cache walk code has "xfs_icwalk" in the name somewhere. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-06-04xfs: move the inode walk functions further downDarrick J. Wong1-195/+206
Move the inode walk functions further down in the file to limit the forward declarations to the two walk functions as we add new code that uses the inode walks. We'll clean them out later (i.e. after the deferred inode inactivation series). Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-06-04xfs: detach inode dquots at the end of inactivationDarrick J. Wong2-12/+12
Once we're done with inactivating an inode, we're finished updating metadata for that inode. This means that we can detach the dquots at the end and not have to wait for reclaim to do it for us. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-06-04xfs: move the quotaoff dqrele inode walk into xfs_icache.cDarrick J. Wong4-57/+71
The only external caller of xfs_inode_walk* happens in quotaoff, when we want to walk all the incore inodes to detach the dquots. Move this code to xfs_icache.c so that we can hide xfs_inode_walk as the starting step in more cleanups of inode walks. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-06-03Merge tag 'io_uring-5.13-2021-06-03' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+1
Pull io_uring fix from Jens Axboe: "Just a single one-liner fix for an accounting regression in this release" * tag 'io_uring-5.13-2021-06-03' of git://git.kernel.dk/linux-block: io_uring: fix misaccounting fix buf pinned pages
2021-06-03Merge tag 'for-5.13-rc4-tag' of ↵Linus Torvalds6-58/+147
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Error handling improvements, caught by error injection: - handle errors during checksum deletion - set error on mapping when ordered extent io cannot be finished - inode link count fixup in tree-log - missing return value checks for inode updates in tree-log - abort transaction in rename exchange if adding second reference fails Fixes: - fix fsync failure after writes to prealloc extents - fix deadlock when cloning inline extents and low on available space - fix compressed writes that cross stripe boundary" * tag 'for-5.13-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: MAINTAINERS: add btrfs IRC link btrfs: fix deadlock when cloning inline extents and low on available space btrfs: fix fsync failure and transaction abort after writes to prealloc extents btrfs: abort in rename_exchange if we fail to insert the second ref btrfs: check error value from btrfs_update_inode in tree log btrfs: fixup error handling in fixup_inode_link_counts btrfs: mark ordered extent and inode with error if we fail to finish btrfs: return errors from btrfs_del_csums in cleanup_ref_head btrfs: fix error handling in btrfs_del_csums btrfs: fix compressed writes that cross stripe boundary
2021-06-03Merge branch 'sched/urgent' into sched/core, to pick up fixesIngo Molnar63-385/+721
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-06-03fuse_fill_write_pages(): don't bother with iov_iter_single_seg_count()Al Viro1-1/+0
another rudiment of fault-in originally having been limited to the first segment, same as in generic_perform_write() and friends. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-03NFSv4: Fix second deadlock in nfs4_evict_inode()Trond Myklebust1-2/+7
If the inode is being evicted but has to return a layout first, then that too can cause a deadlock in the corner case where the server reboots. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-03NFSv4: Fix deadlock between nfs4_evict_inode() and nfs4_opendata_get_inode()Trond Myklebust2-1/+12
If the inode is being evicted, but has to return a delegation first, then it can cause a deadlock in the corner case where the server reboots before the delegreturn completes, but while the call to iget5_locked() in nfs4_opendata_get_inode() is waiting for the inode free to complete. Since the open call still holds a session slot, the reboot recovery cannot proceed. In order to break the logjam, we can turn the delegation return into a privileged operation for the case where we're evicting the inode. We know that in that case, there can be no other state recovery operation that conflicts. Reported-by: zhangxiaoxu (A) <zhangxiaoxu5@huawei.com> Fixes: 5fcdfacc01f3 ("NFSv4: Return delegations synchronously in evict_inode") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-03NFS: FMODE_READ and friends are C macros, not enum typesChuck Lever1-4/+0
Address a sparse warning: CHECK fs/nfs/nfstrace.c fs/nfs/nfstrace.c: note: in included file (through /home/cel/src/linux/rpc-over-tls/include/trace/trace_events.h, /home/cel/src/linux/rpc-over-tls/include/trace/define_trace.h, ...): fs/nfs/./nfstrace.h:424:1: warning: incorrect type in initializer (different base types) fs/nfs/./nfstrace.h:424:1: expected unsigned long eval_value fs/nfs/./nfstrace.h:424:1: got restricted fmode_t [usertype] fs/nfs/./nfstrace.h:425:1: warning: incorrect type in initializer (different base types) fs/nfs/./nfstrace.h:425:1: expected unsigned long eval_value fs/nfs/./nfstrace.h:425:1: got restricted fmode_t [usertype] fs/nfs/./nfstrace.h:426:1: warning: incorrect type in initializer (different base types) fs/nfs/./nfstrace.h:426:1: expected unsigned long eval_value fs/nfs/./nfstrace.h:426:1: got restricted fmode_t [usertype] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>