summaryrefslogtreecommitdiff
path: root/fs/xfs
AgeCommit message (Collapse)AuthorFilesLines
2024-05-27xfs: Add cond_resched to block unmap range and reflink remap pathRitesh Harjani (IBM)2-0/+2
An async dio write to a sparse file can generate a lot of extents and when we unlink this file (using rm), the kernel can be busy in umapping and freeing those extents as part of transaction processing. Similarly xfs reflink remapping path can also iterate over a million extent entries in xfs_reflink_remap_blocks(). Since we can busy loop in these two functions, so let's add cond_resched() to avoid softlockup messages like these. watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:0:82435] CPU: 1 PID: 82435 Comm: kworker/1:0 Tainted: G S L 6.9.0-rc5-0-default #1 Workqueue: xfs-inodegc/sda2 xfs_inodegc_worker NIP [c000000000beea10] xfs_extent_busy_trim+0x100/0x290 LR [c000000000bee958] xfs_extent_busy_trim+0x48/0x290 Call Trace: xfs_alloc_get_rec+0x54/0x1b0 (unreliable) xfs_alloc_compute_aligned+0x5c/0x144 xfs_alloc_ag_vextent_size+0x238/0x8d4 xfs_alloc_fix_freelist+0x540/0x694 xfs_free_extent_fix_freelist+0x84/0xe0 __xfs_free_extent+0x74/0x1ec xfs_extent_free_finish_item+0xcc/0x214 xfs_defer_finish_one+0x194/0x388 xfs_defer_finish_noroll+0x1b4/0x5c8 xfs_defer_finish+0x2c/0xc4 xfs_bunmapi_range+0xa4/0x100 xfs_itruncate_extents_flags+0x1b8/0x2f4 xfs_inactive_truncate+0xe0/0x124 xfs_inactive+0x30c/0x3e0 xfs_inodegc_worker+0x140/0x234 process_scheduled_works+0x240/0x57c worker_thread+0x198/0x468 kthread+0x138/0x140 start_kernel_thread+0x14/0x18 run fstests generic/175 at 2024-02-02 04:40:21 [ C17] watchdog: BUG: soft lockup - CPU#17 stuck for 23s! [xfs_io:7679] watchdog: BUG: soft lockup - CPU#17 stuck for 23s! [xfs_io:7679] CPU: 17 PID: 7679 Comm: xfs_io Kdump: loaded Tainted: G X 6.4.0 NIP [c008000005e3ec94] xfs_rmapbt_diff_two_keys+0x54/0xe0 [xfs] LR [c008000005e08798] xfs_btree_get_leaf_keys+0x110/0x1e0 [xfs] Call Trace: 0xc000000014107c00 (unreliable) __xfs_btree_updkeys+0x8c/0x2c0 [xfs] xfs_btree_update_keys+0x150/0x170 [xfs] xfs_btree_lshift+0x534/0x660 [xfs] xfs_btree_make_block_unfull+0x19c/0x240 [xfs] xfs_btree_insrec+0x4e4/0x630 [xfs] xfs_btree_insert+0x104/0x2d0 [xfs] xfs_rmap_insert+0xc4/0x260 [xfs] xfs_rmap_map_shared+0x228/0x630 [xfs] xfs_rmap_finish_one+0x2d4/0x350 [xfs] xfs_rmap_update_finish_item+0x44/0xc0 [xfs] xfs_defer_finish_noroll+0x2e4/0x740 [xfs] __xfs_trans_commit+0x1f4/0x400 [xfs] xfs_reflink_remap_extent+0x2d8/0x650 [xfs] xfs_reflink_remap_blocks+0x154/0x320 [xfs] xfs_file_remap_range+0x138/0x3a0 [xfs] do_clone_file_range+0x11c/0x2f0 vfs_clone_file_range+0x60/0x1c0 ioctl_file_clone+0x78/0x140 sys_ioctl+0x934/0x1270 system_call_exception+0x158/0x320 system_call_vectored_common+0x15c/0x2ec Cc: Ojaswin Mujoo <ojaswin@linux.ibm.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Tested-by: Disha Goel<disgoel@linux.ibm.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-05-27xfs: don't open-code u64_to_user_ptrDarrick J. Wong2-7/+2
Don't open-code what the kernel already provides. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-05-27xfs: allow symlinks with short remote targetsDarrick J. Wong1-4/+24
An internal user complained about log recovery failing on a symlink ("Bad dinode after recovery") with the following (excerpted) format: core.magic = 0x494e core.mode = 0120777 core.version = 3 core.format = 2 (extents) core.nlinkv2 = 1 core.nextents = 1 core.size = 297 core.nblocks = 1 core.naextents = 0 core.forkoff = 0 core.aformat = 2 (extents) u3.bmx[0] = [startoff,startblock,blockcount,extentflag] 0:[0,12,1,0] This is a symbolic link with a 297-byte target stored in a disk block, which is to say this is a symlink with a remote target. The forkoff is 0, which is to say that there's 512 - 176 == 336 bytes in the inode core to store the data fork. Eventually, testing of generic/388 failed with the same inode corruption message during inode recovery. In writing a debugging patch to call xfs_dinode_verify on dirty inode log items when we're committing transactions, I observed that xfs/298 can reproduce the problem quite quickly. xfs/298 creates a symbolic link, adds some extended attributes, then deletes them all. The test failure occurs when the final removexattr also deletes the attr fork because that does not convert the remote symlink back into a shortform symlink. That is how we trip this test. The only reason why xfs/298 only triggers with the debug patch added is that it deletes the symlink, so the final iflush shows the inode as free. I wrote a quick fstest to emulate the behavior of xfs/298, except that it leaves the symlinks on the filesystem after inducing the "corrupt" state. Kernels going back at least as far as 4.18 have written out symlink inodes in this manner and prior to 1eb70f54c445f they did not object to reading them back in. Because we've been writing out inodes this way for quite some time, the only way to fix this is to relax the check for symbolic links. Directories don't have this problem because di_size is bumped to blocksize during the sf->data conversion. Fixes: 1eb70f54c445f ("xfs: validate inode fork size against fork format") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-05-27xfs: fix xfs_init_attr_trans not handling explicit operation codesDarrick J. Wong3-25/+35
When we were converting the attr code to use an explicit operation code instead of keying off of attr->value being null, we forgot to change the code that initializes the transaction reservation. Split the function into two helpers that handle the !remove and remove cases, then fix both callsites to handle this correctly. Fixes: c27411d4c640 ("xfs: make attr removal an explicit operation") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-05-27xfs: drop xfarray sortinfo folio on errorDarrick J. Wong1-3/+6
Chandan Babu reports the following livelock in xfs/708: run fstests xfs/708 at 2024-05-04 15:35:29 XFS (loop16): EXPERIMENTAL online scrub feature in use. Use at your own risk! XFS (loop5): Mounting V5 Filesystem e96086f0-a2f9-4424-a1d5-c75d53d823be XFS (loop5): Ending clean mount XFS (loop5): Quotacheck needed: Please wait. XFS (loop5): Quotacheck: Done. XFS (loop5): EXPERIMENTAL online scrub feature in use. Use at your own risk! INFO: task xfs_io:143725 blocked for more than 122 seconds. Not tainted 6.9.0-rc4+ #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:xfs_io state:D stack:0 pid:143725 tgid:143725 ppid:117661 flags:0x00004006 Call Trace: <TASK> __schedule+0x69c/0x17a0 schedule+0x74/0x1b0 io_schedule+0xc4/0x140 folio_wait_bit_common+0x254/0x650 shmem_undo_range+0x9d5/0xb40 shmem_evict_inode+0x322/0x8f0 evict+0x24e/0x560 __dentry_kill+0x17d/0x4d0 dput+0x263/0x430 __fput+0x2fc/0xaa0 task_work_run+0x132/0x210 get_signal+0x1a8/0x1910 arch_do_signal_or_restart+0x7b/0x2f0 syscall_exit_to_user_mode+0x1c2/0x200 do_syscall_64+0x72/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e The shmem code is trying to drop all the folios attached to a shmem file and gets stuck on a locked folio after a bnobt repair. It looks like the process has a signal pending, so I started looking for places where we lock an xfile folio and then deal with a fatal signal. I found a bug in xfarray_sort_scan via code inspection. This function is called to set up the scanning phase of a quicksort operation, which may involve grabbing a locked xfile folio. If we exit the function with an error code, the caller does not call xfarray_sort_scan_done to put the xfile folio. If _sort_scan returns an error code while si->folio is set, we leak the reference and never unlock the folio. Therefore, change xfarray_sort to call _scan_done on exit. This is safe to call multiple times because it sets si->folio to NULL and ignores a NULL si->folio. Also change _sort_scan to use an intermediate variable so that we never pollute si->folio with an errptr. Fixes: 232ea052775f9 ("xfs: enable sorting of xfile-backed arrays") Reported-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-05-27xfs: Stop using __maybe_unused in xfs_alloc.cJohn Garry1-4/+2
In both xfs_alloc_cur_finish() and xfs_alloc_ag_vextent_exact(), local variable @afg is tagged as __maybe_unused. Otherwise an unused variable warning would be generated for when building with W=1 and CONFIG_XFS_DEBUG unset. In both cases, the variable is unused as it is only referenced in an ASSERT() call, which is compiled out (in this config). It is generally a poor programming style to use __maybe_unused for variables. The ASSERT() call is to verify that agbno of the end of the extent is within bounds for both functions. @afg is used as an intermediate variable to find the AG length. However xfs_verify_agbext() already exists to verify a valid extent range. The arguments for calling xfs_verify_agbext() are already available, so use that instead. An advantage of using xfs_verify_agbext() is that it verifies that both the start and the end of the extent are within the bounds of the AG and catches overflows. Suggested-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-05-27xfs: Clear W=1 warning in xfs_iwalk_run_callbacks()John Garry1-3/+2
For CONFIG_XFS_DEBUG unset, xfs_iwalk_run_callbacks() generates the following warning for when building with W=1: fs/xfs/xfs_iwalk.c: In function ‘xfs_iwalk_run_callbacks’: fs/xfs/xfs_iwalk.c:354:42: error: variable ‘irec’ set but not used [-Werror=unused-but-set-variable] 354 | struct xfs_inobt_rec_incore *irec; | ^~~~ cc1: all warnings being treated as errors Drop @irec, as it is only an intermediate variable. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-05-23tracing/treewide: Remove second parameter of __assign_str()Steven Rostedt (Google)2-19/+19
With the rework of how the __string() handles dynamic strings where it saves off the source string in field in the helper structure[1], the assignment of that value to the trace event field is stored in the helper value and does not need to be passed in again. This means that with: __string(field, mystring) Which use to be assigned with __assign_str(field, mystring), no longer needs the second parameter and it is unused. With this, __assign_str() will now only get a single parameter. There's over 700 users of __assign_str() and because coccinelle does not handle the TRACE_EVENT() macro I ended up using the following sed script: git grep -l __assign_str | while read a ; do sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file; mv /tmp/test-file $a; done I then searched for __assign_str() that did not end with ';' as those were multi line assignments that the sed script above would fail to catch. Note, the same updates will need to be done for: __assign_str_len() __assign_rel_str() __assign_rel_str_len() I tested this with both an allmodconfig and an allyesconfig (build only for both). [1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/ Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts. Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Darrick J. Wong <djwong@kernel.org> # xfs Tested-by: Guenter Roeck <linux@roeck-us.net>
2024-05-21Merge tag 'pull-set_blocksize' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs blocksize updates from Al Viro: "This gets rid of bogus set_blocksize() uses, switches it over to be based on a 'struct file *' and verifies that the caller has the device opened exclusively" * tag 'pull-set_blocksize' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: make set_blocksize() fail unless block device is opened exclusive set_blocksize(): switch to passing struct file * btrfs_get_bdev_and_sb(): call set_blocksize() only for exclusive opens swsusp: don't bother with setting block size zram: don't bother with reopening - just use O_EXCL for open swapon(2): open swap with O_EXCL swapon(2)/swapoff(2): don't bother with block size pktcdvd: sort set_blocksize() calls out bcache_register(): don't bother with set_blocksize()
2024-05-20Merge tag 'xfs-6.10-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds182-2803/+24665
Pull xfs updates from Chandan Babu: "Online repair feature continues to be expanded. Also, we now support delayed allocation for realtime devices which have an extent size that is equal to filesystem's block size. New code: - Introduce Parent Pointer extended attribute for inodes - Bring back delalloc support for realtime devices which have an extent size that is equal to filesystem's block size - Improve performance of log incompat feature handling Online Repair: - Implement atomic file content exchanges i.e. exchange ranges of bytes between two files atomically - Create temporary files to repair file-based metadata. This uses atomic file content exchange facility to swap file fork mappings between the temporary file and the metadata inode - Allow callers of directory/xattr code to set an explicit owner number to be written into the header fields of any new blocks that are created. This is required to avoid walking every block of the new structure and modify their ownership during online repair - Repair more data structures: - Extended attributes - Inode unlinked state - Directories - Symbolic links - AGI's unlinked inode list - Parent pointers - Move Orphan files to lost and found directory - Fixes for Inode repair functionality - Introduce a new sub-AG FITRIM implementation to reduce the duration for which the AGF lock is held - Updates for the design documentation - Use Parent Pointers to assist in checking directories, parent pointers, extended attributes, and link counts Fixes: - Prevent userspace from reading invalid file data due to incorrect. updation of file size when performing a non-atomic clone operation - Minor fixes to online repair - Fix confusing return values from xfs_bmapi_write() - Fix an out of bounds access due to incorrect h_size during log recovery - Defer upgrading the extent counters in xfs_reflink_end_cow_extent() until we know we are going to modify the extent mapping - Remove racy access to if_bytes check in xfs_reflink_end_cow_extent() - Fix sparse warnings Cleanups: - Hold inode locks on all files involved in a rename until the completion of the operation. This is in preparation for the parent pointers patchset where parent pointers are applied in a separate chained update from the actual directory update - Compile out v4 support when disabled - Cleanup xfs_extent_busy_clear() - Remove unused flags and fields from struct xfs_da_args - Remove definitions of unused functions - Improve extended attribute validation - Add higher level directory operations helpers to remove duplication of code - Cleanup quota (un)reservation interfaces" * tag 'xfs-6.10-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (221 commits) xfs: simplify iext overflow checking and upgrade xfs: remove a racy if_bytes check in xfs_reflink_end_cow_extent xfs: upgrade the extent counters in xfs_reflink_end_cow_extent later xfs: xfs_quota_unreserve_blkres can't fail xfs: consolidate the xfs_quota_reserve_blkres definitions xfs: clean up buffer allocation in xlog_do_recovery_pass xfs: fix log recovery buffer allocation for the legacy h_size fixup xfs: widen flags argument to the xfs_iflags_* helpers xfs: minor cleanups of xfs_attr3_rmt_blocks xfs: create a helper to compute the blockcount of a max sized remote value xfs: turn XFS_ATTR3_RMT_BUF_SPACE into a function xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c xfs: do not allocate the entire delalloc extent in xfs_bmapi_write xfs: fix xfs_bmap_add_extent_delay_real for partial conversions xfs: remove the xfs_iext_peek_prev_extent call in xfs_bmapi_allocate xfs: pass the actual offset and len to allocate to xfs_bmapi_allocate xfs: don't open code XFS_FILBLKS_MIN in xfs_bmapi_write xfs: lift a xfs_valid_startblock into xfs_bmapi_allocate xfs: remove the unusued tmp_logflags variable in xfs_bmapi_allocate xfs: fix error returns from xfs_bmapi_write ...
2024-05-18Merge tag 'kbuild-v6.10' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Avoid 'constexpr', which is a keyword in C23 - Allow 'dtbs_check' and 'dt_compatible_check' run independently of 'dt_binding_check' - Fix weak references to avoid GOT entries in position-independent code generation - Convert the last use of 'optional' property in arch/sh/Kconfig - Remove support for the 'optional' property in Kconfig - Remove support for Clang's ThinLTO caching, which does not work with the .incbin directive - Change the semantics of $(src) so it always points to the source directory, which fixes Makefile inconsistencies between upstream and downstream - Fix 'make tar-pkg' for RISC-V to produce a consistent package - Provide reasonable default coverage for objtool, sanitizers, and profilers - Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc. - Remove the last use of tristate choice in drivers/rapidio/Kconfig - Various cleanups and fixes in Kconfig * tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (46 commits) kconfig: use sym_get_choice_menu() in sym_check_prop() rapidio: remove choice for enumeration kconfig: lxdialog: remove initialization with A_NORMAL kconfig: m/nconf: merge two item_add_str() calls kconfig: m/nconf: remove dead code to display value of bool choice kconfig: m/nconf: remove dead code to display children of choice members kconfig: gconf: show checkbox for choice correctly kbuild: use GCOV_PROFILE and KCSAN_SANITIZE in scripts/Makefile.modfinal Makefile: remove redundant tool coverage variables kbuild: provide reasonable defaults for tool coverage modules: Drop the .export_symbol section from the final modules kconfig: use menu_list_for_each_sym() in sym_check_choice_deps() kconfig: use sym_get_choice_menu() in conf_write_defconfig() kconfig: add sym_get_choice_menu() helper kconfig: turn defaults and additional prompt for choice members into error kconfig: turn missing prompt for choice members into error kconfig: turn conf_choice() into void function kconfig: use linked list in sym_set_changed() kconfig: gconf: use MENU_CHANGED instead of SYMBOL_CHANGED kconfig: gconf: remove debug code ...
2024-05-13Merge tag 'vfs-6.10.misc' of ↵Linus Torvalds1-4/+6
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes for vfs and individual fses. Features: - Free up FMODE_* bits. I've freed up bits 6, 7, 8, and 24. That means we now have six free FMODE_* bits in total (but bit #6 already got used for FMODE_WRITE_RESTRICTED) - Add FOP_HUGE_PAGES flag (follow-up to FMODE_* cleanup) - Add fd_raw cleanup class so we can make use of automatic cleanup provided by CLASS(fd_raw, f)(fd) for O_PATH fds as well - Optimize seq_puts() - Simplify __seq_puts() - Add new anon_inode_getfile_fmode() api to allow specifying f_mode instead of open-coding it in multiple places - Annotate struct file_handle with __counted_by() and use struct_size() - Warn in get_file() whether f_count resurrection from zero is attempted (epoll/drm discussion) - Folio-sophize aio - Export the subvolume id in statx() for both btrfs and bcachefs - Relax linkat(AT_EMPTY_PATH) requirements - Add F_DUPFD_QUERY fcntl() allowing to compare two file descriptors for dup*() equality replacing kcmp() Cleanups: - Compile out swapfile inode checks when swap isn't enabled - Use (1 << n) notation for FMODE_* bitshifts for clarity - Remove redundant variable assignment in fs/direct-io - Cleanup uses of strncpy in orangefs - Speed up and cleanup writeback - Move fsparam_string_empty() helper into header since it's currently open-coded in multiple places - Add kernel-doc comments to proc_create_net_data_write() - Don't needlessly read dentry->d_flags twice Fixes: - Fix out-of-range warning in nilfs2 - Fix ecryptfs overflow due to wrong encryption packet size calculation - Fix overly long line in xfs file_operations (follow-up to FMODE_* cleanup) - Don't raise FOP_BUFFER_{R,W}ASYNC for directories in xfs (follow-up to FMODE_* cleanup) - Don't call xfs_file_open from xfs_dir_open (follow-up to FMODE_* cleanup) - Fix stable offset api to prevent endless loops - Fix afs file server rotations - Prevent xattr node from overflowing the eraseblock in jffs2 - Move fdinfo PTRACE_MODE_READ procfs check into the .permission() operation instead of .open() operation since this caused userspace regressions" * tag 'vfs-6.10.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits) afs: Fix fileserver rotation getting stuck selftests: add F_DUPDFD_QUERY selftests fcntl: add F_DUPFD_QUERY fcntl() file: add fd_raw cleanup class fs: WARN when f_count resurrection is attempted seq_file: Simplify __seq_puts() seq_file: Optimize seq_puts() proc: Move fdinfo PTRACE_MODE_READ check into the inode .permission operation fs: Create anon_inode_getfile_fmode() xfs: don't call xfs_file_open from xfs_dir_open xfs: drop fop_flags for directories xfs: fix overly long line in the file_operations shmem: Fix shmem_rename2() libfs: Add simple_offset_rename() API libfs: Fix simple_offset_rename_exchange() jffs2: prevent xattr node from overflowing the eraseblock vfs, swap: compile out IS_SWAPFILE() on swapless configs vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements fs/direct-io: remove redundant assignment to variable retval fs/dcache: Re-use value stored to dentry->d_flags instead of re-reading ...
2024-05-09kbuild: use $(src) instead of $(srctree)/$(src) for source directoryMasahiro Yamada1-2/+2
Kbuild conventionally uses $(obj)/ for generated files, and $(src)/ for checked-in source files. It is merely a convention without any functional difference. In fact, $(obj) and $(src) are exactly the same, as defined in scripts/Makefile.build: src := $(obj) When the kernel is built in a separate output directory, $(src) does not accurately reflect the source directory location. While Kbuild resolves this discrepancy by specifying VPATH=$(srctree) to search for source files, it does not cover all cases. For example, when adding a header search path for local headers, -I$(srctree)/$(src) is typically passed to the compiler. This introduces inconsistency between upstream and downstream Makefiles because $(src) is used instead of $(srctree)/$(src) for the latter. To address this inconsistency, this commit changes the semantics of $(src) so that it always points to the directory in the source tree. Going forward, the variables used in Makefiles will have the following meanings: $(obj) - directory in the object tree $(src) - directory in the source tree (changed by this commit) $(objtree) - the top of the kernel object tree $(srctree) - the top of the kernel source tree Consequently, $(srctree)/$(src) in upstream Makefiles need to be replaced with $(src). Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
2024-05-03xfs: simplify iext overflow checking and upgradeChristoph Hellwig10-87/+41
Currently the calls to xfs_iext_count_may_overflow and xfs_iext_count_upgrade are always paired. Merge them into a single function to simplify the callers and the actual check and upgrade logic itself. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-05-03xfs: remove a racy if_bytes check in xfs_reflink_end_cow_extentChristoph Hellwig1-6/+0
Accessing if_bytes without the ilock is racy. Remove the initial if_bytes == 0 check in xfs_reflink_end_cow_extent and let ext_iext_lookup_extent fail for this case after we've taken the ilock. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-05-03xfs: upgrade the extent counters in xfs_reflink_end_cow_extent laterChristoph Hellwig1-8/+8
Defer the extent counter size upgrade until we know we're going to modify the extent mapping. This also defers dirtying the transaction and will allow us safely back out later in the function in later changes. Fixes: 4f86bb4b66c9 ("xfs: Conditionally upgrade existing inodes to use large extent counters") Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-05-03xfs: xfs_quota_unreserve_blkres can't failChristoph Hellwig8-37/+20
Unreserving quotas can't fail due to quota limits, and we'll notice a shut down file system a bit later in all the callers anyway. Return void and remove the error checking and propagation in the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-05-03xfs: consolidate the xfs_quota_reserve_blkres definitionsChristoph Hellwig1-12/+6
xfs_trans_reserve_quota_nblks is already stubbed out if quota support is disabled, no need for an extra xfs_quota_reserve_blkres stub. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-05-03xfs: clean up buffer allocation in xlog_do_recovery_passChristoph Hellwig1-7/+6
Merge the initial xlog_alloc_buffer calls, and pass the variable designating the length that is initialized to 1 above instead of passing the open coded 1 directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-05-03xfs: fix log recovery buffer allocation for the legacy h_size fixupChristoph Hellwig1-6/+14
Commit a70f9fe52daa ("xfs: detect and handle invalid iclog size set by mkfs") added a fixup for incorrect h_size values used for the initial umount record in old xfsprogs versions. Later commit 0c771b99d6c9 ("xfs: clean up calculation of LR header blocks") cleaned up the log reover buffer calculation, but stoped using the fixed up h_size value to size the log recovery buffer, which can lead to an out of bounds access when the incorrect h_size does not come from the old mkfs tool, but a fuzzer. Fix this by open coding xlog_logrec_hblks and taking the fixed h_size into account for this calculation. Fixes: 0c771b99d6c9 ("xfs: clean up calculation of LR header blocks") Reported-by: Sam Sun <samsun1006219@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-05-03set_blocksize(): switch to passing struct file *Al Viro1-1/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-05-02xfs: widen flags argument to the xfs_iflags_* helpersDarrick J. Wong2-10/+8
xfs_inode.i_flags is an unsigned long, so make these helpers take that as the flags argument instead of unsigned short. This is needed for the next patch. While we're at it, remove the iflags variable from xfs_iget_cache_miss because we no longer need it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
2024-05-02xfs: minor cleanups of xfs_attr3_rmt_blocksDarrick J. Wong1-8/+8
Clean up the type signature of this function since we don't have negative attr lengths or block counts. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-05-02xfs: create a helper to compute the blockcount of a max sized remote valueDarrick J. Wong3-3/+9
Create a helper function to compute the number of fsblocks needed to store a maximally-sized extended attribute value. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-05-02xfs: turn XFS_ATTR3_RMT_BUF_SPACE into a functionDarrick J. Wong2-6/+17
Turn this into a properly typechecked function, and actually use the correct blocksize for extended attributes. The function cannot be static inline because xfsprogs userspace uses it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-05-02xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.cDarrick J. Wong2-32/+31
In the next few patches we're going to refactor the attr remote code so that we can support headerless remote xattr values for storing merkle tree blocks. For now, let's change the code to use unsigned int to describe quantities of bytes and blocks that cannot be negative. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-30xfs: do not allocate the entire delalloc extent in xfs_bmapi_writeChristoph Hellwig1-2/+3
While trying to convert the entire delalloc extent is a good decision for regular writeback as it leads to larger contigous on-disk extents, but for other callers of xfs_bmapi_write is is rather questionable as it forced them to loop creating new transactions just in case there is no large enough contiguous extent to cover the whole delalloc reservation. Change xfs_bmapi_write to only allocate the passed in range instead, whіle the writeback path through xfs_bmapi_convert_delalloc and xfs_bmapi_allocate still always converts the full extents. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-30xfs: fix xfs_bmap_add_extent_delay_real for partial conversionsChristoph Hellwig1-5/+10
xfs_bmap_add_extent_delay_real takes parts or all of a delalloc extent and converts them to a real extent. It is written to deal with any potential overlap of the to be converted range with the delalloc extent, but it turns out that currently only converting the entire extents, or a part starting at the beginning is actually exercised, as the only caller always tries to convert the entire delalloc extent, and either succeeds or at least progresses partially from the start. If it only converts a tiny part of a delalloc extent, the indirect block calculation for the new delalloc extent (da_new) might be equivalent to that of the existing delalloc extent (da_old). If this extent conversion now requires allocating an indirect block that gets accounted into da_new, leading to the assert that da_new must be smaller or equal to da_new unless we split the extent to trigger. Except for the assert that case is actually handled by just trying to allocate more space, as that already handled for the split case (which currently can't be reached at all), so just reusing it should be fine. Except that without dipping into the reserved block pool that would make it a bit too easy to trigger a fs shutdown due to ENOSPC. So in addition to adjusting the assert, also dip into the reserved block pool. Note that I could only reproduce the assert with a change to only convert the actually asked range instead of the full delalloc extent from xfs_bmapi_write. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-30xfs: remove the xfs_iext_peek_prev_extent call in xfs_bmapi_allocateChristoph Hellwig1-5/+0
Both callers of xfs_bmapi_allocate already initialize bma->prev, don't redo that in xfs_bmapi_allocate. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-30xfs: pass the actual offset and len to allocate to xfs_bmapi_allocateChristoph Hellwig1-14/+18
xfs_bmapi_allocate currently overwrites offset and len when converting delayed allocations, and duplicates the length cap done for non-delalloc allocations. Move all that logic into the callers to avoid duplication and to make the calling conventions more obvious. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-30xfs: don't open code XFS_FILBLKS_MIN in xfs_bmapi_writeChristoph Hellwig1-6/+3
XFS_FILBLKS_MIN uses min_t and thus does the comparison using the correct xfs_filblks_t type. Use it in xfs_bmapi_write and slightly adjust the comment document th potential pitfall to take account of this Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-30xfs: lift a xfs_valid_startblock into xfs_bmapi_allocateChristoph Hellwig1-6/+5
xfs_bmapi_convert_delalloc has a xfs_valid_startblock check on the block allocated by xfs_bmapi_allocate. Lift it into xfs_bmapi_allocate as we should assert the same for xfs_bmapi_write. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-30xfs: remove the unusued tmp_logflags variable in xfs_bmapi_allocateChristoph Hellwig1-3/+0
tmp_logflags is initialized to 0 and then ORed into bma->logflags, which isn't actually doing anything. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-30xfs: fix error returns from xfs_bmapi_writeChristoph Hellwig10-74/+57
xfs_bmapi_write can return 0 without actually returning a mapping in mval in two different cases: 1) when there is absolutely no space available to do an allocation 2) when converting delalloc space, and the allocation is so small that it only covers parts of the delalloc extent before the range requested by the caller Callers at best can handle one of these cases, but in many cases can't cope with either one. Switch xfs_bmapi_write to always return a mapping or return an error code instead. For case 1) above ENOSPC is the obvious choice which is very much what the callers expect anyway. For case 2) there is no really good error code, so pick a funky one from the SysV streams portfolio. This fixes the reproducer here: https://lore.kernel.org/linux-xfs/CAEJPjCvT3Uag-pMTYuigEjWZHn1sGMZ0GCjVVCv29tNHK76Cgg@mail.gmail.com0/ which uses reserved blocks to create file systems that are gravely out of space and thus cause at least xfs_file_alloc_space to hang and trigger the lack of ENOSPC handling in xfs_dquot_disk_alloc. Note that this patch does not actually make any caller but xfs_alloc_file_space deal intelligently with case 2) above. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: 刘通 <lyutoon@gmail.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-29xfs: convert delayed extents to unwritten when zeroing post eof blocksZhang Yi1-0/+29
Current clone operation could be non-atomic if the destination of a file is beyond EOF, user could get a file with corrupted (zeroed) data on crash. The problem is about preallocations. If you write some data into a file: [A...B) and XFS decides to preallocate some post-eof blocks, then it can create a delayed allocation reservation: [A.........D) The writeback path tries to convert delayed extents to real ones by allocating blocks. If there aren't enough contiguous free space, we can end up with two extents, the first real and the second still delalloc: [A....C)[C.D) After that, both the in-memory and the on-disk file sizes are still B. If we clone into the range [E...F) from another file: [A....C)[C.D) [E...F) then xfs_reflink_zero_posteof() calls iomap_zero_range() to zero out the range [B, E) beyond EOF and flush it. Since [C, D) is still a delalloc extent, its pagecache will be zeroed and both the in-memory and on-disk size will be updated to D after flushing but before cloning. This is wrong, because the user can see the size change and read the zeroes while the clone operation is ongoing. We need to keep the in-memory and on-disk size before the clone operation starts, so instead of writing zeroes through the page cache for delayed ranges beyond EOF, we convert these ranges to unwritten and invalidate any cached data over that range beyond EOF. Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-29xfs: make xfs_bmapi_convert_delalloc() to allocate the target offsetZhang Yi2-42/+46
Since xfs_bmapi_convert_delalloc() only attempts to allocate the entire delalloc extent and require multiple invocations to allocate the target offset. So xfs_convert_blocks() add a loop to do this job and we call it in the write back path, but xfs_convert_blocks() isn't a common helper. Let's do it in xfs_bmapi_convert_delalloc() and drop xfs_convert_blocks(), preparing for the post EOF delalloc blocks converting in the buffered write begin path. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-29xfs: make the seq argument to xfs_bmapi_convert_delalloc() optionalZhang Yi1-2/+4
Allow callers to pass a NULLL seq argument if they don't care about the fork sequence number. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-29xfs: match lock mode in xfs_buffered_write_iomap_begin()Zhang Yi1-5/+5
Commit 1aa91d9c9933 ("xfs: Add async buffered write support") replace xfs_ilock(XFS_ILOCK_EXCL) with xfs_ilock_for_iomap() when locking the writing inode, and a new variable lockmode is used to indicate the lock mode. Although the lockmode should always be XFS_ILOCK_EXCL, it's still better to use this variable instead of useing XFS_ILOCK_EXCL directly when unlocking the inode. Fixes: 1aa91d9c9933 ("xfs: Add async buffered write support") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-26xfs: refactor dir format helpersChristoph Hellwig6-150/+105
Add a new enum and a xfs_dir2_format helper that returns it to allow the code to switch on the format of a directory in a single operation and switch all helpers of xfs_dir2_isblock and xfs_dir2_isleaf to it. This also removes the explicit xfs_iread_extents call in a few of the call sites given that xfs_bmap_last_offset already takes care of it underneath. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-26xfs: factor out a xfs_dir_replace_args helperChristoph Hellwig3-41/+28
Add a helper to switch between the different directory formats for removing a directory entry. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-26xfs: factor out a xfs_dir_removename_args helperChristoph Hellwig3-42/+27
Add a helper to switch between the different directory formats for removing a directory entry. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-26xfs: factor out a xfs_dir_createname_args helperChristoph Hellwig3-43/+30
Add a helper to switch between the different directory formats for creating a directory entry and to handle the XFS_DA_OP_JUSTCHECK flag based on the passed in ino number field. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-26xfs: factor out a xfs_dir_lookup_args helperChristoph Hellwig3-60/+43
Add a helper to switch between the different directory formats for lookup and to handle the -EEXIST return for a successful lookup. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-24xfs: don't call xfs_file_open from xfs_dir_openChristoph Hellwig1-1/+3
Directories do not support direct I/O and thus no non-blocking direct I/O either. Open code the shutdown check and call to generic_file_open instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240423124608.537794-4-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-24xfs: drop fop_flags for directoriesChristoph Hellwig1-2/+0
Directories have non of the capabilities, so drop the flags. Note that the current state is harmless as no one actually checks for the flags either. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240423124608.537794-3-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-24xfs: fix overly long line in the file_operationsChristoph Hellwig1-4/+4
Re-wrap the newly added fop_flags fields to not go over 80 characters. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240423124608.537794-2-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-24xfs: Remove unused function xrep_dir_self_parentJiapeng Chong1-21/+0
The function are defined in the dir_repair.c file, but not called elsewhere, so delete the unused function. fs/xfs/scrub/dir_repair.c:186:1: warning: unused function 'xrep_dir_self_parent'. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=8867 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-24xfs: invalidate dentries for a file before moving it to the orphanageDarrick J. Wong2-29/+20
Invalidate the cached dentries that point to the file that we're moving to lost+found before we actually move it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-24xfs: exchange-range for repairs is no longer dynamicDarrick J. Wong10-45/+25
The atomic file exchange-range functionality is now a permanent filesystem feature instead of a dynamic log-incompat feature. It cannot be turned on at runtime, so we no longer need the XCHK_FSGATES flags and whatnot that supported it. Remove the flag and the enable function, and move the xfs_has_exchange_range checks to the start of the repair functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-24xfs: fix iunlock calls in xrep_adoption_trans_allocDarrick J. Wong1-1/+1
If the transaction allocation in xrep_adoption_trans_alloc fails, we should drop only the locks that we took. In this case this is ILOCK_EXCL of both the orphanage and the file being repaired. Dropping any IOLOCK here is incorrect. Found by fuzzing u3.sfdir3.list[1].name = zeroes in xfs/1546. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>