kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2023-08-10	xfs: free the xfs_mount in ->kill_sb	Christoph Hellwig	1	-9/+11
	As a rule of thumb everything allocated to the fs_context and moved into the super_block should be freed by ->kill_sb so that the teardown handling doesn't need to be duplicated between the fill_super error path and put_super. Implement a XFS-specific kill_sb method to do that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Message-Id: <20230809220545.1308228-4-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-10	xfs: remove a superfluous s_fs_info NULL check in xfs_fs_put_super	Christoph Hellwig	1	-4/+0
	->put_super is only called when sb->s_root is set, and thus when fill_super succeeds. Thus drop the NULL check that can't happen in xfs_fs_put_super. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Message-Id: <20230809220545.1308228-3-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-10	xfs: reformat the xfs_fs_free prototype	Christoph Hellwig	1	-1/+2
	The xfs_fs_free prototype formatting is a weird mix of the classic XFS style and the Linux style. Fix it up to be consistent. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Message-Id: <20230809220545.1308228-2-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-10	Merge tag '6.5-rc5-ksmbd-server' of git://git.samba.org/ksmbd	Linus Torvalds	2	-6/+13
	Pull smb server fixes from Steve French: "Two ksmbd server fixes, both also for stable: - improve buffer validation when multiple EAs returned - missing check for command payload size" * tag '6.5-rc5-ksmbd-server' of git://git.samba.org/ksmbd: ksmbd: fix wrong next length validation of ea buffer in smb2_set_ea() ksmbd: validate command request size
2023-08-10	zonefs: fix synchronous direct writes to sequential files	Damien Le Moal	3	-118/+4
	Commit 16d7fd3cfa72 ("zonefs: use iomap for synchronous direct writes") changes zonefs code from a self-built zone append BIO to using iomap for synchronous direct writes. This change relies on iomap submit BIO callback to change the write BIO built by iomap to a zone append BIO. However, this change overlooked the fact that a write BIO may be very large as it is split when issued. The change from a regular write to a zone append operation for the built BIO can result in a block layer warning as zone append BIO are not allowed to be split. WARNING: CPU: 18 PID: 202210 at block/bio.c:1644 bio_split+0x288/0x350 Call Trace: ? __warn+0xc9/0x2b0 ? bio_split+0x288/0x350 ? report_bug+0x2e6/0x390 ? handle_bug+0x41/0x80 ? exc_invalid_op+0x13/0x40 ? asm_exc_invalid_op+0x16/0x20 ? bio_split+0x288/0x350 bio_split_rw+0x4bc/0x810 ? __pfx_bio_split_rw+0x10/0x10 ? lockdep_unlock+0xf2/0x250 __bio_split_to_limits+0x1d8/0x900 blk_mq_submit_bio+0x1cf/0x18a0 ? __pfx_iov_iter_extract_pages+0x10/0x10 ? __pfx_blk_mq_submit_bio+0x10/0x10 ? find_held_lock+0x2d/0x110 ? lock_release+0x362/0x620 ? mark_held_locks+0x9e/0xe0 __submit_bio+0x1ea/0x290 ? __pfx___submit_bio+0x10/0x10 ? seqcount_lockdep_reader_access.constprop.0+0x82/0x90 submit_bio_noacct_nocheck+0x675/0xa20 ? __pfx_bio_iov_iter_get_pages+0x10/0x10 ? __pfx_submit_bio_noacct_nocheck+0x10/0x10 iomap_dio_bio_iter+0x624/0x1280 __iomap_dio_rw+0xa22/0x18a0 ? lock_is_held_type+0xe3/0x140 ? __pfx___iomap_dio_rw+0x10/0x10 ? lock_release+0x362/0x620 ? zonefs_file_write_iter+0x74c/0xc80 [zonefs] ? down_write+0x13d/0x1e0 iomap_dio_rw+0xe/0x40 zonefs_file_write_iter+0x5ea/0xc80 [zonefs] do_iter_readv_writev+0x18b/0x2c0 ? __pfx_do_iter_readv_writev+0x10/0x10 ? inode_security+0x54/0xf0 do_iter_write+0x13b/0x7c0 ? lock_is_held_type+0xe3/0x140 vfs_writev+0x185/0x550 ? __pfx_vfs_writev+0x10/0x10 ? __handle_mm_fault+0x9bd/0x1c90 ? find_held_lock+0x2d/0x110 ? lock_release+0x362/0x620 ? find_held_lock+0x2d/0x110 ? lock_release+0x362/0x620 ? __up_read+0x1ea/0x720 ? do_pwritev+0x136/0x1f0 do_pwritev+0x136/0x1f0 ? __pfx_do_pwritev+0x10/0x10 ? syscall_enter_from_user_mode+0x22/0x90 ? lockdep_hardirqs_on+0x7d/0x100 do_syscall_64+0x58/0x80 This error depends on the hardware used, specifically on the max zone append bytes and max_[hw_]sectors limits. Tests using AMD Epyc machines that have low limits did not reveal this issue while runs on Intel Xeon machines with larger limits trigger it. Manually splitting the zone append BIO using bio_split_rw() can solve this issue but also requires issuing the fragment BIOs synchronously with submit_bio_wait(), to avoid potential reordering of the zone append BIO fragments, which would lead to data corruption. That is, this solution is not better than using regular write BIOs which are subject to serialization using zone write locking at the IO scheduler level. Given this, fix the issue by removing zone append support and using regular write BIOs for synchronous direct writes. This allows preseving the use of iomap and having identical synchronous and asynchronous sequential file write path. Zone append support will be reintroduced later through io_uring commands to ensure that the needed special handling is done correctly. Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Fixes: 16d7fd3cfa72 ("zonefs: use iomap for synchronous direct writes") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-08-09	xattr: simple_xattr_set() return old_xattr to be freed	Hugh Dickins	2	-46/+51
	tmpfs wants to support limited user extended attributes, but kernfs (or cgroupfs, the only kernfs with KERNFS_ROOT_SUPPORT_USER_XATTR) already supports user extended attributes through simple xattrs: but limited by a policy (128KiB per inode) too liberal to be used on tmpfs. To allow a different limiting policy for tmpfs, without affecting the policy for kernfs, change simple_xattr_set() to return the replaced or removed xattr (if any), leaving the caller to update their accounting then free the xattr (by simple_xattr_free(), renamed from the static free_simple_xattr()). Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Message-Id: <158c6585-2aa7-d4aa-90ff-f7c3f8fe407c@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09	libfs: Remove parent dentry locking in offset_iterate_dir()	Chuck Lever	1	-7/+4
	Since offset_iterate_dir() does not walk the parent's d_subdir list nor does it manipulate the parent's d_child, there doesn't seem to be a reason to hold the parent's d_lock. The offset_ctx's xarray can be sufficiently protected with just the RCU read lock. Flame graph data captured during the git regression run shows a 20% reduction in CPU cycles consumed in offset_find_next(). Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202307171640.e299f8d5-oliver.sang@intel.com Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Message-Id: <169030957098.157536.9938425508695693348.stgit@manet.1015granger.net> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09	libfs: Add a lock class for the offset map's xa_lock	Chuck Lever	1	-0/+3
	Tie the dynamically-allocated xarray locks into a single class so contention on the directory offset xarrays can be observed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Message-Id: <169020933088.160441.9405180953116076087.stgit@manet.1015granger.net> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09	libfs: Add directory operations for stable offsets	Chuck Lever	1	-0/+248
	Create a vector of directory operations in fs/libfs.c that handles directory seeks and readdir via stable offsets instead of the current cursor-based mechanism. For the moment these are unused. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Message-Id: <168814732984.530310.11190772066786107220.stgit@manet.1015granger.net> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09	shmem: prepare shmem quota infrastructure	Carlos Maiolino	1	-0/+12
	Add new shmem quota format, its quota_format_ops together with dquot_operations Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230725144510.253763-5-cem@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09	quota: Check presence of quota operation structures instead of ->quota_read ↵	Jan Kara	1	-1/+1
	and ->quota_write callbacks Currently we check whether superblock has ->quota_read and ->quota_write operations to check whether filesystem supports quotas. However for example for shmfs we will not read or write dquots so check whether quota operations are set in the superblock instead. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Message-Id: <20230725144510.253763-4-cem@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09	btrfs: have it use inode_update_timestamps	Jeff Layton	1	-8/+1
	In later patches, we're going to drop the "now" argument from the update_time operation. Have btrfs_update_time use the new inode_update_timestamps helper to fetch a new timestamp and update it properly. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230807-mgctime-v7-4-d1dec143a704@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09	fs: drop the timespec64 arg from generic_update_time	Jeff Layton	5	-24/+78
	In future patches we're going to change how the ctime is updated to keep track of when it has been queried. The way that the update_time operation works (and a lot of its callers) make this difficult, since they grab a timestamp early and then pass it down to eventually be copied into the inode. All of the existing update_time callers pass in the result of current_time() in some fashion. Drop the "time" parameter from generic_update_time, and rework it to fetch its own timestamp. This change means that an update_time could fetch a different timestamp than was seen in inode_needs_update_time. update_time is only ever called with one of two flag combinations: Either S_ATIME is set, or S_MTIME\|S_CTIME\|S_VERSION are set. With this change we now treat the flags argument as an indicator that some value needed to be updated when last checked, rather than an indication to update specific timestamps. Rework the logic for updating the timestamps and put it in a new inode_update_timestamps helper that other update_time routines can use. S_ATIME is as treated as we always have, but if any of the other three are set, then we attempt to update all three. Also, some callers of generic_update_time need to know what timestamps were actually updated. Change it to return an S_* flag mask to indicate that and rework the callers to expect it. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230807-mgctime-v7-3-d1dec143a704@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09	fs: pass the request_mask to generic_fillattr	Jeff Layton	38	-63/+71
	generic_fillattr just fills in the entire stat struct indiscriminately today, copying data from the inode. There is at least one attribute (STATX_CHANGE_COOKIE) that can have side effects when it is reported, and we're looking at adding more with the addition of multigrain timestamps. Add a request_mask argument to generic_fillattr and have most callers just pass in the value that is passed to getattr. Have other callers (e.g. ksmbd) just pass in STATX_BASIC_STATS. Also move the setting of STATX_CHANGE_COOKIE into generic_fillattr. Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: "Paulo Alcantara (SUSE)" <pc@manguebit.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Layton <jlayton@kernel.org> Message-Id: <20230807-mgctime-v7-2-d1dec143a704@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09	fs: remove silly warning from current_time	Jeff Layton	1	-6/+0
	An inode with no superblock? Unpossible! Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230807-mgctime-v7-1-d1dec143a704@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09	fs, block: remove bdev->bd_super	Christoph Hellwig	2	-4/+0
	bdev->bd_super is unused now, remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Message-Id: <20230807112625.652089-5-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09	ocfs2: stop using bdev->bd_super for journal error logging	Christoph Hellwig	1	-3/+3
	All ocfs2 journal error handling and logging is based on buffer_heads, and the owning inode and thus super_block can be retrieved through bh->b_assoc_map->host. Switch to using that to remove the last users of bdev->bd_super. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Message-Id: <20230807112625.652089-4-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09	ext4: don't use bdev->bd_super in __ext4_journal_get_write_access	Christoph Hellwig	1	-2/+1
	__ext4_journal_get_write_access already has a super_block available, and there is no need to go from that to the bdev to go back to the owning super_block. Signed-off-by: Christoph Hellwig <hch@lst.de> Message-Id: <20230807112625.652089-3-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09	fs: stop using bdev->bd_super in mark_buffer_write_io_error	Christoph Hellwig	1	-8/+3
	bdev->bd_super is a somewhat awkward backpointer from a block device to an owning file system with unclear rules. For the buffer_head code we already have a good backpointer for the inode that the buffer_head is associated with, even if it lives on the block device mapping: b_assoc_map. It is used track dirty buffers associated with an inode but living on the block device mapping like directory buffers in ext4. mark_buffer_write_io_error already uses it for the call to mapping_set_error, and should be doing the same for the per-sb error sequence. Signed-off-by: Christoph Hellwig <hch@lst.de> Message-Id: <20230807112625.652089-2-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09	fs: unexport d_genocide	Christoph Hellwig	1	-2/+0
	d_genocide is only used by built-in code. Signed-off-by: Christoph Hellwig <hch@lst.de> Message-Id: <20230808161704.1099680-1-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09	Merge tag 'hardening-v6.5-rc6' of ↵	Linus Torvalds	1	-3/+3
	git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: - Replace remaining open-coded struct_size_t() instance (Gustavo A. R. Silva) - Adjust vboxsf's trailing arrays to be proper flexible arrays * tag 'hardening-v6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: media: venus: Use struct_size_t() helper in pkt_session_unset_buffers() vboxsf: Use flexible arrays for trailing string member
2023-08-08	fs: use __fput_sync in close(2)	Linus Torvalds	2	-7/+25
	close(2) is a special case which guarantees a shallow kernel stack, making delegation to task_work machinery unnecessary. Said delegation is problematic as it involves atomic ops and interrupt masking trips, none of which are cheap on x86-64. Forcing close(2) to do it looks like an oversight in the original work. Moreover presence of CONFIG_RSEQ adds an additional overhead as fput() -> task_work_add(..., TWA_RESUME) -> set_notify_resume() makes the thread returning to userspace land in resume_user_mode_work(), where rseq_handle_notify_resume takes a SMAP round-trip if rseq is enabled for the thread (and it is by default with contemporary glibc). Sample result when benchmarking open1_processes -t 1 from will-it-scale (that's an open + close loop) + tmpfs on /tmp, running on the Sapphire Rapid CPU (ops/s): stock+RSEQ: 1329857 stock-RSEQ: 1421667 (+7%) patched: 1523521 (+14.5% / +7%) (with / without rseq) Patched result is the same regardless of rseq as the codepath is avoided. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-08	Merge tag 'gfs2-v6.4-fixes' of ↵	Linus Torvalds	2	-6/+12
	git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fixes from Andreas Gruenbacher: - Fix a freeze consistency check in gfs2_trans_add_meta() - Don't use filemap_splice_read as it can cause deadlocks on gfs2 * tag 'gfs2-v6.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Don't use filemap_splice_read gfs2: Fix freeze consistency check in gfs2_trans_add_meta
2023-08-07	gfs2: Don't use filemap_splice_read	Bob Peterson	1	-2/+2
	Starting with patch 2cb1e08985, gfs2 started using the new function filemap_splice_read rather than the old (and subsequently deleted) function generic_file_splice_read. filemap_splice_read works by taking references to a number of folios in the page cache and splicing those folios into a pipe. The folios are then read from the pipe and the folio references are dropped. This can take an arbitrary amount of time. We cannot allow that in gfs2 because those folio references will pin the inode glock to the node and prevent it from being demoted, which can lead to cluster-wide deadlocks. Instead, use copy_splice_read. (In addition, the old generic_file_splice_read called into ->read_iter, which called gfs2_file_read_iter, which took the inode glock during the operation. The new filemap_splice_read interface does not take the inode glock anymore. This is fixable, but it still wouldn't prevent cluster-wide deadlocks.) Fixes: 2cb1e08985e3 ("splice: Use filemap_splice_read() instead of generic_file_splice_read()") Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-08-07	gfs2: Fix freeze consistency check in gfs2_trans_add_meta	Andreas Gruenbacher	1	-4/+10
	Function gfs2_trans_add_meta() checks for the SDF_FROZEN flag to make sure that no buffers are added to a transaction while the filesystem is frozen. With the recent freeze/thaw rework, the SDF_FROZEN flag is cleared after thaw_super() is called, which is sufficient for serializing freeze/thaw. However, other filesystem operations started after thaw_super() may now be calling gfs2_trans_add_meta() before the SDF_FROZEN flag is cleared, which will trigger the SDF_FROZEN check in gfs2_trans_add_meta(). Fix that by checking the s_writers.frozen state instead. In addition, make sure not to call gfs2_assert_withdraw() with the sd_log_lock spin lock held. Check for a withdrawn filesystem before checking for a frozen filesystem, and don't pin/add buffers to the current transaction in case of a failure in either case. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2023-08-06	Merge tag 'v6.5-rc5.vfs.fixes' of ↵	Linus Torvalds	13	-42/+79
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix a wrong check for O_TMPFILE during RESOLVE_CACHED lookup - Clean up directory iterators and clarify file_needs_f_pos_lock() * tag 'v6.5-rc5.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: rely on ->iterate_shared to determine f_pos locking vfs: get rid of old '->iterate' directory operation proc: fix missing conversion to 'iterate_shared' open: make RESOLVE_CACHED correctly test for O_TMPFILE
2023-08-06	fs: rely on ->iterate_shared to determine f_pos locking	Christian Brauner	1	-1/+1
	Now that we removed ->iterate we don't need to check for either ->iterate or ->iterate_shared in file_needs_f_pos_lock(). Simply check for ->iterate_shared instead. This will tell us whether we need to unconditionally take the lock. Not just does it allow us to avoid checking f_inode's mode it also actually clearly shows that we're locking because of readdir. Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-06	vfs: get rid of old '->iterate' directory operation	Linus Torvalds	10	-39/+76
	All users now just use '->iterate_shared()', which only takes the directory inode lock for reading. Filesystems that never got convered to shared mode now instead use a wrapper that drops the lock, re-takes it in write mode, calls the old function, and then downgrades the lock back to read mode. This way the VFS layer and other callers no longer need to care about filesystems that never got converted to the modern era. The filesystems that use the new wrapper are ceph, coda, exfat, jfs, ntfs, ocfs2, overlayfs, and vboxsf. Honestly, several of them look like they really could just iterate their directories in shared mode and skip the wrapper entirely, but the point of this change is to not change semantics or fix filesystems that haven't been fixed in the last 7+ years, but to finally get rid of the dual iterators. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-06	proc: fix missing conversion to 'iterate_shared'	Linus Torvalds	1	-1/+1
	I'm looking at the directory handling due to the discussion about f_pos locking (see commit 797964253d35: "file: reinstate f_pos locking optimization for regular files"), and wanting to clean that up. And one source of ugliness is how we were supposed to move filesystems over to the '->iterate_shared()' function that only takes the inode lock for reading many many years ago, but several filesystems still use the bad old '->iterate()' that takes the inode lock for exclusive access. See commit 6192269444eb ("introduce a parallel variant of ->iterate()") that also added some documentation stating Old method is only used if the new one is absent; eventually it will be removed. Switch while you still can; the old one won't stay. and that was back in April 2016. Here we are, many years later, and the old version is still clearly sadly alive and well. Now, some of those old style iterators are probably just because the filesystem may end up having per-inode mutable data that it uses for iterating a directory, but at least one case is just a mistake. Al switched over most filesystems to use '->iterate_shared()' back when it was introduced. In particular, the /proc filesystem was converted as one of the first ones in commit f50752eaa0b0 ("switch all procfs directories ->iterate_shared()"). But then later one new user of '->iterate()' was then re-introduced by commit 6d9c939dbe4d ("procfs: add smack subdir to attrs"). And that's clearly not what we wanted, since that new case just uses the same 'proc_pident_readdir()' and 'proc_pident_lookup()' helper functions that other /proc pident directories use, and they are most definitely safe to use with the inode lock held shared. So just fix it. This still leaves a fair number of oddball filesystems using the old-style directory iterator (ceph, coda, exfat, jfs, ntfs, ocfs2, overlayfs, and vboxsf), but at least we don't have any remaining in the core filesystems. I'm going to add a wrapper function that just drops the read-lock and takes it as a write lock, so that we can clean up the core vfs layer and make all the ugly 'this filesystem needs exclusive inode locking' be just filesystem-internal warts. I just didn't want to make that conversion when we still had a core user left. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-06	open: make RESOLVE_CACHED correctly test for O_TMPFILE	Aleksa Sarai	1	-1/+1
	O_TMPFILE is actually __O_TMPFILE\|O_DIRECTORY. This means that the old fast-path check for RESOLVE_CACHED would reject all users passing O_DIRECTORY with -EAGAIN, when in fact the intended test was to check for __O_TMPFILE. Cc: stable@vger.kernel.org # v5.12+ Fixes: 99668f618062 ("fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED") Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Message-Id: <20230806-resolve_cached-o_tmpfile-v1-1-7ba16308465e@cyphar.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-06	ksmbd: fix wrong next length validation of ea buffer in smb2_set_ea()	Namjae Jeon	1	-1/+8
	There are multiple smb2_ea_info buffers in FILE_FULL_EA_INFORMATION request from client. ksmbd find next smb2_ea_info using ->NextEntryOffset of current smb2_ea_info. ksmbd need to validate buffer length Before accessing the next ea. ksmbd should check buffer length using buf_len, not next variable. next is the start offset of current ea that got from previous ea. Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-21598 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-08-06	ksmbd: validate command request size	Long Li	1	-5/+5
	In commit 2b9b8f3b68ed ("ksmbd: validate command payload size"), except for SMB2_OPLOCK_BREAK_HE command, the request size of other commands is not checked, it's not expected. Fix it by add check for request size of other commands. Cc: stable@vger.kernel.org Fixes: 2b9b8f3b68ed ("ksmbd: validate command payload size") Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Long Li <leo.lilong@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-08-05	Merge tag '6.5-rc4-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6	Linus Torvalds	1	-1/+5
	Pull smb client fix from Steve French: - Fix DFS interlink problem (different namespace) * tag '6.5-rc4-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6: smb: client: fix dfs link mount against w2k8
2023-08-04	nilfs2: fix use-after-free of nilfs_root in dirtying inodes via iput	Ryusuke Konishi	3	-0/+12
	During unmount process of nilfs2, nothing holds nilfs_root structure after nilfs2 detaches its writer in nilfs_detach_log_writer(). Previously, nilfs_evict_inode() could cause use-after-free read for nilfs_root if inodes are left in "garbage_list" and released by nilfs_dispose_list at the end of nilfs_detach_log_writer(), and this bug was fixed by commit 9b5a04ac3ad9 ("nilfs2: fix use-after-free bug of nilfs_root in nilfs_evict_inode()"). However, it turned out that there is another possibility of UAF in the call path where mark_inode_dirty_sync() is called from iput(): nilfs_detach_log_writer() nilfs_dispose_list() iput() mark_inode_dirty_sync() __mark_inode_dirty() nilfs_dirty_inode() __nilfs_mark_inode_dirty() nilfs_load_inode_block() --> causes UAF of nilfs_root struct This can happen after commit 0ae45f63d4ef ("vfs: add support for a lazytime mount option"), which changed iput() to call mark_inode_dirty_sync() on its final reference if i_state has I_DIRTY_TIME flag and i_nlink is non-zero. This issue appears after commit 28a65b49eb53 ("nilfs2: do not write dirty data after degenerating to read-only") when using the syzbot reproducer, but the issue has potentially existed before. Fix this issue by adding a "purging flag" to the nilfs structure, setting that flag while disposing the "garbage_list" and checking it in __nilfs_mark_inode_dirty(). Unlike commit 9b5a04ac3ad9 ("nilfs2: fix use-after-free bug of nilfs_root in nilfs_evict_inode()"), this patch does not rely on ns_writer to determine whether to skip operations, so as not to break recovery on mount. The nilfs_salvage_orphan_logs routine dirties the buffer of salvaged data before attaching the log writer, so changing __nilfs_mark_inode_dirty() to skip the operation when ns_writer is NULL will cause recovery write to fail. The purpose of using the cleanup-only flag is to allow for narrowing of such conditions. Link: https://lkml.kernel.org/r/20230728191318.33047-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+74db8b3087f293d3a13a@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/000000000000b4e906060113fd63@google.com Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option") Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> # 4.0+ Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-04	fs/proc/kcore: reinstate bounce buffer for KCORE_TEXT regions	Lorenzo Stoakes	1	-3/+27
	Some architectures do not populate the entire range categorised by KCORE_TEXT, so we must ensure that the kernel address we read from is valid. Unfortunately there is no solution currently available to do so with a purely iterator solution so reinstate the bounce buffer in this instance so we can use copy_from_kernel_nofault() in order to avoid page faults when regions are unmapped. This change partly reverts commit 2e1c0170771e ("fs/proc/kcore: avoid bounce buffer for ktext data"), reinstating the bounce buffer, but adapts the code to continue to use an iterator. [lstoakes@gmail.com: correct comment to be strictly correct about reasoning] Link: https://lkml.kernel.org/r/525a3f14-74fa-4c22-9fca-9dab4de8a0c3@lucifer.local Link: https://lkml.kernel.org/r/20230731215021.70911-1-lstoakes@gmail.com Fixes: 2e1c0170771e ("fs/proc/kcore: avoid bounce buffer for ktext data") Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Reported-by: Jiri Olsa <olsajiri@gmail.com> Closes: https://lore.kernel.org/all/ZHc2fm+9daF6cgCE@krava Tested-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Will Deacon <will@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liu Shixin <liushixin2@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-04	Merge tag 'ceph-for-6.5-rc5' of https://github.com/ceph/ceph-client	Linus Torvalds	3	-2/+17
	Pull ceph fixes from Ilya Dryomov: "Two patches to improve RBD exclusive lock interaction with osd_request_timeout option and another fix to reduce the potential for erroneous blocklisting -- this time in CephFS. All going to stable" * tag 'ceph-for-6.5-rc5' of https://github.com/ceph/ceph-client: libceph: fix potential hang in ceph_osdc_notify() rbd: prevent busy loop when requesting exclusive lock ceph: defer stopping mdsc delayed_work
2023-08-04	file: reinstate f_pos locking optimization for regular files	Linus Torvalds	1	-1/+17
	In commit 20ea1e7d13c1 ("file: always lock position for FMODE_ATOMIC_POS") we ended up always taking the file pos lock, because pidfd_getfd() could get a reference to the file even when it didn't have an elevated file count due to threading of other sharing cases. But Mateusz Guzik reports that the extra locking is actually measurable, so let's re-introduce the optimization, and only force the locking for directory traversal. Directories need the lock for correctness reasons, while regular files only need it for "POSIX semantics". Since pidfd_getfd() is about debuggers etc special things that are _way_ outside of POSIX, we can relax the rules for that case. Reported-by: Mateusz Guzik <mjguzik@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/linux-fsdevel/20230803095311.ijpvhx3fyrbkasul@f/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-08-04	fs/Kconfig: Fix compile error for romfs	Li Zetao	1	-0/+1
	There are some compile errors reported by kernel test robot: arm-linux-gnueabi-ld: fs/romfs/storage.o: in function `romfs_dev_read': storage.c:(.text+0x64): undefined reference to `__brelse' arm-linux-gnueabi-ld: storage.c:(.text+0x9c): undefined reference to `__bread_gfp' arm-linux-gnueabi-ld: fs/romfs/storage.o: in function `romfs_dev_strnlen': storage.c:(.text+0x128): undefined reference to `__brelse' arm-linux-gnueabi-ld: storage.c:(.text+0x16c): undefined reference to `__bread_gfp' arm-linux-gnueabi-ld: fs/romfs/storage.o: in function `romfs_dev_strcmp': storage.c:(.text+0x22c): undefined reference to `__bread_gfp' arm-linux-gnueabi-ld: storage.c:(.text+0x27c): undefined reference to `__brelse' arm-linux-gnueabi-ld: storage.c:(.text+0x2a8): undefined reference to `__bread_gfp' arm-linux-gnueabi-ld: storage.c:(.text+0x2bc): undefined reference to `__brelse' arm-linux-gnueabi-ld: storage.c:(.text+0x2d4): undefined reference to `__brelse' arm-linux-gnueabi-ld: storage.c:(.text+0x2f4): undefined reference to `__brelse' arm-linux-gnueabi-ld: storage.c:(.text+0x304): undefined reference to `__brelse' The reason for the problem is that the commit "925c86a19bac" ("fs: add CONFIG_BUFFER_HEAD") has added a new config "CONFIG_BUFFER_HEAD" that controls building the buffer_head code, and romfs needs to use the buffer_head API, but no corresponding config has beed added. Select the config "CONFIG_BUFFER_HEAD" in romfs Kconfig to resolve the problem. Fixes: 925c86a19bac ("fs: add CONFIG_BUFFER_HEAD") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202308031810.pQzGmR1v-lkp@intel.com/ Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: Li Zetao <lizetao1@huawei.com> Signed-off-by: Li Zetao <lizetao1@huawei.com> [axboe: fold in Christoph's incremental] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-04	pstore/ram: Check start of empty przs during init	Enlin Mu	1	-1/+1
	After commit 30696378f68a ("pstore/ram: Do not treat empty buffers as valid"), initialization would assume a prz was valid after seeing that the buffer_size is zero (regardless of the buffer start position). This unchecked start value means it could be outside the bounds of the buffer, leading to future access panics when written to: sysdump_panic_event+0x3b4/0x5b8 atomic_notifier_call_chain+0x54/0x90 panic+0x1c8/0x42c die+0x29c/0x2a8 die_kernel_fault+0x68/0x78 __do_kernel_fault+0x1c4/0x1e0 do_bad_area+0x40/0x100 do_translation_fault+0x68/0x80 do_mem_abort+0x68/0xf8 el1_da+0x1c/0xc0 __raw_writeb+0x38/0x174 __memcpy_toio+0x40/0xac persistent_ram_update+0x44/0x12c persistent_ram_write+0x1a8/0x1b8 ramoops_pstore_write+0x198/0x1e8 pstore_console_write+0x94/0xe0 ... To avoid this, also check if the prz start is 0 during the initialization phase. If not, the next prz sanity check case will discover it (start > size) and zap the buffer back to a sane state. Fixes: 30696378f68a ("pstore/ram: Do not treat empty buffers as valid") Cc: Yunlong Xing <yunlong.xing@unisoc.com> Cc: stable@vger.kernel.org Signed-off-by: Enlin Mu <enlin.mu@unisoc.com> Link: https://lore.kernel.org/r/20230801060432.1307717-1-yunlong.xing@unisoc.com [kees: update commit log with backtrace and clarifications] Signed-off-by: Kees Cook <keescook@chromium.org>
2023-08-04	file: mostly eliminate spurious relocking in __range_close	Mateusz Guzik	1	-13/+14
	Stock code takes a lock trip for every fd in range, but this can be trivially avoided and real-world consumers do have plenty of already closed cases. Just booting Debian 12 with a debug printk shows: (sh) min 3 max 17 closed 15 empty 0 (sh) min 19 max 63 closed 31 empty 14 (sh) min 4 max 63 closed 0 empty 60 (spawn) min 3 max 63 closed 13 empty 48 (spawn) min 3 max 63 closed 13 empty 48 (mount) min 3 max 17 closed 15 empty 0 (mount) min 19 max 63 closed 32 empty 13 and so on. While here use more idiomatic naming. An avoidable relock is left in place to avoid uglifying the code. The code was not switched to bitmap traversal for the same reason. Tested with ltp kernel/syscalls/close_range Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Message-Id: <20230727113809.800067-1-mjguzik@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-04	nfsd: Fix race to FREE_STATEID and cl_revoked	Benjamin Coddington	1	-1/+1
	We have some reports of linux NFS clients that cannot satisfy a linux knfsd server that always sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED even though those clients repeatedly walk all their known state using TEST_STATEID and receive NFS4_OK for all. Its possible for revoke_delegation() to set NFS4_REVOKED_DELEG_STID, then nfsd4_free_stateid() finds the delegation and returns NFS4_OK to FREE_STATEID. Afterward, revoke_delegation() moves the same delegation to cl_revoked. This would produce the observed client/server effect. Fix this by ensuring that the setting of sc_type to NFS4_REVOKED_DELEG_STID and move to cl_revoked happens within the same cl_lock. This will allow nfsd4_free_stateid() to properly remove the delegation from cl_revoked. Link: https://bugzilla.redhat.com/show_bug.cgi?id=2217103 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2176575 Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Cc: stable@vger.kernel.org # v4.17+ Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-04	xfs: stabilize fs summary counters for online fsck	Darrick J. Wong	4	-38/+183
	If the fscounters scrubber notices incorrect summary counters, it's entirely possible that scrub is simply racing with other threads that are updating the incore counters. There isn't a good way to stabilize percpu counters or set ourselves up to observe live updates with hooks like we do for the quotacheck or nlinks scanners, so we instead choose to freeze the filesystem long enough to walk the incore per-AG structures. Past me thought that it was going to be commonplace to have to freeze the filesystem to perform some kind of repair and set up a whole separate infrastructure to freeze the filesystem in such a way that userspace could not unfreeze while we were running. This involved adding a mutex and freeze_super/thaw_super functions and dealing with the fact that the VFS freeze/thaw functions can free the VFS superblock references on return. This was all very overwrought, since fscounters turned out to be the only user of scrub freezes, and it doesn't require the log to quiesce, only the incore superblock counters. We prevent other threads from changing the freeze level by calling freeze_super_excl with a custom freeze cookie to keep everyone else out of the filesystem. The end result is that fscounters should be much more efficient. When we're checking a busy system and we can't stabilize the counters, the custom freeze will do less work, which should result in less downtime. Repair should be similarly speedy, but that's in a later patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-08-04	autofs: use wake_up() instead of wake_up_interruptible(()	Ian Kent	1	-1/+1
	In autofs_wait_release() wake_up() is used to wake up processes waiting on a mount callback to complete which matches the wait_event_killable() in autofs_wait(). But in autofs_catatonic_mode() the wake_up_interruptible() was not also changed at the time autofs_wait_release() was changed. Signed-off-by: Ian Kent <raven@themaw.net> Message-Id: <169112719813.7590.4971499386839952992.stgit@donald.themaw.net> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-04	autofs: fix memory leak of waitqueues in autofs_catatonic_mode	Fedor Pchelkin	1	-1/+2
	Syzkaller reports a memory leak: BUG: memory leak unreferenced object 0xffff88810b279e00 (size 96): comm "syz-executor399", pid 3631, jiffies 4294964921 (age 23.870s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 08 9e 27 0b 81 88 ff ff ..........'..... 08 9e 27 0b 81 88 ff ff 00 00 00 00 00 00 00 00 ..'............. backtrace: [<ffffffff814cfc90>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046 [<ffffffff81bb75ca>] kmalloc include/linux/slab.h:576 [inline] [<ffffffff81bb75ca>] autofs_wait+0x3fa/0x9a0 fs/autofs/waitq.c:378 [<ffffffff81bb88a7>] autofs_do_expire_multi+0xa7/0x3e0 fs/autofs/expire.c:593 [<ffffffff81bb8c33>] autofs_expire_multi+0x53/0x80 fs/autofs/expire.c:619 [<ffffffff81bb6972>] autofs_root_ioctl_unlocked+0x322/0x3b0 fs/autofs/root.c:897 [<ffffffff81bb6a95>] autofs_root_ioctl+0x25/0x30 fs/autofs/root.c:910 [<ffffffff81602a9c>] vfs_ioctl fs/ioctl.c:51 [inline] [<ffffffff81602a9c>] __do_sys_ioctl fs/ioctl.c:870 [inline] [<ffffffff81602a9c>] __se_sys_ioctl fs/ioctl.c:856 [inline] [<ffffffff81602a9c>] __x64_sys_ioctl+0xfc/0x140 fs/ioctl.c:856 [<ffffffff84608225>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff84608225>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd autofs_wait_queue structs should be freed if their wait_ctr becomes zero. Otherwise they will be lost. In this case an AUTOFS_IOC_EXPIRE_MULTI ioctl is done, then a new waitqueue struct is allocated in autofs_wait(), its initial wait_ctr equals 2. After that wait_event_killable() is interrupted (it returns -ERESTARTSYS), so that 'wq->name.name == NULL' condition may be not satisfied. Actually, this condition can be satisfied when autofs_wait_release() or autofs_catatonic_mode() is called and, what is also important, wait_ctr is decremented in those places. Upon the exit of autofs_wait(), wait_ctr is decremented to 1. Then the unmounting process begins: kill_sb calls autofs_catatonic_mode(), which should have freed the waitqueues, but it only decrements its usage counter to zero which is not a correct behaviour. edit:imk This description is of course not correct. The umount performed as a result of an expire is a umount of a mount that has been automounted, it's not the autofs mount itself. They happen independently, usually after everything mounted within the autofs file system has been expired away. If everything hasn't been expired away the automount daemon can still exit leaving mounts in place. But expires done in both cases will result in a notification that calls autofs_wait_release() with a result status. The problem case is the summary execution of of the automount daemon. In this case any waiting processes won't be woken up until either they are terminated or the mount is umounted. end edit: imk So in catatonic mode we should free waitqueues which counter becomes zero. edit: imk Initially I was concerned that the calling of autofs_wait_release() and autofs_catatonic_mode() was not mutually exclusive but that can't be the case (obviously) because the queue entry (or entries) is removed from the list when either of these two functions are called. Consequently the wait entry will be freed by only one of these functions or by the woken process in autofs_wait() depending on the order of the calls. end edit: imk Reported-by: syzbot+5e53f70e69ff0c0a1c0c@syzkaller.appspotmail.com Suggested-by: Takeshi Misawa <jeliantsurux@gmail.com> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Ian Kent <raven@themaw.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Andrei Vagin <avagin@gmail.com> Cc: autofs@vger.kernel.org Cc: linux-kernel@vger.kernel.org Message-Id: <169112719161.7590.6700123246297365841.stgit@donald.themaw.net> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-03	Merge tag 'nfsd-6.5-3' of ↵	Linus Torvalds	1	-3/+6
	git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: - Fix tmpfs splice read support * tag 'nfsd-6.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: Fix reading via splice
2023-08-03	Merge tag 'erofs-for-6.5-rc5-fixes' of ↵	Linus Torvalds	2	-5/+4
	git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: - Fix data corruption caused by insufficient decompression on deduplicated compressed extents - Drop a useless s_magic checking in erofs_kill_sb() * tag 'erofs-for-6.5-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: drop unnecessary WARN_ON() in erofs_kill_sb() erofs: fix wrong primary bvec selection on deduplicated extents
2023-08-02	Merge tag 'exfat-for-6.5-rc5' of ↵	Linus Torvalds	2	-20/+22
	git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat fixes from Namjae Jeon: - Fix page allocation failure from allocation bitmap by using kvmalloc_array/kvfree - Add the check to validate if filename entries exceeds max filename length - Fix potential deadlock condition from dir_emit() tag 'exfat-for-6.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: release s_lock before calling dir_emit() exfat: check if filename entries exceeds max filename length exfat: use kvmalloc_array/kvfree instead of kmalloc_array/kfree
2023-08-02	smb: client: fix dfs link mount against w2k8	Paulo Alcantara	1	-1/+5
	Customer reported that they couldn't mount their DFS link that was seen by the client as a DFS interlink -- special form of DFS link where its single target may point to a different DFS namespace -- and it turned out that it was just a regular DFS link where its referral header flags missed the StorageServers bit thus making the client think it couldn't tree connect to target directly without requiring further referrals. When the DFS link referral header flags misses the StoraServers bit and its target doesn't respond to any referrals, then tree connect to it. Fixes: a1c0d00572fc ("cifs: share dfs connections and supers") Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-08-02	fs: add CONFIG_BUFFER_HEAD	Christoph Hellwig	31	-1/+34
	Add a new config option that controls building the buffer_head code, and select it from all file systems and stacking drivers that need it. For the block device nodes and alternative iomap based buffered I/O path is provided when buffer_head support is not enabled, and iomap needs a a small tweak to define the IOMAP_F_BUFFER_HEAD flag to 0 to not call into the buffer_head code when it doesn't exist. Otherwise this is just Kconfig and ifdef changes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230801172201.1923299-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-02	fs: rename and move block_page_mkwrite_return	Christoph Hellwig	6	-13/+13
	block_page_mkwrite_return is neither block nor mkwrite specific, and should not be under CONFIG_BLOCK. Move it to mm.h and rename it to vmf_fs_error. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230801172201.1923299-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>