Age | Commit message (Collapse) | Author | Files | Lines |
|
Pull SCSI updates from James Bottomley:
"This series consists of the usual driver updates (ufs, qla2xxx,
target, smartpqi, lpfc, mpt3sas).
The core change causing the most churn was replacing the command
request field request with a macro, allowing us to offset map to it
and remove the redundant field; the same was also done for the tag
field.
The most impactful change is the final removal of scsi_ioctl, which
has been deprecated for over a decade"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (293 commits)
scsi: ufs: Fix ufshcd_request_sense_async() for Samsung KLUFG8RHDA-B2D1
scsi: ufs: ufs-exynos: Fix static checker warning
scsi: mpt3sas: Use the proper SCSI midlayer interfaces for PI
scsi: lpfc: Use the proper SCSI midlayer interfaces for PI
scsi: lpfc: Copyright updates for 14.0.0.1 patches
scsi: lpfc: Update lpfc version to 14.0.0.1
scsi: lpfc: Add bsg support for retrieving adapter cmf data
scsi: lpfc: Add cmf_info sysfs entry
scsi: lpfc: Add debugfs support for cm framework buffers
scsi: lpfc: Add support for maintaining the cm statistics buffer
scsi: lpfc: Add rx monitoring statistics
scsi: lpfc: Add support for the CM framework
scsi: lpfc: Add cmfsync WQE support
scsi: lpfc: Add support for cm enablement buffer
scsi: lpfc: Add cm statistics buffer support
scsi: lpfc: Add EDC ELS support
scsi: lpfc: Expand FPIN and RDF receive logging
scsi: lpfc: Add MIB feature enablement support
scsi: lpfc: Add SET_HOST_DATA mbox cmd to pass date/time info to firmware
scsi: fc: Add EDC ELS definition
...
|
|
These are very chatty, racy, and not terribly useful. Just remove them.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
For example in the case of a forced umount, we'll remove all the session
caps even if they are dirty. Move the warning to a wrapper function and
make most of the callers use it. Call the core function when removing
caps due to a forced umount.
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Force umount will try to close the sessions by setting the session
state to _CLOSING. We don't want to WARN in this situation, since it's
expected.
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
capsnaps will take inode references via ihold when queueing to flush.
When force unmounting, the client will just close the sessions and
may never get a flush reply, causing a leak and inode ref leak.
Fix this by removing the capsnaps for an inode when removing the caps.
URL: https://tracker.ceph.com/issues/52295
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
The current code will update the mtime and then try to get caps to
handle the write. If we end up having to request caps from the MDS, then
the mtime in the cap grant will clobber the updated mtime and it'll be
lost.
This is most noticable when two clients are alternately writing to the
same file. Fw caps are continually being granted and revoked, and the
mtime ends up stuck because the updated mtimes are always being
overwritten with the old one.
Fix this by changing the order of operations in ceph_write_iter to get
the caps before updating the times. Also, make sure we check the pool
full conditions before even getting any caps or uninlining.
URL: https://tracker.ceph.com/issues/46574
Reported-by: Jozef Kováč <kovac@firma.zoznam.sk>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Luis Henriques <lhenriques@suse.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
In the case where the export MDS has crashed just after the EImportStart
journal is flushed, a standby MDS takes over for it and when replaying
the EImportStart journal the MDS will wait the client to reconnect. That
may never happen because the client may not have registered or opened
the sessions yet.
When receiving a new map, ensure we reconnect to valid export targets as
well if their sessions don't exist yet.
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Print a bit more information when we can't find the realm during
ceph_add_cap. Show both the inode number and the old realm inode
number.
Suggested-by: Sage Weil <sage@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Consolidate some fiddly code for changing an inode's snap_realm
into a new helper function, and change the callers to use it.
While we're in here, nothing uses the i_snap_realm_counter field, so
remove that from the inode.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Luis Henriques <lhenriques@suse.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
The ceph_mds_client and ceph_mds_session structures are kzalloc'ed so
there's no need to explicitly initialize either of their fields to 0.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
The first thing metric_delayed_work does is check mdsc->stopping,
and then return immediately if it's set. That's good since we would
have already torn down the metric structures at this point, otherwise,
but there is no locking around mdsc->stopping.
It's possible that the ceph_metric_destroy call could race with the
delayed_work, in which case we could end up with the delayed_work
accessing destroyed percpu variables.
At this point in the mdsc teardown, the "stopping" flag has already been
set, so there's no benefit to flushing the work. Move the work
cancellation in ceph_metric_destroy ahead of the percpu variable
destruction, and eliminate the flush_delayed_work call in
ceph_mdsc_destroy.
Fixes: 18f473b384a6 ("ceph: periodically send perf metrics to MDSes")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Add a new vxattr that shows what MDS is authoritative for an inode (if
we happen to have auth caps). If we don't have an auth cap for the inode
then just return -1.
URL: https://tracker.ceph.com/issues/1276
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Luis Henriques <lhenriques@suse.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
We missed these in the recent fscache rework.
Reported-by: Steve French <smfrench@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
For the client requests who will have unsafe and safe replies from
MDS daemons, in the MDS side the MDS daemons won't flush the mdlog
(journal log) immediatelly, because they think it's unnecessary.
That's true for most cases but not all, likes the fsync request.
The fsync will wait until all the unsafe replied requests to be
safely replied.
Normally if there have multiple threads or clients are running, the
whole mdlog in MDS daemons could be flushed in time if any request
will trigger the mdlog submit thread. So usually we won't experience
the normal operations will stuck for a long time. But in case there
has only one client with only thread is running, the stuck phenomenon
maybe obvious and the worst case it must wait at most 5 seconds to
wait the mdlog to be flushed by the MDS's tick thread periodically.
This patch will trigger to flush the mdlog in the relevant and auth
MDSes to which the in-flight requests are sent just before waiting
the unsafe requests to finish.
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
If we hit a decoding error late in the frame, then we might exit the
function without putting the pool_ns string. Ensure that we always put
that reference on the way out of the function.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull KUnit updates from Shuah Khan:
"This KUnit update for Linux 5.15-rc1 adds new features and tests:
Tool:
- support for '--kernel_args' to allow setting module params
- support for '--raw_output' option to show just the kunit output
during make
Tests:
- new KUnit tests for checksums and timestamps
- Print test statistics on failure
- Integrates UBSAN into the KUnit testing framework. It fails KUnit
tests whenever it reports undefined behavior"
* tag 'linux-kselftest-kunit-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: Print test statistics on failure
kunit: tool: make --raw_output support only showing kunit output
kunit: tool: add --kernel_args to allow setting module params
kunit: ubsan integration
fat: Add KUnit tests for checksums and timestamps
|
|
Pull configfs updates from Christoph Hellwig:
- fix a race in configfs_lookup (Sishuai Gong)
- minor cleanups (me)
* tag 'configfs-5.15' of git://git.infradead.org/users/hch/configfs:
configfs: fix a race in configfs_lookup()
configfs: fold configfs_attach_attr into configfs_lookup
configfs: simplify the configfs_dirent_is_ready
configfs: return -ENAMETOOLONG earlier in configfs_lookup
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
"This set includes a number of minor fixes and cleanups related to the
networking changes in the last release.
A patch to delay ack messages reduces network traffic significantly"
* tag 'dlm-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
fs: dlm: avoid comms shutdown delay in release_lockspace
fs: dlm: fix return -EINTR on recovery stopped
fs: dlm: implement delayed ack handling
fs: dlm: move receive loop into receive handler
fs: dlm: fix multiple empty writequeue alloc
fs: dlm: generic connect func
fs: dlm: auto load sctp module
fs: dlm: introduce generic listen
fs: dlm: move to static proto ops
fs: dlm: introduce con_next_wq helper
fs: dlm: cleanup and remove _send_rcom
fs: dlm: clear CF_APP_LIMITED on close
fs: dlm: fix typo in tlv prefix
fs: dlm: use READ_ONCE for config var
fs: dlm: use sk->sk_socket instead of con->sock
|
|
If a task is queueing async work and also handling signals, then we can
run into the case where create_io_thread() is interrupted and returns
failure because of that. If this happens for creating the first worker
in a group, then that worker will never get created and we can hang the
ring.
If we do get a fork failure, retry from task_work. With signals we have
to be a bit careful as we cannot simply queue as task_work, as we'll
still have signals pending at that point. Punt over a normal workqueue
first and then create from task_work after that.
Lastly, ensure that we handle fatal worker creations. Worker creation
failures are normally not fatal, only if we fail to create one in an empty
worker group can we not make progress. Right now that is ignored, ensure
that we handle that and run cancel on the work item.
There are two paths that create new workers - one is the "existing worker
going to sleep", and the other is "no workers found for this work, create
one". The former is never fatal, as workers do exist in the group. Only
the latter needs to be carefully handled.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
It makes the logic easier to follow if we just get rid of the fixed worker
flag, and simply ensure that we never exit the last worker in the group.
This also means that no particular worker is special.
Just track the last timeout state, and if we have hit it and no work
is pending, check if there are other workers. If yes, then we can exit
this one safely.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Instead of just marking the buffer as having overflowed, fill it up as
much as we can. That will allow the overflow case to then return
whatever truncated result if it wants to.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"In addition to some ext4 bug fixes and cleanups, this cycle we add the
orphan_file feature, which eliminates bottlenecks when doing a large
number of parallel truncates and file deletions, and move the discard
operation out of the jbd2 commit thread when using the discard mount
option, to better support devices with slow discard operations"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits)
ext4: make the updating inode data procedure atomic
ext4: remove an unnecessary if statement in __ext4_get_inode_loc()
ext4: move inode eio simulation behind io completeion
ext4: Improve scalability of ext4 orphan file handling
ext4: Orphan file documentation
ext4: Speedup ext4 orphan inode handling
ext4: Move orphan inode handling into a separate file
ext4: Support for checksumming from journal triggers
ext4: fix race writing to an inline_data file while its xattrs are changing
jbd2: add sparse annotations for add_transaction_credits()
ext4: fix sparse warnings
ext4: Make sure quota files are not grabbed accidentally
ext4: fix e2fsprogs checksum failure for mounted filesystem
ext4: if zeroout fails fall back to splitting the extent node
ext4: reduce arguments of ext4_fc_add_dentry_tlv
ext4: flush background discard kwork when retry allocation
ext4: get discard out of jbd2 commit kthread contex
ext4: remove the repeated comment of ext4_trim_all_free
ext4: add new helper interface ext4_try_to_trim_range()
ext4: remove the 'group' parameter of ext4_trim_extent
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs update from Miklos Szeredi:
- Copy up immutable/append/sync/noatime attributes (Amir Goldstein)
- Improve performance by enabling RCU lookup.
- Misc fixes and improvements
The reason this touches so many files is that the ->get_acl() method now
gets a "bool rcu" argument. The ->get_acl() API was updated based on
comments from Al and Linus:
Link: https://lore.kernel.org/linux-fsdevel/CAJfpeguQxpd6Wgc0Jd3ks77zcsAv_bn0q17L3VNnnmPKu11t8A@mail.gmail.com/
* tag 'ovl-update-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: enable RCU'd ->get_acl()
vfs: add rcu argument to ->get_acl() callback
ovl: fix BUG_ON() in may_delete() when called from ovl_cleanup()
ovl: use kvalloc in xattr copy-up
ovl: update ctime when changing fileattr
ovl: skip checking lower file's i_writecount on truncate
ovl: relax lookup error on mismatch origin ftype
ovl: do not set overlay.opaque for new directories
ovl: add ovl_allow_offline_changes() helper
ovl: disable decoding null uuid with redirect_dir
ovl: consistent behavior for immutable/append-only inodes
ovl: copy up sync/noatime fileattr flags
ovl: pass ovl_fs to ovl_check_setxattr()
fs: add generic helper for filling statx attribute flags
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"In this cycle, direct I/O and fsdax support for uncompressed files are
now added in order to avoid double-caching for loop device and VM
container use cases. All uncompressed cases are now turned into iomap
infrastructure, which looks much simpler and cleaner.
In addition, fiemap support is added for both (un)compressed files by
using iomap infrastructure as well so end users can easily get file
distribution. We've also added chunk-based uncompressed files support
for data deduplication as the next step of VM container use cases.
Summary:
- support direct I/O for all uncompressed files
- support fsdax for non-tailpacking regular files
- use iomap infrastructure for all uncompressed cases
- support fiemap for both (un)compressed files
- introduce chunk-based files for chunk deduplication
- some cleanups"
* tag 'erofs-for-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix double free of 'copied'
erofs: support reading chunk-based uncompressed files
erofs: introduce chunk-based file on-disk format
erofs: add fiemap support with iomap
erofs: add support for the full decompressed length
erofs: remove the mapping parameter from erofs_try_to_free_cached_page()
erofs: directly use wrapper erofs_page_is_managed() when shrinking
erofs: convert all uncompressed cases to iomap
erofs: dax support for non-tailpacking regular file
erofs: iomap support for non-tailpacking DIO
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull fscache updates from David Howells:
"Preparatory work for the fscache rewrite that's being worked on and
fix some bugs. These include:
- Always select netfs stats when enabling fscache stats since they're
displayed through the same procfile.
- Add a cookie debug ID that can be used in tracepoints instead of a
pointer and cache it in the netfs_cache_resources struct rather
than in the netfs_read_request struct to make it more available.
- Use file_inode() in cachefiles rather than dereferencing
file->f_inode directly.
- Provide a procfile to display fscache cookies.
- Remove the fscache and cachefiles histogram procfiles.
- Remove the fscache object list procfile.
- Avoid using %p in fscache and cachefiles as the value is hashed and
not comparable to the register dump in an oops trace.
- Fix the cookie hash function to actually achieve useful dispersion.
- Fix fscache_cookie_put() so that it doesn't dereference the cookie
pointer in the tracepoint after the refcount has been decremented
(we're only allowed to do that if we decremented it to zero).
- Use refcount_t rather than atomic_t for the fscache_cookie
refcount"
* tag 'fscache-next-20210829' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
fscache: Use refcount_t for the cookie refcount instead of atomic_t
fscache: Fix fscache_cookie_put() to not deref after dec
fscache: Fix cookie key hashing
cachefiles: Change %p in format strings to something else
fscache: Change %p in format strings to something else
fscache: Remove the object list procfile
fscache, cachefiles: Remove the histogram stuff
fscache: Procfile to display cookies
fscache: Add a cookie debug ID and use that in traces
cachefiles: Use file_inode() rather than accessing ->f_inode
netfs: Move cookie debug ID to struct netfs_cache_resources
fscache: Select netfs stats if fscache stats are enabled
|
|
Usually in file system init() messages are only displayed in info level.
Change level from notice to info, but keep CONFIG_NTFS3_64BIT_CLUSTER in
notice level. Also this need even more attention so let's put big
warning here so that nobody will not try accidentally use it.
There is also no good reason to display internal stuff like binary tree
search. This is always on option which can only disabled for debugging
purposes by developer. Also this message does not even check if
developer has disabled it or not so it is useless info.
Signed-off-by: Kari Argillander <kari.argillander@gmail.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
|
|
Files already have SDPX identifier so no reason to keep boilerplates in
these files anymore.
Signed-off-by: Kari Argillander <kari.argillander@gmail.com>
Acked-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
|
|
This check will be also performed in generic_file_read_iter() so we do
not want to check this two times in a row.
This was founded with Smatch
fs/ntfs3/file.c:803 ntfs_file_read_iter()
warn: unused return: count = iov_iter_count()
Signed-off-by: Kari Argillander <kari.argillander@gmail.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
|
|
Use fiemap_prep() to check valid flags. It also shrink request scope
(@len) to what the fs can actually handle.
This address following Smatch static checker warning:
fs/ntfs3/frecord.c:1894 ni_fiemap()
warn: potential integer overflow from user 'vbo + len'
Because fiemap_prep() shrinks @len this cannot happened anymore.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: lore.kernel.org/ntfs3/20210825080440.GA17407@kili/
Fixes: 4342306f0f0d ("fs/ntfs3: Add file operations and implementation")
Signed-off-by: Kari Argillander <kari.argillander@gmail.com>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
|
|
If the call to ext4_map_blocks() fails due to an corrupted file
system, ext4_ext_replay_set_iblocks() can get stuck in an infinite
loop. This could be reproduced by running generic/526 with a file
system that has inline_data and fast_commit enabled. The system will
repeatedly log to the console:
EXT4-fs warning (device dm-3): ext4_block_to_path:105: block 1074800922 > max in inode 131076
and the stack that it gets stuck in is:
ext4_block_to_path+0xe3/0x130
ext4_ind_map_blocks+0x93/0x690
ext4_map_blocks+0x100/0x660
skip_hole+0x47/0x70
ext4_ext_replay_set_iblocks+0x223/0x440
ext4_fc_replay_inode+0x29e/0x3b0
ext4_fc_replay+0x278/0x550
do_one_pass+0x646/0xc10
jbd2_journal_recover+0x14a/0x270
jbd2_journal_load+0xc4/0x150
ext4_load_journal+0x1f3/0x490
ext4_fill_super+0x22d4/0x2c00
With this patch, generic/526 still fails, but system is no longer
locking up in a tight loop. It's likely the root casue is that
fast_commit replay is corrupting file systems with inline_data, and we
probably need to add better error handling in the fast commit replay
code path beyond what is done here, which essentially just breaks the
infinite loop without reporting the to the higher levels of the code.
Fixes: 8016E29F4362 ("ext4: fast commit recovery path")
Cc: stable@kernel.org
Cc: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Pull xfs updates from Darrick Wong:
"There's a lot in this cycle.
Starting with bug fixes: To avoid livelocks between the logging code
and the quota code, we've disabled the ability of quotaoff to turn off
quota accounting. (Admins can still disable quota enforcement, but
truly turning off accounting requires a remount.) We've tried to do
this in a careful enough way that there shouldn't be any user visible
effects aside from quotaoff no longer randomly hanging the system.
We've also fixed some bugs in runtime log behavior that could trip up
log recovery if (otherwise unrelated) transactions manage to start and
commit concurrently; some bugs in the GETFSMAP ioctl where we would
incorrectly restrict the range of records output if the two xfs
devices are of different sizes; a bug that resulted in fallocate
funshare failing unnecessarily; and broken behavior in the xfs inode
cache when DONTCACHE is in play.
As for new features: we now batch inode inactivations in percpu
background threads, which sharply decreases frontend thread wait time
when performing file deletions and should improve overall directory
tree deletion times. This eliminates both the problem where closing an
unlinked file (especially on a frozen fs) can stall for a long time,
and should also ease complaints about direct reclaim bogging down on
unlinked file cleanup.
Starting with this release, we've enabled pipelining of the XFS log.
On workloads with high rates of metadata updates to different shards
of the filesystem, multiple threads can be used to format committed
log updates into log checkpoints.
Lastly, with this release, two new features have graduated to
supported status: inode btree counters (for faster mounts), and
support for dates beyond Y2038. Expect these to be enabled by default
in a future release of xfsprogs.
Summary:
- Fix a potential log livelock on busy filesystems when there's so
much work going on that we can't finish a quotaoff before filling
up the log by removing the ability to disable quota accounting.
- Introduce the ability to use per-CPU data structures in XFS so that
we can do a better job of maintaining CPU locality for certain
operations.
- Defer inode inactivation work to per-CPU lists, which will help us
batch that processing. Deletions of large sparse files will
*appear* to run faster, but all that means is that we've moved the
work to the backend.
- Drop the EXPERIMENTAL warnings from the y2038+ support and the
inode btree counters, since it's been nearly a year and no
complaints have come in.
- Remove more of our bespoke kmem* variants in favor of using the
standard Linux calls.
- Prepare for the addition of log incompat features in upcoming
cycles by actually adding code to support this.
- Small cleanups of the xattr code in preparation for landing support
for full logging of extended attribute updates in a future cycle.
- Replace the various log shutdown state and flag code all over xfs
with a single atomic bit flag.
- Fix a serious log recovery bug where log item replay can be skipped
based on the start lsn of a transaction even though the transaction
commit lsn is the key data point for that by enforcing start lsns
to appear in the log in the same order as commit lsns.
- Enable pipelining in the code that pushes log items to disk.
- Drop ->writepage.
- Fix some bugs in GETFSMAP where the last fsmap record reported for
a device could extend beyond the end of the device, and a separate
bug where query keys for one device could be applied to another.
- Don't let GETFSMAP query functions edit their input parameters.
- Small cleanups to the scrub code's handling of perag structures.
- Small cleanups to the incore inode tree walk code.
- Constify btree function parameters that aren't changed, so that
there will never again be confusion about range query functions
changing their input parameters.
- Standardize the format and names of tracepoint data attributes.
- Clean up all the mount state and feature flags to use wrapped
bitset functions instead of inconsistently open-coded flag checks.
- Fix some confusion between xfs_buf hash table key variable vs.
block number.
- Fix a mis-interaction with iomap where we reported shared delalloc
cow fork extents to iomap, which would cause the iomap unshare
operation to return IO errors unnecessarily.
- Fix DONTCACHE behavior"
* tag 'xfs-5.15-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (103 commits)
xfs: fix I_DONTCACHE
xfs: only set IOMAP_F_SHARED when providing a srcmap to a write
xfs: fix perag structure refcounting error when scrub fails
xfs: rename buffer cache index variable b_bn
xfs: convert bp->b_bn references to xfs_buf_daddr()
xfs: introduce xfs_buf_daddr()
xfs: kill xfs_sb_version_has_v3inode()
xfs: introduce xfs_sb_is_v5 helper
xfs: remove unused xfs_sb_version_has wrappers
xfs: convert xfs_sb_version_has checks to use mount features
xfs: convert scrub to use mount-based feature checks
xfs: open code sb verifier feature checks
xfs: convert xfs_fs_geometry to use mount feature checks
xfs: replace XFS_FORCED_SHUTDOWN with xfs_is_shutdown
xfs: convert remaining mount flags to state flags
xfs: convert mount flags to features
xfs: consolidate mount option features in m_features
xfs: replace xfs_sb_version checks with feature flag checks
xfs: reflect sb features in xfs_mount
xfs: rework attr2 feature and mount options
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull exit cleanups from Eric Biederman:
"In preparation of doing something about PTRACE_EVENT_EXIT I have
started cleaning up various pieces of code related to do_exit. Most of
that code I did not manage to get tested and reviewed before the merge
window opened but a handful of very useful cleanups are ready to be
merged.
The first change is simply the removal of the bdflush system call. The
code has now been disabled long enough that even the oldest userspace
working userspace setups anyone can find to test are fine with the
bdflush system call being removed.
Changing m68k fsp040_die to use force_sigsegv(SIGSEGV) instead of
calling do_exit directly is interesting only in that it is nearly the
most difficult of the incorrect uses of do_exit to remove.
The change to the seccomp code to simply send a signal instead of
calling do_coredump directly is a very nice little cleanup made
possible by realizing the existing signal sending helpers were missing
a little bit of functionality that is easy to provide"
* 'exit-cleanups-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
signal/seccomp: Dump core when there is only one live thread
signal/seccomp: Refactor seccomp signal and coredump generation
signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die
exit/bdflush: Remove the deprecated bdflush system call
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull siginfo si_trapno updates from Eric Biederman:
"The full set of si_trapno changes was not appropriate as a fix for the
newly added SIGTRAP TRAP_PERF, and so I postponed the rest of the
related cleanups.
This is the rest of the cleanups for si_trapno that reduces it from
being a really weird arch special case that is expect to be always
present (but isn't) on the architectures that support it to being yet
another field in the _sigfault union of struct siginfo.
The changes have been reviewed and marinated in linux-next. With the
removal of this awkward special case new code (like SIGTRAP TRAP_PERF)
that works across architectures should be easier to write and
maintain"
* 'siginfo-si_trapno-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
signal: Rename SIL_PERF_EVENT SIL_FAULT_PERF_EVENT for consistency
signal: Verify the alignment and size of siginfo_t
signal: Remove the generic __ARCH_SI_TRAPNO support
signal/alpha: si_trapno is only used with SIGFPE and SIGTRAP TRAP_UNK
signal/sparc: si_trapno is only used with SIGILL ILL_ILLTRP
arm64: Add compile-time asserts for siginfo_t offsets
arm: Add compile-time asserts for siginfo_t offsets
sparc64: Add compile-time asserts for siginfo_t offsets
|
|
If the application uses io_uring and also relies heavily on signals
for communication, that can cause io-wq workers to spuriously exit
just because the parent has a signal pending. Just ignore signals
unless they are fatal.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We've got a few issues that all boil down to the fact that we have one
list of pending work items, yet two different types of workers to
serve them. This causes some oddities around workers switching type and
even hashed work vs regular work on the same bounded list.
Just separate them out cleanly, similarly to how we already do
accounting of what is running. That provides a clean separation and
removes some corner cases that can cause stalls when handling IO
that is punted to io-wq.
Fixes: ecc53c48c13d ("io-wq: check max_worker limits if a worker transitions bound state")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When dlm_release_lockspace does active shutdown on connections to
other nodes, the active shutdown will wait for any exisitng passive
shutdowns to be resolved. But, the sequence of operations during
dlm_release_lockspace can prevent the normal resolution of passive
shutdowns (processed normally by way of lockspace recovery.)
This disruption of passive shutdown handling can cause the active
shutdown to wait for a full timeout period, delaying the completion
of dlm_release_lockspace.
To fix this, make dlm_release_lockspace resolve existing passive
shutdowns (by calling dlm_clear_members earlier), before it does
active shutdowns. The active shutdowns will not find any passive
shutdowns to wait for, and will not be delayed.
Reported-by: Chris Mackowski <cmackows@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the big set of driver core patches for 5.15-rc1.
These do change a number of different things across different
subsystems, and because of that, there were 2 stable tags created that
might have already come into your tree from different pulls that did
the following
- changed the bus remove callback to return void
- sysfs iomem_get_mapping rework
Other than those two things, there's only a few small things in here:
- kernfs performance improvements for huge numbers of sysfs users at
once
- tiny api cleanups
- other minor changes
All of these have been in linux-next for a while with no reported
problems, other than the before-mentioned merge issue"
* tag 'driver-core-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (33 commits)
MAINTAINERS: Add dri-devel for component.[hc]
driver core: platform: Remove platform_device_add_properties()
ARM: tegra: paz00: Handle device properties with software node API
bitmap: extend comment to bitmap_print_bitmask/list_to_buf
drivers/base/node.c: use bin_attribute to break the size limitation of cpumap ABI
topology: use bin_attribute to break the size limitation of cpumap ABI
lib: test_bitmap: add bitmap_print_bitmask/list_to_buf test cases
cpumask: introduce cpumap_print_list/bitmask_to_buf to support large bitmask and list
sysfs: Rename struct bin_attribute member to f_mapping
sysfs: Invoke iomem_get_mapping() from the sysfs open callback
debugfs: Return error during {full/open}_proxy_open() on rmmod
zorro: Drop useless (and hardly used) .driver member in struct zorro_dev
zorro: Simplify remove callback
sh: superhyway: Simplify check in remove callback
nubus: Simplify check in remove callback
nubus: Make struct nubus_driver::remove return void
kernfs: dont call d_splice_alias() under kernfs node lock
kernfs: use i_lock to protect concurrent inode updates
kernfs: switch kernfs to use an rwsem
kernfs: use VFS negative dentry caching
...
|
|
The prepare_compress_overwrite() gets/locks a page to prepare a read, and calls
f2fs_read_multi_pages() which checks EOF first. If there's any page beyond EOF,
we unlock the page and set cc->rpages[i] = NULL, which we can't put the page
anymore. This makes page leak, so let's fix by putting that page.
Fixes: a949dc5f2c5c ("f2fs: compress: fix race condition of overwrite vs truncate")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In f2fs_write_multi_pages(), f2fs_compress_pages() allocates pages for
compression work in cc->cpages[]. Then, f2fs_write_compressed_pages() initiates
bio submission. But, if there's any error before submitting the IOs like early
f2fs_cp_error(), previously it didn't free cpages by f2fs_compress_free_page().
Let's fix memory leak by putting that just before deallocating cc->cpages.
Fixes: 4c8ff7095bef ("f2fs: support data compression")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
We need to set the stalled bit early, before we drop the lock for adding
us to the stall hash queue. If not, then we can race with new work being
queued between adding us to the stall hash and io_worker_handle_work()
marking us stalled.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull close_range() cleanup from Christian Brauner:
"This is a cleanup for close_range() which was sent as part of a bugfix
we did some time ago in commit 9b5b872215fe ("file: fix close_range()
for unshare+cloexec").
We used to share more code between some helpers for close_range()
which made retrieving the maximum number of open fds before calling
into the helpers sensible. But with the introduction of
CLOSE_RANGE_CLOEXEC and the need to retrieve the number of maximum fds
once more for CLOSE_RANGE_CLOEXEC that stopped making sense. So the
code was in a dumb in-limbo state.
Fix this by simplifying the code a bit.
The original idea was to only fix the bug itself and make backporting
easy. And since the cleanup wasn't very pressing I left it in
linux-next for a very long time. I didn't pull the patches from the
list again back then which is why they don't have lore-links. So I'm
listing them below explicitly"
Commit 03ba0fe4d09f ("file: simplify logic in __close_range()")
Link: https://lore.kernel.org/linux-fsdevel/20210402123548.108372-3-brauner@kernel.org
Commit f49fd6d3c070 ("file: let pick_file() tell caller it's done")
Link: https://lore.kernel.org/linux-fsdevel/20210402123548.108372-4-brauner@kernel.org
* tag 'fs.close_range.v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
file: simplify logic in __close_range()
file: let pick_file() tell caller it's done
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull move_mount updates from Christian Brauner:
"This contains an extension to the move_mount() syscall making it
possible to add a single private mount into an existing propagation
tree.
The use-case comes from the criu folks which have been struggling with
restoring complex mount trees for a long time. Variations of this work
have been discussed at Plumbers before, e.g.
https://www.linuxplumbersconf.org/event/7/contributions/640/
The extension to move_mount() enables criu to restore any set of mount
namespaces, mount trees and sharing group trees without introducing
yet more complexity into mount propagation itself.
The changes required to criu to make use of this and restore complex
propagation trees are available at
https://github.com/Snorch/criu/commits/mount-v2-poc
A cleaned-up version of this will go up for merging into the main criu
repo after this lands"
* tag 'fs.move_mount.move_mount_set_group.v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
tests: add move_mount(MOVE_MOUNT_SET_GROUP) selftest
move_mount: allow to add a mount into an existing group
|
|
Pull iomap updates from Darrick Wong:
"The most notable externally visible change for this cycle is the
addition of support for reads to inline tail fragments of files, which
was requested by the erofs developers; and a correction for a kernel
memory corruption bug if the sysadmin tries to activate a swapfile
with more pages than the swapfile header suggests.
We also now report writeback completion errors to the file mapping
correctly, instead of munging all errors into EIO.
Internally, the bulk of the changes are Christoph's patchset to reduce
the indirect function call count by a third to a half by converting
iomap iteration from a loop pattern to a generator/consumer pattern.
As an added bonus, fsdax no longer open-codes iomap apply loops.
Summary:
- Simplify the bio_end_page usage in the buffered IO code.
- Support reading inline data at nonzero offsets for erofs.
- Fix some typos and bad grammar.
- Convert kmap_atomic usage in the inline data read path.
- Add some extra inline data input checking.
- Fix a memory corruption bug stemming from iomap_swapfile_activate
trying to activate more pages than mm was expecting.
- Pass errnos through the page writeback code so that writeback
errors are reported correctly instead of being munged to EIO.
- Replace iomap_apply with a open-coded iterator loops to reduce the
number of indirect calls by a third to a half.
- Refactor the fsdax code to use iomap iterators instead of the
open-coded iomap_apply code that it had before.
- Format file range iomap tracepoint data in hexadecimal and
standardize the names used in the pretty-print string"
* tag 'iomap-5.15-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (41 commits)
iomap: standardize tracepoint formatting and storage
mm/swap: consider max pages in iomap_swapfile_add_extent
iomap: move loop control code to iter.c
iomap: constify iomap_iter_srcmap
fsdax: switch the fault handlers to use iomap_iter
fsdax: factor out a dax_fault_actor() helper
fsdax: factor out helpers to simplify the dax fault code
iomap: rework unshare flag
iomap: pass an iomap_iter to various buffered I/O helpers
iomap: remove iomap_apply
fsdax: switch dax_iomap_rw to use iomap_iter
iomap: switch iomap_swapfile_activate to use iomap_iter
iomap: switch iomap_seek_data to use iomap_iter
iomap: switch iomap_seek_hole to use iomap_iter
iomap: switch iomap_bmap to use iomap_iter
iomap: switch iomap_fiemap to use iomap_iter
iomap: switch __iomap_dio_rw to use iomap_iter
iomap: switch iomap_page_mkwrite to use iomap_iter
iomap: switch iomap_zero_range to use iomap_iter
iomap: switch iomap_file_unshare to use iomap_iter
...
|
|
Pull project quota update from Darrick Wong:
"A single VFS patch that prevents userspace from setting project quota
ids on files that the VFS considers invalid"
* tag 'vfs-5.15-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
fs: forbid invalid project ID
|
|
Pull nfsd updates from Chuck Lever:
"New features:
- Support for server-side disconnect injection via debugfs
- Protocol definitions for new RPC_AUTH_TLS authentication flavor
Performance improvements:
- Reduce page allocator traffic in the NFSD splice read actor
- Reduce CPU utilization in svcrdma's Send completion handler
Notable bug fixes:
- Stabilize lockd operation when re-exporting NFS mounts
- Fix the use of %.*s in NFSD tracepoints
- Fix /proc/sys/fs/nfs/nsm_use_hostnames"
* tag 'nfsd-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (31 commits)
nfsd: fix crash on LOCKT on reexported NFSv3
nfs: don't allow reexport reclaims
lockd: don't attempt blocking locks on nfs reexports
nfs: don't atempt blocking locks on nfs reexports
Keep read and write fds with each nlm_file
lockd: update nlm_lookup_file reexport comment
nlm: minor refactoring
nlm: minor nlm_lookup_file argument change
lockd: lockd server-side shouldn't set fl_ops
SUNRPC: Add documentation for the fail_sunrpc/ directory
SUNRPC: Server-side disconnect injection
SUNRPC: Move client-side disconnect injection
SUNRPC: Add a /sys/kernel/debug/fail_sunrpc/ directory
svcrdma: xpt_bc_xprt is already clear in __svc_rdma_free()
nfsd4: Fix forced-expiry locking
rpc: fix gss_svc_init cleanup on failure
SUNRPC: Add RPC_AUTH_TLS protocol numbers
lockd: change the proc_handler for nsm_use_hostnames
sysctl: introduce new proc handler proc_dobool
SUNRPC: Fix a NULL pointer deref in trace_svc_stats_latency()
...
|