Age | Commit message (Collapse) | Author | Files | Lines |
|
It is only used by the CIL checkpoints, and is the counterpart to
start record formatting and writing that is already local to
xfs_log_cil.c.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
I'm seeing assert failures from xlog_space_left() after a shutdown
has begun that look like:
XFS (dm-0): log I/O error -5
XFS (dm-0): xfs_do_force_shutdown(0x2) called from line 1338 of file fs/xfs/xfs_log.c. Return address = xlog_ioend_work+0x64/0xc0
XFS (dm-0): Log I/O Error Detected.
XFS (dm-0): Shutting down filesystem. Please unmount the filesystem and rectify the problem(s)
XFS (dm-0): xlog_space_left: head behind tail
XFS (dm-0): tail_cycle = 6, tail_bytes = 2706944
XFS (dm-0): GH cycle = 6, GH bytes = 1633867
XFS: Assertion failed: 0, file: fs/xfs/xfs_log.c, line: 1310
------------[ cut here ]------------
Call Trace:
xlog_space_left+0xc3/0x110
xlog_grant_push_threshold+0x3f/0xf0
xlog_grant_push_ail+0x12/0x40
xfs_log_reserve+0xd2/0x270
? __might_sleep+0x4b/0x80
xfs_trans_reserve+0x18b/0x260
.....
There are two things here. Firstly, after a shutdown, the log head
and tail can be out of whack as things abort and release (or don't
release) resources, so checking them for sanity doesn't make much
sense. Secondly, xfs_log_reserve() can race with shutdown and so it
can still fail like this even though it has already checked for a
log shutdown before calling xlog_grant_push_ail().
So, before ASSERT failing in xlog_space_left(), make sure we haven't
already shut down....
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
When the log is shutdown, it currently walks all the iclogs and runs
callbacks that are attached to the iclogs, regardless of whether the
iclog is queued for IO completion or not. This creates a problem for
contexts attaching callbacks to iclogs in that a racing shutdown can
run the callbacks even before the attaching context has finished
processing the iclog and releasing it for IO submission.
If the callback processing of the iclog frees the structure that is
attached to the iclog, then this leads to an UAF scenario that can
only be protected against by holding the icloglock from the point
callbacks are attached through to the release of the iclog. While we
currently do this, it is not practical or sustainable.
Hence we need to make shutdown processing the responsibility of the
context that holds active references to the iclog. We know that the
contexts attaching callbacks to the iclog must have active
references to the iclog, and that means they must be in either
ACTIVE or WANT_SYNC states. xlog_state_do_callback() will skip over
iclogs in these states -except- when the log is shut down.
xlog_state_do_callback() checks the state of the iclogs while
holding the icloglock, therefore the reference count/state change
that occurs in xlog_state_release_iclog() after the callbacks are
atomic w.r.t. shutdown processing.
We can't push the responsibility of callback cleanup onto the CIL
context because we can have ACTIVE iclogs that have callbacks
attached that have already been released. Hence we really need to
internalise the cleanup of callbacks into xlog_state_release_iclog()
processing.
Indeed, we already have that internalisation via:
xlog_state_release_iclog
drop last reference
->SYNCING
xlog_sync
xlog_write_iclog
if (log_is_shutdown)
xlog_state_done_syncing()
xlog_state_do_callback()
<process shutdown on iclog that is now in SYNCING state>
The problem is that xlog_state_release_iclog() aborts before doing
anything if the log is already shut down. It assumes that the
callbacks have already been cleaned up, and it doesn't need to do
any cleanup.
Hence the fix is to remove the xlog_is_shutdown() check from
xlog_state_release_iclog() so that reference counts are correctly
released from the iclogs, and when the reference count is zero we
always transition to SYNCING if the log is shut down. Hence we'll
always enter the xlog_sync() path in a shutdown and eventually end
up erroring out the iclog IO and running xlog_state_do_callback() to
process the callbacks attached to the iclog.
This allows us to stop processing referenced ACTIVE/WANT_SYNC iclogs
directly in the shutdown code, and in doing so gets rid of the UAF
vector that currently exists. This then decouples the adding of
callbacks to the iclogs from xlog_state_release_iclog() as we
guarantee that xlog_state_release_iclog() will process the callbacks
if the log has been shut down before xlog_state_release_iclog() has
been called.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
The iclog callback processing done during a forced log shutdown has
different logic to normal runtime IO completion callback processing.
Separate out the shutdown callbacks into their own function and call
that from the shutdown code instead.
We don't need this shutdown specific logic in the normal runtime
completion code - we'll always run the shutdown version on shutdown,
and it will do what shutdown needs regardless of whether there are
racing IO completion callbacks scheduled or in progress. Hence we
can also simplify the normal IO completion callpath and only abort
if shutdown occurred while we actively were processing callbacks.
Further, separating out the IO completion logic from the shutdown
logic avoids callback race conditions from being triggered by log IO
completion after a shutdown. IO completion will now only run
callbacks on iclogs that are in the correct state for a callback to
be run, avoiding the possibility of running callbacks on a
referenced iclog that hasn't yet been submitted for IO.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Clean it up a bit by factoring and rearranging some of the code.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
The running of a forced shutdown is a bit of a mess. It does racy
checks for XFS_MOUNT_SHUTDOWN in xfs_do_force_shutdown(), then
does more racy checks in xfs_log_force_unmount() before finally
setting XFS_MOUNT_SHUTDOWN and XLOG_IO_ERROR under the
log->icloglock.
Move the checking and setting of XFS_MOUNT_SHUTDOWN into
xfs_do_force_shutdown() so we only process a shutdown once and once
only. Serialise this with the mp->m_sb_lock spinlock so that the
state change is atomic and won't race. Move all the mount specific
shutdown state changes from xfs_log_force_unmount() to
xfs_do_force_shutdown() so they are done atomically with setting
XFS_MOUNT_SHUTDOWN.
Then get rid of the racy xlog_is_shutdown() check from
xlog_force_shutdown(), and gate the log shutdown on the
test_and_set_bit(XLOG_IO_ERROR) test under the icloglock. This
means that the log is shutdown once and once only, and code that
needs to prevent races with shutdown can do so by holding the
icloglock and checking the return value of xlog_is_shutdown().
This results in a predictable shutdown execution process - we set the
shutdown flags once and process the shutdown once rather than the
current "as many concurrent shutdowns as can race to the flag
setting" situation we have now.
Also, now that shutdown is atomic, alway emit a stack trace when the
error level for the filesystem is high enough. This means that we
always get a stack trace when trying to diagnose the cause of
shutdowns in the field, rather than just for SHUTDOWN_CORRUPT_INCORE
cases.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
log->l_flags doesn't actually contain "flags" as such, it contains
operational state information that can change at runtime. For the
shutdown state, this at least should be an atomic bit because
it is read without holding locks in many places and so using atomic
bitops for the state field modifications makes sense.
This allows us to use things like test_and_set_bit() on state
changes (e.g. setting XLOG_TAIL_WARN) to avoid races in setting the
state when we aren't holding locks.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
xfs_log_mount_finish() needs to know if recovery is needed or not to
make decisions on whether to flush the log and AIL. Move the
handling of the NEED_RECOVERY state out to this function rather than
needing a temporary variable to store this state over the call to
xlog_recover_finish().
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
We don't need an iclog state field to tell us the log has been shut
down. We can just check the xlog_is_shutdown() instead. The avoids
the need to have shutdown overwrite the current iclog state while
being active used by the log code and so having to ensure that every
iclog state check handles XLOG_STATE_IOERROR appropriately.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Make it less shouty and a static inline before adding more calls
through the log code.
Also convert internal log code that uses XFS_FORCED_SHUTDOWN(mount)
to use xlog_is_shutdown(log) as well.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Use bvec_virt instead of open coding it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210804095634.460779-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
inode_detach_wb references the "main" bdi of the inode. With the
recent change to move the bdi from the request_queue to the gendisk
this causes a guaranteed use after free when using certain cgroup
configurations. The big itself is older through as any non-default
inode reference (e.g. an open file descriptor) could have injected
this use after free even before that.
Fixes: 52ebea749aae ("writeback: make backing_dev_info host cgroup-specific bdi_writebacks")
Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Reported-by: syzbot <syzbot+1fb38bb7d3ce0fa3e1c4@syzkaller.appspotmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210816122614.601358-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The dev_t is used as the inode hash, so we should only released it
once then block device inode is gone from the inode cache. Move it
to bdev_free_inode to ensure that.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210816122614.601358-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
different parents
Cross-rename lacks a check when that would prevent exchanging a
directory and subvolume from different parent subvolume. This causes
data inconsistencies and is caught before commit by tree-checker,
turning the filesystem to read-only.
Calling the renameat2 with RENAME_EXCHANGE flags like
renameat2(AT_FDCWD, namesrc, AT_FDCWD, namedest, (1 << 1))
on two paths:
namesrc = dir1/subvol1/dir2
namedest = subvol2/subvol3
will cause key order problem with following write time tree-checker
report:
[1194842.307890] BTRFS critical (device loop1): corrupt leaf: root=5 block=27574272 slot=10 ino=258, invalid previous key objectid, have 257 expect 258
[1194842.322221] BTRFS info (device loop1): leaf 27574272 gen 8 total ptrs 11 free space 15444 owner 5
[1194842.331562] BTRFS info (device loop1): refs 2 lock_owner 0 current 26561
[1194842.338772] item 0 key (256 1 0) itemoff 16123 itemsize 160
[1194842.338793] inode generation 3 size 16 mode 40755
[1194842.338801] item 1 key (256 12 256) itemoff 16111 itemsize 12
[1194842.338809] item 2 key (256 84 2248503653) itemoff 16077 itemsize 34
[1194842.338817] dir oid 258 type 2
[1194842.338823] item 3 key (256 84 2363071922) itemoff 16043 itemsize 34
[1194842.338830] dir oid 257 type 2
[1194842.338836] item 4 key (256 96 2) itemoff 16009 itemsize 34
[1194842.338843] item 5 key (256 96 3) itemoff 15975 itemsize 34
[1194842.338852] item 6 key (257 1 0) itemoff 15815 itemsize 160
[1194842.338863] inode generation 6 size 8 mode 40755
[1194842.338869] item 7 key (257 12 256) itemoff 15801 itemsize 14
[1194842.338876] item 8 key (257 84 2505409169) itemoff 15767 itemsize 34
[1194842.338883] dir oid 256 type 2
[1194842.338888] item 9 key (257 96 2) itemoff 15733 itemsize 34
[1194842.338895] item 10 key (258 12 256) itemoff 15719 itemsize 14
[1194842.339163] BTRFS error (device loop1): block=27574272 write time tree block corruption detected
[1194842.339245] ------------[ cut here ]------------
[1194842.443422] WARNING: CPU: 6 PID: 26561 at fs/btrfs/disk-io.c:449 csum_one_extent_buffer+0xed/0x100 [btrfs]
[1194842.511863] CPU: 6 PID: 26561 Comm: kworker/u17:2 Not tainted 5.14.0-rc3-git+ #793
[1194842.511870] Hardware name: empty empty/S3993, BIOS PAQEX0-3 02/24/2008
[1194842.511876] Workqueue: btrfs-worker-high btrfs_work_helper [btrfs]
[1194842.511976] RIP: 0010:csum_one_extent_buffer+0xed/0x100 [btrfs]
[1194842.512068] RSP: 0018:ffffa2c284d77da0 EFLAGS: 00010282
[1194842.512074] RAX: 0000000000000000 RBX: 0000000000001000 RCX: ffff928867bd9978
[1194842.512078] RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff928867bd9970
[1194842.512081] RBP: ffff92876b958000 R08: 0000000000000001 R09: 00000000000c0003
[1194842.512085] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[1194842.512088] R13: ffff92875f989f98 R14: 0000000000000000 R15: 0000000000000000
[1194842.512092] FS: 0000000000000000(0000) GS:ffff928867a00000(0000) knlGS:0000000000000000
[1194842.512095] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1194842.512099] CR2: 000055f5384da1f0 CR3: 0000000102fe4000 CR4: 00000000000006e0
[1194842.512103] Call Trace:
[1194842.512128] ? run_one_async_free+0x10/0x10 [btrfs]
[1194842.631729] btree_csum_one_bio+0x1ac/0x1d0 [btrfs]
[1194842.631837] run_one_async_start+0x18/0x30 [btrfs]
[1194842.631938] btrfs_work_helper+0xd5/0x1d0 [btrfs]
[1194842.647482] process_one_work+0x262/0x5e0
[1194842.647520] worker_thread+0x4c/0x320
[1194842.655935] ? process_one_work+0x5e0/0x5e0
[1194842.655946] kthread+0x135/0x160
[1194842.655953] ? set_kthread_struct+0x40/0x40
[1194842.655965] ret_from_fork+0x1f/0x30
[1194842.672465] irq event stamp: 1729
[1194842.672469] hardirqs last enabled at (1735): [<ffffffffbd1104f5>] console_trylock_spinning+0x185/0x1a0
[1194842.672477] hardirqs last disabled at (1740): [<ffffffffbd1104cc>] console_trylock_spinning+0x15c/0x1a0
[1194842.672482] softirqs last enabled at (1666): [<ffffffffbdc002e1>] __do_softirq+0x2e1/0x50a
[1194842.672491] softirqs last disabled at (1651): [<ffffffffbd08aab7>] __irq_exit_rcu+0xa7/0xd0
The corrupted data will not be written, and filesystem can be unmounted
and mounted again (all changes since the last commit will be lost).
Add the missing check for new_ino so that all non-subvolumes must reside
under the same parent subvolume. There's an exception allowing to
exchange two subvolumes from any parents as the directory representing a
subvolume is only a logical link and does not have any other structures
related to the parent subvolume, unlike files, directories etc, that
are always in the inode namespace of the parent subvolume.
Fixes: cdd1fedf8261 ("btrfs: add support for RENAME_EXCHANGE and RENAME_WHITEOUT")
CC: stable@vger.kernel.org # 4.7+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"A couple of fixes for long standing bugs, a warning fixup, and some
miscellaneous dax cleanups.
The bugs were recently found due to new platforms looking to use the
ACPI NFIT "virtual" device definition, and new error injection
capabilities to trigger error responses to label area requests. Ira's
cleanups have been long pending, I neglected to send them earlier, and
see no harm in including them now. This has all appeared in -next with
no reported issues.
Summary:
- Fix support for NFIT "virtual" ranges (BIOS-defined memory disks)
- Fix recovery from failed label storage areas on NVDIMM devices
- Miscellaneous cleanups from Ira's investigation of
dax_direct_access paths preparing for stray-write protection"
* tag 'libnvdimm-fixes-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
tools/testing/nvdimm: Fix missing 'fallthrough' warning
libnvdimm/region: Fix label activation vs errors
ACPI: NFIT: Fix support for virtual SPA ranges
dax: Ensure errno is returned from dax_direct_access
fs/dax: Clarify nr_pages to dax_direct_access()
fs/fuse: Remove unneeded kaddr parameter
|
|
If an SQPOLL based ring is newly created and an application issues an
io_uring_enter(2) system call on it, then we can return a spurious
-EOWNERDEAD error. This happens because there's nothing to submit, and
if the caller doesn't specify any other action, the initial error
assignment of -EOWNERDEAD never gets overwritten. This causes us to
return it directly, even if it isn't valid.
Move the error assignment into the actual failure case instead.
Cc: stable@vger.kernel.org
Fixes: d9d05217cb69 ("io_uring: stop SQPOLL submit on creator's death")
Reported-by: Sherlock Holo sherlockya@gmail.com
Link: https://github.com/axboe/liburing/issues/413
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull configfs fix from Christoph Hellwig:
- fix to revert to the historic write behavior (Bart Van Assche)
* tag 'configfs-5.14' of git://git.infradead.org/users/hch/configfs:
configfs: restore the kernel v5.13 text attribute write behavior
|
|
Pull cifs fixes from Steve French:
"Four CIFS/SMB3 Fixes, all for stable, two relating to deferred close,
and one for the 'modefromsid' mount option (when 'idsfromsid' not
specified)"
* tag '5.14-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Call close synchronously during unlink/rename/lease break.
cifs: Handle race conditions during rename
cifs: use the correct max-length for dentry_path_raw()
cifs: create sd context must be a multiple of 8
|
|
Pull io_uring fixes from Jens Axboe:
"A bit bigger than the previous weeks, but mostly just a few stable
bound fixes. In detail:
- Followup fixes to patches from last week for io-wq, turns out they
weren't complete (Hao)
- Two lockdep reported fixes out of the RT camp (me)
- Sync the io_uring-cp example with liburing, as a few bug fixes
never made it to the kernel carried version (me)
- SQPOLL related TIF_NOTIFY_SIGNAL fix (Nadav)
- Use WRITE_ONCE() when writing sq flags (Nadav)
- io_rsrc_put_work() deadlock fix (Pavel)"
* tag 'io_uring-5.14-2021-08-13' of git://git.kernel.dk/linux-block:
tools/io_uring/io_uring-cp: sync with liburing example
io_uring: fix ctx-exit io_rsrc_put_work() deadlock
io_uring: drop ctx->uring_lock before flushing work item
io-wq: fix IO_WORKER_F_FIXED issue in create_io_worker()
io-wq: fix bug of creating io-wokers unconditionally
io_uring: rsrc ref lock needs to be IRQ safe
io_uring: Use WRITE_ONCE() when writing to sq_flags
io_uring: clear TIF_NOTIFY_SIGNAL when running task work
|
|
Add some basic sanity-check tests for the fat_checksum() function and
the fat_time_unix2fat() and fat_time_fat2unix() functions. These unit
tests verify these functions return correct output for a number of test
inputs.
These tests were inspired by -- and serve a similar purpose to -- the
timestamp parsing KUnit tests in ext4[1].
Note that, unlike fat_time_unix2fat, fat_time_fat2unix wasn't previously
exported, so this patch exports it as well. This is required for the
case where we're building the fat and fat_test as modules.
Fixed minor checkpatch coding style errors and typos in commit log:
Shuah Khan <skhan@linuxfoundation.org>
[1]:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/ext4/inode-test.c
Signed-off-by: David Gow <davidgow@google.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Tested-by: Daniel Latypov <dlatypov@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
s/two/three
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Do not use numbers but strings to improve readability when flag is set.
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This adds NTFS3 in fs/Kconfig and fs/Makefile
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
|
|
This adds Kconfig, Makefile and doc
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
|
|
This adds NTFS journal
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
|
|
This patch adds different types of NTFS-applicable compressions:
- lznt
- lzx
- xpress
Latter two (lzx, xpress) implement Windows Compact OS feature and
were taken from ntfs-3g system comression plugin authored by Eric Biggers
(https://github.com/ebiggers/ntfs-3g-system-compression)
which were ported to ntfs3 and adapted to Linux Kernel environment.
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
|
|
This adds attrib operations
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
|
|
This adds file operations and implementation
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
|
|
This adds bitmap
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
|
|
This adds initialization of super block
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
|
|
This adds headers and misc files
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
|
|
The permission check in gfs2_setattr is an old and outdated version of
may_setattr(). Switch to the updated version.
Fixes fstest generic/079.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Move the permission checks in notify_change into a separate function to
make them available to filesystems.
When notify_change is called, the vfs performs those checks before
calling into iop->setattr. However, a filesystem like gfs2 can only
lock and revalidate the inode inside ->setattr, and it must then repeat
those checks to err on the safe side.
It would be nice to get rid of the double checking, but moving the
permission check into iop->setattr altogether isn't really an option.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Pull ceph fixes from Ilya Dryomov:
"A patch to avoid a soft lockup in ceph_check_delayed_caps() from Luis
and a reference handling fix from Jeff that should address some memory
corruption reports in the snaprealm area.
Both marked for stable"
* tag 'ceph-for-5.14-rc6' of git://github.com/ceph/ceph-client:
ceph: take snap_empty_lock atomically with snaprealm refcount change
ceph: reduce contention in ceph_check_delayed_caps()
|
|
if server shutdown happens in the situation that
there are connections, workqueue could be destroyed
before queueing disconnect work.
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
ksmbd is forcing to turn on FS_POSIX_ACL in Kconfig to use vfs acl
functions(posix_acl_alloc, get_acl, set_posix_acl). OpenWRT and other
platform doesn't use acl and this config is disable by default in
kernel. This patch use IS_ENABLED() to know acl config is enable and use
acl function if it is enable.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Use proper errno instead of -1 in smb2_get_ksmbd_tcon().
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Update the comment for smb2_get_ksmbd_tcon().
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Change data type of function that return only 0 or 1 to boolean.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
To negotiate either the SMB2 protocol or SMB protocol, a client must
send a SMB_COM_NEGOTIATE message containing the list of dialects it
supports, to which the server will respond with either a
SMB_COM_NEGOTIATE or a SMB2_NEGOTIATE response.
The current implementation responds with the highest common dialect,
rather than looking explicitly for "SMB 2.???" and "SMB 2.002", as
indicated in [MS-SMB2]:
[MS-SMB2] 3.3.5.3.1:
If the server does not implement the SMB 2.1 or 3.x dialect family,
processing MUST continue as specified in 3.3.5.3.2.
Otherwise, the server MUST scan the dialects provided for the dialect
string "SMB 2.???". If the string is not present, continue to section
3.3.5.3.2. If the string is present, the server MUST respond with an
SMB2 NEGOTIATE Response as specified in 2.2.4.
[MS-SMB2] 3.3.5.3.2:
The server MUST scan the dialects provided for the dialect string "SMB
2.002". If the string is present, the client understands SMB2, and the
server MUST respond with an SMB2 NEGOTIATE Response.
This is an issue if a client attempts to negotiate SMB3.1.1 using
a SMB_COM_NEGOTIATE, as it will trigger the following NULL pointer
dereference:
8<--- cut here ---
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = 1917455e
[00000000] *pgd=00000000
Internal error: Oops: 17 [#1] ARM
CPU: 0 PID: 60 Comm: kworker/0:1 Not tainted 5.4.60-00027-g0518c02b5c5b #35
Hardware name: Marvell Kirkwood (Flattened Device Tree)
Workqueue: ksmbd-io handle_ksmbd_work
PC is at ksmbd_gen_preauth_integrity_hash+0x24/0x190
LR is at smb3_preauth_hash_rsp+0x50/0xa0
pc : [<802b7044>] lr : [<802d6ac0>] psr: 40000013
sp : bf199ed8 ip : 00000000 fp : 80d1edb0
r10: 80a3471b r9 : 8091af16 r8 : 80d70640
r7 : 00000072 r6 : be95e198 r5 : ca000000 r4 : b97fee00
r3 : 00000000 r2 : 00000002 r1 : b97fea00 r0 : b97fee00
Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control: 0005317f Table: 3e7f4000 DAC: 00000055
Process kworker/0:1 (pid: 60, stack limit = 0x3dd1fdb4)
Stack: (0xbf199ed8 to 0xbf19a000)
9ec0: b97fee00 00000000
9ee0: be95e198 00000072 80d70640 802d6ac0 b3da2680 b97fea00 424d53ff be95e140
9f00: b97fee00 802bd7b0 bf10fa58 80128a78 00000000 000001c8 b6220000 bf0b7720
9f20: be95e198 80d0c410 bf7e2a00 00000000 00000000 be95e19c 80d0c370 80123b90
9f40: bf0b7720 be95e198 bf0b7720 bf0b7734 80d0c410 bf198000 80d0c424 80d116e0
9f60: bf10fa58 801240c0 00000000 bf10fa40 bf1463a0 bf198000 bf0b7720 80123ed0
9f80: bf077ee4 bf10fa58 00000000 80127f80 bf1463a0 80127e88 00000000 00000000
9fa0: 00000000 00000000 00000000 801010d0 00000000 00000000 00000000 00000000
9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[<802b7044>] (ksmbd_gen_preauth_integrity_hash) from [<802d6ac0>] (smb3_preauth_hash_rsp+0x50/0xa0)
[<802d6ac0>] (smb3_preauth_hash_rsp) from [<802bd7b0>] (handle_ksmbd_work+0x348/0x3f8)
[<802bd7b0>] (handle_ksmbd_work) from [<80123b90>] (process_one_work+0x160/0x200)
[<80123b90>] (process_one_work) from [<801240c0>] (worker_thread+0x1f0/0x2e4)
[<801240c0>] (worker_thread) from [<80127f80>] (kthread+0xf8/0x10c)
[<80127f80>] (kthread) from [<801010d0>] (ret_from_fork+0x14/0x24)
Exception stack(0xbf199fb0 to 0xbf199ff8)
9fa0: 00000000 00000000 00000000 00000000
9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
Code: e1855803 e5d13003 e1855c03 e5903094 (e1d330b0)
---[ end trace 8d03be3ed09e5699 ]---
Kernel panic - not syncing: Fatal exception
smb3_preauth_hash_rsp() panics because conn->preauth_info is only allocated
when processing a SMB2 NEGOTIATE request.
Fix this by splitting the smb_protos array into two, each containing
only SMB1 and SMB2 dialects respectively.
While here, make ksmbd_negotiate_smb_dialect() static as it not
called from anywhere else.
Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
compressed file
Since cluster is basic unit of compression, one cluster is compressed or
not, so we can calculate valid blocks only for first page in cluster,
the other pages just skip.
Signed-off-by: Fengnan Chang <changfengnan@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ucounts fix from Eric Biederman:
"This fixes the ucount sysctls on big endian architectures.
The counts were expanded to be longs instead of ints, and the sysctl
code was overlooked, so only the low 32bit were being processed. On
litte endian just processing the low 32bits is fine, but on 64bit big
endian processing just the low 32bits results in the high order bits
instead of the low order bits being processed and nothing works
proper.
This change took a little bit to mature as we have the SYSCTL_ZERO,
and SYSCTL_INT_MAX macros that are only usable for sysctls operating
on ints, but unfortunately are not obviously broken. Which resulted in
the versions of this change working on big endian and not on little
endian, because the int SYSCTL_ZERO when extended 64bit wound up being
0x100000000. So we only allowed values greater than 0x100000000 and
less than 0faff. Which unfortunately broken everything that tried to
set the sysctls. (First reported with the windows subsystem for
linux).
I have tested this on x86_64 64bit after first reproducing the
problems with the earlier version of this change, and then verifying
the problems do not exist when we use appropriate long min and max
values for extra1 and extra2"
* 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ucounts: add missing data type changes
|
|
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210724074124.25731-4-jianchao.wan9@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
There is no functional change in this patch but just split the
codes, which serachs free block and does trim, into a new function
ext4_try_to_trim_range. This is preparing for the following async
backgroup discard.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210724074124.25731-3-jianchao.wan9@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Get rid of the 'group' parameter of ext4_trim_extent as we can get
it from the 'e4b'.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Wang Jianchao <wangjianchao@kuaishou.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210724074124.25731-2-jianchao.wan9@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
During unlink/rename/lease break, deferred work for close is
scheduled immediately but in an asynchronous manner which might
lead to race with actual(unlink/rename) commands.
This change will schedule close synchronously which will avoid
the race conditions with other commands.
Signed-off-by: Rohith Surabattula <rohiths@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Cc: stable@vger.kernel.org # 5.13
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
When rename is executed on directory which has files for which
close is deferred, then rename will fail with EACCES.
This patch will try to close all deferred files when EACCES is received
and retry rename on a directory.
Signed-off-by: Rohith Surabattula <rohiths@microsoft.com>
Cc: stable@vger.kernel.org # 5.13
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Just check inode_unhashed on the whole device bdev inode instead,
and provide a helper to check for that information.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210809064028.1198327-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Currently iocharset=utf8 mount option is broken. To use UTF-8 as iocharset,
it is required to use utf8 mount option.
Fix iocharset=utf8 mount option to use be equivalent to the utf8 mount
option.
If UTF-8 as iocharset is used then s_nls_iocharset is set to NULL. So
simplify code around, remove s_utf8 field as to distinguish between UTF-8
and non-UTF-8 it is needed just to check if s_nls_iocharset is set to NULL
or not.
Link: https://lore.kernel.org/r/20210808162453.1653-5-pali@kernel.org
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Currently iocharset=utf8 mount option is broken. To use UTF-8 as iocharset,
it is required to use utf8 mount option.
Fix iocharset=utf8 mount option to use be equivalent to the utf8 mount
option.
If UTF-8 as iocharset is used then s_nls_map is set to NULL. So simplify
code around, remove UDF_FLAG_NLS_MAP and UDF_FLAG_UTF8 flags as to
distinguish between UTF-8 and non-UTF-8 it is needed just to check if
s_nls_map set to NULL or not.
Link: https://lore.kernel.org/r/20210808162453.1653-4-pali@kernel.org
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
|