Age | Commit message (Collapse) | Author | Files | Lines |
|
fscache_create_volume_work() uses wake_up_bit() to wake up the processes
which are waiting for the completion of volume creation. According to
comments in wake_up_bit() and waitqueue_active(), an extra smp_mb() is
needed to guarantee the memory order between FSCACHE_VOLUME_CREATING
flag and waitqueue_active() before invoking wake_up_bit().
Fixing it by using clear_and_wake_up_bit() to add the missing memory
barrier.
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20230113115211.2895845-3-houtao@huaweicloud.com/ # v3
|
|
The freeing of relinquished volume will wake up the pending volume
acquisition by using wake_up_bit(), however it is mismatched with
wait_var_event() used in fscache_wait_on_volume_collision() and it will
never wake up the waiter in the wait-queue because these two functions
operate on different wait-queues.
According to the implementation in fscache_wait_on_volume_collision(),
if the wake-up of pending acquisition is delayed longer than 20 seconds
(e.g., due to the delay of on-demand fd closing), the first
wait_var_event_timeout() will timeout and the following wait_var_event()
will hang forever as shown below:
FS-Cache: Potential volume collision new=00000024 old=00000022
......
INFO: task mount:1148 blocked for more than 122 seconds.
Not tainted 6.1.0-rc6+ #1
task:mount state:D stack:0 pid:1148 ppid:1
Call Trace:
<TASK>
__schedule+0x2f6/0xb80
schedule+0x67/0xe0
fscache_wait_on_volume_collision.cold+0x80/0x82
__fscache_acquire_volume+0x40d/0x4e0
erofs_fscache_register_volume+0x51/0xe0 [erofs]
erofs_fscache_register_fs+0x19c/0x240 [erofs]
erofs_fc_fill_super+0x746/0xaf0 [erofs]
vfs_get_super+0x7d/0x100
get_tree_nodev+0x16/0x20
erofs_fc_get_tree+0x20/0x30 [erofs]
vfs_get_tree+0x24/0xb0
path_mount+0x2fa/0xa90
do_mount+0x7c/0xa0
__x64_sys_mount+0x8b/0xe0
do_syscall_64+0x30/0x60
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Considering that wake_up_bit() is more selective, so fix it by using
wait_on_bit() instead of wait_var_event() to wait for the freeing of
relinquished volume. In addition because waitqueue_active() is used in
wake_up_bit() and clear_bit() doesn't imply any memory barrier, use
clear_and_wake_up_bit() to add the missing memory barrier between
cursor->flags and waitqueue_active().
Fixes: 62ab63352350 ("fscache: Implement volume registration")
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20230113115211.2895845-2-houtao@huaweicloud.com/ # v3
|
|
Pull ksmbd server fixes from Steve French:
"Four smb3 server fixes, all also for stable:
- fix for signing bug
- fix to more strictly check packet length
- add a max connections parm to limit simultaneous connections
- fix error message flood that can occur with newer Samba xattr
format"
* tag '6.2-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: downgrade ndr version error message to debug
ksmbd: limit pdu length size according to connection status
ksmbd: do not sign response to session request for guest login
ksmbd: add max connections parameter
|
|
Pull cifs fix from Steve French:
"Fix for reconnect oops in smbdirect (RDMA), also is marked for stable"
* tag '6.2-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix oops due to uncleared server->smbd_conn in reconnect
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
"Fix two bugs, a recent one introduced in the last cycle, and an older
one from v5.11"
* tag 'ovl-fixes-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: fail on invalid uid/gid mapping at copy up
ovl: fix tmpfile leak
|
|
If st_uid/st_gid doesn't have a mapping in the mounter's user_ns, then
copy-up should fail, just like it would fail if the mounter task was doing
the copy using "cp -a".
There's a corner case where the "cp -a" would succeed but copy up fail: if
there's a mapping of the invalid uid/gid (65534 by default) in the user
namespace. This is because stat(2) will return this value if the mapping
doesn't exist in the current user_ns and "cp -a" will in turn be able to
create a file with this uid/gid.
This behavior would be inconsistent with POSIX ACL's, which return -1 for
invalid uid/gid which result in a failed copy.
For consistency and simplicity fail the copy of the st_uid/st_gid are
invalid.
Fixes: 459c7c565ac3 ("ovl: unprivieged mounts")
Cc: <stable@vger.kernel.org> # v5.11
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Seth Forshee <sforshee@kernel.org>
|
|
Missed an error cleanup.
Reported-by: syzbot+fd749a7ea127a84e0ffd@syzkaller.appspotmail.com
Fixes: 2b1a77461f16 ("ovl: use vfs_tmpfile_open() helper")
Cc: <stable@vger.kernel.org> # v6.1
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
When user switch samba to ksmbd, The following message flood is coming
when accessing files. Samba seems to changs dos attribute version to v5.
This patch downgrade ndr version error message to debug.
$ dmesg
...
[68971.766914] ksmbd: v5 version is not supported
[68971.779808] ksmbd: v5 version is not supported
[68971.871544] ksmbd: v5 version is not supported
[68971.910135] ksmbd: v5 version is not supported
...
Cc: stable@vger.kernel.org
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Stream protocol length will never be larger than 16KB until session setup.
After session setup, the size of requests will not be larger than
16KB + SMB2 MAX WRITE size. This patch limits these invalidly oversized
requests and closes the connection immediately.
Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers")
Cc: stable@vger.kernel.org
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-18259
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull fuse ACL fix from Christian Brauner:
"The new posix acl API doesn't depend on the xattr handler
infrastructure anymore and instead only relies on the posix acl inode
operations. As a result daemons without FUSE_POSIX_ACL are unable to
use posix acls like they used to.
Fix this by copying what we did for overlayfs during the posix acl api
conversion. Make fuse implement a dedicated ->get_inode_acl() method
as does overlayfs. Fuse can then also uses this to express different
needs for vfs permission checking during lookup and acl based
retrieval via the regular system call path.
This allows fuse to continue to refuse retrieving posix acls for
daemons that don't set FUSE_POSXI_ACL for permission checking while
also allowing a fuse server to retrieve it via the usual system calls"
* tag 'fs.fuse.acl.v6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
fuse: fixes after adapting to new posix acl api
|
|
In smbd_destroy(), clear the server->smbd_conn pointer after freeing the
smbd_connection struct that it points to so that reconnection doesn't get
confused.
Fixes: 8ef130f9ec27 ("CIFS: SMBD: Implement function to destroy a SMB Direct connection")
Cc: stable@vger.kernel.org
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Long Li <longli@microsoft.com>
Cc: Pavel Shilovsky <piastryyy@gmail.com>
Cc: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:
- Nail another UAF in NFSD's filecache
* tag 'nfsd-6.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: don't free files unconditionally in __nfsd_file_cache_purge
|
|
Commit f3bbac32475b ("ext4: deal with legacy signed xattr name hash
values") added a hashing function for the legacy case of having the
xattr hash calculated using a signed 'char' type. It left the unsigned
case alone, since it's all implicitly handled by the '-funsigned-char'
compiler option.
However, there's been some noise about back-porting it all into stable
kernels that lack the '-funsigned-char', so let's just make that at
least possible by making the whole 'this uses unsigned char' very
explicit in the code itself. Whether such a back-port is really
warranted or not, I'll leave to others, but at least together with this
change it is technically sensible.
Also, add a 'pr_warn_once()' for reporting the "hey, signedness for this
hash calculation has changed" issue. Hopefully it never triggers except
for that xfstests generic/454 test-case, but even if it does it's just
good information to have.
If for no other reason than "we can remove the legacy signed hash code
entirely if nobody ever sees the message any more".
Cc: Sasha Levin <sashal@kernel.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Theodore Ts'o <tytso@mit.edu>,
Cc: Jason Donenfeld <Jason@zx2c4.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This cycle we ported all filesystems to the new posix acl api. While
looking at further simplifications in this area to remove the last
remnants of the generic dummy posix acl handlers we realized that we
regressed fuse daemons that don't set FUSE_POSIX_ACL but still make use
of posix acls.
With the change to a dedicated posix acl api interacting with posix acls
doesn't go through the old xattr codepaths anymore and instead only
relies the get acl and set acl inode operations.
Before this change fuse daemons that don't set FUSE_POSIX_ACL were able
to get and set posix acl albeit with two caveats. First, that posix acls
aren't cached. And second, that they aren't used for permission checking
in the vfs.
We regressed that use-case as we currently refuse to retrieve any posix
acls if they aren't enabled via FUSE_POSIX_ACL. So older fuse daemons
would see a change in behavior.
We can restore the old behavior in multiple ways. We could change the
new posix acl api and look for a dedicated xattr handler and if we find
one prefer that over the dedicated posix acl api. That would break the
consistency of the new posix acl api so we would very much prefer not to
do that.
We could introduce a new ACL_*_CACHE sentinel that would instruct the
vfs permission checking codepath to not call into the filesystem and
ignore acls.
But a more straightforward fix for v6.2 is to do the same thing that
Overlayfs does and give fuse a separate get acl method for permission
checking. Overlayfs uses this to express different needs for vfs
permission lookup and acl based retrieval via the regular system call
path as well. Let fuse do the same for now. This way fuse can continue
to refuse to retrieve posix acls for daemons that don't set
FUSE_POSXI_ACL for permission checking while allowing a fuse server to
retrieve it via the usual system calls.
In the future, we could extend the get acl inode operation to not just
pass a simple boolean to indicate rcu lookup but instead make it a flag
argument. Then in addition to passing the information that this is an
rcu lookup to the filesystem we could also introduce a flag that tells
the filesystem that this is a request from the vfs to use these acls for
permission checking. Then fuse could refuse the get acl request for
permission checking when the daemon doesn't have FUSE_POSIX_ACL set in
the same get acl method. This would also help Overlayfs and allow us to
remove the second method for it as well.
But since that change is more invasive as we need to update the get acl
inode operation for multiple filesystems we should not do this as a fix
for v6.2. Instead we will do this for the v6.3 merge window.
Fwiw, since posix acls are now always correctly translated in the new
posix acl api we could also allow them to be used for daemons without
FUSE_POSIX_ACL that are not mounted on the host. But this is behavioral
change and again if dones should be done for v6.3. For now, let's just
restore the original behavior.
A nice side-effect of this change is that for fuse daemons with and
without FUSE_POSIX_ACL the same code is used for posix acls in a
backwards compatible way. This also means we can remove the legacy xattr
handlers completely. We've also added comments to explain the expected
behavior for daemons without FUSE_POSIX_ACL into the code.
Fixes: 318e66856dde ("xattr: use posix acl api")
Signed-off-by: Seth Forshee (Digital Ocean) <sforshee@kernel.org>
Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
|
nfsd_file_cache_purge is called when the server is shutting down, in
which case, tearing things down is generally fine, but it also gets
called when the exports cache is flushed.
Instead of walking the cache and freeing everything unconditionally,
handle it the same as when we have a notification of conflicting access.
Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache")
Reported-by: Ruben Vestergaard <rubenv@drcmr.dk>
Reported-by: Torkil Svensgaard <torkil@drcmr.dk>
Reported-by: Shachar Kagan <skagan@nvidia.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Shachar Kagan <skagan@nvidia.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 writepage fix from Andreas Gruenbacher:
- Fix a regression introduced by commit "gfs2: stop using
generic_writepages in gfs2_ail1_start_one".
* tag 'gfs2-v6.2-rc4-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
Revert "gfs2: stop using generic_writepages in gfs2_ail1_start_one"
|
|
Commit b2b0a5e97855 switched from generic_writepages() to
filemap_fdatawrite_wbc() in gfs2_ail1_start_one() on the path to
replacing ->writepage() with ->writepages() and eventually eliminating
the former. Function gfs2_ail1_start_one() is called from
gfs2_log_flush(), our main function for flushing the filesystem log.
Unfortunately, at least as implemented today, ->writepage() and
->writepages() are entirely different operations for journaled data
inodes: while the former creates and submits transactions covering the
data to be written, the latter flushes dirty buffers out to disk.
With gfs2_ail1_start_one() now calling ->writepages(), we end up
creating filesystem transactions while we are in the course of a log
flush, which immediately deadlocks on the sdp->sd_log_flush_lock
semaphore.
Work around that by going back to how things used to work before commit
b2b0a5e97855 for now; figuring out a superior solution will take time we
don't have available right now. However ...
Since the removal of generic_writepages() is imminent, open-code it
here. We're already inside a blk_start_plug() ... blk_finish_plug()
section here, so skip that part of the original generic_writepages().
This reverts commit b2b0a5e978552e348f85ad9c7568b630a5ede659.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
|
|
We potentially have old hashes of the xattr names generated on systems
with signed 'char' types. Now that everybody uses '-funsigned-char',
those hashes will no longer match.
This only happens if you use xattrs names that have the high bit set,
which probably doesn't happen in practice, but the xfstest generic/454
shows it.
Instead of adding a new "signed xattr hash filesystem" bit and having to
deal with all the possible combinations, just calculate the hash both
ways if the first one fails, and always generate new hashes with the
proper unsigned char version.
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/oe-lkp/202212291509.704a11c9-oliver.sang@intel.com
Link: https://lore.kernel.org/all/CAHk-=whUNjwqZXa-MH9KMmc_CpQpoFKFjAB9ZKHuu=TbsouT4A@mail.gmail.com/
Exposed-by: 3bc753c06dd0 ("kbuild: treat char as always unsigned")
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Theodore Ts'o <tytso@mit.edu>,
Cc: Jason Donenfeld <Jason@zx2c4.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull cifs fixes from Steve French:
- important fix for packet signature calculation error
- three fixes to correct DFS deadlock, and DFS refresh problem
- remove an unused DFS function, and duplicate tcon refresh code
- DFS cache lookup fix
- uninitialized rc fix
* tag '6.2-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: remove unused function
cifs: do not include page data when checking signature
cifs: fix return of uninitialized rc in dfs_cache_update_tgthint()
cifs: handle cache lookup errors different than -ENOENT
cifs: remove duplicate code in __refresh_tcon()
cifs: don't take exclusive lock for updating target hints
cifs: avoid re-lookups in dfs_cache_find()
cifs: fix potential deadlock in cache_refresh_path()
|
|
If ksmbd.mountd is configured to assign unknown users to the guest account
("map to guest = bad user" in the config), ksmbd signs the response.
This is wrong according to MS-SMB2 3.3.5.5.3:
12. If the SMB2_SESSION_FLAG_IS_GUEST bit is not set in the SessionFlags
field, and Session.IsAnonymous is FALSE, the server MUST sign the
final session setup response before sending it to the client, as
follows:
[...]
This fixes libsmb2 based applications failing to establish a session
("Wrong signature in received").
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Cc: stable@vger.kernel.org
Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Add max connections parameter to limit number of maximum simultaneous
connections.
Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers")
Cc: stable@vger.kernel.org
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix potential out-of-bounds access to leaf data when seeking in an
inline file
- fix potential crash in quota when rescan races with disable
- reimplement super block signature scratching by marking page/folio
dirty and syncing block device, allow removing write_one_page
* tag 'for-6.2-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix race between quota rescan and disable leading to NULL pointer deref
btrfs: fix invalid leaf access due to inline extent during lseek
btrfs: stop using write_one_page in btrfs_scratch_superblock
btrfs: factor out scratching of one regular super block
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs
Pull zonefs fix from Damien Le Moal:
- A single patch to fix sync write operations to detect and handle
errors due to external zone corruptions resulting in writes at
invalid location, from me.
* tag 'zonefs-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: Detect append writes at invalid locations
|
|
Remove dfs_cache_update_tgthint() as it is not used anywhere.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
On async reads, page data is allocated before sending. When the
response is received but it has no data to fill (e.g.
STATUS_END_OF_FILE), __calc_signature() will still include the pages in
its computation, leading to an invalid signature check.
This patch fixes this by not setting the async read smb_rqst page data
(zeroed by default) if its got_bytes is 0.
This can be reproduced/verified with xfstests generic/465.
Cc: <stable@vger.kernel.org>
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull affs fix from David Sterba:
"One minor fix for a KCSAN report"
* tag 'affs-for-6.2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
affs: initialize fsdata in affs_truncate()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
"Two patches fixes issues reported by syzbot, one fixes a missing
`domain_id` mount option in documentation and a minor cleanup:
- Fix wrong iomap->length calculation post EOF, which could cause a
WARN_ON in iomap_iter_done() (Siddh)
- Fix improper kvcalloc() use with __GFP_NOFAIL (me)
- Add missing `domain_id` mount option in documentation (Jingbo)
- Clean up fscache option parsing (Jingbo)"
* tag 'erofs-for-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: clean up parsing of fscache related options
erofs: add documentation for 'domain_id' mount option
erofs: fix kvcalloc() misuse with __GFP_NOFAIL
erofs/zmap.c: Fix incorrect offset calculation
|
|
Fix this by initializing rc to 0 as cache_refresh_path() would not set
it in case of success.
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/all/202301190004.bEHvbKG6-lkp@intel.com/
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
lookup_cache_entry() might return an error different than -ENOENT
(e.g. from ->char2uni), so handle those as well in
cache_refresh_path().
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
The logic for creating or updating a cache entry in __refresh_tcon()
could be simply done with cache_refresh_path(), so use it instead.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Avoid contention while updating dfs target hints. This should be
perfectly fine to update them under shared locks.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Simply downgrade the write lock on cache updates from
cache_refresh_path() and avoid unnecessary re-lookup in
dfs_cache_find().
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Avoid getting DFS referral from an exclusive lock in
cache_refresh_path() because the tcon IPC used for getting the
referral could be disconnected and thus causing a deadlock as shown
below:
task A task B
====== ======
cifs_demultiplex_thread() dfs_cache_find()
cifs_handle_standard() cache_refresh_path()
reconnect_dfs_server() down_write()
dfs_cache_noreq_find() get_dfs_referral()
down_read() <- deadlock smb2_get_dfs_refer()
SMB2_ioctl()
cifs_send_recv()
compound_send_recv()
wait_for_response()
where task A cannot wake up task B because it is blocked on
down_read() due to the exclusive lock held in cache_refresh_path() and
therefore not being able to make progress.
Fixes: c9f711039905 ("cifs: keep referral server sessions alive")
Reviewed-by: Aurélien Aptel <aurelien.aptel@gmail.com>
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Fix recently introduced use-after-free bugs
* tag 'nfsd-6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: replace delayed_work with work_struct for nfsd_client_shrinker
NFSD: register/unregister of nfsd-client shrinker at nfsd startup/shutdown time
NFSD: fix use-after-free in nfsd4_ssc_setup_dul()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
"21 hotfixes. Thirteen of these address pre-6.1 issues and hence have
the cc:stable tag"
* tag 'mm-hotfixes-stable-2023-01-16-15-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
init/Kconfig: fix typo (usafe -> unsafe)
nommu: fix split_vma() map_count error
nommu: fix do_munmap() error path
nommu: fix memory leak in do_mmap() error path
MAINTAINERS: update Robert Foss' email address
proc: fix PIE proc-empty-vm, proc-pid-vm tests
mm: update mmap_sem comments to refer to mmap_lock
include/linux/mm: fix release_pages_arg kernel doc comment
lib/win_minmax: use /* notation for regular comments
kasan: mark kasan_kunit_executing as static
nilfs2: fix general protection fault in nilfs_btree_insert()
Docs/admin-guide/mm/zswap: remove zsmalloc's lack of writeback warning
mm/hugetlb: pre-allocate pgtable pages for uffd wr-protects
hugetlb: unshare some PMDs when splitting VMAs
mm: fix vma->anon_name memory leak for anonymous shmem VMAs
mm/shmem: restore SHMEM_HUGE_DENY precedence over MADV_COLLAPSE
mm/MADV_COLLAPSE: don't expand collapse when vm_end is past requested end
mm/userfaultfd: enable writenotify while userfaultfd-wp is enabled for a VMA
mm/khugepaged: fix collapse_pte_mapped_thp() to allow anon_vma
mm/hugetlb: fix uffd-wp handling for migration entries in hugetlb_change_protection()
...
|
|
If we have one task trying to start the quota rescan worker while another
one is trying to disable quotas, we can end up hitting a race that results
in the quota rescan worker doing a NULL pointer dereference. The steps for
this are the following:
1) Quotas are enabled;
2) Task A calls the quota rescan ioctl and enters btrfs_qgroup_rescan().
It calls qgroup_rescan_init() which returns 0 (success) and then joins a
transaction and commits it;
3) Task B calls the quota disable ioctl and enters btrfs_quota_disable().
It clears the bit BTRFS_FS_QUOTA_ENABLED from fs_info->flags and calls
btrfs_qgroup_wait_for_completion(), which returns immediately since the
rescan worker is not yet running.
Then it starts a transaction and locks fs_info->qgroup_ioctl_lock;
4) Task A queues the rescan worker, by calling btrfs_queue_work();
5) The rescan worker starts, and calls rescan_should_stop() at the start
of its while loop, which results in 0 iterations of the loop, since
the flag BTRFS_FS_QUOTA_ENABLED was cleared from fs_info->flags by
task B at step 3);
6) Task B sets fs_info->quota_root to NULL;
7) The rescan worker tries to start a transaction and uses
fs_info->quota_root as the root argument for btrfs_start_transaction().
This results in a NULL pointer dereference down the call chain of
btrfs_start_transaction(). The stack trace is something like the one
reported in Link tag below:
general protection fault, probably for non-canonical address 0xdffffc0000000041: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000208-0x000000000000020f]
CPU: 1 PID: 34 Comm: kworker/u4:2 Not tainted 6.1.0-syzkaller-13872-gb6bb9676f216 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Workqueue: btrfs-qgroup-rescan btrfs_work_helper
RIP: 0010:start_transaction+0x48/0x10f0 fs/btrfs/transaction.c:564
Code: 48 89 fb 48 (...)
RSP: 0018:ffffc90000ab7ab0 EFLAGS: 00010206
RAX: 0000000000000041 RBX: 0000000000000208 RCX: ffff88801779ba80
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: dffffc0000000000 R08: 0000000000000001 R09: fffff52000156f5d
R10: fffff52000156f5d R11: 1ffff92000156f5c R12: 0000000000000000
R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000003
FS: 0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2bea75b718 CR3: 000000001d0cc000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
btrfs_qgroup_rescan_worker+0x3bb/0x6a0 fs/btrfs/qgroup.c:3402
btrfs_work_helper+0x312/0x850 fs/btrfs/async-thread.c:280
process_one_work+0x877/0xdb0 kernel/workqueue.c:2289
worker_thread+0xb14/0x1330 kernel/workqueue.c:2436
kthread+0x266/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
</TASK>
Modules linked in:
So fix this by having the rescan worker function not attempt to start a
transaction if it didn't do any rescan work.
Reported-by: syzbot+96977faa68092ad382c4@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/000000000000e5454b05f065a803@google.com/
Fixes: e804861bd4e6 ("btrfs: fix deadlock between quota disable and qgroup rescan worker")
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
During lseek, for SEEK_DATA and SEEK_HOLE modes, we access the disk_bytenr
of an extent without checking its type. However inline extents have their
data starting the offset of the disk_bytenr field, so accessing that field
when we have an inline extent can result in either of the following:
1) Interpret the inline extent's data as a disk_bytenr value;
2) In case the inline data is less than 8 bytes, we access part of some
other item in the leaf, or unused space in the leaf;
3) In case the inline data is less than 8 bytes and the extent item is
the first item in the leaf, we can access beyond the leaf's limit.
So fix this by not accessing the disk_bytenr field if we have an inline
extent.
Fixes: b6e833567ea1 ("btrfs: make hole and data seeking a lot more efficient")
Reported-by: Matthias Schoepfer <matthias.schoepfer@googlemail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216908
Link: https://lore.kernel.org/linux-btrfs/7f25442f-b121-2a3a-5a3d-22bcaae83cd4@leemhuis.info/
CC: stable@vger.kernel.org # 6.1
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
write_one_page is an awkward interface that expects the page locked and
->writepage to be implemented. Replace that by zeroing the signature
bytes and synchronize the block device page using the proper bdev
helpers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
btrfs_scratch_superblocks open codes scratching super block of a
non-zoned super block. Split the code to read, zero and write the
superblock for regular devices into a separate helper.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Another batch of fixes, dealing with fallouts from 6.1 reported by
users:
- tree-log fixes:
- fix directory logging due to race with concurrent index key
deletion
- fix missing error handling when logging directory items
- handle case of conflicting inodes being added to the log
- remove transaction aborts for not so serious errors
- fix qgroup accounting warning when rescan can be started at time
with temporarily disable accounting
- print more specific errors to system log when device scan ioctl
fails
- disable space overcommit for ZNS devices, causing heavy performance
drop"
* tag 'for-6.2-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: do not abort transaction on failure to update log root
btrfs: do not abort transaction on failure to write log tree when syncing log
btrfs: add missing setup of log for full commit at add_conflicting_inode()
btrfs: fix directory logging due to race with concurrent index key deletion
btrfs: fix missing error handling when logging directory items
btrfs: zoned: enable metadata over-commit for non-ZNS setup
btrfs: qgroup: do not warn on record without old_roots populated
btrfs: add extra error messages to cover non-ENOMEM errors from device_add_list()
|
|
... to avoid the mess of conditional preprocessing as we are continually
adding fscache related mount options.
Reviewd-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230112065431.124926-3-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Using REQ_OP_ZONE_APPEND operations for synchronous writes to sequential
files succeeds regardless of the zone write pointer position, as long as
the target zone is not full. This means that if an external (buggy)
application writes to the zone of a sequential file underneath the file
system, subsequent file write() operation will succeed but the file size
will not be correct and the file will contain invalid data written by
another application.
Modify zonefs_file_dio_append() to check the written sector of an append
write (returned in bio->bi_iter.bi_sector) and return -EIO if there is a
mismatch with the file zone wp offset field. This change triggers a call
to zonefs_io_error() and a zone check. Modify zonefs_io_error_cb() to
not expose the unexpected data after the current inode size when the
errors=remount-ro mode is used. Other error modes are correctly handled
already.
Fixes: 02ef12a663c7 ("zonefs: use REQ_OP_ZONE_APPEND for sync DIO")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
|
|
Pull cifs fixes from Steve French:
- memory leak and double free fix
- two symlink fixes
- minor cleanup fix
- two smb1 fixes
* tag '6.2-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix uninitialized memory read for smb311 posix symlink create
cifs: fix potential memory leaks in session setup
cifs: do not query ifaces on smb1 mounts
cifs: fix double free on failed kerberos auth
cifs: remove redundant assignment to the variable match
cifs: fix file info setting in cifs_open_file()
cifs: fix file info setting in cifs_query_path_info()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Here's a sizeable batch of Friday the 13th arm64 fixes for -rc4. What
could possibly go wrong?
The obvious reason we have so much here is because of the holiday
season right after the merge window, but we've also brought back an
erratum workaround that was previously dropped at the last minute and
there's an MTE coredumping fix that strays outside of the arch/arm64
directory.
Summary:
- Fix PAGE_TABLE_CHECK failures on hugepage splitting path
- Fix PSCI encoding of MEM_PROTECT_RANGE function in UAPI header
- Fix NULL deref when accessing debugfs node if PSCI is not present
- Fix MTE core dumping when VMA list is being updated concurrently
- Fix SME signal frame handling when SVE is not implemented by the
CPU
- Fix asm constraints for cmpxchg_double() to hazard both words
- Fix build failure with stack tracer and older versions of Clang
- Bring back workaround for Cortex-A715 erratum 2645198"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Fix build with CC=clang, CONFIG_FTRACE=y and CONFIG_STACK_TRACER=y
arm64/mm: Define dummy pud_user_exec() when using 2-level page-table
arm64: errata: Workaround possible Cortex-A715 [ESR|FAR]_ELx corruption
firmware/psci: Don't register with debugfs if PSCI isn't available
firmware/psci: Fix MEM_PROTECT_RANGE function numbers
arm64/signal: Always allocate SVE signal frames on SME only systems
arm64/signal: Always accept SVE signal frames on SME only systems
arm64/sme: Fix context switch for SME only systems
arm64: cmpxchg_double*: hazard against entire exchange variable
arm64/uprobes: change the uprobe_opcode_t typedef to fix the sparse warning
arm64: mte: Avoid the racy walk of the vma list during core dump
elfcore: Add a cprm parameter to elf_core_extra_{phdrs,data_size}
arm64: mte: Fix double-freeing of the temporary tag storage during coredump
arm64: ptrace: Use ARM64_SME to guard the SME register enumerations
arm64/mm: add pud_user_exec() check in pud_user_accessible_page()
arm64/mm: fix incorrect file_map_count for invalid pmd
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from rxrpc.
The rxrpc changes are noticeable large: to address a recent regression
has been necessary completing the threaded refactor.
Current release - regressions:
- rxrpc:
- only disconnect calls in the I/O thread
- move client call connection to the I/O thread
- fix incoming call setup race
- eth: mlx5:
- restore pkt rate policing support
- fix memory leak on updating vport counters
Previous releases - regressions:
- gro: take care of DODGY packets
- ipv6: deduct extension header length in rawv6_push_pending_frames
- tipc: fix unexpected link reset due to discovery messages
Previous releases - always broken:
- sched: disallow noqueue for qdisc classes
- eth: ice: fix potential memory leak in ice_gnss_tty_write()
- eth: ixgbe: fix pci device refcount leak
- eth: mlx5:
- fix command stats access after free
- fix macsec possible null dereference when updating MAC security
entity (SecY)"
* tag 'net-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits)
r8152: add vendor/device ID pair for Microsoft Devkit
net: stmmac: add aux timestamps fifo clearance wait
bnxt: make sure we return pages to the pool
net: hns3: fix wrong use of rss size during VF rss config
ipv6: raw: Deduct extension header length in rawv6_push_pending_frames
net: lan966x: check for ptp to be enabled in lan966x_ptp_deinit()
net: sched: disallow noqueue for qdisc classes
iavf/iavf_main: actually log ->src mask when talking about it
igc: Fix PPS delta between two synchronized end-points
ixgbe: fix pci device refcount leak
octeontx2-pf: Fix resource leakage in VF driver unbind
selftests/net: l2_tos_ttl_inherit.sh: Ensure environment cleanup on failure.
selftests/net: l2_tos_ttl_inherit.sh: Run tests in their own netns.
selftests/net: l2_tos_ttl_inherit.sh: Set IPv6 addresses with "nodad".
net/mlx5e: Fix macsec possible null dereference when updating MAC security entity (SecY)
net/mlx5e: Fix macsec ssci attribute handling in offload path
net/mlx5: E-switch, Coverity: overlapping copy
net/mlx5e: Don't support encap rules with gbp option
net/mlx5: Fix ptp max frequency adjustment range
net/mlx5e: Fix memory leak on updating vport counters
...
|
|
If smb311 posix is enabled, we send the intended mode for file
creation in the posix create context. Instead of using what's there on
the stack, create the mfsymlink file with 0644.
Fixes: ce558b0e17f8a ("smb3: Add posix create context for smb3.11 posix mounts")
Cc: stable@vger.kernel.org
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Tom Talpey <tom@talpey.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
When syncing a log, if we fail to update a log root in the log root tree,
we are aborting the transaction if the failure was not -ENOSPC. This is
excessive because there is a chance that a transaction commit can succeed,
and therefore avoid to turn the filesystem into RO mode. All we need to be
careful about is to mark the log for a full commit, which we already do,
to make sure no one commits a super block pointing to an outdated log root
tree.
So don't abort the transaction if we fail to update a log root in the log
root tree, and log an error if the failure is not -ENOSPC, so that it does
not go completely unnoticed.
CC: stable@vger.kernel.org # 6.0+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When syncing the log, if we fail to write log tree extent buffers, we mark
the log for a full commit and abort the transaction. However we don't need
to abort the transaction, all we really need to do is to make sure no one
can commit a superblock pointing to new log tree roots. Just because we
got a failure writing extent buffers for a log tree, it does not mean we
will also fail to do a transaction commit.
One particular case is if due to a bug somewhere, when writing log tree
extent buffers, the tree checker detects some corruption and the writeout
fails because of that. Aborting the transaction can be very disruptive for
a user, specially if the issue happened on a root filesystem. One example
is the scenario in the Link tag below, where an isolated corruption on log
tree leaves was causing transaction aborts when syncing the log.
Link: https://lore.kernel.org/linux-btrfs/ae169fc6-f504-28f0-a098-6fa6a4dfb612@leemhuis.info/
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When logging conflicting inodes, if we reach the maximum limit of inodes,
we return BTRFS_LOG_FORCE_COMMIT to force a transaction commit. However
we don't mark the log for full commit (with btrfs_set_log_full_commit()),
which means that once we leave the log transaction and before we commit
the transaction, some other task may sync the log, which is incomplete
as we have not logged all conflicting inodes, leading to some inconsistent
in case that log ends up being replayed.
So also call btrfs_set_log_full_commit() at add_conflicting_inode().
Fixes: e09d94c9e448 ("btrfs: log conflicting inodes without holding log mutex of the initial inode")
CC: stable@vger.kernel.org # 6.1
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Sometimes we log a directory without holding its VFS lock, so while we
logging it, dir index entries may be added or removed. This typically
happens when logging a dentry from a parent directory that points to a
new directory, through log_new_dir_dentries(), or when while logging
some other inode we also need to log its parent directories (through
btrfs_log_all_parents()).
This means that while we are at log_dir_items(), we may not find a dir
index key we found before, because it was deleted in the meanwhile, so
a call to btrfs_search_slot() may return 1 (key not found). In that case
we return from log_dir_items() with a success value (the variable 'err'
has a value of 0). This can lead to a few problems, specially in the case
where the variable 'last_offset' has a value of (u64)-1 (and it's
initialized to that when it was declared):
1) By returning from log_dir_items() with success (0) and a value of
(u64)-1 for '*last_offset_ret', we end up not logging any other dir
index keys that follow the missing, just deleted, index key. The
(u64)-1 value makes log_directory_changes() not call log_dir_items()
again;
2) Before returning with success (0), log_dir_items(), will log a dir
index range item covering a range from the last old dentry index
(stored in the variable 'last_old_dentry_offset') to the value of
'last_offset'. If 'last_offset' has a value of (u64)-1, then it means
if the log is persisted and replayed after a power failure, it will
cause deletion of all the directory entries that have an index number
between last_old_dentry_offset + 1 and (u64)-1;
3) We can end up returning from log_dir_items() with
ctx->last_dir_item_offset having a lower value than
inode->last_dir_index_offset, because the former is set to the current
key we are processing at process_dir_items_leaf(), and at the end of
log_directory_changes() we set inode->last_dir_index_offset to the
current value of ctx->last_dir_item_offset. So if for example a
deletion of a lower dir index key happened, we set
ctx->last_dir_item_offset to that index value, then if we return from
log_dir_items() because btrfs_search_slot() returned 1, we end up
returning from log_dir_items() with success (0) and then
log_directory_changes() sets inode->last_dir_index_offset to a lower
value than it had before.
This can result in unpredictable and unexpected behaviour when we
need to log again the directory in the same transaction, and can result
in ending up with a log tree leaf that has duplicated keys, as we do
batch insertions of dir index keys into a log tree.
So fix this by making log_dir_items() move on to the next dir index key
if it does not find the one it was looking for.
Reported-by: David Arendt <admin@prnet.org>
Link: https://lore.kernel.org/linux-btrfs/ae169fc6-f504-28f0-a098-6fa6a4dfb612@leemhuis.info/
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|