Age | Commit message (Collapse) | Author | Files | Lines |
|
In certain scenarios, for example, during fuzz testing, the source
name may be NULL, which could lead to a kernel panic. Therefore, an
extra check for the source name should be added.
Fixes: a62a8ef9d97d ("virtio-fs: add virtiofs filesystem")
Cc: <stable@vger.kernel.org> # all LTS kernels
Signed-off-by: Xiangsheng Hou <xiangsheng.hou@mediatek.com>
Link: https://lore.kernel.org/20250407115111.25535-1-xiangsheng.hou@mediatek.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Fix devpts to parse uid and gid params using the correct type so that they
get interpreted in the context of the user namespace.
Fixes: cc0876f817d6 ("vfs: Convert devpts to use the new mount API")
Reported-by: Debarshi Ray <dray@redhat.com>
Closes: https://github.com/containers/podman/issues/25751
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/759134.1743596274@warthog.procyon.org.uk
cc: Eric Sandeen <sandeen@redhat.com>
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The ovl_get_verity_xattr() function was never added, only its declaration.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Fixes: 184996e92e86 ("ovl: Validate verity xattr when resolving lowerdata")
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Alexander Larsson <alexl@redhat.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
In theory overlayfs could support upper layer directly referring to a data
layer, but there's no current use case for this.
Originally, when data-only layers were introduced, this wasn't allowed,
only introduced by the "datadir+" feature, but without actually handling
this case, resulting in an Oops.
Fix by disallowing datadir without lowerdir.
Reported-by: Giuseppe Scrivano <gscrivan@redhat.com>
Fixes: 24e16e385f22 ("ovl: add support for appending lowerdirs one by one")
Cc: <stable@vger.kernel.org> # v6.7
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Alexander Larsson <alexl@redhat.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The filesystem's freeze/thaw functions can be called from contexts where
the holder isn't userspace but the kernel, e.g., during systemd
suspend/hibernate. So pass through the freeze/thaw flags from the VFS
instead of hard-coding them.
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Use a common iterator for all callbacks. We could go for something even
more elaborate (advance step-by-step similar to iov_iter) but I really
don't think this is warranted.
Link: https://lore.kernel.org/r/20250329-work-freeze-v2-5-a47af37ecc3d@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Use a common iterator for all callbacks.
Link: https://lore.kernel.org/r/20250329-work-freeze-v2-4-a47af37ecc3d@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Make all iterators uniform by performing an early check whether the
superblock is dying.
Link: https://lore.kernel.org/r/20250329-work-freeze-v2-3-a47af37ecc3d@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Make it easier to read and remove one level of identation.
Link: https://lore.kernel.org/r/20250329-work-freeze-v2-2-a47af37ecc3d@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The locking guarantees that the superblock is alive and sb->s_root is
still set. Remove the pointless check.
Link: https://lore.kernel.org/r/20250329-work-freeze-v2-1-a47af37ecc3d@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
All callers and implementations are now removed, so remove the operation
and update the documentation to match.
Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Link: https://lore.kernel.org/r/20250402150005.2309458-10-willy@infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
If we add a migrate_folio operation, we can convert the writepage
operation to writepages. Further, this lets us optimise by using
the same write handle for multiple folios. The large folio support here
is illusory; we would need to kmap each page in turn for proper support.
But we do remove a few hidden calls to compound_head().
Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Link: https://lore.kernel.org/r/20250402150005.2309458-3-willy@infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The migration code used to be able to migrate dirty 9p folios by writing
them back using writepage. When the writepage method was removed,
we neglected to add a migrate_folio method, which means that dirty 9p
folios have been unmovable ever since. This reduced our success at
defragmenting memory on machines which use 9p heavily.
Fixes: 80105ed2fd27 (9p: Use netfslib read/write_iter)
Cc: stable@vger.kernel.org
Cc: David Howells <dhowells@redhat.com>
Cc: v9fs@lists.linux.dev
Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Link: https://lore.kernel.org/r/20250402150005.2309458-2-willy@infradead.org
Acked-by: Dominique Martinet <asmadeus@codewreck.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Convert the bfs filesystem to use the new mount API.
Tested using mount and simple writes & reads on ro/rw bfs devices.
Signed-off-by: Pavel Reichl <preichl@redhat.com>
Link: https://lore.kernel.org/r/20250320204224.181403-1-preichl@redhat.com
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
cachefiles uses some VFS interfaces (such as vfs_mkdir) which take an
explicit mnt_idmap, and it passes &nop_mnt_idmap as cachefiles doesn't
yet support idmapped mounts.
It also uses the lookup_one_len() family of functions which implicitly
use &nop_mnt_idmap. This mixture of implicit and explicit could be
confusing. When we eventually update cachefiles to support idmap mounts it
would be best if all places which need an idmap determined from the
mount point were similar and easily found.
So this patch changes cachefiles to use lookup_one(), lookup_one_unlocked(),
and lookup_one_positive_unlocked(), passing &nop_mnt_idmap.
This has the benefit of removing the remaining user of the
lookup_one_len functions where permission checking is actually needed.
Other callers don't care about permission checking and using these
function only where permission checking is needed is a valuable
simplification.
This requires passing the name in a qstr. This is easily done with
QSTR() as the name is always nul terminated, and often strlen is used
anyway. ->d_name_len is removed as no longer useful.
Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250319031545.2999807-4-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
nfsd uses some VFS interfaces (such as vfs_mkdir) which take an explicit
mnt_idmap, and it passes &nop_mnt_idmap as nfsd doesn't yet support
idmapped mounts.
It also uses the lookup_one_len() family of functions which implicitly
use &nop_mnt_idmap. This mixture of implicit and explicit could be
confusing. When we eventually update nfsd to support idmap mounts it
would be best if all places which need an idmap determined from the
mount point were similar and easily found.
So this patch changes nfsd to use lookup_one(), lookup_one_unlocked(),
and lookup_one_positive_unlocked(), passing &nop_mnt_idmap.
This has the benefit of removing some uses of the lookup_one_len
functions where permission checking is actually needed. Many callers
don't care about permission checking and using these function only where
permission checking is needed is a valuable simplification.
This change requires passing the name in a qstr. Currently this is a
little clumsy, but if nfsd is changed to use qstr more broadly it will
result in a net improvement.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/r/20250319031545.2999807-3-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The family of functions:
lookup_one()
lookup_one_unlocked()
lookup_one_positive_unlocked()
appear designed to be used by external clients of the filesystem rather
than by filesystems acting on themselves as the lookup_one_len family
are used.
They are used by:
btrfs/ioctl - which is a user-space interface rather than an internal
activity
exportfs - i.e. from nfsd or the open_by_handle_at interface
overlayfs - at access the underlying filesystems
smb/server - for file service
They should be used by nfsd (more than just the exportfs path) and
cachefs but aren't.
It would help if the documentation didn't claim they should "not be
called by generic code".
Also the path component name is passed as "name" and "len" which are
(confusingly?) separate by the "base". In some cases the len in simply
"strlen" and so passing a qstr using QSTR() would make the calling
clearer.
Other callers do pass separate name and len which are stored in a
struct. Sometimes these are already stored in a qstr, other times it
easily could be.
So this patch changes these three functions to receive a 'struct qstr *',
and improves the documentation.
QSTR_LEN() is added to make it easy to pass a QSTR containing a known
len.
[brauner@kernel.org: take a struct qstr pointer]
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/r/20250319031545.2999807-2-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
When mounting the same share twice, once with the "linux" mount parameter
(or equivalently "posix") and then once without (or e.g. with "nolinux"),
we were incorrectly reusing the same tree connection for both mounts.
This meant that the first mount of the share on the client, would
cause subsequent mounts of that same share on the same client to
ignore that mount parm ("linux" vs. "nolinux") and incorrectly reuse
the same tcon.
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
the server
Fix regression in mounts to e.g. onedrive shares.
Generally, reparse points are processed by the SMB server during the
SMB OPEN request, but there are few reparse points which do not have
OPEN-like meaning for the SMB server and has to be processed by the SMB
client. Those are symlinks and special files (fifo, socket, block, char).
For Linux SMB client, it is required to process also name surrogate reparse
points as they represent another entity on the SMB server system. Linux
client will mark them as separate mount points. Examples of name surrogate
reparse points are NTFS junction points (e.g. created by the "mklink" tool
on Windows servers).
So after processing the name surrogate reparse points, clear the
-EOPNOTSUPP error code returned from the parse_reparse_point() to let SMB
server to process reparse points.
And remove printing misleading error message "unhandled reparse tag:" as
reparse points are handled by SMB server and hence unhandled fact is normal
operation.
Fixes: cad3fc0a4c8c ("cifs: Throw -EOPNOTSUPP error on unsupported reparse point type from parse_reparse_point()")
Fixes: b587fd128660 ("cifs: Treat unhandled directory name surrogate reparse points as mount directory nodes")
Cc: stable@vger.kernel.org
Reported-by: Junwen Sun <sunjw8888@gmail.com>
Tested-by: Junwen Sun <sunjw8888@gmail.com>
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Fixes "task out to lunch" warnings during recovery on large machines
with lots of dirty data in the journal.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
As with the other algorithms, bcachefs does not access xxhash through
the crypto API. So there is no need to use a module softdep to ensure
that it is loaded.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Just use the ChaCha20 and Poly1305 libraries instead of the clunky
crypto API. This is much simpler. It is also slightly faster, since
the libraries provide more direct access to the same
architecture-optimized ChaCha20 and Poly1305 code.
I've tested that existing encrypted bcachefs filesystems can be continue
to be accessed with this patch applied.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Commit 3ba0240a8789 fixed a bug in the read retry path in __bch2_read(),
and changed bchfs_read() to match - to avoid a landmine if
bch2_read_extent() ever starts returning transaction restarts.
But that was incorrect, because bchfs_read() doesn't use a separate
stack allocated bvec_iter, it uses the one in the rbio being submitted.
Add a comment explaining the issue, and revert the buggy change.
Fixes: 3ba0240a8789 ("bcachefs: Fix silent short reads in data read retry path")
Reported-by: syzbot+2deb10b8dc9aae6fab67@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Prevent incorrect byte ordering for big-endian systems.
Signed-off-by: Gabriel Shahrouzi <gshahrouzi@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Replace u64 with __le64 to match the expected parameter type. Ensure consistency both in function calls and within the function itself.
Signed-off-by: Gabriel Shahrouzi <gshahrouzi@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Remove backslash before format specifier. Ensure correct output.
Signed-off-by: Gabriel Shahrouzi <gshahrouzi@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
- Fix a nonsensical Kconfig combination
- Remove an unnecessary rseq-notification
* tag 'sched-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rseq: Eliminate useless task_work on execve
sched/isolation: Make CONFIG_CPU_ISOLATION depend on CONFIG_SMP
|
|
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.
Conversion was done with coccinelle plus manual fixups where necessary.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.samba.org/sfrench/cifs-2.6
Pull more smb client updates from Steve French:
- reconnect fixes: three for updating rsize/wsize and an SMB1 reconnect
fix
- RFC1001 fixes: fixing connections to nonstandard ports, and negprot
retries
- fix mfsymlinks to old servers
- make mapping of open flags for SMB1 more accurate
- permission fixes: adding retry on open for write, and one for stat to
workaround unexpected access denied
- add two new xattrs, one for retrieving SACL and one for retrieving
owner (without having to retrieve the whole ACL)
- fix mount parm validation for echo_interval
- minor cleanup (including removing now unneeded cifs_truncate_page)
* tag '6.15-rc-part2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal version number
cifs: Implement is_network_name_deleted for SMB1
cifs: Remove cifs_truncate_page() as it should be superfluous
cifs: Do not add FILE_READ_ATTRIBUTES when using GENERIC_READ/EXECUTE/ALL
cifs: Improve SMB2+ stat() to work also without FILE_READ_ATTRIBUTES
cifs: Add fallback for SMB2 CREATE without FILE_READ_ATTRIBUTES
cifs: Fix querying and creating MF symlinks over SMB1
cifs: Fix access_flags_to_smbopen_mode
cifs: Fix negotiate retry functionality
cifs: Improve handling of NetBIOS packets
cifs: Allow to disable or force initialization of NetBIOS session
cifs: Add a new xattr system.smb3_ntsd_owner for getting or setting owner
cifs: Add a new xattr system.smb3_ntsd_sacl for getting or setting SACLs
smb: client: Update IO sizes after reconnection
smb: client: Store original IO parameters and prevent zero IO sizes
smb:client: smb: client: Add reverse mapping from tcon to superblocks
cifs: remove unreachable code in cifs_get_tcp_session()
cifs: fix integer overflow in match_server()
|
|
Now that LIBCRC32C does nothing besides select CRC32, make every option
that selects LIBCRC32C instead select CRC32 directly. Then remove
LIBCRC32C.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250401221600.24878-8-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
syzbot reported a slab-out-of-bounds Read in isofs_fh_to_parent. [1]
The handle_bytes value passed in by the reproducing program is equal to 12.
In handle_to_path(), only 12 bytes of memory are allocated for the structure
file_handle->f_handle member, which causes an out-of-bounds access when
accessing the member parent_block of the structure isofs_fid in isofs,
because accessing parent_block requires at least 16 bytes of f_handle.
Here, fh_len is used to indirectly confirm that the value of handle_bytes
is greater than 3 before accessing parent_block.
[1]
BUG: KASAN: slab-out-of-bounds in isofs_fh_to_parent+0x1b8/0x210 fs/isofs/export.c:183
Read of size 4 at addr ffff0000cc030d94 by task syz-executor215/6466
CPU: 1 UID: 0 PID: 6466 Comm: syz-executor215 Not tainted 6.14.0-rc7-syzkaller-ga2392f333575 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
Call trace:
show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:466 (C)
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0xe4/0x150 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:408 [inline]
print_report+0x198/0x550 mm/kasan/report.c:521
kasan_report+0xd8/0x138 mm/kasan/report.c:634
__asan_report_load4_noabort+0x20/0x2c mm/kasan/report_generic.c:380
isofs_fh_to_parent+0x1b8/0x210 fs/isofs/export.c:183
exportfs_decode_fh_raw+0x2dc/0x608 fs/exportfs/expfs.c:523
do_handle_to_path+0xa0/0x198 fs/fhandle.c:257
handle_to_path fs/fhandle.c:385 [inline]
do_handle_open+0x8cc/0xb8c fs/fhandle.c:403
__do_sys_open_by_handle_at fs/fhandle.c:443 [inline]
__se_sys_open_by_handle_at fs/fhandle.c:434 [inline]
__arm64_sys_open_by_handle_at+0x80/0x94 fs/fhandle.c:434
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:744
el0t_64_sync_handler+0x84/0x108 arch/arm64/kernel/entry-common.c:762
el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600
Allocated by task 6466:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x40/0x78 mm/kasan/common.c:68
kasan_save_alloc_info+0x40/0x50 mm/kasan/generic.c:562
poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
__kasan_kmalloc+0xac/0xc4 mm/kasan/common.c:394
kasan_kmalloc include/linux/kasan.h:260 [inline]
__do_kmalloc_node mm/slub.c:4294 [inline]
__kmalloc_noprof+0x32c/0x54c mm/slub.c:4306
kmalloc_noprof include/linux/slab.h:905 [inline]
handle_to_path fs/fhandle.c:357 [inline]
do_handle_open+0x5a4/0xb8c fs/fhandle.c:403
__do_sys_open_by_handle_at fs/fhandle.c:443 [inline]
__se_sys_open_by_handle_at fs/fhandle.c:434 [inline]
__arm64_sys_open_by_handle_at+0x80/0x94 fs/fhandle.c:434
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:744
el0t_64_sync_handler+0x84/0x108 arch/arm64/kernel/entry-common.c:762
el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600
Reported-by: syzbot+4d7cd7dd0ce1aa8d5c65@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4d7cd7dd0ce1aa8d5c65
Tested-by: syzbot+4d7cd7dd0ce1aa8d5c65@syzkaller.appspotmail.com
CC: stable@vger.kernel.org
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/tencent_9C8CB8A7E7C6C512C7065DC98B6EDF6EC606@qq.com
|
|
Pull smb server fixes from Steve French:
"Four ksmbd SMB3 server fixes, all also for stable"
* tag 'v6.15rc-part2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix null pointer dereference in alloc_preauth_hash()
ksmbd: validate zero num_subauth before sub_auth is accessed
ksmbd: fix overflow in dacloffset bounds check
ksmbd: fix session use-after-free in multichannel connection
|
|
Don't use a scoped guard that only protects the next statement.
Use a regular guard to make sure that the namespace semaphore is held
across the whole function.
Signed-off-by: Christian Brauner <brauner@kernel.org>
Reported-by: Leon Romanovsky <leon@kernel.org>
Link: https://lore.kernel.org/all/20250401170715.GA112019@unreal/
Fixes: db04662e2f4f ("fs: allow detached mounts in clone_private_mount()")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull more bcachefs updates from Kent Overstreet:
"More notable fixes:
- Fix for striping behaviour on tiering filesystems where replicas
exceeds durability on destination target
- Fix a race in device removal where deleting alloc info races with
the discard worker
- Some small stack usage improvements: this is just enough for KMSAN
builds to not blow the stack, more is queued up for 6.16"
* tag 'bcachefs-2025-04-03' of git://evilpiepirate.org/bcachefs:
bcachefs: Fix "journal stuck" during recovery
bcachefs: backpointer_get_key: check for null from peek_slot()
bcachefs: Fix null ptr deref in invalidate_one_bucket()
bcachefs: Fix check_snapshot_exists() restart handling
bcachefs: use nonblocking variant of print_string_as_lines in error path
bcachefs: Fix scheduling while atomic from logging changes
bcachefs: Add error handling for zlib_deflateInit2()
bcachefs: add missing selection of XARRAY_MULTI
bcachefs: bch_dev_usage_full
bcachefs: Kill btree_iter.trans
bcachefs: do_trace_key_cache_fill()
bcachefs: Split up bch_dev.io_ref
bcachefs: fix ref leak in btree_node_read_all_replicas
bcachefs: Fix null ptr deref in bch2_write_endio()
bcachefs: Fix field spanning write warning
bcachefs: Fix striping behaviour
|
|
Pull 9p updates from Dominique Martinet:
- fix handling of bogus (negative/too long) replies
- fix crash on mkdir with ACLs (... looks like nobody is using ACLs
with semi-recent kernels...)
- ipv6 support for trans=tcp
- minor concurrency fix to make syzbot happy
- minor cleanup
* tag '9p-for-6.15-rc1' of https://github.com/martinetd/linux:
docs: fs/9p: Add missing "not" in cache documentation
9p: Use hashtable.h for hash_errmap
Documentation/fs/9p: fix broken link
9p/trans_fd: mark concurrent read and writes to p9_conn->err
9p/net: return error on bogus (longer than requested) replies
9p/net: fix improper handling of bogus negative read/write replies
fs/9p: fix NULL pointer dereference on mkdir
net/9p/fd: support ipv6 for trans=tcp
|
|
In ceph, in fill_fscrypt_truncate(), the end flush position is calculated
by:
loff_t lend = orig_pos + CEPH_FSCRYPT_BLOCK_SHIFT - 1;
but that's using the block shift not the block size.
Fix this to use the block size instead.
Fixes: 5c64737d2536 ("ceph: add truncate size handling support for fscrypt")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM hotfixes from Andrew Morton:
"Five hotfixes. Three are cc:stable and the remainder address post-6.14
issues or aren't considered necessary for -stable kernels.
All patches are for MM"
* tag 'mm-hotfixes-stable-2025-04-02-21-57' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: zswap: fix crypto_free_acomp() deadlock in zswap_cpu_comp_dead()
mm/hugetlb: move hugetlb_sysctl_init() to the __init section
mm: page_isolation: avoid calling folio_hstate() without hugetlb_lock
mm/hugetlb_vmemmap: fix memory loads ordering
mm/userfaultfd: fix release hang over concurrent GUP
|
|
If we crash when the journal pin fifo is completely full - i.e. we're
at the maximum number of dirty journal entries - that may put us in a
sticky situation in recovery, as journal replay will need to be able to
open new journal entries in order to get going.
bch2_fs_journal_start() already had provisions for resizing the journal
pin fifo if needed, but it needs a fudge factor to ensure there's room
for journal replay.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
peek_slot() doesn't normally return bkey_s_c_null - except when we ask
for a key at a btree level that doesn't exist, which can happen here.
We might want to revisit this, but we'll have to look over all the
places where we use peek_slot() on interior nodes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_backpointer_get_key() returns bkey_s_c_null when the target isn't
found.
backpointer_get_key() flags the error, so there's nothing else to do
here - just skip it and move on.
Link: https://github.com/koverstreet/bcachefs/issues/847
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Codepaths that create entries in the snapshots btree currently call
bch2_mark_snapshot(), which updates the in-memory snapshot table, before
transaction commit.
This is because bch2_mark_snapshot() is an atomic trigger, run with
btree write locks held, and isn't allowed to fail - but it might need to
reallocate the table, hence we call it early when we're still allowed to
fail.
This is generally harmless - if we fail, we'll have left an entry in the
snapshots table around, but nothing will reference it and it'll get
overwritten if reused by another transaction.
But check_snapshot_exists(), which reconstructs snapshots when the
snapshots btree has been corrupted or lost, was erronously rechecking if
the snapshot exists inside the transaction commit loop - so on
transaction restart (in this case mem_realloced), the second iteration
would return without repairing.
This code needs some cleanup: splitting out a "maybe realloc snapshots
table" helper would have avoided this, that will be in the next patch.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The inconsistency error path calls print_string_as_lines, which calls
console_lock, which is a potentially-sleeping function and so can't be
called in an atomic context.
Replace calls to it with the nonblocking variant which is safe to call.
Signed-off-by: Bharadwaj Raju <bharadwaj.raju777@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Two fixes from the recent logging changes:
bch2_inconsistent(), bch2_fs_inconsistent() be called from interrupt
context, or with rcu_read_lock() held.
The one syzbot found is in
bch2_bkey_pick_read_device
bch2_dev_rcu
bch2_fs_inconsistent
We're starting to switch to lift the printbufs up to higher levels so we
can emit better log messages and print them all in one go (avoid
garbling), so that conversion will help with spotting these in the
future; when we declare a printbuf it must be flagged if we're in an
atomic context.
Secondly, in btree_node_write_endio:
00085 BUG: sleeping function called from invalid context at include/linux/sched/mm.h:321
00085 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 618, name: bch-reclaim/fa6
00085 preempt_count: 10001, expected: 0
00085 RCU nest depth: 0, expected: 0
00085 4 locks held by bch-reclaim/fa6/618:
00085 #0: ffffff80d7ccad68 (&j->reclaim_lock){+.+.}-{4:4}, at: bch2_journal_reclaim_thread+0x84/0x198
00085 #1: ffffff80d7c84218 (&c->btree_trans_barrier){.+.+}-{0:0}, at: __bch2_trans_get+0x1c0/0x440
00085 #2: ffffff80cd3f8140 (bcachefs_btree){+.+.}-{0:0}, at: __bch2_trans_get+0x22c/0x440
00085 #3: ffffff80c3823c20 (&vblk->vqs[i].lock){-.-.}-{3:3}, at: virtblk_done+0x58/0x130
00085 irq event stamp: 328
00085 hardirqs last enabled at (327): [<ffffffc080073a14>] finish_task_switch.isra.0+0xbc/0x2a0
00085 hardirqs last disabled at (328): [<ffffffc080971a10>] el1_interrupt+0x20/0x60
00085 softirqs last enabled at (0): [<ffffffc08002f920>] copy_process+0x7c8/0x2118
00085 softirqs last disabled at (0): [<0000000000000000>] 0x0
00085 Preemption disabled at:
00085 [<ffffffc08003ada0>] irq_enter_rcu+0x18/0x90
00085 CPU: 8 UID: 0 PID: 618 Comm: bch-reclaim/fa6 Not tainted 6.14.0-rc6-ktest-g04630bde23e8 #18798
00085 Hardware name: linux,dummy-virt (DT)
00085 Call trace:
00085 show_stack+0x1c/0x30 (C)
00085 dump_stack_lvl+0x84/0xc0
00085 dump_stack+0x14/0x20
00085 __might_resched+0x180/0x288
00085 __might_sleep+0x4c/0x88
00085 __kmalloc_node_track_caller_noprof+0x34c/0x3e0
00085 krealloc_noprof+0x1a0/0x2d8
00085 bch2_printbuf_make_room+0x9c/0x120
00085 bch2_prt_printf+0x60/0x1b8
00085 btree_node_write_endio+0x1b0/0x2d8
00085 bio_endio+0x138/0x1f0
00085 btree_node_write_endio+0xe8/0x2d8
00085 bio_endio+0x138/0x1f0
00085 blk_update_request+0x220/0x4c0
00085 blk_mq_end_request+0x28/0x148
00085 virtblk_request_done+0x64/0xe8
00085 blk_mq_complete_request+0x34/0x40
00085 virtblk_done+0x78/0x130
00085 vring_interrupt+0x6c/0xb0
00085 __handle_irq_event_percpu+0x8c/0x2e0
00085 handle_irq_event+0x50/0xb0
00085 handle_fasteoi_irq+0xc4/0x250
00085 handle_irq_desc+0x44/0x60
00085 generic_handle_domain_irq+0x20/0x30
00085 gic_handle_irq+0x54/0xc8
00085 call_on_irq_stack+0x24/0x40
Reported-by: syzbot+c82cd2906e2f192410bb@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
In attempt_compress(), the return value of zlib_deflateInit2() needs to be
checked. A proper implementation can be found in pstore_compress().
Add an error check and return 0 immediately if the initialzation fails.
Fixes: 986e9842fb68 ("bcachefs: Compression levels")
Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
stbl is s8 but it must contain offsets into slot which can go from 0 to
127.
Added a bound check for that error and return -EIO if the check fails.
Also make jfs_readdir return with error if add_missing_indices returns
with an error.
Reported-by: syzbot+b974bd41515f770c608b@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com./bug?extid=b974bd41515f770c608b
Signed-off-by: Aditya Dutt <duttaditya18@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
[ Syzkaller Report ]
Oops: general protection fault, probably for non-canonical address
0xdffffc0000000087: 0000 [#1
KASAN: null-ptr-deref in range [0x0000000000000438-0x000000000000043f]
CPU: 2 UID: 0 PID: 10614 Comm: syz-executor.0 Not tainted
6.13.0-rc6-gfbfd64d25c7a-dirty #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Sched_ext: serialise (enabled+all), task: runnable_at=-30ms
RIP: 0010:jfs_ioc_trim+0x34b/0x8f0
Code: e7 e8 59 a4 87 fe 4d 8b 24 24 4d 8d bc 24 38 04 00 00 48 8d 93
90 82 fe ff 4c 89 ff 31 f6
RSP: 0018:ffffc900055f7cd0 EFLAGS: 00010206
RAX: 0000000000000087 RBX: 00005866a9e67ff8 RCX: 000000000000000a
RDX: 0000000000000001 RSI: 0000000000000004 RDI: 0000000000000001
RBP: dffffc0000000000 R08: ffff88807c180003 R09: 1ffff1100f830000
R10: dffffc0000000000 R11: ffffed100f830001 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000438
FS: 00007fe520225640(0000) GS:ffff8880b7e80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005593c91b2c88 CR3: 000000014927c000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
? __die_body+0x61/0xb0
? die_addr+0xb1/0xe0
? exc_general_protection+0x333/0x510
? asm_exc_general_protection+0x26/0x30
? jfs_ioc_trim+0x34b/0x8f0
jfs_ioctl+0x3c8/0x4f0
? __pfx_jfs_ioctl+0x10/0x10
? __pfx_jfs_ioctl+0x10/0x10
__se_sys_ioctl+0x269/0x350
? __pfx___se_sys_ioctl+0x10/0x10
? do_syscall_64+0xfb/0x210
do_syscall_64+0xee/0x210
? syscall_exit_to_user_mode+0x1e0/0x330
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fe51f4903ad
Code: c3 e8 a7 2b 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d
RSP: 002b:00007fe5202250c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007fe51f5cbf80 RCX: 00007fe51f4903ad
RDX: 0000000020000680 RSI: 00000000c0185879 RDI: 0000000000000005
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe520225640
R13: 000000000000000e R14: 00007fe51f44fca0 R15: 00007fe52021d000
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:jfs_ioc_trim+0x34b/0x8f0
Code: e7 e8 59 a4 87 fe 4d 8b 24 24 4d 8d bc 24 38 04 00 00 48 8d 93
90 82 fe ff 4c 89 ff 31 f6
RSP: 0018:ffffc900055f7cd0 EFLAGS: 00010206
RAX: 0000000000000087 RBX: 00005866a9e67ff8 RCX: 000000000000000a
RDX: 0000000000000001 RSI: 0000000000000004 RDI: 0000000000000001
RBP: dffffc0000000000 R08: ffff88807c180003 R09: 1ffff1100f830000
R10: dffffc0000000000 R11: ffffed100f830001 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000438
FS: 00007fe520225640(0000) GS:ffff8880b7e80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005593c91b2c88 CR3: 000000014927c000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Kernel panic - not syncing: Fatal exception
[ Analysis ]
We believe that we have found a concurrency bug in the `fs/jfs` module
that results in a null pointer dereference. There is a closely related
issue which has been fixed:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d6c1b3599b2feb5c7291f5ac3a36e5fa7cedb234
... but, unfortunately, the accepted patch appears to still be
susceptible to a null pointer dereference under some interleavings.
To trigger the bug, we think that `JFS_SBI(ipbmap->i_sb)->bmap` is set
to NULL in `dbFreeBits` and then dereferenced in `jfs_ioc_trim`. This
bug manifests quite rarely under normal circumstances, but is
triggereable from a syz-program.
Reported-and-tested-by: Dylan J. Wolff<wolffd@comp.nus.edu.sg>
Reported-and-tested-by: Jiacheng Xu <stitch@zju.edu.cn>
Signed-off-by: Dylan J. Wolff<wolffd@comp.nus.edu.sg>
Signed-off-by: Jiacheng Xu <stitch@zju.edu.cn>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
Validate db_agheight, db_agwidth, and db_agstart in dbMount to catch
corrupted metadata early and avoid undefined behavior in dbAllocAG.
Limits are derived from L2LPERCTL, LPERCTL/MAXAG, and CTLTREESIZE:
- agheight: 0 to L2LPERCTL/2 (0 to 5) ensures shift
(L2LPERCTL - 2*agheight) >= 0.
- agwidth: 1 to min(LPERCTL/MAXAG, 2^(L2LPERCTL - 2*agheight))
ensures agperlev >= 1.
- Ranges: 1-8 (agheight 0-3), 1-4 (agheight 4), 1 (agheight 5).
- LPERCTL/MAXAG = 1024/128 = 8 limits leaves per AG;
2^(10 - 2*agheight) prevents division to 0.
- agstart: 0 to CTLTREESIZE-1 - agwidth*(MAXAG-1) keeps ti within
stree (size 1365).
- Ranges: 0-1237 (agwidth 1), 0-348 (agwidth 8).
UBSAN: shift-out-of-bounds in fs/jfs/jfs_dmap.c:1400:9
shift exponent -335544310 is negative
CPU: 0 UID: 0 PID: 5822 Comm: syz-executor130 Not tainted 6.14.0-rc5-syzkaller #0
Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
ubsan_epilogue lib/ubsan.c:231 [inline]
__ubsan_handle_shift_out_of_bounds+0x3c8/0x420 lib/ubsan.c:468
dbAllocAG+0x1087/0x10b0 fs/jfs/jfs_dmap.c:1400
dbDiscardAG+0x352/0xa20 fs/jfs/jfs_dmap.c:1613
jfs_ioc_trim+0x45a/0x6b0 fs/jfs/jfs_discard.c:105
jfs_ioctl+0x2cd/0x3e0 fs/jfs/ioctl.c:131
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:906 [inline]
__se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Cc: stable@vger.kernel.org
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+fe8264911355151c487f@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=fe8264911355151c487f
Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
Eliminate a useless task_work on execve by moving the call to
rseq_set_notify_resume() from sched_mm_cid_after_execve() to the error
path of bprm_execve().
The call to rseq_set_notify_resume() from sched_mm_cid_after_execve() is
pointless in the success case, because rseq_execve() will clear the rseq
pointer before returning to userspace.
sched_mm_cid_after_execve() is called from both the success and error
paths of bprm_execve(). The call to rseq_set_notify_resume() is needed
on error because the mm_cid may have changed.
Also move the rseq_execve() to right after sched_mm_cid_after_execve()
in bprm_execve().
[ mingo: Merged to a recent upstream kernel, extended the changelog. ]
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250327132945.1558783-1-mathieu.desnoyers@efficios.com
|
|
To 2.52
Signed-off-by: Steve French <stfrench@microsoft.com>
|