Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
"22 hotfixes. 13 are cc:stable and the remainder address post-6.14
issues or aren't considered necessary for -stable kernels.
About half are for MM. Five OCFS2 fixes and a few MAINTAINERS updates"
* tag 'mm-hotfixes-stable-2025-05-10-14-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits)
mm: fix folio_pte_batch() on XEN PV
nilfs2: fix deadlock warnings caused by lock dependency in init_nilfs()
mm/hugetlb: copy the CMA flag when demoting
mm, swap: fix false warning for large allocation with !THP_SWAP
selftests/mm: fix a build failure on powerpc
selftests/mm: fix build break when compiling pkey_util.c
mm: vmalloc: support more granular vrealloc() sizing
tools/testing/selftests: fix guard region test tmpfs assumption
ocfs2: stop quota recovery before disabling quotas
ocfs2: implement handshaking with ocfs2 recovery thread
ocfs2: switch osb->disable_recovery to enum
mailmap: map Uwe's BayLibre addresses to a single one
MAINTAINERS: add mm THP section
mm/userfaultfd: fix uninitialized output field for -EAGAIN race
selftests/mm: compaction_test: support platform with huge mount of memory
MAINTAINERS: add core mm section
ocfs2: fix panic in failed foilio allocation
mm/huge_memory: fix dereferencing invalid pmd migration entry
MAINTAINERS: add reverse mapping section
x86: disable image size check for test builds
...
|
|
Pull mount fixes from Al Viro:
"A couple of races around legalize_mnt vs umount (both fairly old and
hard to hit) plus two bugs in move_mount(2) - both around 'move
detached subtree in place' logics"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix IS_MNT_PROPAGATING uses
do_move_mount(): don't leak MNTNS_PROPAGATING on failures
do_umount(): add missing barrier before refcount checks in sync case
__legitimize_mnt(): check for MNT_SYNC_UMOUNT should be under mount_lock
|
|
Pull smb client fixes from Steve French:
- Fix dentry leak which can cause umount crash
- Add warning for parse contexts error on compounded operation
* tag '6.15-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: Avoid race in open_cached_dir with lease breaks
smb3 client: warn when parse contexts returns error on compounded operation
|
|
propagate_mnt() does not attach anything to mounts created during
propagate_mnt() itself. What's more, anything on ->mnt_slave_list
of such new mount must also be new, so we don't need to even look
there.
When move_mount() had been introduced, we've got an additional
class of mounts to skip - if we are moving from anon namespace,
we do not want to propagate to mounts we are moving (i.e. all
mounts in that anon namespace).
Unfortunately, the part about "everything on their ->mnt_slave_list
will also be ignorable" is not true - if we have propagation graph
A -> B -> C
and do OPEN_TREE_CLONE open_tree() of B, we get
A -> [B <-> B'] -> C
as propagation graph, where B' is a clone of B in our detached tree.
Making B private will result in
A -> B' -> C
C still gets propagation from A, as it would after making B private
if we hadn't done that open_tree(), but now the propagation goes
through B'. Trying to move_mount() our detached tree on subdirectory
in A should have
* moved B' on that subdirectory in A
* skipped the corresponding subdirectory in B' itself
* copied B' on the corresponding subdirectory in C.
As it is, the logics in propagation_next() and friends ends up
skipping propagation into C, since it doesn't consider anything
downstream of B'.
IOW, walking the propagation graph should only skip the ->mnt_slave_list
of new mounts; the only places where the check for "in that one
anon namespace" are applicable are propagate_one() (where we should
treat that as the same kind of thing as "mountpoint we are looking
at is not visible in the mount we are looking at") and
propagation_would_overmount(). The latter is better dealt with
in the caller (can_move_mount_beneath()); on the first call of
propagation_would_overmount() the test is always false, on the
second it is always true in "move from anon namespace" case and
always false in "move within our namespace" one, so it's easier
to just use check_mnt() before bothering with the second call and
be done with that.
Fixes: 064fe6e233e8 ("mount: handle mount propagation for detached mount trees")
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
as it is, a failed move_mount(2) from anon namespace breaks
all further propagation into that namespace, including normal
mounts in non-anon namespaces that would otherwise propagate
there.
Fixes: 064fe6e233e8 ("mount: handle mount propagation for detached mount trees")
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
do_umount() analogue of the race fixed in 119e1ef80ecf "fix
__legitimize_mnt()/mntput() race". Here we want to make sure that
if __legitimize_mnt() doesn't notice our lock_mount_hash(), we will
notice their refcount increment. Harder to hit than mntput_no_expire()
one, fortunately, and consequences are milder (sync umount acting
like umount -l on a rare race with RCU pathwalk hitting at just the
wrong time instead of use-after-free galore mntput_no_expire()
counterpart used to be hit). Still a bug...
Fixes: 48a066e72d97 ("RCU'd vfsmounts")
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... or we risk stealing final mntput from sync umount - raising mnt_count
after umount(2) has verified that victim is not busy, but before it
has set MNT_SYNC_UMOUNT; in that case __legitimize_mnt() doesn't see
that it's safe to quietly undo mnt_count increment and leaves dropping
the reference to caller, where it'll be a full-blown mntput().
Check under mount_lock is needed; leaving the current one done before
taking that makes no sense - it's nowhere near common enough to bother
with.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Pull bcachefs fixes from Kent Overstreet:
- Some fixes to help with filesystem analysis: ensure superblock
error count gets written if we go ERO, don't discard the journal
aggressively (so it's available for list_journal -a)
- Fix lost wakeup on arm causing us to get stuck when reading btree
nodes
- Fix fsck failing to exit on ctrl-c
- An additional fix for filesystems with misaligned bucket sizes: we
now ensure that allocations are properly aligned
- Setting background target but not promote target will now leave that
data cached on the foreground target, as it used to
- Revert a change to when we allocate the VFS superblock, this was done
for implementing blk_holder_ops but ended up not being needed, and
allocating a superblock and not setting SB_BORN while we do recovery
caused sync() calls and other things to hang
- Assorted fixes for harmless error messages that caused concern to
users
* tag 'bcachefs-2025-05-08' of git://evilpiepirate.org/bcachefs:
bcachefs: Don't aggressively discard the journal
bcachefs: Ensure superblock gets written when we go ERO
bcachefs: Filter out harmless EROFS error messages
bcachefs: journal_shutdown is EROFS, not EIO
bcachefs: Call bch2_fs_start before getting vfs superblock
bcachefs: fix hung task timeout in journal read
bcachefs: Add missing barriers before wake_up_bit()
bcachefs: Ensure proper write alignment
bcachefs: Improve want_cached_ptr()
bcachefs: thread_with_stdio: fix spinning instead of exiting
|
|
Pull smb server fixes from Steve French:
- Fix UAF closing file table (e.g. in tree disconnect)
- Fix potential out of bounds write
- Fix potential memory leak parsing lease state in open
- Fix oops in rename with empty target
* tag 'v6.15-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: Fix UAF in __close_file_table_ids
ksmbd: prevent out-of-bounds stream writes by validating *pos
ksmbd: fix memory leak in parse_lease_state()
ksmbd: prevent rename with empty string
|
|
After commit c0e473a0d226 ("block: fix race between set_blocksize and read
paths") was merged, set_blocksize() called by sb_set_blocksize() now locks
the inode of the backing device file. As a result of this change, syzbot
started reporting deadlock warnings due to a circular dependency involving
the semaphore "ns_sem" of the nilfs object, the inode lock of the backing
device file, and the locks that this inode lock is transitively dependent
on.
This is caused by a new lock dependency added by the above change, since
init_nilfs() calls sb_set_blocksize() in the lock section of "ns_sem".
However, these warnings are false positives because init_nilfs() is called
in the early stage of the mount operation and the filesystem has not yet
started.
The reason why "ns_sem" is locked in init_nilfs() was to avoid a race
condition in nilfs_fill_super() caused by sharing a nilfs object among
multiple filesystem instances (super block structures) in the early
implementation. However, nilfs objects and super block structures have
long ago become one-to-one, and there is no longer any need to use the
semaphore there.
So, fix this issue by removing the use of the semaphore "ns_sem" in
init_nilfs().
Link: https://lkml.kernel.org/r/20250503053327.12294-1-konishi.ryusuke@gmail.com
Fixes: c0e473a0d226 ("block: fix race between set_blocksize and read paths")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+00f7f5b884b117ee6773@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=00f7f5b884b117ee6773
Tested-by: syzbot+00f7f5b884b117ee6773@syzkaller.appspotmail.com
Reported-by: syzbot+f30591e72bfc24d4715b@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f30591e72bfc24d4715b
Tested-by: syzbot+f30591e72bfc24d4715b@syzkaller.appspotmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently quota recovery is synchronized with unmount using sb->s_umount
semaphore. That is however prone to deadlocks because
flush_workqueue(osb->ocfs2_wq) called from umount code can wait for quota
recovery to complete while ocfs2_finish_quota_recovery() waits for
sb->s_umount semaphore.
Grabbing of sb->s_umount semaphore in ocfs2_finish_quota_recovery() is
only needed to protect that function from disabling of quotas from
ocfs2_dismount_volume(). Handle this problem by disabling quota recovery
early during unmount in ocfs2_dismount_volume() instead so that we can
drop acquisition of sb->s_umount from ocfs2_finish_quota_recovery().
Link: https://lkml.kernel.org/r/20250424134515.18933-6-jack@suse.cz
Fixes: 5f530de63cfc ("ocfs2: Use s_umount for quota recovery protection")
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Shichangkuo <shi.changkuo@h3c.com>
Reported-by: Murad Masimov <m.masimov@mt-integration.ru>
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
Tested-by: Heming Zhao <heming.zhao@suse.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We will need ocfs2 recovery thread to acknowledge transitions of
recovery_state when disabling particular types of recovery. This is
similar to what currently happens when disabling recovery completely, just
more general. Implement the handshake and use it for exit from recovery.
Link: https://lkml.kernel.org/r/20250424134515.18933-5-jack@suse.cz
Fixes: 5f530de63cfc ("ocfs2: Use s_umount for quota recovery protection")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
Tested-by: Heming Zhao <heming.zhao@suse.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Murad Masimov <m.masimov@mt-integration.ru>
Cc: Shichangkuo <shi.changkuo@h3c.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "ocfs2: Fix deadlocks in quota recovery", v3.
This implements another approach to fixing quota recovery deadlocks. We
avoid grabbing sb->s_umount semaphore from ocfs2_finish_quota_recovery()
and instead stop quota recovery early in ocfs2_dismount_volume().
This patch (of 3):
We will need more recovery states than just pure enable / disable to fix
deadlocks with quota recovery. Switch osb->disable_recovery to enum.
Link: https://lkml.kernel.org/r/20250424134301.1392-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20250424134515.18933-4-jack@suse.cz
Fixes: 5f530de63cfc ("ocfs2: Use s_umount for quota recovery protection")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
Tested-by: Heming Zhao <heming.zhao@suse.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Murad Masimov <m.masimov@mt-integration.ru>
Cc: Shichangkuo <shi.changkuo@h3c.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
While discussing some userfaultfd relevant issues recently, Andrea noticed
a potential ABI breakage with -EAGAIN on almost all userfaultfd ioctl()s.
Quote from Andrea, explaining how -EAGAIN was processed, and how this
should fix it (taking example of UFFDIO_COPY ioctl):
The "mmap_changing" and "stale pmd" conditions are already reported as
-EAGAIN written in the copy field, this does not change it. This change
removes the subnormal case that left copy.copy uninitialized and required
apps to explicitly set the copy field to get deterministic
behavior (which is a requirement contrary to the documentation in both
the manpage and source code). In turn there's no alteration to backwards
compatibility as result of this change because userland will find the
copy field consistently set to -EAGAIN, and not anymore sometime -EAGAIN
and sometime uninitialized.
Even then the change only can make a difference to non cooperative users
of userfaultfd, so when UFFD_FEATURE_EVENT_* is enabled, which is not
true for the vast majority of apps using userfaultfd or this unintended
uninitialized field may have been noticed sooner.
Meanwhile, since this bug existed for years, it also almost affects all
ioctl()s that was introduced later. Besides UFFDIO_ZEROPAGE, these also
get affected in the same way:
- UFFDIO_CONTINUE
- UFFDIO_POISON
- UFFDIO_MOVE
This patch should have fixed all of them.
Link: https://lkml.kernel.org/r/20250424215729.194656-2-peterx@redhat.com
Fixes: df2cc96e7701 ("userfaultfd: prevent non-cooperative events vs mcopy_atomic races")
Fixes: f619147104c8 ("userfaultfd: add UFFDIO_CONTINUE ioctl")
Fixes: fc71884a5f59 ("mm: userfaultfd: add new UFFDIO_POISON ioctl")
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
commit 7e119cff9d0a ("ocfs2: convert w_pages to w_folios") and commit
9a5e08652dc4b ("ocfs2: use an array of folios instead of an array of
pages") save -ENOMEM in the folio array upon allocation failure and call
the folio array free code.
The folio array free code expects either valid folio pointers or NULL.
Finding the -ENOMEM will result in a panic. Fix by NULLing the error
folio entry.
Link: https://lkml.kernel.org/r/c879a52b-835c-4fa0-902b-8b2e9196dcbd@oracle.com
Fixes: 7e119cff9d0a ("ocfs2: convert w_pages to w_folios")
Fixes: 9a5e08652dc4b ("ocfs2: use an array of folios instead of an array of pages")
Signed-off-by: Mark Tinguely <mark.tinguely@oracle.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
commit 4eb7b93e0310 ("ocfs2: improve write IO performance when
fragmentation is high") introduced another regression.
The following ocfs2-test case can trigger this issue:
> discontig_runner.sh => activate_discontig_bg.sh => resv_unwritten:
> ${RESV_UNWRITTEN_BIN} -f ${WORK_PLACE}/large_testfile -s 0 -l \
> $((${FILE_MAJOR_SIZE_M}*1024*1024))
In my env, test disk size (by "fdisk -l <dev>"):
> 53687091200 bytes, 104857600 sectors.
Above command is:
> /usr/local/ocfs2-test/bin/resv_unwritten -f \
> /mnt/ocfs2/ocfs2-activate-discontig-bg-dir/large_testfile -s 0 -l \
> 53187969024
Error log:
> [*] Reserve 50724M space for a LARGE file, reserve 200M space for future test.
> ioctl error 28: "No space left on device"
> resv allocation failed Unknown error -1
> reserve unwritten region from 0 to 53187969024.
Call flow:
__ocfs2_change_file_space //by ioctl OCFS2_IOC_RESVSP64
ocfs2_allocate_unwritten_extents //start:0 len:53187969024
while()
+ ocfs2_get_clusters //cpos:0, alloc_size:1623168 (cluster number)
+ ocfs2_extend_allocation
+ ocfs2_lock_allocators
| + choose OCFS2_AC_USE_MAIN & ocfs2_cluster_group_search
|
+ ocfs2_add_inode_data
ocfs2_add_clusters_in_btree
__ocfs2_claim_clusters
ocfs2_claim_suballoc_bits
+ During the allocation of the final part of the large file
(after ~47GB), no chain had the required contiguous
bits_wanted. Consequently, the allocation failed.
How to fix:
When OCFS2 is encountering fragmented allocation, the file system should
stop attempting bits_wanted contiguous allocation and instead provide the
largest available contiguous free bits from the cluster groups.
Link: https://lkml.kernel.org/r/20250414060125.19938-2-heming.zhao@suse.com
Fixes: 4eb7b93e0310 ("ocfs2: improve write IO performance when fragmentation is high")
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
Reported-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We frequently use 'bcachefs list_journal -a' for debugging, as it
provides a record of all btree transactions, and a history of what
happened.
But it's not so useful if we immediately discard journal buckets right
after they're no longer dirty.
This tweaks journal reclaim to only discard when we're low on space,
keeping the journal mostly un-discarded.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When we go emergency read-only, make sure we do a final write_super() to
persist counters and error counts - this can be critical for piecing
together what fsck was doing.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
These just indicate that we're shutting down.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We often filter out EROFS errors to avoid log spew after an emergency
shutdown - journal_shutdown is just another emergency shutdown error.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
A pre-existing valid cfid returned from find_or_create_cached_dir might
race with a lease break, meaning open_cached_dir doesn't consider it
valid, and thinks it's newly-constructed. This leaks a dentry reference
if the allocation occurs before the queued lease break work runs.
Avoid the race by extending holding the cfid_list_lock across
find_or_create_cached_dir and when the result is checked.
Cc: stable@vger.kernel.org
Reviewed-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Paul Aurich <paul@darkrain42.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
- Add a new reviewer, Hongbo Li, for better community development
- Fix an I/O hang out of file-backed mounts
- Address a rare data corruption caused by concurrent I/Os on the same
deduplicated compressed data
- Minor cleanup
* tag 'erofs-for-6.15-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: ensure the extra temporary copy is valid for shortened bvecs
erofs: remove unused enum type
fs/erofs/fileio: call erofs_onlinefolio_split() after bio_add_folio()
MAINTAINERS: erofs: add myself as reviewer
|
|
When compressed data deduplication is enabled, multiple logical extents
may reference the same compressed physical cluster.
The previous commit 94c43de73521 ("erofs: fix wrong primary bvec
selection on deduplicated extents") already avoids using shortened
bvecs. However, in such cases, the extra temporary buffers also
need to be preserved for later use in z_erofs_fill_other_copies() to
to prevent data corruption.
IOWs, extra temporary buffers have to be retained not only due to
varying start relative offsets (`pageofs_out`, as indicated by
`pcl->multibases`) but also because of shortened bvecs.
android.hardware.graphics.composer@2.1.so : 270696 bytes
0: 0.. 204185 | 204185 : 628019200.. 628084736 | 65536
-> 1: 204185.. 225536 | 21351 : 544063488.. 544129024 | 65536
2: 225536.. 270696 | 45160 : 0.. 0 | 0
com.android.vndk.v28.apex : 93814897 bytes
...
364: 53869896..54095257 | 225361 : 543997952.. 544063488 | 65536
-> 365: 54095257..54309344 | 214087 : 544063488.. 544129024 | 65536
366: 54309344..54514557 | 205213 : 544129024.. 544194560 | 65536
...
Both 204185 and 54095257 have the same start relative offset of 3481,
but the logical page 55 of `android.hardware.graphics.composer@2.1.so`
ranges from 225280 to 229632, forming a shortened bvec [225280, 225536)
that cannot be used for decompressing the range from 54095257 to
54309344 of `com.android.vndk.v28.apex`.
Since `pcl->multibases` is already meaningless, just mark `be->keepxcpy`
on demand for simplicity.
Again, this issue can only lead to data corruption if `-Ededupe` is on.
Fixes: 94c43de73521 ("erofs: fix wrong primary bvec selection on deduplicated extents")
Reviewed-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20250506101850.191506-1-hsiangkao@linux.alibaba.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- revert device path canonicalization, this does not work as intended
with namespaces and is not reliable in all setups
- fix crash in scrub when checksum tree is not valid, e.g. when mounted
with rescue=ignoredatacsums
- fix crash when tracepoint btrfs_prelim_ref_insert is enabled
- other minor fixups:
- open code folio_index(), meant to be used in MM code
- use matching type for sizeof in compression allocation
* tag 'for-6.15-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: open code folio_index() in btree_clear_folio_dirty_tag()
Revert "btrfs: canonicalize the device path before adding it"
btrfs: avoid NULL pointer dereference if no valid csum tree
btrfs: handle empty eb->folios in num_extent_folios()
btrfs: correct the order of prelim_ref arguments in btrfs__prelim_ref
btrfs: compression: adjust cb->compressed_folios allocation type
|
|
Coverity noticed that the rc on smb2_parse_contexts() was not being checked
in the case of compounded operations. Since we don't want to stop parsing
the following compounded responses which are likely valid, we can't easily
error out here, but at least print a warning message if server has a bug
causing us to skip parsing the open response contexts.
Addresses-Coverity: 1639191
Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
A use-after-free is possible if one thread destroys the file
via __ksmbd_close_fd while another thread holds a reference to
it. The existing checks on fp->refcount are not sufficient to
prevent this.
The fix takes ft->lock around the section which removes the
file from the file table. This prevents two threads acquiring the
same file pointer via __close_file_table_ids, as well as the other
functions which retrieve a file from the IDR and which already use
this same lock.
Cc: stable@vger.kernel.org
Signed-off-by: Sean Heelan <seanheelan@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
ksmbd_vfs_stream_write() did not validate whether the write offset
(*pos) was within the bounds of the existing stream data length (v_len).
If *pos was greater than or equal to v_len, this could lead to an
out-of-bounds memory write.
This patch adds a check to ensure *pos is less than v_len before
proceeding. If the condition fails, -EINVAL is returned.
Cc: stable@vger.kernel.org
Signed-off-by: Norbert Szetei <norbert@doyensec.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
This reverts
1fdbe0b184c8 bcachefs: Make sure c->vfs_sb is set before starting fs
switched up bch2_fs_get_tree() so that we got a superblock before
calling bch2_fs_start, so that c->vfs_sb would always be initialized
while the filesystem was active.
This turned out not to be necessary, because blk_holder_ops were
implemented using our own locking, not vfs locking.
And this had the side effect of creating a super_block and doing our
full recovery (including potentially fsck) before setting SB_BORN, which
causes things like sync calls to hang until our recovery is finished.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
wake_up() doesn't require a barrier - but wake_up_bit() does.
This only affected non x86, and primarily lead to lost wakeups after
btree node reads.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
There was a buggy version of bcachefs-tools which picked misaligned
bucket sizes when formatting, and we're also about to do dynamic block
sizes - which will allow picking logical block size or physical block
size of the device per-write, allowing for better compression ratios at
the cost of slightly worse write performance (i.e. forcing the device to
do RMW or extra buffering).
To account for this, tweak bch2_alloc_sectors_start() to properly align
open_buckets to the blocksize of the write we're about to do.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If promote target isn't set, rebalance should still leave a cached copy
on the faster device.
Fall back to foreground_target if it's set, or allow a cached copy on
any device if neither are set.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_stdio_redirect_vprintf() was missing a check for stdio->done, i.e.
exiting.
This caused the thread attempting to print to spin, and since it was
being called from the kthread ran by thread_with_stdio, the userspace
side hung as well.
Change it to return -EPIPE - i.e. writing to a pipe that's been closed.
Reported-by: Jan Solanti <jhs@psonet.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Pull smb client fixes from Steve French:
- fix posix mkdir error to ksmbd (also avoids crash in
cifs_destroy_request_bufs)
- two smb1 fixes: fixing querypath info and setpathinfo to old servers
- fix rsize/wsize when not multiple of page size to address DIO
reads/writes
* tag '6.15-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: ensure aligned IO sizes
cifs: Fix changing times and read-only attr over SMB1 smb_set_file_info() function
cifs: Fix and improve cifs_query_path_info() and cifs_query_file_info()
smb: client: fix zero length for mkdir POSIX create context
|
|
Pull bcachefs fixes from Kent Overstreet:
"Lots of assorted small fixes...
- Some repair path fixes, a fix for -ENOMEM when reconstructing lots
of alloc info on large filesystems, upgrade for ancient 0.14
filesystems, etc.
- Various assert tweaks; assert -> ERO, ERO -> log the error in the
superblock and continue
- casefolding now uses d_ops like on other casefolding filesystems
- fix device label create on device add, fix bucket array resize on
filesystem resize
- fix xattrs with FORTIFY_SOURCE builds with gcc-15/clang"
* tag 'bcachefs-2025-05-01' of git://evilpiepirate.org/bcachefs: (22 commits)
bcachefs: Remove incorrect __counted_by annotation
bcachefs: add missing sched_annotate_sleep()
bcachefs: Fix __bch2_dev_group_set()
bcachefs: Kill ERO for i_blocks check in truncate
bcachefs: check for inode.bi_sectors underflow
bcachefs: Kill ERO in __bch2_i_sectors_acct()
bcachefs: readdir fixes
bcachefs: improve missing journal write device error message
bcachefs: Topology error after insert is now an ERO
bcachefs: Use bch2_kvmalloc() for journal keys array
bcachefs: More informative error message when shutting down due to error
bcachefs: btree_root_unreadable_and_scan_found_nothing autofix for non data btrees
bcachefs: btree_node_data_missing is now autofix
bcachefs: Don't generate alloc updates to invalid buckets
bcachefs: Improve bch2_dev_bucket_missing()
bcachefs: fix bch2_dev_buckets_resize()
bcachefs: Add upgrade table entry from 0.14
bcachefs: Run BCH_RECOVERY_PASS_reconstruct_snapshots on missing subvol -> snapshot
bcachefs: Add missing utf8_unload()
bcachefs: Emit unicode version message on startup
...
|
|
The folio_index() helper is only needed for mixed usage of page cache
and swap cache, for pure page cache usage, the caller can just use
folio->index instead.
It can't be a swap cache folio here. Swap mapping may only call into fs
through 'swap_rw' but btrfs does not use that method for swap.
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
This reverts commit 7e06de7c83a746e58d4701e013182af133395188.
Commit 7e06de7c83a7 ("btrfs: canonicalize the device path before adding
it") tries to make btrfs to use "/dev/mapper/*" name first, then any
filename inside "/dev/" as the device path.
This is mostly fine when there is only the root namespace involved, but
when multiple namespace are involved, things can easily go wrong for the
d_path() usage.
As d_path() returns a file path that is namespace dependent, the
resulted string may not make any sense in another namespace.
Furthermore, the "/dev/" prefix checks itself is not reliable, one can
still make a valid initramfs without devtmpfs, and fill all needed
device nodes manually.
Overall the userspace has all its might to pass whatever device path for
mount, and we are not going to win the war trying to cover every corner
case.
So just revert that commit, and do no extra d_path() based file path
sanity check.
CC: stable@vger.kernel.org # 6.12+
Link: https://lore.kernel.org/linux-fsdevel/20250115185608.GA2223535@zen.localdomain/
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
[BUG]
When trying read-only scrub on a btrfs with rescue=idatacsums mount
option, it will crash with the following call trace:
BUG: kernel NULL pointer dereference, address: 0000000000000208
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
CPU: 1 UID: 0 PID: 835 Comm: btrfs Tainted: G O 6.15.0-rc3-custom+ #236 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
RIP: 0010:btrfs_lookup_csums_bitmap+0x49/0x480 [btrfs]
Call Trace:
<TASK>
scrub_find_fill_first_stripe+0x35b/0x3d0 [btrfs]
scrub_simple_mirror+0x175/0x290 [btrfs]
scrub_stripe+0x5f7/0x6f0 [btrfs]
scrub_chunk+0x9a/0x150 [btrfs]
scrub_enumerate_chunks+0x333/0x660 [btrfs]
btrfs_scrub_dev+0x23e/0x600 [btrfs]
btrfs_ioctl+0x1dcf/0x2f80 [btrfs]
__x64_sys_ioctl+0x97/0xc0
do_syscall_64+0x4f/0x120
entry_SYSCALL_64_after_hwframe+0x76/0x7e
[CAUSE]
Mount option "rescue=idatacsums" will completely skip loading the csum
tree, so that any data read will not find any data csum thus we will
ignore data checksum verification.
Normally call sites utilizing csum tree will check the fs state flag
NO_DATA_CSUMS bit, but unfortunately scrub does not check that bit at all.
This results in scrub to call btrfs_search_slot() on a NULL pointer
and triggered above crash.
[FIX]
Check both extent and csum tree root before doing any tree search.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
num_extent_folios() unconditionally calls folio_order() on
eb->folios[0]. If that is NULL this will be a segfault. It is reasonable
for it to return 0 as the number of folios in the eb when the first
entry is NULL, so do that instead.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
In preparation for making the kmalloc() family of allocators type aware,
we need to make sure that the returned type from the allocation matches
the type of the variable being assigned. (Before, the allocator would
always return "void *", which can be implicitly cast to any pointer type.)
The assigned type is "struct folio **" but the returned type will be
"struct page **". These are the same allocation size (pointer size), but
the types don't match. Adjust the allocation type to match the assignment.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Kees Cook <kees@kernel.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The previous patch that added bounds check for create lease context
introduced a memory leak. When the bounds check fails, the function
returns NULL without freeing the previously allocated lease_ctx_info
structure.
This patch fixes the issue by adding kfree(lreq) before returning NULL
in both boundary check cases.
Fixes: bab703ed8472 ("ksmbd: add bounds check for create lease context")
Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Client can send empty newname string to ksmbd server.
It will cause a kernel oops from d_alloc.
This patch return the error when attempting to rename
a file or directory with an empty new name string.
Cc: stable@vger.kernel.org
Reported-by: Norbert Szetei <norbert@doyensec.com>
Tested-by: Norbert Szetei <norbert@doyensec.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
This actually reverts 86e92eeeb237 ("bcachefs: Annotate struct bch_xattr
with __counted_by()").
After the x_name, there is a value. According to the disscussion[1],
__counted_by assumes that the flexible array member contains exactly
the amount of elements that are specified. Now there are users came across
a false positive detection of an out of bounds write caused by
the __counted_by here[2], so revert that.
[1] https://lore.kernel.org/lkml/Zv8VDKWN1GzLRT-_@archlinux/T/#m0ce9541c5070146320efd4f928cc1ff8de69e9b2
[2] https://privatebin.net/?a0d4e97d590d71e1#9bLmp2Kb5NU6X6cZEucchDcu88HzUQwHUah8okKPReEt
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
00594 ------------[ cut here ]------------
00594 do not call blocking ops when !TASK_RUNNING; state=2 set at [<000000003e51ef4a>] prepare_to_wait_event+0x5c/0x1c0
00594 WARNING: CPU: 12 PID: 1117 at kernel/sched/core.c:8741 __might_sleep+0x74/0x88
00594 Modules linked in:
00594 CPU: 12 UID: 0 PID: 1117 Comm: umount Not tainted 6.15.0-rc4-ktest-g3a72e369412d #21845 PREEMPT
00594 Hardware name: linux,dummy-virt (DT)
00594 pstate: 60001005 (nZCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
00594 pc : __might_sleep+0x74/0x88
00594 lr : __might_sleep+0x74/0x88
00594 sp : ffffff80c8d67a90
00594 x29: ffffff80c8d67a90 x28: ffffff80f5903500 x27: 0000000000000000
00594 x26: 0000000000000000 x25: ffffff80cf5002a0 x24: ffffffc087dad000
00594 x23: ffffff80c8d67b40 x22: 0000000000000000 x21: 0000000000000000
00594 x20: 0000000000000242 x19: ffffffc080b92020 x18: 00000000ffffffff
00594 x17: 30303c5b20746120 x16: 74657320323d6574 x15: 617473203b474e49
00594 x14: 0000000000000001 x13: 00000000000c0000 x12: ffffff80facc0000
00594 x11: 0000000000000001 x10: 0000000000000001 x9 : ffffffc0800b0774
00594 x8 : c0000000fffbffff x7 : ffffffc087dac670 x6 : 00000000015fffa8
00594 x5 : ffffff80facbffa8 x4 : ffffff80fbd30b90 x3 : 0000000000000000
00594 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff80f5903500
00594 Call trace:
00594 __might_sleep+0x74/0x88 (P)
00594 __mutex_lock+0x64/0x8d8
00594 mutex_lock_nested+0x28/0x38
00594 bch2_fs_ec_flush+0xf8/0x128
00594 __bch2_fs_read_only+0x54/0x1d8
00594 bch2_fs_read_only+0x3e0/0x438
00594 __bch2_fs_stop+0x5c/0x250
00594 bch2_put_super+0x18/0x28
00594 generic_shutdown_super+0x6c/0x140
00594 bch2_kill_sb+0x1c/0x38
00594 deactivate_locked_super+0x54/0xd0
00594 deactivate_super+0x70/0x90
00594 cleanup_mnt+0xec/0x188
00594 __cleanup_mnt+0x18/0x28
00594 task_work_run+0x90/0xd8
00594 do_notify_resume+0x138/0x148
00594 el0_svc+0x9c/0xa0
00594 el0t_64_sync_handler+0x104/0x130
00594 el0t_64_sync+0x154/0x158
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_sb_disk_groups_to_cpu() goes off of the superblock member info, so
we need to set that first.
Reported-by: Stijn Tintel <stijn@linux-ipv6.be>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Make all IO sizes multiple of PAGE_SIZE, either negotiated by the
server or passed through rsize, wsize and bsize mount options, to
prevent from breaking DIO reads and writes against servers that
enforce alignment as specified in MS-FSA 2.1.5.3 and 2.1.5.4.
Cc: linux-cifs@vger.kernel.org
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Replace with logging the error in the superblock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We won't be root causing this in the immediate future, and it's fairly
innocuous - so just log it in the superblock.
https://github.com/koverstreet/bcachefs/issues/869
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix potential inode leak in iget() after memory allocation failure
- in subpage mode, fix extent buffer bitmap iteration when writing out
dirty sectors
- fix range calculation when falling back to COW for a NOCOW file
* tag 'for-6.15-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: adjust subpage bit start based on sectorsize
btrfs: fix the inode leak in btrfs_iget()
btrfs: fix COW handling in run_delalloc_nocow()
|