Age | Commit message (Collapse) | Author | Files | Lines |
|
drivers/net/ethernet/mellanox/mlx5/core/main.c
b33886971dbc ("net/mlx5: Initialize flow steering during driver probe")
40379a0084c2 ("net/mlx5_fpga: Drop INNOVA TLS support")
f2b41b32cde8 ("net/mlx5: Remove ipsec_ops function table")
https://lore.kernel.org/all/20220519040345.6yrjromcdistu7vh@sx1/
16d42d313350 ("net/mlx5: Drain fw_reset when removing device")
8324a02c342a ("net/mlx5: Add exit route when waiting for FW")
https://lore.kernel.org/all/20220519114119.060ce014@canb.auug.org.au/
tools/testing/selftests/net/mptcp/mptcp_join.sh
e274f7154008 ("selftests: mptcp: add subflow limits test-cases")
b6e074e171bc ("selftests: mptcp: add infinite map testcase")
5ac1d2d63451 ("selftests: mptcp: Add tests for userspace PM type")
https://lore.kernel.org/all/20220516111918.366d747f@canb.auug.org.au/
net/mptcp/options.c
ba2c89e0ea74 ("mptcp: fix checksum byte order")
1e39e5a32ad7 ("mptcp: infinite mapping sending")
ea66758c1795 ("tcp: allow MPTCP to update the announced window")
https://lore.kernel.org/all/20220519115146.751c3a37@canb.auug.org.au/
net/mptcp/pm.c
95d686517884 ("mptcp: fix subflow accounting on close")
4d25247d3ae4 ("mptcp: bypass in-kernel PM restrictions for non-kernel PMs")
https://lore.kernel.org/all/20220516111435.72f35dca@canb.auug.org.au/
net/mptcp/subflow.c
ae66fb2ba6c3 ("mptcp: Do TCP fallback on early DSS checksum failure")
0348c690ed37 ("mptcp: add the fallback check")
f8d4bcacff3b ("mptcp: infinite mapping receiving")
https://lore.kernel.org/all/20220519115837.380bb8d4@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull io_uring fixes from Jens Axboe:
"Two small changes fixing issues from the 5.18 merge window:
- Fix wrong ordering of a tracepoint (Dylan)
- Fix MSG_RING on IOPOLL rings (me)"
* tag 'io_uring-5.18-2022-05-18' of git://git.kernel.dk/linux-block:
io_uring: don't attempt to IOPOLL for MSG_RING requests
io_uring: fix ordering of args in io_uring_queue_async_work
|
|
We gate whether to IOPOLL for a request on whether the opcode is allowed
on a ring setup for IOPOLL and if it's got a file assigned. MSG_RING
is the only one that allows a file yet isn't pollable, it's merely
supported to allow communication on an IOPOLL ring, not because we can
poll for completion of it.
Put the assigned file early and clear it, so we don't attempt to poll
for it.
Reported-by: syzbot+1a0a53300ce782f8b3ad@syzkaller.appspotmail.com
Fixes: 3f1d52abf098 ("io_uring: defer msg-ring file validity check until command issue")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fixes from Andreas Gruenbacher:
"We've finally identified commit dc732906c245 ("gfs2: Introduce flag
for glock holder auto-demotion") to be the other cause of the
filesystem corruption we've been seeing. This feature isn't strictly
necessary anymore, so we've decided to stop using it for now.
With this and the gfs_iomap_end rounding fix you've already seen
("gfs2: Fix filesystem block deallocation for short writes" in this
pull request), we're corruption free again now.
- Fix filesystem block deallocation for short writes.
- Stop using glock holder auto-demotion for now.
- Get rid of buffered writes inefficiencies due to page faults being
disabled.
- Minor other cleanups"
* tag 'gfs2-v5.18-rc4-fix3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Stop using glock holder auto-demotion for now
gfs2: buffered write prefaulting
gfs2: Align read and write chunks to the page cache
gfs2: Pull return value test out of should_fault_in_pages
gfs2: Clean up use of fault_in_iov_iter_{read,write}able
gfs2: Variable rename
gfs2: Fix filesystem block deallocation for short writes
|
|
We're having unresolved issues with the glock holder auto-demotion mechanism
introduced in commit dc732906c245. This mechanism was assumed to be essential
for avoiding frequent short reads and writes until commit 296abc0d91d8
("gfs2: No short reads or writes upon glock contention"). Since then,
when the inode glock is lost, it is simply re-acquired and the operation
is resumed. This means that apart from the performance penalty, we
might as well drop the inode glock before faulting in pages, and
re-acquire it afterwards.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
In gfs2_file_buffered_write, to increase the likelihood that all the
user memory we're trying to write will be resident in memory, carry out
the write in chunks and fault in each chunk of user memory before trying
to write it. Otherwise, some workloads will trigger frequent short
"internal" writes, causing filesystem blocks to be allocated and then
partially deallocated again when writing into holes, which is wasteful
and breaks reservations.
Neither the chunked writes nor any of the short "internal" writes are
user visible.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Align the chunks that reads and writes are carried out in to the page
cache rather than the user buffers. This will be more efficient in
general, especially for allocating writes. Optimizing the case that the
user buffer is gfs2 backed isn't very useful; we only need to make sure
we won't deadlock.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Pull the return value test of the previous read or write operation out
of should_fault_in_pages(). In a following patch, we'll fault in pages
before the I/O and there will be no return value to check.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
No need to store the return value of the fault_in functions in separate
variables.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Instead of counting the number of bytes read from the filesystem,
functions gfs2_file_direct_read and gfs2_file_read_iter count the number
of bytes written into the user buffer. Conversely, functions
gfs2_file_direct_write and gfs2_file_buffered_write count the number of
bytes read from the user buffer. This is nothing but confusing, so
change the read functions to count how many bytes they have read, and
the write functions to count how many bytes they have written.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
When a write cannot be carried out in full, gfs2_iomap_end() releases
blocks that have been allocated for this write but haven't been used.
To compute the end of the allocation, gfs2_iomap_end() incorrectly
rounded the end of the attempted write down to the next block boundary
to arrive at the end of the allocation. It would have to round up, but
the end of the allocation is also available as iomap->offset +
iomap->length, so just use that instead.
In addition, use round_up() for computing the start of the unused range.
Fixes: 64bc06bb32ee ("gfs2: iomap buffered write support")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Pull ceph fix from Ilya Dryomov:
"Two fixes to properly maintain xattrs on async creates and thus
preserve SELinux context on newly created files and to avoid improper
usage of folio->private field which triggered BUG_ONs.
Both marked for stable"
* tag 'ceph-for-5.18-rc7' of https://github.com/ceph/ceph-client:
ceph: check folio PG_private bit instead of folio->private
ceph: fix setting of xattrs on async created inodes
|
|
Pull NFS client bugfixes from Trond Myklebust:
"One more pull request. There was a bug in the fix to ensure that gss-
proxy continues to work correctly after we fixed the AF_LOCAL socket
leak in the RPC code. This therefore reverts that broken patch, and
replaces it with one that works correctly.
Stable fixes:
- SUNRPC: Ensure that the gssproxy client can start in a connected
state
Bugfixes:
- Revert "SUNRPC: Ensure gss-proxy connects on setup"
- nfs: fix broken handling of the softreval mount option"
* tag 'nfs-for-5.18-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs: fix broken handling of the softreval mount option
SUNRPC: Ensure that the gssproxy client can start in a connected state
Revert "SUNRPC: Ensure gss-proxy connects on setup"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Seven MM fixes, three of which address issues added in the most recent
merge window, four of which are cc:stable.
Three non-MM fixes, none very serious"
[ And yes, that's a real pull request from Andrew, not me creating a
branch from emailed patches. Woo-hoo! ]
* tag 'mm-hotfixes-stable-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
MAINTAINERS: add a mailing list for DAMON development
selftests: vm: Makefile: rename TARGETS to VMTARGETS
mm/kfence: reset PG_slab and memcg_data before freeing __kfence_pool
mailmap: add entry for martyna.szapar-mudlaw@intel.com
arm[64]/memremap: don't abuse pfn_valid() to ensure presence of linear map
procfs: prevent unprivileged processes accessing fdinfo dir
mm: mremap: fix sign for EFAULT error return value
mm/hwpoison: use pr_err() instead of dump_page() in get_any_page()
mm/huge_memory: do not overkill when splitting huge_zero_page
Revert "mm/memory-failure.c: skip huge_zero_page in memory_failure()"
|
|
No conflicts.
Build issue in drivers/net/ethernet/sfc/ptp.c
54fccfdd7c66 ("sfc: efx_default_channel_type APIs can be static")
49e6123c65da ("net: sfc: fix memory leak due to ptp channel")
https://lore.kernel.org/all/20220510130556.52598fe2@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fs fixes from Jan Kara:
"Three fixes that I'd still like to get to 5.18:
- add a missing sanity check in the fanotify FAN_RENAME feature
(added in 5.17, let's fix it before it gets wider usage in
userspace)
- udf fix for recently introduced filesystem corruption issue
- writeback fix for a race in inode list handling that can lead to
delayed writeback and possible dirty throttling stalls"
* tag 'fixes_for_v5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Avoid using stale lengthOfImpUse
writeback: Avoid skipping inode writeback
fanotify: do not allow setting dirent events in mask of non-dir
|
|
udf_write_fi() uses lengthOfImpUse of the entry it is writing to.
However this field has not yet been initialized so it either contains
completely bogus value or value from last directory entry at that place.
In either case this is wrong and can lead to filesystem corruption or
kernel crashes.
Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com>
CC: stable@vger.kernel.org
Fixes: 979a6e28dd96 ("udf: Get rid of 0-length arrays in struct fileIdentDesc")
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
We have run into an issue that a task gets stuck in
balance_dirty_pages_ratelimited() when perform I/O stress testing.
The reason we observed is that an I_DIRTY_PAGES inode with lots
of dirty pages is in b_dirty_time list and standard background
writeback cannot writeback the inode.
After studing the relevant code, the following scenario may lead
to the issue:
task1 task2
----- -----
fuse_flush
write_inode_now //in b_dirty_time
writeback_single_inode
__writeback_single_inode
fuse_write_end
filemap_dirty_folio
__xa_set_mark:PAGECACHE_TAG_DIRTY
lock inode->i_lock
if mapping tagged PAGECACHE_TAG_DIRTY
inode->i_state |= I_DIRTY_PAGES
unlock inode->i_lock
__mark_inode_dirty:I_DIRTY_PAGES
lock inode->i_lock
-was dirty,inode stays in
-b_dirty_time
unlock inode->i_lock
if(!(inode->i_state & I_DIRTY_All))
-not true,so nothing done
This patch moves the dirty inode to b_dirty list when the inode
currently is not queued in b_io or b_more_io list at the end of
writeback_single_inode.
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
CC: stable@vger.kernel.org
Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option")
Signed-off-by: Jing Xia <jing.xia@unisoc.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220510023514.27399-1-jing.xia@unisoc.com
|
|
The pages in the file mapping maybe reclaimed and reused by other
subsystems and the page->private maybe used as flags field or
something else, if later that pages are used by page caches again
the page->private maybe not cleared as expected.
Here will check the PG_private bit instead of the folio->private.
Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/55421
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Luis Henriques <lhenriques@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Currently when we create a file, we spin up an xattr buffer to send
along with the create request. If we end up doing an async create
however, then we currently pass down a zero-length xattr buffer.
Fix the code to send down the xattr buffer in req->r_pagelist. If the
xattrs span more than a page, however give up and don't try to do an
async create.
Cc: stable@vger.kernel.org
URL: https://bugzilla.redhat.com/show_bug.cgi?id=2063929
Fixes: 9a8d03ca2e2c ("ceph: attempt to do async create when possible")
Reported-by: John Fortin <fortinj66@gmail.com>
Reported-by: Sri Ramanujam <sri@ramanujam.io>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
The file permissions on the fdinfo dir from were changed from
S_IRUSR|S_IXUSR to S_IRUGO|S_IXUGO, and a PTRACE_MODE_READ check was added
for opening the fdinfo files [1]. However, the ptrace permission check
was not added to the directory, allowing anyone to get the open FD numbers
by reading the fdinfo directory.
Add the missing ptrace permission check for opening the fdinfo directory.
[1] https://lkml.kernel.org/r/20210308170651.919148-1-kaleshsingh@google.com
Link: https://lkml.kernel.org/r/20210713162008.1056986-1-kaleshsingh@google.com
Fixes: 7bc3fa0172a4 ("procfs: allow reading fdinfo with PTRACE_MODE_READ")
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Turns out that ever since this mount option was added, passing
`softreval` in NFS mount options cancelled all other flags while not
affecting the underlying flag `NFS_MOUNT_SOFTREVAL`.
Fixes: c74dfe97c104 ("NFS: Add mount option 'softreval'")
Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Dirent events (create/delete/move) are only reported on watched
directory inodes, but in fanotify as well as in legacy inotify, it was
always allowed to set them on non-dir inode, which does not result in
any meaningful outcome.
Until kernel v5.17, dirent events in fanotify also differed from events
"on child" (e.g. FAN_OPEN) in the information provided in the event.
For example, FAN_OPEN could be set in the mask of a non-dir or the mask
of its parent and event would report the fid of the child regardless of
the marked object.
By contrast, FAN_DELETE is not reported if the child is marked and the
child fid was not reported in the events.
Since kernel v5.17, with fanotify group flag FAN_REPORT_TARGET_FID, the
fid of the child is reported with dirent events, like events "on child",
which may create confusion for users expecting the same behavior as
events "on child" when setting events in the mask on a child.
The desired semantics of setting dirent events in the mask of a child
are not clear, so for now, deny this action for a group initialized
with flag FAN_REPORT_TARGET_FID and for the new event FAN_RENAME.
We may relax this restriction in the future if we decide on the
semantics and implement them.
Fixes: d61fd650e9d2 ("fanotify: introduce group flag FAN_REPORT_TARGET_FID")
Fixes: 8cc3b1ccd930 ("fanotify: wire up FAN_RENAME event")
Link: https://lore.kernel.org/linux-fsdevel/20220505133057.zm5t6vumc4xdcnsg@quack3.lan/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220507080028.219826-1-amir73il@gmail.com
|
|
Pull io_uring fix from Jens Axboe:
"Just a single file assignment fix this week"
* tag 'io_uring-5.18-2022-05-06' of git://git.kernel.dk/linux-block:
io_uring: assign non-fixed early for async work
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Regression fixes in zone activation:
- move a loop invariant out of the loop to avoid checking space
status
- properly handle unlimited activation
Other fixes:
- for subpage, force the free space v2 mount to avoid a warning and
make it easy to switch a filesystem on different page size systems
- export sysfs status of exclusive operation 'balance paused', so the
user space tools can recognize it and allow adding a device with
paused balance
- fix assertion failure when logging directory key range item"
* tag 'for-5.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: sysfs: export the balance paused state of exclusive operation
btrfs: fix assertion failure when logging directory key range item
btrfs: zoned: activate block group properly on unlimited active zone device
btrfs: zoned: move non-changing condition check out of the loop
btrfs: force v2 space cache usage for subpage mount
|
|
Pull NFS client fixes from Trond Myklebust:
"Highlights include:
Stable fixes:
- Fix a socket leak when setting up an AF_LOCAL RPC client
- Ensure that knfsd connects to the gss-proxy daemon on setup
Bugfixes:
- Fix a refcount leak when migrating a task off an offlined transport
- Don't gratuitously invalidate inode attributes on delegation return
- Don't leak sockets in xs_local_connect()
- Ensure timely close of disconnected AF_LOCAL sockets"
* tag 'nfs-for-5.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
Revert "SUNRPC: attempt AF_LOCAL connect on setup"
SUNRPC: Ensure gss-proxy connects on setup
SUNRPC: Ensure timely close of disconnected AF_LOCAL sockets
SUNRPC: Don't leak sockets in xs_local_connect()
NFSv4: Don't invalidate inode attributes on delegation return
SUNRPC release the transport of a relocated task with an assigned transport
|
|
tools/testing/selftests/net/forwarding/Makefile
f62c5acc800e ("selftests/net/forwarding: add missing tests to Makefile")
50fe062c806e ("selftests: forwarding: new test, verify host mdb entries")
https://lore.kernel.org/all/20220502111539.0b7e4621@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The new state allowing device addition with paused balance is not
exported to user space so it can't recognize it and actually start the
operation.
Fixes: efc0e69c2fea ("btrfs: introduce exclusive operation BALANCE_PAUSED state")
CC: stable@vger.kernel.org # 5.17
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When inserting a key range item (BTRFS_DIR_LOG_INDEX_KEY) while logging
a directory, we don't expect the insertion to fail with -EEXIST, because
we are holding the directory's log_mutex and we have dropped all existing
BTRFS_DIR_LOG_INDEX_KEY keys from the log tree before we started to log
the directory. However it's possible that during the logging we attempt
to insert the same BTRFS_DIR_LOG_INDEX_KEY key twice, but for this to
happen we need to race with insertions of items from other inodes in the
subvolume's tree while we are logging a directory. Here's how this can
happen:
1) We are logging a directory with inode number 1000 that has its items
spread across 3 leaves in the subvolume's tree:
leaf A - has index keys from the range 2 to 20 for example. The last
item in the leaf corresponds to a dir item for index number 20. All
these dir items were created in a past transaction.
leaf B - has index keys from the range 22 to 100 for example. It has
no keys from other inodes, all its keys are dir index keys for our
directory inode number 1000. Its first key is for the dir item with
a sequence number of 22. All these dir items were also created in a
past transaction.
leaf C - has index keys for our directory for the range 101 to 120 for
example. This leaf also has items from other inodes, and its first
item corresponds to the dir item for index number 101 for our directory
with inode number 1000;
2) When we finish processing the items from leaf A at log_dir_items(),
we log a BTRFS_DIR_LOG_INDEX_KEY key with an offset of 21 and a last
offset of 21, meaning the log is authoritative for the index range
from 21 to 21 (a single sequence number). At this point leaf B was
not yet modified in the current transaction;
3) When we return from log_dir_items() we have released our read lock on
leaf B, and have set *last_offset_ret to 21 (index number of the first
item on leaf B minus 1);
4) Some other task inserts an item for other inode (inode number 1001 for
example) into leaf C. That resulted in pushing some items from leaf C
into leaf B, in order to make room for the new item, so now leaf B
has dir index keys for the sequence number range from 22 to 102 and
leaf C has the dir items for the sequence number range 103 to 120;
5) At log_directory_changes() we call log_dir_items() again, passing it
a 'min_offset' / 'min_key' value of 22 (*last_offset_ret from step 3
plus 1, so 21 + 1). Then btrfs_search_forward() leaves us at slot 0
of leaf B, since leaf B was modified in the current transaction.
We have also initialized 'last_old_dentry_offset' to 20 after calling
btrfs_previous_item() at log_dir_items(), as it left us at the last
item of leaf A, which refers to the dir item with sequence number 20;
6) We then call process_dir_items_leaf() to process the dir items of
leaf B, and when we process the first item, corresponding to slot 0,
sequence number 22, we notice the dir item was created in a past
transaction and its sequence number is greater than the value of
*last_old_dentry_offset + 1 (20 + 1), so we decide to log again a
BTRFS_DIR_LOG_INDEX_KEY key with an offset of 21 and an end range
of 21 (key.offset - 1 == 22 - 1 == 21), which results in an -EEXIST
error from insert_dir_log_key(), as we have already inserted that
key at step 2, triggering the assertion at process_dir_items_leaf().
The trace produced in dmesg is like the following:
assertion failed: ret != -EEXIST, in fs/btrfs/tree-log.c:3857
[198255.980839][ T7460] ------------[ cut here ]------------
[198255.981666][ T7460] kernel BUG at fs/btrfs/ctree.h:3617!
[198255.983141][ T7460] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
[198255.984080][ T7460] CPU: 0 PID: 7460 Comm: repro-ghost-dir Not tainted 5.18.0-5314c78ac373-misc-next+
[198255.986027][ T7460] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[198255.988600][ T7460] RIP: 0010:assertfail.constprop.0+0x1c/0x1e
[198255.989465][ T7460] Code: 8b 4c 89 (...)
[198255.992599][ T7460] RSP: 0018:ffffc90007387188 EFLAGS: 00010282
[198255.993414][ T7460] RAX: 000000000000003d RBX: 0000000000000065 RCX: 0000000000000000
[198255.996056][ T7460] RDX: 0000000000000001 RSI: ffffffff8b62b180 RDI: fffff52000e70e24
[198255.997668][ T7460] RBP: ffffc90007387188 R08: 000000000000003d R09: ffff8881f0e16507
[198255.999199][ T7460] R10: ffffed103e1c2ca0 R11: 0000000000000001 R12: 00000000ffffffef
[198256.000683][ T7460] R13: ffff88813befc630 R14: ffff888116c16e70 R15: ffffc90007387358
[198256.007082][ T7460] FS: 00007fc7f7c24640(0000) GS:ffff8881f0c00000(0000) knlGS:0000000000000000
[198256.009939][ T7460] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[198256.014133][ T7460] CR2: 0000560bb16d0b78 CR3: 0000000140b34005 CR4: 0000000000170ef0
[198256.015239][ T7460] Call Trace:
[198256.015674][ T7460] <TASK>
[198256.016313][ T7460] log_dir_items.cold+0x16/0x2c
[198256.018858][ T7460] ? replay_one_extent+0xbf0/0xbf0
[198256.025932][ T7460] ? release_extent_buffer+0x1d2/0x270
[198256.029658][ T7460] ? rcu_read_lock_sched_held+0x16/0x80
[198256.031114][ T7460] ? lock_acquired+0xbe/0x660
[198256.032633][ T7460] ? rcu_read_lock_sched_held+0x16/0x80
[198256.034386][ T7460] ? lock_release+0xcf/0x8a0
[198256.036152][ T7460] log_directory_changes+0xf9/0x170
[198256.036993][ T7460] ? log_dir_items+0xba0/0xba0
[198256.037661][ T7460] ? do_raw_write_unlock+0x7d/0xe0
[198256.038680][ T7460] btrfs_log_inode+0x233b/0x26d0
[198256.041294][ T7460] ? log_directory_changes+0x170/0x170
[198256.042864][ T7460] ? btrfs_attach_transaction_barrier+0x60/0x60
[198256.045130][ T7460] ? rcu_read_lock_sched_held+0x16/0x80
[198256.046568][ T7460] ? lock_release+0xcf/0x8a0
[198256.047504][ T7460] ? lock_downgrade+0x420/0x420
[198256.048712][ T7460] ? ilookup5_nowait+0x81/0xa0
[198256.049747][ T7460] ? lock_downgrade+0x420/0x420
[198256.050652][ T7460] ? do_raw_spin_unlock+0xa9/0x100
[198256.051618][ T7460] ? __might_resched+0x128/0x1c0
[198256.052511][ T7460] ? __might_sleep+0x66/0xc0
[198256.053442][ T7460] ? __kasan_check_read+0x11/0x20
[198256.054251][ T7460] ? iget5_locked+0xbd/0x150
[198256.054986][ T7460] ? run_delayed_iput_locked+0x110/0x110
[198256.055929][ T7460] ? btrfs_iget+0xc7/0x150
[198256.056630][ T7460] ? btrfs_orphan_cleanup+0x4a0/0x4a0
[198256.057502][ T7460] ? free_extent_buffer+0x13/0x20
[198256.058322][ T7460] btrfs_log_inode+0x2654/0x26d0
[198256.059137][ T7460] ? log_directory_changes+0x170/0x170
[198256.060020][ T7460] ? rcu_read_lock_sched_held+0x16/0x80
[198256.060930][ T7460] ? rcu_read_lock_sched_held+0x16/0x80
[198256.061905][ T7460] ? lock_contended+0x770/0x770
[198256.062682][ T7460] ? btrfs_log_inode_parent+0xd04/0x1750
[198256.063582][ T7460] ? lock_downgrade+0x420/0x420
[198256.064432][ T7460] ? preempt_count_sub+0x18/0xc0
[198256.065550][ T7460] ? __mutex_lock+0x580/0xdc0
[198256.066654][ T7460] ? stack_trace_save+0x94/0xc0
[198256.068008][ T7460] ? __kasan_check_write+0x14/0x20
[198256.072149][ T7460] ? __mutex_unlock_slowpath+0x12a/0x430
[198256.073145][ T7460] ? mutex_lock_io_nested+0xcd0/0xcd0
[198256.074341][ T7460] ? wait_for_completion_io_timeout+0x20/0x20
[198256.075345][ T7460] ? lock_downgrade+0x420/0x420
[198256.076142][ T7460] ? lock_contended+0x770/0x770
[198256.076939][ T7460] ? do_raw_spin_lock+0x1c0/0x1c0
[198256.078401][ T7460] ? btrfs_sync_file+0x5e6/0xa40
[198256.080598][ T7460] btrfs_log_inode_parent+0x523/0x1750
[198256.081991][ T7460] ? wait_current_trans+0xc8/0x240
[198256.083320][ T7460] ? lock_downgrade+0x420/0x420
[198256.085450][ T7460] ? btrfs_end_log_trans+0x70/0x70
[198256.086362][ T7460] ? rcu_read_lock_sched_held+0x16/0x80
[198256.087544][ T7460] ? lock_release+0xcf/0x8a0
[198256.088305][ T7460] ? lock_downgrade+0x420/0x420
[198256.090375][ T7460] ? dget_parent+0x8e/0x300
[198256.093538][ T7460] ? do_raw_spin_lock+0x1c0/0x1c0
[198256.094918][ T7460] ? lock_downgrade+0x420/0x420
[198256.097815][ T7460] ? do_raw_spin_unlock+0xa9/0x100
[198256.101822][ T7460] ? dget_parent+0xb7/0x300
[198256.103345][ T7460] btrfs_log_dentry_safe+0x48/0x60
[198256.105052][ T7460] btrfs_sync_file+0x629/0xa40
[198256.106829][ T7460] ? start_ordered_ops.constprop.0+0x120/0x120
[198256.109655][ T7460] ? __fget_files+0x161/0x230
[198256.110760][ T7460] vfs_fsync_range+0x6d/0x110
[198256.111923][ T7460] ? start_ordered_ops.constprop.0+0x120/0x120
[198256.113556][ T7460] __x64_sys_fsync+0x45/0x70
[198256.114323][ T7460] do_syscall_64+0x5c/0xc0
[198256.115084][ T7460] ? syscall_exit_to_user_mode+0x3b/0x50
[198256.116030][ T7460] ? do_syscall_64+0x69/0xc0
[198256.116768][ T7460] ? do_syscall_64+0x69/0xc0
[198256.117555][ T7460] ? do_syscall_64+0x69/0xc0
[198256.118324][ T7460] ? sysvec_call_function_single+0x57/0xc0
[198256.119308][ T7460] ? asm_sysvec_call_function_single+0xa/0x20
[198256.120363][ T7460] entry_SYSCALL_64_after_hwframe+0x44/0xae
[198256.121334][ T7460] RIP: 0033:0x7fc7fe97b6ab
[198256.122067][ T7460] Code: 0f 05 48 (...)
[198256.125198][ T7460] RSP: 002b:00007fc7f7c23950 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[198256.126568][ T7460] RAX: ffffffffffffffda RBX: 00007fc7f7c239f0 RCX: 00007fc7fe97b6ab
[198256.127942][ T7460] RDX: 0000000000000002 RSI: 000056167536bcf0 RDI: 0000000000000004
[198256.129302][ T7460] RBP: 0000000000000004 R08: 0000000000000000 R09: 000000007ffffeb8
[198256.130670][ T7460] R10: 00000000000001ff R11: 0000000000000293 R12: 0000000000000001
[198256.132046][ T7460] R13: 0000561674ca8140 R14: 00007fc7f7c239d0 R15: 000056167536dab8
[198256.133403][ T7460] </TASK>
Fix this by treating -EEXIST as expected at insert_dir_log_key() and have
it update the item with an end offset corresponding to the maximum between
the previously logged end offset and the new requested end offset. The end
offsets may be different due to dir index key deletions that happened as
part of unlink operations while we are logging a directory (triggered when
fsyncing some other inode parented by the directory) or during renames
which always attempt to log a single dir index deletion.
Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Link: https://lore.kernel.org/linux-btrfs/YmyefE9mc2xl5ZMz@hungrycats.org/
Fixes: 732d591a5d6c12 ("btrfs: stop copying old dir items when logging a directory")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
btrfs_zone_activate() checks if it activated all the underlying zones in
the loop. However, that check never hit on an unlimited activate zone
device (max_active_zones == 0).
Fortunately, it still works without ENOSPC because btrfs_zone_activate()
returns true in the end, even if block_group->zone_is_active == 0. But, it
is confusing to have non zone_is_active block group still usable for
allocation. Also, we are wasting CPU time to iterate the loop every time
btrfs_zone_activate() is called for the blog groups.
Since error case in the loop is handled by out_unlock, we can just set
zone_is_active and do the list stuff after the loop.
Fixes: f9a912a3c45f ("btrfs: zoned: make zone activation multi stripe capable")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
btrfs_zone_activate() checks if block_group->alloc_offset ==
block_group->zone_capacity every time it iterates the loop. But, it is
not depending on the index. Move out the check and do it only once.
Fixes: f9a912a3c45f ("btrfs: zoned: make zone activation multi stripe capable")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
[BUG]
For a 4K sector sized btrfs with v1 cache enabled and only mounted on
systems with 4K page size, if it's mounted on subpage (64K page size)
systems, it can cause the following warning on v1 space cache:
BTRFS error (device dm-1): csum mismatch on free space cache
BTRFS warning (device dm-1): failed to load free space cache for block group 84082688, rebuilding it now
Although not a big deal, as kernel can rebuild it without problem, such
warning will bother end users, especially if they want to switch the
same btrfs seamlessly between different page sized systems.
[CAUSE]
V1 free space cache is still using fixed PAGE_SIZE for various bitmap,
like BITS_PER_BITMAP.
Such hard-coded PAGE_SIZE usage will cause various mismatch, from v1
cache size to checksum.
Thus kernel will always reject v1 cache with a different PAGE_SIZE with
csum mismatch.
[FIX]
Although we should fix v1 cache, it's already going to be marked
deprecated soon.
And we have v2 cache based on metadata (which is already fully subpage
compatible), and it has almost everything superior than v1 cache.
So just force subpage mount to use v2 cache on mount.
Reported-by: Matt Corallo <blnxfsl@bluematt.me>
CC: stable@vger.kernel.org # 5.15+
Link: https://lore.kernel.org/linux-btrfs/61aa27d1-30fc-c1a9-f0f4-9df544395ec3@bluematt.me/
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Creating a new netdevice allocates at least ~50Kb of memory for various
kernel objects, but only ~5Kb of them are accounted to memcg. As a result,
creating an unlimited number of netdevice inside a memcg-limited container
does not fall within memcg restrictions, consumes a significant part
of the host's memory, can cause global OOM and lead to random kills of
host processes.
The main consumers of non-accounted memory are:
~10Kb 80+ kernfs nodes
~6Kb ipv6_add_dev() allocations
6Kb __register_sysctl_table() allocations
4Kb neigh_sysctl_register() allocations
4Kb __devinet_sysctl_register() allocations
4Kb __addrconf_sysctl_register() allocations
Accounting of these objects allows to increase the share of memcg-related
memory up to 60-70% (~38Kb accounted vs ~54Kb total for dummy netdevice
on typical VM with default Fedora 35 kernel) and this should be enough
to somehow protect the host from misuse inside container.
Other related objects are quite small and may not be taken into account
to minimize the expected performance degradation.
It should be separately mentonied ~300 bytes of percpu allocation
of struct ipstats_mib in snmp6_alloc_dev(), on huge multi-cpu nodes
it can become the main consumer of memory.
This patch does not enables kernfs accounting as it affects
other parts of the kernel and should be discussed separately.
However, even without kernfs, this patch significantly improves the
current situation and allows to take into account more than half
of all netdevice allocations.
Signed-off-by: Vasily Averin <vvs@openvz.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/354a0a5f-9ec3-a25c-3215-304eab2157bc@openvz.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch introdues the SYSCTL_THREE.
KUnit:
[00:10:14] ================ sysctl_test (10 subtests) =================
[00:10:14] [PASSED] sysctl_test_api_dointvec_null_tbl_data
[00:10:14] [PASSED] sysctl_test_api_dointvec_table_maxlen_unset
[00:10:14] [PASSED] sysctl_test_api_dointvec_table_len_is_zero
[00:10:14] [PASSED] sysctl_test_api_dointvec_table_read_but_position_set
[00:10:14] [PASSED] sysctl_test_dointvec_read_happy_single_positive
[00:10:14] [PASSED] sysctl_test_dointvec_read_happy_single_negative
[00:10:14] [PASSED] sysctl_test_dointvec_write_happy_single_positive
[00:10:14] [PASSED] sysctl_test_dointvec_write_happy_single_negative
[00:10:14] [PASSED] sysctl_test_api_dointvec_write_single_less_int_min
[00:10:14] [PASSED] sysctl_test_api_dointvec_write_single_greater_int_max
[00:10:14] =================== [PASSED] sysctl_test ===================
./run_kselftest.sh -c sysctl
...
ok 1 selftests: sysctl: sysctl.sh
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Cc: Akhmat Karakotov <hmukos@yandex-team.ru>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more fixes mostly around how some file attributes could be set.
- fix handling of compression property:
- don't allow setting it on anything else than regular file or
directory
- do not allow setting it on nodatacow files via properties
- improved error handling when setting xattr
- make sure symlinks are always properly logged"
* tag 'for-5.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: skip compression property for anything other than files and dirs
btrfs: do not BUG_ON() on failure to update inode when setting xattr
btrfs: always log symlinks in full mode
btrfs: do not allow compression on nodatacow files
btrfs: export a helper for compression hard check
|
|
We defer file assignment to ensure that fixed files work with links
between a direct accept/open and the links that follow it. But this has
the side effect that normal file assignment is then not complete by the
time that request submission has been done.
For deferred execution, if the file is a regular file, assign it when
we do the async prep anyway.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are some small driver core and kernfs fixes for some reported
problems. They include:
- kernfs regression that is causing oopses in 5.17 and newer releases
- topology sysfs fixes for a few small reported problems.
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
kernfs: fix NULL dereferencing in kernfs_remove
topology: Fix up build warning in topology_is_visible()
arch_topology: Do not set llc_sibling if llc_id is invalid
topology: make core_mask include at least cluster_siblings
topology/sysfs: Hide PPIN on systems that do not support it.
|
|
Pull io_uring fixes from Jens Axboe:
"Pretty boring:
- three patches just adding reserved field checks (me, Eugene)
- Fixing a potential regression with IOPOLL caused by a block change
(Joseph)"
Boring is good.
* tag 'io_uring-5.18-2022-04-29' of git://git.kernel.dk/linux-block:
io_uring: check that data field is 0 in ringfd unregister
io_uring: fix uninitialized field in rw io_kiocb
io_uring: check reserved fields for recv/recvmsg
io_uring: check reserved fields for send/sendmsg
|
|
Pull ceph client fixes from Ilya Dryomov:
"A fix for a NULL dereference that turns out to be easily triggerable
by fsync (marked for stable) and a false positive WARN and snap_rwsem
locking fixups"
* tag 'ceph-for-5.18-rc5' of https://github.com/ceph/ceph-client:
ceph: fix possible NULL pointer dereference for req->r_session
ceph: remove incorrect session state check
ceph: get snap_rwsem read lock in handle_cap_export for ceph_add_cap
libceph: disambiguate cluster/pool full log message
|
|
Only allow data field to be 0 in struct io_uring_rsrc_update user
arguments to allow for future possible usage.
Fixes: e7a6c00dc77a ("io_uring: add support for registering ring file descriptors")
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Link: https://lore.kernel.org/r/20220429142218.GA28696@asgard.redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_rw_init_file does not initialize kiocb->private, so when iocb_bio_iopoll
reads kiocb->private it can contain uninitialized data.
Fixes: 3e08773c3841 ("block: switch polling to be bio based")
Signed-off-by: Joseph Ravichandran <jravi@mit.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fix from Andreas Gruenbacher:
- No short reads or writes upon glock contention
* tag 'gfs2-v5.18-rc4-fix2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: No short reads or writes upon glock contention
|
|
Pull xfs fixes from Dave Chinner:
- define buffer bit flags as unsigned to fix gcc-5 + c11 warnings
- remove redundant XFS fields from MAINTAINERS
- fix inode buffer locking order regression
* tag 'xfs-5.18-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: reorder iunlink remove operation in xfs_ifree
MAINTAINERS: update IOMAP FILESYSTEM LIBRARY and XFS FILESYSTEM
xfs: convert buffer flags to unsigned.
|
|
Commit 00bfe02f4796 ("gfs2: Fix mmap + page fault deadlocks for buffered
I/O") changed gfs2_file_read_iter() and gfs2_file_buffered_write() to
allow dropping the inode glock while faulting in user buffers. When the
lock was dropped, a short result was returned to indicate that the
operation was interrupted.
As pointed out by Linus (see the link below), this behavior is broken
and the operations should always re-acquire the inode glock and resume
the operation instead.
Link: https://lore.kernel.org/lkml/CAHk-=whaz-g_nOOoo8RRiWNjnv2R+h6_xk2F1J4TuSRxk1MtLw@mail.gmail.com/
Fixes: 00bfe02f4796 ("gfs2: Fix mmap + page fault deadlocks for buffered I/O")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The compression property only has effect on regular files and directories
(so that it's propagated to files and subdirectories created inside a
directory). For any other inode type (symlink, fifo, device, socket),
it's pointless to set the compression property because it does nothing
and ends up unnecessarily wasting leaf space due to the pointless xattr
(75 or 76 bytes, depending on the compression value). Symlinks in
particular are very common (for example, I have almost 10k symlinks under
/etc, /usr and /var alone) and therefore it's worth to avoid wasting
leaf space with the compression xattr.
For example, the compression property can end up on a symlink or character
device implicitly, through inheritance from a parent directory
$ mkdir /mnt/testdir
$ btrfs property set /mnt/testdir compression lzo
$ ln -s yadayada /mnt/testdir/lnk
$ mknod /mnt/testdir/dev c 0 0
Or explicitly like this:
$ ln -s yadayda /mnt/lnk
$ setfattr -h -n btrfs.compression -v lzo /mnt/lnk
So skip the compression property on inodes that are neither a regular
file nor a directory.
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
We are doing a BUG_ON() if we fail to update an inode after setting (or
clearing) a xattr, but there's really no reason to not instead simply
abort the transaction and return the error to the caller. This should be
a rare error because we have previously reserved enough metadata space to
update the inode and the delayed inode should have already been setup, so
an -ENOSPC or -ENOMEM, which are the possible errors, are very unlikely to
happen.
So replace the BUG_ON()s with a transaction abort.
CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
On Linux, empty symlinks are invalid, and attempting to create one with
the system call symlink(2) results in an -ENOENT error and this is
explicitly documented in the man page.
If we rename a symlink that was created in the current transaction and its
parent directory was logged before, we actually end up logging the symlink
without logging its content, which is stored in an inline extent. That
means that after a power failure we can end up with an empty symlink,
having no content and an i_size of 0 bytes.
It can be easily reproduced like this:
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt
$ mkdir /mnt/testdir
$ sync
# Create a file inside the directory and fsync the directory.
$ touch /mnt/testdir/foo
$ xfs_io -c "fsync" /mnt/testdir
# Create a symlink inside the directory and then rename the symlink.
$ ln -s /mnt/testdir/foo /mnt/testdir/bar
$ mv /mnt/testdir/bar /mnt/testdir/baz
# Now fsync again the directory, this persist the log tree.
$ xfs_io -c "fsync" /mnt/testdir
<power failure>
$ mount /dev/sdc /mnt
$ stat -c %s /mnt/testdir/baz
0
$ readlink /mnt/testdir/baz
$
Fix this by always logging symlinks in full mode (LOG_INODE_ALL), so that
their content is also logged.
A test case for fstests will follow.
CC: stable@vger.kernel.org # 4.9+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Compression and nodatacow are mutually exclusive. A similar issue was
fixed by commit f37c563bab429 ("btrfs: add missing check for nocow and
compression inode flags"). Besides ioctl, there is another way to
enable/disable/reset compression directly via xattr. The following
steps will result in a invalid combination.
$ touch bar
$ chattr +C bar
$ lsattr bar
---------------C-- bar
$ setfattr -n btrfs.compression -v zstd bar
$ lsattr bar
--------c------C-- bar
To align with the logic in check_fsflags, nocompress will also be
unacceptable after this patch, to prevent mix any compression-related
options with nodatacow.
$ touch bar
$ chattr +C bar
$ lsattr bar
---------------C-- bar
$ setfattr -n btrfs.compression -v zstd bar
setfattr: bar: Invalid argument
$ setfattr -n btrfs.compression -v no bar
setfattr: bar: Invalid argument
When both compression and nodatacow are enabled, then
btrfs_run_delalloc_range prefers nodatacow and no compression happens.
Reported-by: Jayce Lin <jaycelin@synology.com>
CC: stable@vger.kernel.org # 5.10.x: e6f9d6964802: btrfs: export a helper for compression hard check
CC: stable@vger.kernel.org # 5.10.x
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chung-Chiang Cheng <cccheng@synology.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
inode_can_compress will be used outside of inode.c to check the
availability of setting compression flag by xattr. This patch moves
this function as an internal helper and renames it to
btrfs_inode_can_compress.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Chung-Chiang Cheng <cccheng@synology.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
kernfs_remove supported NULL kernfs_node param to bail out but revent
per-fs lock change introduced regression that dereferencing the
param without NULL check so kernel goes crash.
This patch checks the NULL kernfs_node in kernfs_remove and if so,
just return.
Quote from bug report by Jirka
```
The bug is triggered by running NAS Parallel benchmark suite on
SuperMicro servers with 2x Xeon(R) Gold 6126 CPU. Here is the error
log:
[ 247.035564] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 247.036009] #PF: supervisor read access in kernel mode
[ 247.036009] #PF: error_code(0x0000) - not-present page
[ 247.036009] PGD 0 P4D 0
[ 247.036009] Oops: 0000 [#1] PREEMPT SMP PTI
[ 247.058060] CPU: 1 PID: 6546 Comm: umount Not tainted
5.16.0393c3714081a53795bbff0e985d24146def6f57f+ #16
[ 247.058060] Hardware name: Supermicro Super Server/X11DDW-L, BIOS
2.0b 03/07/2018
[ 247.058060] RIP: 0010:kernfs_remove+0x8/0x50
[ 247.058060] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 49 c7 c4 f4
ff ff ff eb b2 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00
41 54 55 <48> 8b 47 08 48 89 fd 48 85 c0 48 0f 44 c7 4c 8b 60 50 49 83
c4 60
[ 247.058060] RSP: 0018:ffffbbfa48a27e48 EFLAGS: 00010246
[ 247.058060] RAX: 0000000000000001 RBX: ffffffff89e31f98 RCX: 0000000080200018
[ 247.058060] RDX: 0000000080200019 RSI: fffff6760786c900 RDI: 0000000000000000
[ 247.058060] RBP: ffffffff89e31f98 R08: ffff926b61b24d00 R09: 0000000080200018
[ 247.122048] R10: ffff926b61b24d00 R11: ffff926a8040c000 R12: ffff927bd09a2000
[ 247.122048] R13: ffffffff89e31fa0 R14: dead000000000122 R15: dead000000000100
[ 247.122048] FS: 00007f01be0a8c40(0000) GS:ffff926fa8e40000(0000)
knlGS:0000000000000000
[ 247.122048] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 247.122048] CR2: 0000000000000008 CR3: 00000001145c6003 CR4: 00000000007706e0
[ 247.122048] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 247.122048] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 247.122048] PKRU: 55555554
[ 247.122048] Call Trace:
[ 247.122048] <TASK>
[ 247.122048] rdt_kill_sb+0x29d/0x350
[ 247.122048] deactivate_locked_super+0x36/0xa0
[ 247.122048] cleanup_mnt+0x131/0x190
[ 247.122048] task_work_run+0x5c/0x90
[ 247.122048] exit_to_user_mode_prepare+0x229/0x230
[ 247.122048] syscall_exit_to_user_mode+0x18/0x40
[ 247.122048] do_syscall_64+0x48/0x90
[ 247.122048] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 247.122048] RIP: 0033:0x7f01be2d735b
```
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215696
Link: https://lore.kernel.org/lkml/CAE4VaGDZr_4wzRn2___eDYRtmdPaGGJdzu_LCSkJYuY9BEO3cw@mail.gmail.com/
Fixes: 393c3714081a (kernfs: switch global kernfs_rwsem lock to per-fs lock)
Cc: stable@vger.kernel.org
Reported-by: Jirka Hladky <jhladky@redhat.com>
Tested-by: Jirka Hladky <jhladky@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Link: https://lore.kernel.org/r/20220427172152.3505364-1-minchan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|