Age | Commit message (Collapse) | Author | Files | Lines |
|
A minor issue with nilfs_rename, originating from an old ext2
implementation, is that the mtime is updated even if the rename target is
a directory and it is renamed within the same directory, rather than moved
to a different directory.
In this case, the child directory being renamed does not change in any
way, so changing its mtime is unnecessary according to the specification,
and can unnecessarily confuse backup tools.
In ext2, this issue was fixed by commit 39fe7557b4d6 ("ext2: Do not update
mtime of a moved directory") and a few subsequent fixes, but it remained
in nilfs2.
Fix this issue by not calling nilfs_set_link(), which rewrites the inode
number of the directory entry that refers to the parent directory, when
the move target is a directory and the source and destination are the same
directory.
Here, the directory to be moved only needs to be read if the inode number
of the parent directory is rewritten with nilfs_set_link, so also adjust
the execution conditions of the preparation work to avoid unnecessary
directory reads.
Link: https://lkml.kernel.org/r/20250111143518.7901-3-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "nilfs2: fix issues with rename operations".
This series fixes BUG_ON check failures reported by syzbot around rename
operations, and a minor behavioral issue where the mtime of a child
directory changes when it is renamed instead of moved.
This patch (of 2):
The directory manipulation routines nilfs_set_link() and
nilfs_delete_entry() rewrite the directory entry in the folio/page
previously read by nilfs_find_entry(), so error handling is omitted on the
assumption that nilfs_prepare_chunk(), which prepares the buffer for
rewriting, will always succeed for these. And if an error is returned, it
triggers the legacy BUG_ON() checks in each routine.
This assumption is wrong, as proven by syzbot: the buffer layer called by
nilfs_prepare_chunk() may call nilfs_get_block() if necessary, which may
fail due to metadata corruption or other reasons. This has been there all
along, but improved sanity checks and error handling may have made it more
reproducible in fuzzing tests.
Fix this issue by adding missing error paths in nilfs_set_link(),
nilfs_delete_entry(), and their caller nilfs_rename().
Link: https://lkml.kernel.org/r/20250111143518.7901-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20250111143518.7901-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+32c3706ebf5d95046ea1@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=32c3706ebf5d95046ea1
Reported-by: syzbot+1097e95f134f37d9395c@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1097e95f134f37d9395c
Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations")
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Also for comments that do not cause kernel-doc warnings (those that list
multiple error codes), revise the return value description style to match
Brian G.'s suggestion of "..., or one of the following negative error
codes on failure:".
Link: https://lkml.kernel.org/r/CAAq45aNh1qV8P6XgDhKeNstT=PvcPUaCXsAF-f9rvmzznsZL5A@mail.gmail.com
Link: https://lkml.kernel.org/r/20250110010530.21872-8-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
There are a number of kernel-doc comments for functions that are missing
return values, which also causes a number of warnings when the kernel-doc
script is run with the "-Wall" option.
Fix this issue by adding proper return value descriptions, and improve
code maintainability.
Link: https://lkml.kernel.org/r/20250110010530.21872-7-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Similar to the previous changes to fix return value descriptions, this
fixes the format of the return value descriptions of functions for the
rest.
Link: https://lkml.kernel.org/r/20250110010530.21872-6-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Similar to the previous changes to fix return value descriptions, this
fixes the format of the return value descriptions for metadata file
functions other than sufile.
Link: https://lkml.kernel.org/r/20250110010530.21872-5-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Similar to the previous changes to fix return value descriptions, this
fixes the format of the return value descriptions of functions for
sufile-related functions, eliminating a dozen warnings emitted by the
kernel-doc script.
Link: https://lkml.kernel.org/r/20250110010530.21872-4-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Similar to the previous patch to fix the ioctl return value descriptions,
this fixes the format of the return value descriptions for bmap (and
btree)-related functions, which was causing the kernel-doc script to emit
a number of warnings.
Link: https://lkml.kernel.org/r/20250110010530.21872-3-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "nilfs2: fix kernel-doc comments for function return values",
v2.
This series fixes the inadequacies in the return value descriptions in
nilfs2's kernel-doc comments (mainly incorrect formatting), as well as the
lack of return value descriptions themselves, and fixes most of the
remaining warnings that are output when the kernel-doc script is run with
the "-Wall" option.
This patch (of 7):
In the kernel-doc comments for functions, there are many cases where the
format of the return value description is inaccurate, such as "Return
Value: ...", which causes many warnings to be output when the kernel-doc
script is executed with the "-Wall" option.
This fixes such incorrectly formatted return value descriptions for ioctl
functions.
Link: https://lkml.kernel.org/r/20250110010530.21872-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20250110010530.21872-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: "Brian G ." <gissf1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
nilfs_lookup_dirty_data_buffers(), which iterates through the buffers
attached to dirty data folios/pages, accesses the attached buffers without
locking the folios/pages.
For data cache, nilfs_clear_folio_dirty() may be called asynchronously
when the file system degenerates to read only, so
nilfs_lookup_dirty_data_buffers() still has the potential to cause use
after free issues when buffers lose the protection of their dirty state
midway due to this asynchronous clearing and are unintentionally freed by
try_to_free_buffers().
Eliminate this race issue by adjusting the lock section in this function.
Link: https://lkml.kernel.org/r/20250107200202.6432-3-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption")
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "nilfs2: protect busy buffer heads from being force-cleared".
This series fixes the buffer head state inconsistency issues reported by
syzbot that occurs when the filesystem is corrupted and falls back to
read-only, and the associated buffer head use-after-free issue.
This patch (of 2):
Syzbot has reported that after nilfs2 detects filesystem corruption and
falls back to read-only, inconsistencies in the buffer state may occur.
One of the inconsistencies is that when nilfs2 calls mark_buffer_dirty()
to set a data or metadata buffer as dirty, but it detects that the buffer
is not in the uptodate state:
WARNING: CPU: 0 PID: 6049 at fs/buffer.c:1177 mark_buffer_dirty+0x2e5/0x520
fs/buffer.c:1177
...
Call Trace:
<TASK>
nilfs_palloc_commit_alloc_entry+0x4b/0x160 fs/nilfs2/alloc.c:598
nilfs_ifile_create_inode+0x1dd/0x3a0 fs/nilfs2/ifile.c:73
nilfs_new_inode+0x254/0x830 fs/nilfs2/inode.c:344
nilfs_mkdir+0x10d/0x340 fs/nilfs2/namei.c:218
vfs_mkdir+0x2f9/0x4f0 fs/namei.c:4257
do_mkdirat+0x264/0x3a0 fs/namei.c:4280
__do_sys_mkdirat fs/namei.c:4295 [inline]
__se_sys_mkdirat fs/namei.c:4293 [inline]
__x64_sys_mkdirat+0x87/0xa0 fs/namei.c:4293
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The other is when nilfs_btree_propagate(), which propagates the dirty
state to the ancestor nodes of a b-tree that point to a dirty buffer,
detects that the origin buffer is not dirty, even though it should be:
WARNING: CPU: 0 PID: 5245 at fs/nilfs2/btree.c:2089
nilfs_btree_propagate+0xc79/0xdf0 fs/nilfs2/btree.c:2089
...
Call Trace:
<TASK>
nilfs_bmap_propagate+0x75/0x120 fs/nilfs2/bmap.c:345
nilfs_collect_file_data+0x4d/0xd0 fs/nilfs2/segment.c:587
nilfs_segctor_apply_buffers+0x184/0x340 fs/nilfs2/segment.c:1006
nilfs_segctor_scan_file+0x28c/0xa50 fs/nilfs2/segment.c:1045
nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1216 [inline]
nilfs_segctor_collect fs/nilfs2/segment.c:1540 [inline]
nilfs_segctor_do_construct+0x1c28/0x6b90 fs/nilfs2/segment.c:2115
nilfs_segctor_construct+0x181/0x6b0 fs/nilfs2/segment.c:2479
nilfs_segctor_thread_construct fs/nilfs2/segment.c:2587 [inline]
nilfs_segctor_thread+0x69e/0xe80 fs/nilfs2/segment.c:2701
kthread+0x2f0/0x390 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
Both of these issues are caused by the callbacks that handle the
page/folio write requests, forcibly clear various states, including the
working state of the buffers they hold, at unexpected times when they
detect read-only fallback.
Fix these issues by checking if the buffer is referenced before clearing
the page/folio state, and skipping the clear if it is.
Link: https://lkml.kernel.org/r/20250107200202.6432-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20250107200202.6432-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+b2b14916b77acf8626d7@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b2b14916b77acf8626d7
Reported-by: syzbot+d98fd19acd08b36ff422@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=d98fd19acd08b36ff422
Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption")
Tested-by: syzbot+b2b14916b77acf8626d7@syzkaller.appspotmail.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The parameter is not used in __ocfs2_mknod_locked(). So remove it.
No functional change.
Link: https://lkml.kernel.org/r/20250106140634.92241-1-glass.su@suse.com
Signed-off-by: Su Yue <glass.su@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
While running fstests generic/329, the kernel workqueue
quota_release_workfn is dead looping in calling ocfs2_release_dquot().
The ocfs2 state is already readonly but ocfs2_release_dquot wants to
start a transaction but fails and returns.
=====================================================================
[ 2918.123602 ][ T275 ] On-disk corruption discovered. Please run
fsck.ocfs2 once the filesystem is unmounted.
[ 2918.124034 ][ T275 ] (kworker/u135:1,275,11):ocfs2_release_dquot:765
ERROR: status = -30
[ 2918.124452 ][ T275 ] (kworker/u135:1,275,11):ocfs2_release_dquot:795
ERROR: status = -30
[ 2918.124883 ][ T275 ] (kworker/u135:1,275,11):ocfs2_start_trans:357
ERROR: status = -30
[ 2918.125276 ][ T275 ] OCFS2: abort (device dm-0): ocfs2_start_trans:
Detected aborted journal
[ 2918.125710 ][ T275 ] On-disk corruption discovered. Please run
fsck.ocfs2 once the filesystem is unmounted.
=====================================================================
ocfs2_release_dquot() is much like dquot_release(), which is called by
ext4 to handle similar situation. So here fix it by marking the dquot as
inactive like what dquot_release() does.
Link: https://lkml.kernel.org/r/20250106140653.92292-1-glass.su@suse.com
Fixes: 9e33d69f553a ("ocfs2: Implementation of local and global quota file handling")
Signed-off-by: Su Yue <glass.su@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
syz reported:
(syz-executor404,5313,0):ocfs2_truncate_log_append:5874 ERROR: bug
expression: tl_count > ocfs2_truncate_recs_per_inode(osb->sb) ||
tl_count == 0
(syz-executor404,5313,0):ocfs2_truncate_log_append:5874 ERROR: Truncate
record count on #77 invalid wanted 39, actual 2087
------------[ cut here ]------------
kernel BUG at fs/ocfs2/alloc.c:5874!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 5313 Comm: syz-executor404 Not tainted
6.12.0-rc5-syzkaller-00299-g11066801dd4b #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:ocfs2_truncate_log_append+0x9a8/0x9c0 fs/ocfs2/alloc.c:5868
RSP: 0018:ffffc9000cf16f40 EFLAGS: 00010292
RAX: b4b54f1d10640800 RBX: 0000000000000027 RCX: b4b54f1d10640800
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: ffffc9000cf17070 R08: ffffffff8174a14c R09: 1ffff11003f8519a
R10: dffffc0000000000 R11: ffffed1003f8519b R12: 1ffff110085f5f58
R13: ffffff3800000000 R14: 000000000000004d R15: ffff8880438f0008
FS: 00005555722df380(0000) GS:ffff88801fc00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000002000f000 CR3: 000000004010e000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
ocfs2_remove_btree_range+0x1303/0x1860 fs/ocfs2/alloc.c:5789
ocfs2_remove_inode_range+0xff3/0x29f0 fs/ocfs2/file.c:1907
ocfs2_reflink_remap_extent fs/ocfs2/refcounttree.c:4537 [inline]
ocfs2_reflink_remap_blocks+0xcd4/0x1f30 fs/ocfs2/refcounttree.c:4684
ocfs2_remap_file_range+0x5fa/0x8d0 fs/ocfs2/file.c:2736
vfs_copy_file_range+0xc07/0x1510 fs/read_write.c:1615
__do_sys_copy_file_range fs/read_write.c:1705 [inline]
__se_sys_copy_file_range+0x3f2/0x5d0 fs/read_write.c:1668
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fd327167af9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 61 17 00 00 90 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe6b8e22e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000146
RAX: ffffffffffffffda RBX: 00007fd3271b005e RCX: 00007fd327167af9
RDX: 0000000000000006 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 00007fd3271de610 R08: 000000000000d8c2 R09: 0000000000000000
R10: 0000000020000640 R11: 0000000000000246 R12: 0000000000000001
R13: 00007ffe6b8e24b8 R14: 0000000000000001 R15: 0000000000000001
</TASK>
The fuzz image has a truncate log inode whose tl_count is bigger than
ocfs2_truncate_recs_per_inode() so it triggers the BUG in
ocfs2_truncate_log_append().
As what the check in ocfs2_truncate_log_append() does, just do same check
into ocfs2_get_truncate_log_info when truncate log inode is reading in so
we can bail out earlier.
Link: https://lkml.kernel.org/r/20250108024119.60313-1-glass.su@suse.com
Signed-off-by: Su Yue <glass.su@suse.com>
Reported-by: Liebes Wang <wanghaichi0403@gmail.com>
Link: https://lore.kernel.org/ocfs2-devel/CADCV8souQhdP0RdQF1U7KTWtuHDfpn+3LnTt-EEuMmB-pMRrgQ@mail.gmail.com/T/#u
Reported-by: syzbot+a66542ca5ebb4233b563@syzkaller.appspotmail.com
Tested-by: syzbot+a66542ca5ebb4233b563@syzkaller.appspotmail.com
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Correct the value of l_next_free_rec to l_count during the online check,
as done in the check_el() function in ocfs2_tools.
Link: https://lkml.kernel.org/r/20250106023432.1320904-2-sunjunchao2870@gmail.com
Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Recently syzbot reported a use-after-free issue[1].
The root cause of the problem is that the journal inode recorded in this
file system image is corrupted. The value of
"di->id2.i_list.l_next_free_rec" is 8193, which is greater than the value
of "di->id2.i_list.l_count" (19).
To solve this problem, an additional check should be added within
ocfs2_get_clusters_nocache(). If the check fails, an error will be
returned and the file system will be set to read-only.
[1]: https://lore.kernel.org/all/67577778.050a0220.a30f1.01bc.GAE@google.com/T/
Link: https://lkml.kernel.org/r/20250106023432.1320904-1-sunjunchao2870@gmail.com
Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
Reported-by: syzbot+2313dda4dc4885c93578@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=2313dda4dc4885c93578
Tested-by: syzbot+2313dda4dc4885c93578@syzkaller.appspotmail.com
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
squashfs_fill_page is only used in this file, so make it static.
Use kmap_local instead of kmap_atomic, and return a bool so that
the caller can use folio_end_read() which saves an atomic operation
over calling folio_mark_uptodate() followed by folio_unlock().
[willy@infradead.org: fix polarity of "uptodate" Thanks to Ryan for testing]
Link: https://lkml.kernel.org/r/20250110163300.3346321-2-willy@infradead.org
Link: https://lkml.kernel.org/r/20241220224634.723899-5-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Remove accesses to page->index and page->mapping. Also use folio
APIs where available. This code still assumes order 0 folios.
[dan.carpenter@linaro.org: fix a NULL vs IS_ERR() bug]
Link: https://lkml.kernel.org/r/7b7f44d6-9153-4d7c-b65b-2d78febe6c7a@stanley.mountain
Link: https://lkml.kernel.org/r/20241220224634.723899-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Remove a few accesses to page->mapping.
Link: https://lkml.kernel.org/r/20241220224634.723899-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Remove an access to page->mapping.
Link: https://lkml.kernel.org/r/20241220224634.723899-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Use modern folio APIs where they exist and convert back to struct
page for the internal functions.
Link: https://lkml.kernel.org/r/20241220224634.723899-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Update the compression algorithms supported, and the Squashfs website
location.
Link: https://lkml.kernel.org/r/20241229233752.54481-5-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
If Squashfs has been configured to directly read datablocks into the page
cache (SQUASHFS_FILE_DIRECT), then the read_page cache is unnecessary.
This improvement is due to the following two commits, which added the
ability to read datablocks into the page cache when pages were missing,
enabling the fallback which used an intermediate buffer to be removed.
commit f268eedddf359 ("squashfs: extend "page actor" to handle missing pages")
commit 1bb1a07afad97 ("squashfs: don't use intermediate buffer if pages missing")
This reduces the amount of memory used when mounting a filesystem by
block_size * maximum number of threads.
Link: https://lkml.kernel.org/r/20241229233752.54481-3-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "squashfs: reduce memory usage and update docs".
This patchset reduces the amount of memory that Squashfs uses when
CONFIG_FILE_DIRECT is configured, and updates various out of date
information in the documentation and Kconfig.
This patch (of 4):
Make squashfs_cache_init() return an ERR_PTR(-ENOMEM) on failure rather
than NULL.
This tidies up some calling code, but, it also allows NULL to be returned
as a valid result when a cache hasn't be allocated.
Link: https://lkml.kernel.org/r/20241229233752.54481-1-phillip@squashfs.org.uk
Link: https://lkml.kernel.org/r/20241229233752.54481-2-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When testing the atomic write fix patches, the f2fs_bug_on was
triggered as below:
------------[ cut here ]------------
kernel BUG at fs/f2fs/inode.c:935!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 3 UID: 0 PID: 257 Comm: bash Not tainted 6.13.0-rc1-00033-gc283a70d3497 #5
RIP: 0010:f2fs_evict_inode+0x50f/0x520
Call Trace:
<TASK>
? __die_body+0x65/0xb0
? die+0x9f/0xc0
? do_trap+0xa1/0x170
? f2fs_evict_inode+0x50f/0x520
? f2fs_evict_inode+0x50f/0x520
? handle_invalid_op+0x65/0x80
? f2fs_evict_inode+0x50f/0x520
? exc_invalid_op+0x39/0x50
? asm_exc_invalid_op+0x1a/0x20
? __pfx_f2fs_get_dquots+0x10/0x10
? f2fs_evict_inode+0x50f/0x520
? f2fs_evict_inode+0x2e5/0x520
evict+0x186/0x2f0
prune_icache_sb+0x75/0xb0
super_cache_scan+0x1a8/0x200
do_shrink_slab+0x163/0x320
shrink_slab+0x2fc/0x470
drop_slab+0x82/0xf0
drop_caches_sysctl_handler+0x4e/0xb0
proc_sys_call_handler+0x183/0x280
vfs_write+0x36d/0x450
ksys_write+0x68/0xd0
do_syscall_64+0xc8/0x1a0
? arch_exit_to_user_mode_prepare+0x11/0x60
? irqentry_exit_to_user_mode+0x7e/0xa0
The root cause is: f2fs uses FI_ATOMIC_DIRTIED to indicate dirty
atomic files during commit. If the inode is dirtied during commit,
such as by f2fs_i_pino_write, the vfs inode keeps clean and the
f2fs inode is set to FI_DIRTY_INODE. The FI_DIRTY_INODE flag cann't
be cleared by write_inode later due to the clean vfs inode. Finally,
f2fs_bug_on is triggered due to this inconsistent state when evict.
To reproduce this situation:
- fd = open("/mnt/test.db", O_WRONLY)
- ioctl(fd, F2FS_IOC_START_ATOMIC_WRITE)
- mv /mnt/test.db /mnt/test1.db
- ioctl(fd, F2FS_IOC_COMMIT_ATOMIC_WRITE)
- echo 3 > /proc/sys/vm/drop_caches
To fix this problem, clear FI_DIRTY_INODE after commit, then
f2fs_mark_inode_dirty_sync will ensure a consistent dirty state.
Fixes: fccaa81de87e ("f2fs: prevent atomic file from being dirtied before commit")
Signed-off-by: Yunlei He <heyunlei@xiaomi.com>
Signed-off-by: Jianan Huang <huangjianan@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
- Increase the headroom in the EFI memory map allocation created by the
EFI stub. This is needed because event callbacks called during
ExitBootServices() may cause fragmentation, and reallocation is not
allowed after that.
- Drop obsolete UGA graphics code and switch to a more ergonomic API to
traverse handle buffers. Simplify some error paths using a __free()
helper while at it.
- Fix some W=1 warnings when CONFIG_EFI=n
- Rely on the dentry cache to keep track of the contents of the
efivarfs filesystem, rather than using a separate linked list.
- Improve and extend efivarfs test cases.
- Synchronize efivarfs with underlying variable store on resume from
hibernation - this is needed because the firmware itself or another
OS running on the same machine may have modified it.
- Fix x86 EFI stub build with GCC 15.
- Fix kexec/x86 false positive warning in EFI memory attributes table
sanity check.
* tag 'efi-next-for-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (23 commits)
x86/efi: skip memattr table on kexec boot
efivarfs: add variable resync after hibernation
efivarfs: abstract initial variable creation routine
efi: libstub: Use '-std=gnu11' to fix build with GCC 15
selftests/efivarfs: add concurrent update tests
selftests/efivarfs: fix tests for failed write removal
efivarfs: fix error on write to new variable leaving remnants
efivarfs: remove unused efivarfs_list
efivarfs: move variable lifetime management into the inodes
selftests/efivarfs: add check for disallowing file truncation
efivarfs: prevent setting of zero size on the inodes in the cache
efi: sysfb_efi: fix W=1 warnings when EFI is not set
efi/libstub: Use __free() helper for pool deallocations
efi/libstub: Use cleanup helpers for freeing copies of the memory map
efi/libstub: Simplify PCI I/O handle buffer traversal
efi/libstub: Refactor and clean up GOP resolution picker code
efi/libstub: Simplify GOP handling code
efi/libstub: Use C99-style for loop to traverse handle buffer
x86/efistub: Drop long obsolete UGA support
efivarfs: make variable_is_present use dcache lookup
...
|
|
The XFS_IOC_EXCHANGE_RANGE ioctl with the XFS_EXCHANGE_RANGE_TO_EOF flag
operates on a range bounded by the end of the file. This means the
actual amount of blocks exchanged is derived from the inode size, which
is only stable with the IOLOCK (i_rwsem) held. Do that, it currently
calls remap_verify_area from inside the sb write protection which nests
outside the IOLOCK. But this makes fsnotify_file_area_perm which is
called from remap_verify_area unhappy when the kernel is built with
lockdep and the recently added CONFIG_FANOTIFY_ACCESS_PERMISSIONS
option.
Fix this by always calling remap_verify_area before taking the write
protection, and passing a 0 size to remap_verify_area similar to
the FICLONE/FICLONERANGE ioctls when they are asked to clone until
the file end.
(Note: the size argument gets passed to fsnotify_file_area_perm, but
then isn't actually used there).
Fixes: 9a64d9b3109d ("xfs: introduce new file range exchange ioctl")
Cc: <stable@vger.kernel.org> # v6.10
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
There hasn't been anything like an io_length for a long time.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
When we call create(), lseek() and write() sequentially, offset != 0
cannot be used as a judgment condition for whether the file already
has extents.
Furthermore, when xfs_bmap_adjacent() has not given a better blkno,
it is not necessary to use exact EOF block allocation.
Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
fuse-over-io-uring uses existing functions to find requests based
on their unique id - make these functions non-static.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Luis Henriques <luis@igalia.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Add special fuse-io-uring into the fuse argument
copy handler.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Luis Henriques <luis@igalia.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Move 'struct fuse_copy_state' and fuse_copy_* functions
to fuse_dev_i.h to make it available for fuse-io-uring.
'copy_out_args()' is renamed to 'fuse_copy_out_args'.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Luis Henriques <luis@igalia.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This adds basic support for ring SQEs (with opcode=IORING_OP_URING_CMD).
For now only FUSE_IO_URING_CMD_REGISTER is handled to register queue
entries.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> # io_uring
Reviewed-by: Luis Henriques <luis@igalia.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This change sets up FUSE operations to always have headers in
args.in_args[0], even for opcodes without an actual header.
This step prepares for a clean separation of payload from headers,
initially it is used by fuse-over-io-uring.
For opcodes without a header, we use a zero-sized struct as a
placeholder. This approach:
- Keeps things consistent across all FUSE operations
- Will help with payload alignment later
- Avoids future issues when header sizes change
Op codes that already have an op code specific header do not
need modification.
Op codes that have neither payload nor op code headers
are not modified either (FUSE_READLINK and FUSE_DESTROY).
FUSE_BATCH_FORGET already has the header in the right place,
but is not using fuse_copy_args - as -over-uring is currently
not handling forgets it does not matter for now, but header
separation will later need special attention for that op code.
Correct the struct fuse_args->in_args array max size.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Luis Henriques <luis@igalia.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
These are needed by fuse-over-io-uring.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Luis Henriques <luis@igalia.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Another preparation patch, as this function will be needed by
fuse/dev.c and fuse/dev_uring.c.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Luis Henriques <luis@igalia.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This function is needed by fuse_uring.c to clean ring queues,
so make it non static. Especially in non-static mode the function
name 'end_requests' should be prefixed with fuse_
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Luis Henriques <luis@igalia.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
git://git.samba.org/sfrench/cifs-2.6
Pull smb client updates from Steve French:
- Fix oops in DebugData when link speed 0
- Two reparse point fixes
- Ten DFS (global namespace) fixes
- Symlink error handling fix
- Two SMB1 fixes
- Four cleanup fixes
- Improved debugging of status codes
- Fix incorrect output of tracepoints for compounding, and add missing
compounding tracepoint
* tag 'v6.14-rc-smb3-client-fixes-part' of git://git.samba.org/sfrench/cifs-2.6: (23 commits)
smb: client: handle lack of EA support in smb2_query_path_info()
smb: client: don't check for @leaf_fullpath in match_server()
smb: client: get rid of TCP_Server_Info::refpath_lock
cifs: Remove duplicate struct reparse_symlink_data and SYMLINK_FLAG_RELATIVE
cifs: Do not attempt to call CIFSGetSrvInodeNumber() without CAP_INFOLEVEL_PASSTHRU
cifs: Do not attempt to call CIFSSMBRenameOpenFile() without CAP_INFOLEVEL_PASSTHRU
cifs: Remove declaration of dead CIFSSMBQuerySymLink function
cifs: Fix printing Status code into dmesg
cifs: Add missing NT_STATUS_* codes from nterr.h to nterr.c
cifs: Fix endian types in struct rfc1002_session_packet
cifs: Use cifs_autodisable_serverino() for disabling CIFS_MOUNT_SERVER_INUM in readdir.c
smb3: add missing tracepoint for querying wsl EAs
smb: client: fix order of arguments of tracepoints
smb: client: fix oops due to unset link speed
smb: client: correctly handle ErrorContextData as a flexible array
smb: client: don't retry DFS targets on server shutdown
smb: client: fix return value of parse_dfs_referrals()
smb: client: optimize referral walk on failed link targets
smb: client: provide dns_resolve_{unc,name} helpers
smb: client: parse DNS domain name from domain= option
...
|
|
Pull smb server updates from Steve French:
"Three ksmbd server fixes:
- Fix potential memory corruption in IPC calls
- Support FSCTL_QUERY_INTERFACE_INFO for more configurations
- Remove some unused functions"
* tag 'v6.14-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix integer overflows on 32 bit systems
ksmbd: browse interfaces list on FSCTL_QUERY_INTERFACE_INFO IOCTL
ksmbd: Remove unused functions
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify pre-content notification support from Jan Kara:
"This introduces a new fsnotify event (FS_PRE_ACCESS) that gets
generated before a file contents is accessed.
The event is synchronous so if there is listener for this event, the
kernel waits for reply. On success the execution continues as usual,
on failure we propagate the error to userspace. This allows userspace
to fill in file content on demand from slow storage. The context in
which the events are generated has been picked so that we don't hold
any locks and thus there's no risk of a deadlock for the userspace
handler.
The new pre-content event is available only for users with global
CAP_SYS_ADMIN capability (similarly to other parts of fanotify
functionality) and it is an administrator responsibility to make sure
the userspace event handler doesn't do stupid stuff that can DoS the
system.
Based on your feedback from the last submission, fsnotify code has
been improved and now file->f_mode encodes whether pre-content event
needs to be generated for the file so the fast path when nobody wants
pre-content event for the file just grows the additional file->f_mode
check. As a bonus this also removes the checks whether the old
FS_ACCESS event needs to be generated from the fast path. Also the
place where the event is generated during page fault has been moved so
now filemap_fault() generates the event if and only if there is no
uptodate folio in the page cache.
Also we have dropped FS_PRE_MODIFY event as current real-world users
of the pre-content functionality don't really use it so let's start
with the minimal useful feature set"
* tag 'fsnotify_hsm_for_v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (21 commits)
fanotify: Fix crash in fanotify_init(2)
fs: don't block write during exec on pre-content watched files
fs: enable pre-content events on supported file systems
ext4: add pre-content fsnotify hook for DAX faults
btrfs: disable defrag on pre-content watched files
xfs: add pre-content fsnotify hook for DAX faults
fsnotify: generate pre-content permission event on page fault
mm: don't allow huge faults for files with pre content watches
fanotify: disable readahead if we have pre-content watches
fanotify: allow to set errno in FAN_DENY permission response
fanotify: report file range info with pre-content events
fanotify: introduce FAN_PRE_ACCESS permission event
fsnotify: generate pre-content permission event on truncate
fsnotify: pass optional file access range in pre-content event
fsnotify: introduce pre-content permission events
fanotify: reserve event bit of deprecated FAN_DIR_MODIFY
fanotify: rename a misnamed constant
fanotify: don't skip extra event info if no info_mode is set
fsnotify: check if file is actually being watched for pre-content events on open
fsnotify: opt-in for permission events at file open time
...
|
|
At btrfs_qgroup_cleanup_dropped_subvolume() all we want to commit the
current transaction in order to have all the qgroup rfer/excl numbers up
to date. However we are using btrfs_start_transaction(), which joins the
current transaction if there is one that is not yet committing, but also
starts a new one if there is none or if the current one is already
committing (its state is >= TRANS_STATE_COMMIT_START). This later case
results in unnecessary IO, wasting time and a pointless rotation of the
backup roots in the super block.
So instead of using btrfs_start_transaction() followed by a
btrfs_commit_transaction(), use btrfs_commit_current_transaction() which
achieves our purpose and avoids starting and committing new transactions.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When we are trying to join the current transaction and if it's aborted,
we read its 'aborted' field after unlocking fs_info->trans_lock and
without holding any extra reference count on it. This means that a
concurrent task that is aborting the transaction may free the transaction
before we read its 'aborted' field, leading to a use-after-free.
Fix this by reading the 'aborted' field while holding fs_info->trans_lock
since any freeing task must first acquire that lock and set
fs_info->running_transaction to NULL before freeing the transaction.
This was reported by syzbot and Dmitry with the following stack traces
from KASAN:
==================================================================
BUG: KASAN: slab-use-after-free in join_transaction+0xd9b/0xda0 fs/btrfs/transaction.c:278
Read of size 4 at addr ffff888011839024 by task kworker/u4:9/1128
CPU: 0 UID: 0 PID: 1128 Comm: kworker/u4:9 Not tainted 6.13.0-rc7-syzkaller-00019-gc45323b7560e #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Workqueue: events_unbound btrfs_async_reclaim_data_space
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0x169/0x550 mm/kasan/report.c:489
kasan_report+0x143/0x180 mm/kasan/report.c:602
join_transaction+0xd9b/0xda0 fs/btrfs/transaction.c:278
start_transaction+0xaf8/0x1670 fs/btrfs/transaction.c:697
flush_space+0x448/0xcf0 fs/btrfs/space-info.c:803
btrfs_async_reclaim_data_space+0x159/0x510 fs/btrfs/space-info.c:1321
process_one_work kernel/workqueue.c:3236 [inline]
process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3317
worker_thread+0x870/0xd30 kernel/workqueue.c:3398
kthread+0x2f0/0x390 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
Allocated by task 5315:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
__kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:394
kasan_kmalloc include/linux/kasan.h:260 [inline]
__kmalloc_cache_noprof+0x243/0x390 mm/slub.c:4329
kmalloc_noprof include/linux/slab.h:901 [inline]
join_transaction+0x144/0xda0 fs/btrfs/transaction.c:308
start_transaction+0xaf8/0x1670 fs/btrfs/transaction.c:697
btrfs_create_common+0x1b2/0x2e0 fs/btrfs/inode.c:6572
lookup_open fs/namei.c:3649 [inline]
open_last_lookups fs/namei.c:3748 [inline]
path_openat+0x1c03/0x3590 fs/namei.c:3984
do_filp_open+0x27f/0x4e0 fs/namei.c:4014
do_sys_openat2+0x13e/0x1d0 fs/open.c:1402
do_sys_open fs/open.c:1417 [inline]
__do_sys_creat fs/open.c:1495 [inline]
__se_sys_creat fs/open.c:1489 [inline]
__x64_sys_creat+0x123/0x170 fs/open.c:1489
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 5336:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:582
poison_slab_object mm/kasan/common.c:247 [inline]
__kasan_slab_free+0x59/0x70 mm/kasan/common.c:264
kasan_slab_free include/linux/kasan.h:233 [inline]
slab_free_hook mm/slub.c:2353 [inline]
slab_free mm/slub.c:4613 [inline]
kfree+0x196/0x430 mm/slub.c:4761
cleanup_transaction fs/btrfs/transaction.c:2063 [inline]
btrfs_commit_transaction+0x2c97/0x3720 fs/btrfs/transaction.c:2598
insert_balance_item+0x1284/0x20b0 fs/btrfs/volumes.c:3757
btrfs_balance+0x992/0x10c0 fs/btrfs/volumes.c:4633
btrfs_ioctl_balance+0x493/0x7c0 fs/btrfs/ioctl.c:3670
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:906 [inline]
__se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The buggy address belongs to the object at ffff888011839000
which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 36 bytes inside of
freed 2048-byte region [ffff888011839000, ffff888011839800)
The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11838
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0xfff00000000040(head|node=0|zone=1|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 00fff00000000040 ffff88801ac42000 ffffea0000493400 dead000000000002
raw: 0000000000000000 0000000000080008 00000001f5000000 0000000000000000
head: 00fff00000000040 ffff88801ac42000 ffffea0000493400 dead000000000002
head: 0000000000000000 0000000000080008 00000001f5000000 0000000000000000
head: 00fff00000000003 ffffea0000460e01 ffffffffffffffff 0000000000000000
head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 57, tgid 57 (kworker/0:2), ts 67248182943, free_ts 67229742023
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1558
prep_new_page mm/page_alloc.c:1566 [inline]
get_page_from_freelist+0x365c/0x37a0 mm/page_alloc.c:3476
__alloc_pages_noprof+0x292/0x710 mm/page_alloc.c:4753
alloc_pages_mpol_noprof+0x3e1/0x780 mm/mempolicy.c:2269
alloc_slab_page+0x6a/0x110 mm/slub.c:2423
allocate_slab+0x5a/0x2b0 mm/slub.c:2589
new_slab mm/slub.c:2642 [inline]
___slab_alloc+0xc27/0x14a0 mm/slub.c:3830
__slab_alloc+0x58/0xa0 mm/slub.c:3920
__slab_alloc_node mm/slub.c:3995 [inline]
slab_alloc_node mm/slub.c:4156 [inline]
__do_kmalloc_node mm/slub.c:4297 [inline]
__kmalloc_node_track_caller_noprof+0x2e9/0x4c0 mm/slub.c:4317
kmalloc_reserve+0x111/0x2a0 net/core/skbuff.c:609
__alloc_skb+0x1f3/0x440 net/core/skbuff.c:678
alloc_skb include/linux/skbuff.h:1323 [inline]
alloc_skb_with_frags+0xc3/0x820 net/core/skbuff.c:6612
sock_alloc_send_pskb+0x91a/0xa60 net/core/sock.c:2884
sock_alloc_send_skb include/net/sock.h:1803 [inline]
mld_newpack+0x1c3/0xaf0 net/ipv6/mcast.c:1747
add_grhead net/ipv6/mcast.c:1850 [inline]
add_grec+0x1492/0x19a0 net/ipv6/mcast.c:1988
mld_send_cr net/ipv6/mcast.c:2114 [inline]
mld_ifc_work+0x691/0xd90 net/ipv6/mcast.c:2651
page last free pid 5300 tgid 5300 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
free_pages_prepare mm/page_alloc.c:1127 [inline]
free_unref_page+0xd3f/0x1010 mm/page_alloc.c:2659
__slab_free+0x2c2/0x380 mm/slub.c:4524
qlink_free mm/kasan/quarantine.c:163 [inline]
qlist_free_all+0x9a/0x140 mm/kasan/quarantine.c:179
kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286
__kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:329
kasan_slab_alloc include/linux/kasan.h:250 [inline]
slab_post_alloc_hook mm/slub.c:4119 [inline]
slab_alloc_node mm/slub.c:4168 [inline]
__do_kmalloc_node mm/slub.c:4297 [inline]
__kmalloc_noprof+0x236/0x4c0 mm/slub.c:4310
kmalloc_noprof include/linux/slab.h:905 [inline]
kzalloc_noprof include/linux/slab.h:1037 [inline]
fib_create_info+0xc14/0x25b0 net/ipv4/fib_semantics.c:1435
fib_table_insert+0x1f6/0x1f20 net/ipv4/fib_trie.c:1231
fib_magic+0x3d8/0x620 net/ipv4/fib_frontend.c:1112
fib_add_ifaddr+0x40c/0x5e0 net/ipv4/fib_frontend.c:1156
fib_netdev_event+0x375/0x490 net/ipv4/fib_frontend.c:1494
notifier_call_chain+0x1a5/0x3f0 kernel/notifier.c:85
__dev_notify_flags+0x207/0x400
dev_change_flags+0xf0/0x1a0 net/core/dev.c:9045
do_setlink+0xc90/0x4210 net/core/rtnetlink.c:3109
rtnl_changelink net/core/rtnetlink.c:3723 [inline]
__rtnl_newlink net/core/rtnetlink.c:3875 [inline]
rtnl_newlink+0x1bb6/0x2210 net/core/rtnetlink.c:4012
Memory state around the buggy address:
ffff888011838f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888011838f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888011839000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888011839080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888011839100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Reported-by: syzbot+45212e9d87a98c3f5b42@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/678e7da5.050a0220.303755.007c.GAE@google.com/
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://lore.kernel.org/linux-btrfs/CACT4Y+ZFBdo7pT8L2AzM=vegZwjp-wNkVJZQf0Ta3vZqtExaSw@mail.gmail.com/
Fixes: 871383be592b ("btrfs: add missing unlocks to transaction abort paths")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
[BUG]
There is a bug report that btrfs outputs the following error message:
BTRFS info (device nvme0n1p2): qgroup scan completed (inconsistency flag cleared)
BTRFS warning (device nvme0n1p2): failed to cleanup qgroup 0/1179: -2
[CAUSE]
The error itself is pretty harmless, and the end user should ignore it.
When a subvolume is fully dropped, btrfs will call
btrfs_qgroup_cleanup_dropped_subvolume() to delete the qgroup.
However if a qgroup rescan happened before a subvolume fully dropped,
qgroup for that subvolume will not be re-created, as rescan will only
create new qgroup if there is a BTRFS_ROOT_REF_KEY found.
But before we drop a subvolume, the subvolume is unlinked thus there is no
BTRFS_ROOT_REF_KEY.
In that case, btrfs_remove_qgroup() will fail with -ENOENT and trigger
the above error message.
[FIX]
Just ignore -ENOENT error from btrfs_remove_qgroup() inside
btrfs_qgroup_cleanup_dropped_subvolume().
Reported-by: John Shand <jshand2013@gmail.com>
Link: https://bugzilla.suse.com/show_bug.cgi?id=1236056
Fixes: 839d6ea4f86d ("btrfs: automatically remove the subvolume qgroup")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
abort
If while we are doing a direct IO write a transaction abort happens, we
mark all existing ordered extents with the BTRFS_ORDERED_IOERR flag (done
at btrfs_destroy_ordered_extents()), and then after that if we enter
btrfs_split_ordered_extent() and the ordered extent has bytes left
(meaning we have a bio that doesn't cover the whole ordered extent, see
details at btrfs_extract_ordered_extent()), we will fail on the following
assertion at btrfs_split_ordered_extent():
ASSERT(!(flags & ~BTRFS_ORDERED_TYPE_FLAGS));
because the BTRFS_ORDERED_IOERR flag is set and the definition of
BTRFS_ORDERED_TYPE_FLAGS is just the union of all flags that identify the
type of write (regular, nocow, prealloc, compressed, direct IO, encoded).
Fix this by returning an error from btrfs_extract_ordered_extent() if we
find the BTRFS_ORDERED_IOERR flag in the ordered extent. The error will
be the error that resulted in the transaction abort or -EIO if no
transaction abort happened.
This was recently reported by syzbot with the following trace:
FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 1
CPU: 0 UID: 0 PID: 5321 Comm: syz.0.0 Not tainted 6.13.0-rc5-syzkaller #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
fail_dump lib/fault-inject.c:53 [inline]
should_fail_ex+0x3b0/0x4e0 lib/fault-inject.c:154
should_failslab+0xac/0x100 mm/failslab.c:46
slab_pre_alloc_hook mm/slub.c:4072 [inline]
slab_alloc_node mm/slub.c:4148 [inline]
__do_kmalloc_node mm/slub.c:4297 [inline]
__kmalloc_noprof+0xdd/0x4c0 mm/slub.c:4310
kmalloc_noprof include/linux/slab.h:905 [inline]
kzalloc_noprof include/linux/slab.h:1037 [inline]
btrfs_chunk_alloc_add_chunk_item+0x244/0x1100 fs/btrfs/volumes.c:5742
reserve_chunk_space+0x1ca/0x2c0 fs/btrfs/block-group.c:4292
check_system_chunk fs/btrfs/block-group.c:4319 [inline]
do_chunk_alloc fs/btrfs/block-group.c:3891 [inline]
btrfs_chunk_alloc+0x77b/0xf80 fs/btrfs/block-group.c:4187
find_free_extent_update_loop fs/btrfs/extent-tree.c:4166 [inline]
find_free_extent+0x42d1/0x5810 fs/btrfs/extent-tree.c:4579
btrfs_reserve_extent+0x422/0x810 fs/btrfs/extent-tree.c:4672
btrfs_new_extent_direct fs/btrfs/direct-io.c:186 [inline]
btrfs_get_blocks_direct_write+0x706/0xfa0 fs/btrfs/direct-io.c:321
btrfs_dio_iomap_begin+0xbb7/0x1180 fs/btrfs/direct-io.c:525
iomap_iter+0x697/0xf60 fs/iomap/iter.c:90
__iomap_dio_rw+0xeb9/0x25b0 fs/iomap/direct-io.c:702
btrfs_dio_write fs/btrfs/direct-io.c:775 [inline]
btrfs_direct_write+0x610/0xa30 fs/btrfs/direct-io.c:880
btrfs_do_write_iter+0x2a0/0x760 fs/btrfs/file.c:1397
do_iter_readv_writev+0x600/0x880
vfs_writev+0x376/0xba0 fs/read_write.c:1050
do_pwritev fs/read_write.c:1146 [inline]
__do_sys_pwritev2 fs/read_write.c:1204 [inline]
__se_sys_pwritev2+0x196/0x2b0 fs/read_write.c:1195
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f1281f85d29
RSP: 002b:00007f12819fe038 EFLAGS: 00000246 ORIG_RAX: 0000000000000148
RAX: ffffffffffffffda RBX: 00007f1282176080 RCX: 00007f1281f85d29
RDX: 0000000000000001 RSI: 0000000020000240 RDI: 0000000000000005
RBP: 00007f12819fe090 R08: 0000000000000000 R09: 0000000000000003
R10: 0000000000007000 R11: 0000000000000246 R12: 0000000000000002
R13: 0000000000000000 R14: 00007f1282176080 R15: 00007ffcb9e23328
</TASK>
BTRFS error (device loop0 state A): Transaction aborted (error -12)
BTRFS: error (device loop0 state A) in btrfs_chunk_alloc_add_chunk_item:5745: errno=-12 Out of memory
BTRFS info (device loop0 state EA): forced readonly
assertion failed: !(flags & ~BTRFS_ORDERED_TYPE_FLAGS), in fs/btrfs/ordered-data.c:1234
------------[ cut here ]------------
kernel BUG at fs/btrfs/ordered-data.c:1234!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 5321 Comm: syz.0.0 Not tainted 6.13.0-rc5-syzkaller #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:btrfs_split_ordered_extent+0xd8d/0xe20 fs/btrfs/ordered-data.c:1234
RSP: 0018:ffffc9000d1df2b8 EFLAGS: 00010246
RAX: 0000000000000057 RBX: 000000000006a000 RCX: 9ce21886c4195300
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: 0000000000000091 R08: ffffffff817f0a3c R09: 1ffff92001a3bdf4
R10: dffffc0000000000 R11: fffff52001a3bdf5 R12: 1ffff1100a45f401
R13: ffff8880522fa018 R14: dffffc0000000000 R15: 000000000006a000
FS: 00007f12819fe6c0(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000557750bd7da8 CR3: 00000000400ea000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
btrfs_extract_ordered_extent fs/btrfs/direct-io.c:702 [inline]
btrfs_dio_submit_io+0x4be/0x6d0 fs/btrfs/direct-io.c:737
iomap_dio_submit_bio fs/iomap/direct-io.c:85 [inline]
iomap_dio_bio_iter+0x1022/0x1740 fs/iomap/direct-io.c:447
__iomap_dio_rw+0x13b7/0x25b0 fs/iomap/direct-io.c:703
btrfs_dio_write fs/btrfs/direct-io.c:775 [inline]
btrfs_direct_write+0x610/0xa30 fs/btrfs/direct-io.c:880
btrfs_do_write_iter+0x2a0/0x760 fs/btrfs/file.c:1397
do_iter_readv_writev+0x600/0x880
vfs_writev+0x376/0xba0 fs/read_write.c:1050
do_pwritev fs/read_write.c:1146 [inline]
__do_sys_pwritev2 fs/read_write.c:1204 [inline]
__se_sys_pwritev2+0x196/0x2b0 fs/read_write.c:1195
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f1281f85d29
RSP: 002b:00007f12819fe038 EFLAGS: 00000246 ORIG_RAX: 0000000000000148
RAX: ffffffffffffffda RBX: 00007f1282176080 RCX: 00007f1281f85d29
RDX: 0000000000000001 RSI: 0000000020000240 RDI: 0000000000000005
RBP: 00007f12819fe090 R08: 0000000000000000 R09: 0000000000000003
R10: 0000000000007000 R11: 0000000000000246 R12: 0000000000000002
R13: 0000000000000000 R14: 00007f1282176080 R15: 00007ffcb9e23328
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:btrfs_split_ordered_extent+0xd8d/0xe20 fs/btrfs/ordered-data.c:1234
RSP: 0018:ffffc9000d1df2b8 EFLAGS: 00010246
RAX: 0000000000000057 RBX: 000000000006a000 RCX: 9ce21886c4195300
RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
RBP: 0000000000000091 R08: ffffffff817f0a3c R09: 1ffff92001a3bdf4
R10: dffffc0000000000 R11: fffff52001a3bdf5 R12: 1ffff1100a45f401
R13: ffff8880522fa018 R14: dffffc0000000000 R15: 000000000006a000
FS: 00007f12819fe6c0(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000557750bd7da8 CR3: 00000000400ea000 CR4: 0000000000352ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
In this case the transaction abort was due to (an injected) memory
allocation failure when attempting to allocate a new chunk.
Reported-by: syzbot+f60d8337a5c8e8d92a77@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/6777f2dd.050a0220.178762.0045.GAE@google.com/
Fixes: 52b1fdca23ac ("btrfs: handle completed ordered extents in btrfs_split_ordered_extent")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When COWing a relocation tree path, at relocation.c:replace_path(), we
can trigger a lockdep splat while we are in the btrfs_search_slot() call
against the relocation root. This happens in that callchain at
ctree.c:read_block_for_search() when we happen to find a child extent
buffer already loaded through the fs tree with a lockdep class set to
the fs tree. So when we attempt to lock that extent buffer through a
relocation tree we have to reset the lockdep class to the class for a
relocation tree, since a relocation tree has extent buffers that used
to belong to a fs tree and may currently be already loaded (we swap
extent buffers between the two trees at the end of replace_path()).
However we are missing calls to btrfs_maybe_reset_lockdep_class() to reset
the lockdep class at ctree.c:read_block_for_search() before we read lock
an extent buffer, just like we did for btrfs_search_slot() in commit
b40130b23ca4 ("btrfs: fix lockdep splat with reloc root extent buffers").
So add the missing btrfs_maybe_reset_lockdep_class() calls before the
attempts to read lock an extent buffer at ctree.c:read_block_for_search().
The lockdep splat was reported by syzbot and it looks like this:
======================================================
WARNING: possible circular locking dependency detected
6.13.0-rc5-syzkaller-00163-gab75170520d4 #0 Not tainted
------------------------------------------------------
syz.0.0/5335 is trying to acquire lock:
ffff8880545dbc38 (btrfs-tree-01){++++}-{4:4}, at: btrfs_tree_read_lock_nested+0x2f/0x250 fs/btrfs/locking.c:146
but task is already holding lock:
ffff8880545dba58 (btrfs-treloc-02/1){+.+.}-{4:4}, at: btrfs_tree_lock_nested+0x2f/0x250 fs/btrfs/locking.c:189
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (btrfs-treloc-02/1){+.+.}-{4:4}:
reacquire_held_locks+0x3eb/0x690 kernel/locking/lockdep.c:5374
__lock_release kernel/locking/lockdep.c:5563 [inline]
lock_release+0x396/0xa30 kernel/locking/lockdep.c:5870
up_write+0x79/0x590 kernel/locking/rwsem.c:1629
btrfs_force_cow_block+0x14b3/0x1fd0 fs/btrfs/ctree.c:660
btrfs_cow_block+0x371/0x830 fs/btrfs/ctree.c:755
btrfs_search_slot+0xc01/0x3180 fs/btrfs/ctree.c:2153
replace_path+0x1243/0x2740 fs/btrfs/relocation.c:1224
merge_reloc_root+0xc46/0x1ad0 fs/btrfs/relocation.c:1692
merge_reloc_roots+0x3b3/0x980 fs/btrfs/relocation.c:1942
relocate_block_group+0xb0a/0xd40 fs/btrfs/relocation.c:3754
btrfs_relocate_block_group+0x77d/0xd90 fs/btrfs/relocation.c:4087
btrfs_relocate_chunk+0x12c/0x3b0 fs/btrfs/volumes.c:3494
__btrfs_balance+0x1b0f/0x26b0 fs/btrfs/volumes.c:4278
btrfs_balance+0xbdc/0x10c0 fs/btrfs/volumes.c:4655
btrfs_ioctl_balance+0x493/0x7c0 fs/btrfs/ioctl.c:3670
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:906 [inline]
__se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #1 (btrfs-tree-01/1){+.+.}-{4:4}:
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849
down_write_nested+0xa2/0x220 kernel/locking/rwsem.c:1693
btrfs_tree_lock_nested+0x2f/0x250 fs/btrfs/locking.c:189
btrfs_init_new_buffer fs/btrfs/extent-tree.c:5052 [inline]
btrfs_alloc_tree_block+0x41c/0x1440 fs/btrfs/extent-tree.c:5132
btrfs_force_cow_block+0x526/0x1fd0 fs/btrfs/ctree.c:573
btrfs_cow_block+0x371/0x830 fs/btrfs/ctree.c:755
btrfs_search_slot+0xc01/0x3180 fs/btrfs/ctree.c:2153
btrfs_insert_empty_items+0x9c/0x1a0 fs/btrfs/ctree.c:4351
btrfs_insert_empty_item fs/btrfs/ctree.h:688 [inline]
btrfs_insert_inode_ref+0x2bb/0xf80 fs/btrfs/inode-item.c:330
btrfs_rename_exchange fs/btrfs/inode.c:7990 [inline]
btrfs_rename2+0xcb7/0x2b90 fs/btrfs/inode.c:8374
vfs_rename+0xbdb/0xf00 fs/namei.c:5067
do_renameat2+0xd94/0x13f0 fs/namei.c:5224
__do_sys_renameat2 fs/namei.c:5258 [inline]
__se_sys_renameat2 fs/namei.c:5255 [inline]
__x64_sys_renameat2+0xce/0xe0 fs/namei.c:5255
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #0 (btrfs-tree-01){++++}-{4:4}:
check_prev_add kernel/locking/lockdep.c:3161 [inline]
check_prevs_add kernel/locking/lockdep.c:3280 [inline]
validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904
__lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5226
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849
down_read_nested+0xb5/0xa50 kernel/locking/rwsem.c:1649
btrfs_tree_read_lock_nested+0x2f/0x250 fs/btrfs/locking.c:146
btrfs_tree_read_lock fs/btrfs/locking.h:188 [inline]
read_block_for_search+0x718/0xbb0 fs/btrfs/ctree.c:1610
btrfs_search_slot+0x1274/0x3180 fs/btrfs/ctree.c:2237
replace_path+0x1243/0x2740 fs/btrfs/relocation.c:1224
merge_reloc_root+0xc46/0x1ad0 fs/btrfs/relocation.c:1692
merge_reloc_roots+0x3b3/0x980 fs/btrfs/relocation.c:1942
relocate_block_group+0xb0a/0xd40 fs/btrfs/relocation.c:3754
btrfs_relocate_block_group+0x77d/0xd90 fs/btrfs/relocation.c:4087
btrfs_relocate_chunk+0x12c/0x3b0 fs/btrfs/volumes.c:3494
__btrfs_balance+0x1b0f/0x26b0 fs/btrfs/volumes.c:4278
btrfs_balance+0xbdc/0x10c0 fs/btrfs/volumes.c:4655
btrfs_ioctl_balance+0x493/0x7c0 fs/btrfs/ioctl.c:3670
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:906 [inline]
__se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
other info that might help us debug this:
Chain exists of:
btrfs-tree-01 --> btrfs-tree-01/1 --> btrfs-treloc-02/1
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(btrfs-treloc-02/1);
lock(btrfs-tree-01/1);
lock(btrfs-treloc-02/1);
rlock(btrfs-tree-01);
*** DEADLOCK ***
8 locks held by syz.0.0/5335:
#0: ffff88801e3ae420 (sb_writers#13){.+.+}-{0:0}, at: mnt_want_write_file+0x5e/0x200 fs/namespace.c:559
#1: ffff888052c760d0 (&fs_info->reclaim_bgs_lock){+.+.}-{4:4}, at: __btrfs_balance+0x4c2/0x26b0 fs/btrfs/volumes.c:4183
#2: ffff888052c74850 (&fs_info->cleaner_mutex){+.+.}-{4:4}, at: btrfs_relocate_block_group+0x775/0xd90 fs/btrfs/relocation.c:4086
#3: ffff88801e3ae610 (sb_internal#2){.+.+}-{0:0}, at: merge_reloc_root+0xf11/0x1ad0 fs/btrfs/relocation.c:1659
#4: ffff888052c76470 (btrfs_trans_num_writers){++++}-{0:0}, at: join_transaction+0x405/0xda0 fs/btrfs/transaction.c:288
#5: ffff888052c76498 (btrfs_trans_num_extwriters){++++}-{0:0}, at: join_transaction+0x405/0xda0 fs/btrfs/transaction.c:288
#6: ffff8880545db878 (btrfs-tree-01/1){+.+.}-{4:4}, at: btrfs_tree_lock_nested+0x2f/0x250 fs/btrfs/locking.c:189
#7: ffff8880545dba58 (btrfs-treloc-02/1){+.+.}-{4:4}, at: btrfs_tree_lock_nested+0x2f/0x250 fs/btrfs/locking.c:189
stack backtrace:
CPU: 0 UID: 0 PID: 5335 Comm: syz.0.0 Not tainted 6.13.0-rc5-syzkaller-00163-gab75170520d4 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
print_circular_bug+0x13a/0x1b0 kernel/locking/lockdep.c:2074
check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2206
check_prev_add kernel/locking/lockdep.c:3161 [inline]
check_prevs_add kernel/locking/lockdep.c:3280 [inline]
validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904
__lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5226
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849
down_read_nested+0xb5/0xa50 kernel/locking/rwsem.c:1649
btrfs_tree_read_lock_nested+0x2f/0x250 fs/btrfs/locking.c:146
btrfs_tree_read_lock fs/btrfs/locking.h:188 [inline]
read_block_for_search+0x718/0xbb0 fs/btrfs/ctree.c:1610
btrfs_search_slot+0x1274/0x3180 fs/btrfs/ctree.c:2237
replace_path+0x1243/0x2740 fs/btrfs/relocation.c:1224
merge_reloc_root+0xc46/0x1ad0 fs/btrfs/relocation.c:1692
merge_reloc_roots+0x3b3/0x980 fs/btrfs/relocation.c:1942
relocate_block_group+0xb0a/0xd40 fs/btrfs/relocation.c:3754
btrfs_relocate_block_group+0x77d/0xd90 fs/btrfs/relocation.c:4087
btrfs_relocate_chunk+0x12c/0x3b0 fs/btrfs/volumes.c:3494
__btrfs_balance+0x1b0f/0x26b0 fs/btrfs/volumes.c:4278
btrfs_balance+0xbdc/0x10c0 fs/btrfs/volumes.c:4655
btrfs_ioctl_balance+0x493/0x7c0 fs/btrfs/ioctl.c:3670
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:906 [inline]
__se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f1ac6985d29
Code: ff ff c3 (...)
RSP: 002b:00007f1ac63fe038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f1ac6b76160 RCX: 00007f1ac6985d29
RDX: 0000000020000180 RSI: 00000000c4009420 RDI: 0000000000000007
RBP: 00007f1ac6a01b08 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000001 R14: 00007f1ac6b76160 R15: 00007fffda145a88
</TASK>
Reported-by: syzbot+63913e558c084f7f8fdc@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/677b3014.050a0220.3b53b0.0064.GAE@google.com/
Fixes: 99785998ed1c ("btrfs: reduce lock contention when eb cache miss for btree search")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull isofs update from Jan Kara:
"Partial conversion of isofs to folios"
* tag 'fs_for_v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
isofs: Partially convert zisofs_read_folio to use a folio
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull inotify update from Jan Kara:
"A small inotify strcpy() cleanup"
* tag 'fsnotify_for_v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
inotify: Use strscpy() for event->name copies
|
|
Pull XFS updates from Carlos Maiolino:
"This is mostly focused on the implementation of reflink and
reverse-mapping support for XFS's real-time devices.
It also includes several bugfixes.
- Implement reflink support for the realtime device
- Implement reverse-mapping support for the realtime device
- Several bug fixes and cleanups"
* tag 'xfs-merge-6.14' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (121 commits)
xfs: fix buffer lookup vs release race
xfs: check for dead buffers in xfs_buf_find_insert
xfs: add a b_iodone callback to struct xfs_buf
xfs: move b_li_list based retry handling to common code
xfs: simplify xfsaild_resubmit_item
xfs: always complete the buffer inline in xfs_buf_submit
xfs: remove the extra buffer reference in xfs_buf_submit
xfs: move invalidate_kernel_vmap_range to xfs_buf_ioend
xfs: simplify buffer I/O submission
xfs: move in-memory buftarg handling out of _xfs_buf_ioapply
xfs: move write verification out of _xfs_buf_ioapply
xfs: remove xfs_buf_delwri_submit_buffers
xfs: simplify xfs_buf_delwri_pushbuf
xfs: move xfs_buf_iowait out of (__)xfs_buf_submit
xfs: remove the incorrect comment about the b_pag field
xfs: remove the incorrect comment above xfs_buf_free_maps
xfs: fix a double completion for buffers on in-memory targets
xfs/libxfs: replace kmalloc() and memcpy() with kmemdup()
xfs: constify feature checks
xfs: refactor xfs_fs_statfs
...
|
|
- Set `compressedblks = 1` directly for non-bigpcluster cases. This
simplifies the logic a bit since lcluster sizes larger than one block
are unsupported and the details remain unclear.
- For Z_EROFS_LCLUSTER_TYPE_PLAIN pclusters, avoid assuming
`compressedblks = 1` by default. Instead, check if
Z_EROFS_ADVISE_BIG_PCLUSTER_2 is set.
It basically has no impact to existing valid images, but it's useful to
find the gap to prepare for large PLAIN pclusters.
Link: https://lore.kernel.org/r/20250123090109.973463-1-hsiangkao@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull AT_EXECVE_CHECK from Kees Cook:
- Implement AT_EXECVE_CHECK flag to execveat(2) (Mickaël Salaün)
- Implement EXEC_RESTRICT_FILE and EXEC_DENY_INTERACTIVE securebits
(Mickaël Salaün)
- Add selftests and samples for AT_EXECVE_CHECK (Mickaël Salaün)
* tag 'AT_EXECVE_CHECK-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
ima: instantiate the bprm_creds_for_exec() hook
samples/check-exec: Add an enlighten "inc" interpreter and 28 tests
selftests: ktap_helpers: Fix uninitialized variable
samples/check-exec: Add set-exec
selftests/landlock: Add tests for execveat + AT_EXECVE_CHECK
selftests/exec: Add 32 tests for AT_EXECVE_CHECK and exec securebits
security: Add EXEC_RESTRICT_FILE and EXEC_DENY_INTERACTIVE securebits
exec: Add a new AT_EXECVE_CHECK flag to execveat(2)
|