Age | Commit message (Collapse) | Author | Files | Lines |
|
Return a literal instead of 'err' in ksmbd_vfs_kern_path_locked().
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Use kzalloc() instead of __GFP_ZERO.
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Remove unused ksmbd_tree_conn_share function.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Remove ->sendpage() and ->sendpage_locked(). sendmsg() with
MSG_SPLICE_PAGES should be used instead. This allows multiple pages and
multipage folios to be passed through.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for net/can
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-afs@lists.infradead.org
cc: mptcp@lists.linux.dev
cc: rds-devel@oss.oracle.com
cc: tipc-discussion@lists.sourceforge.net
cc: virtualization@lists.linux-foundation.org
Link: https://lore.kernel.org/r/20230623225513.2732256-16-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Switch ocfs2 from using sendpage() to using sendmsg() + MSG_SPLICE_PAGES so
that sendpage can be phased out.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Mark Fasheh <mark@fasheh.com>
cc: Joel Becker <jlbec@evilplan.org>
cc: Joseph Qi <joseph.qi@linux.alibaba.com>
cc: ocfs2-devel@oss.oracle.com
Link: https://lore.kernel.org/r/20230623225513.2732256-15-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ocfs2 uses kzalloc() to allocate buffers for o2net_hand, o2net_keep_req and
o2net_keep_resp and then passes these to sendpage. This isn't really
allowed as the lifetime of slab objects is not controlled by page ref -
though in this case it will probably work. sendmsg() with MSG_SPLICE_PAGES
will, however, print a warning and give an error.
Fix it to use folio_alloc() instead to allocate a buffer for the handshake
message, keepalive request and reply messages.
Fixes: 98211489d414 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Mark Fasheh <mark@fasheh.com>
cc: Kurt Hackel <kurt.hackel@oracle.com>
cc: Joel Becker <jlbec@evilplan.org>
cc: Joseph Qi <joseph.qi@linux.alibaba.com>
cc: ocfs2-devel@oss.oracle.com
Link: https://lore.kernel.org/r/20230623225513.2732256-14-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When transmitting data, call down a layer using a single sendmsg with
MSG_SPLICE_PAGES to indicate that content should be spliced rather using
sendpage. This allows ->sendpage() to be replaced by something that can
handle multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christine Caulfield <ccaulfie@redhat.com>
cc: David Teigland <teigland@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: cluster-devel@redhat.com
Link: https://lore.kernel.org/r/20230623225513.2732256-7-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This is a small step towards a model where GUP itself would not expand
the stack, and any user that needs GUP to not look up existing mappings,
but actually expand on them, would have to do so manually before-hand,
and with the mm lock held for writing.
It turns out that execve() already did almost exactly that, except it
didn't take the mm lock at all (it's single-threaded so no locking
technically needed, but it could cause lockdep errors). And it only did
it for the CONFIG_STACK_GROWSUP case, since in that case GUP has
obviously never expanded the stack downwards.
So just make that CONFIG_STACK_GROWSUP case do the right thing with
locking, and enable it generally. This will eventually help GUP, and in
the meantime avoids a special case and the lockdep issue.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Make calls to extend_vma() and find_extend_vma() fail if the write lock
is required.
To avoid making this a flag-day event, this still allows the old
read-locking case for the trivial situations, and passes in a flag to
say "is it write-locked". That way write-lockers can say "yes, I'm
being careful", and legacy users will continue to work in all the common
cases until they have been fully converted to the new world order.
Co-Developed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Variable bit_off is being assigned a value that is never read, it is being
re-assigned a new value in the following while loop. Remove the
assignment. Cleans up clang scan build warning:
fs/ocfs2/localalloc.c:976:18: warning: Although the value stored to
'bit_off' is used in the enclosing expression, the value is never
actually read from 'bit_off' [deadcode.DeadStores]
Link: https://lkml.kernel.org/r/20230622102736.2831126-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
During the seq_printf,the mmap_sem_read_lock protection is not
required.
Link: https://lkml.kernel.org/r/20230622040152.1173-1-lipeifeng@oppo.com
Signed-off-by: lipeifeng <lipeifeng@oppo.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Ackerley Tng reported an issue with hugetlbfs fallocate as noted in the
Closes tag. The issue showed up after the conversion of hugetlb page
cache lookup code to use page_cache_next_miss. User visible effects are:
- hugetlbfs fallocate incorrectly returns -EEXIST if pages are presnet
in the file.
- hugetlb pages will not be included in core dumps if they need to be
brought in via GUP.
- userfaultfd UFFDIO_COPY will not notice pages already present in the
cache. It may try to allocate a new page and potentially return
ENOMEM as opposed to EEXIST.
Revert the use page_cache_next_miss() in hugetlb code.
IMPORTANT NOTE FOR STABLE BACKPORTS:
This patch will apply cleanly to v6.3. However, due to the change of
filemap_get_folio() return values, it will not function correctly. This
patch must be modified for stable backports.
[dan.carpenter@linaro.org: fix hugetlbfs_pagecache_present()]
Link: https://lkml.kernel.org/r/efa86091-6a2c-4064-8f55-9b44e1313015@moroto.mountain
Link: https://lkml.kernel.org/r/20230621212403.174710-2-mike.kravetz@oracle.com
Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reported-by: Ackerley Tng <ackerleytng@google.com>
Closes: https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Erdem Aktas <erdemaktas@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Vishal Annapurve <vannapurve@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "Remove pagevecs".
Removes a folio->page->folio conversion for each folio that's involved.
More importantly, removes one of the last few uses of a pagevec.
Link: https://lkml.kernel.org/r/20230621164557.3510324-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20230621164557.3510324-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"Unfortunately the recent u32 overflow fix was not complete, there was
one conversion left, assertion not triggered by my tests but caught by
Qu's fstests case.
The "cleanup for later" has been promoted to a proper fix and wraps
all uses of the stripe left shift so the diffstat has grown but leaves
no potentially problematic uses.
We should have done it that way before, sorry"
* tag 'for-6.4-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix remaining u32 overflows when left shifting stripe_nr
|
|
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/0b2967c4a4141875c493e835d5a6f8f2d19ae2d6.1687499804.git.baruch@tkos.co.il
|
|
ext4_blkdev_remove() passes a wrong holder pointer to blkdev_put() which
triggers a warning there. Fix it.
Fixes: 2736e8eeb0cc ("block: use the holder as indication for exclusive opens")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230622165107.13687-1-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This patch adds a check for read-only mounted filesystem
in txBegin before starting a transaction potentially saving
from NULL pointer deref.
Signed-off-by: Immad Mir <mirimmad17@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
Syzkaller reported an issue where txBegin may be called
on a superblock in a read-only mounted filesystem which leads
to NULL pointer deref. This could be solved by checking if
the filesystem is read-only before calling txBegin, and returning
with appropiate error code.
Reported-By: syzbot+f1faa20eec55e0c8644c@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=be7e52c50c5182cc09a09ea6fc456446b2039de3
Signed-off-by: Immad Mir <mirimmad17@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
The variable 'error' is being assigned a value that is never read,
the assignment and the variable and redundant and can be removed.
Cleans up clang scan build warning:
fs/dax.c:1880:10: warning: Although the value stored to 'error' is
used in the enclosing expression, the value is never actually read
from 'error' [deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20230621130256.2676126-1-colin.i.king@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
tools/testing/selftests/net/fcnal-test.sh
d7a2fc1437f7 ("selftests: net: fcnal-test: check if FIPS mode is enabled")
dd017c72dde6 ("selftests: fcnal: Test SO_DONTROUTE on TCP sockets.")
https://lore.kernel.org/all/5007b52c-dd16-dbf6-8d64-b9701bfa498b@tessares.net/
https://lore.kernel.org/all/20230619105427.4a0df9b3@canb.auug.org.au/
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Syzkaller reported the following issue:
UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dmap.c:1965:6
index -84 is out of range for type 's8[341]' (aka 'signed char[341]')
CPU: 1 PID: 4995 Comm: syz-executor146 Not tainted 6.4.0-rc6-syzkaller-00037-gb6dad5178cea #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
ubsan_epilogue lib/ubsan.c:217 [inline]
__ubsan_handle_out_of_bounds+0x11c/0x150 lib/ubsan.c:348
dbAllocDmapLev+0x3e5/0x430 fs/jfs/jfs_dmap.c:1965
dbAllocCtl+0x113/0x920 fs/jfs/jfs_dmap.c:1809
dbAllocAG+0x28f/0x10b0 fs/jfs/jfs_dmap.c:1350
dbAlloc+0x658/0xca0 fs/jfs/jfs_dmap.c:874
dtSplitUp fs/jfs/jfs_dtree.c:974 [inline]
dtInsert+0xda7/0x6b00 fs/jfs/jfs_dtree.c:863
jfs_create+0x7b6/0xbb0 fs/jfs/namei.c:137
lookup_open fs/namei.c:3492 [inline]
open_last_lookups fs/namei.c:3560 [inline]
path_openat+0x13df/0x3170 fs/namei.c:3788
do_filp_open+0x234/0x490 fs/namei.c:3818
do_sys_openat2+0x13f/0x500 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x247/0x290 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f1f4e33f7e9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 51 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffc21129578 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1f4e33f7e9
RDX: 000000000000275a RSI: 0000000020000040 RDI: 00000000ffffff9c
RBP: 00007f1f4e2ff080 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f1f4e2ff110
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>
The bug occurs when the dbAllocDmapLev()function attempts to access
dp->tree.stree[leafidx + LEAFIND] while the leafidx value is negative.
To rectify this, the patch introduces a safeguard within the
dbAllocDmapLev() function. A check has been added to verify if leafidx is
negative. If it is, the function immediately returns an I/O error, preventing
any further execution that could potentially cause harm.
Tested via syzbot.
Reported-by: syzbot+853a6f4dfa3cf37d3aea@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=ae2f5a27a07ae44b0f17
Signed-off-by: Yogesh <yogi.kernel@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
There was regression caused by a97699d1d610 ("btrfs: replace
map_lookup->stripe_len by BTRFS_STRIPE_LEN") and supposedly fixed by
a7299a18a179 ("btrfs: fix u32 overflows when left shifting stripe_nr").
To avoid code churn the fix was open coding the type casts but
unfortunately missed one which was still possible to hit [1].
The missing place was assignment of bioc->full_stripe_logical inside
btrfs_map_block().
Fix it by adding a helper that does the safe calculation of the offset
and use it everywhere even though it may not be strictly necessary due
to already using u64 types. This replaces all remaining
"<< BTRFS_STRIPE_LEN_SHIFT" calls.
[1] https://lore.kernel.org/linux-btrfs/20230622065438.86402-1-wqu@suse.com/
Fixes: a7299a18a179 ("btrfs: fix u32 overflows when left shifting stripe_nr")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Several trivial cleanups which aren't quite necessary to split:
- Rename lcluster load functions as well as justify full indexes
since they are typically used for global deduplication for
compressed data;
- Avoid unnecessary lines, comments for simplicity.
No logic changes.
Reviewed-by: Guo Xuenan <guoxuenan@huaweicloud.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230615064421.103178-1-hsiangkao@linux.alibaba.com
|
|
It's redundant, let's remove it.
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230615034539.14286-1-frank.li@vivo.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Current check for atomic context is not sufficient as
z_erofs_decompressqueue_endio can be called under rcu lock
from blk_mq_flush_plug_list(). See the stacktrace [1]
In such case we should hand off the decompression work for async
processing rather than trying to do sync decompression in current
context. Patch fixes the detection by checking for
rcu_read_lock_any_held() and while at it use more appropriate
!in_task() check than in_atomic().
Background: Historically erofs would always schedule a kworker for
decompression which would incur the scheduling cost regardless of
the context. But z_erofs_decompressqueue_endio() may not always
be in atomic context and we could actually benefit from doing the
decompression in z_erofs_decompressqueue_endio() if we are in
thread context, for example when running with dm-verity.
This optimization was later added in patch [2] which has shown
improvement in performance benchmarks.
==============================================
[1] Problem stacktrace
[name:core&]BUG: sleeping function called from invalid context at kernel/locking/mutex.c:291
[name:core&]in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1615, name: CpuMonitorServi
[name:core&]preempt_count: 0, expected: 0
[name:core&]RCU nest depth: 1, expected: 0
CPU: 7 PID: 1615 Comm: CpuMonitorServi Tainted: G S W OE 6.1.25-android14-5-maybe-dirty-mainline #1
Hardware name: MT6897 (DT)
Call trace:
dump_backtrace+0x108/0x15c
show_stack+0x20/0x30
dump_stack_lvl+0x6c/0x8c
dump_stack+0x20/0x48
__might_resched+0x1fc/0x308
__might_sleep+0x50/0x88
mutex_lock+0x2c/0x110
z_erofs_decompress_queue+0x11c/0xc10
z_erofs_decompress_kickoff+0x110/0x1a4
z_erofs_decompressqueue_endio+0x154/0x180
bio_endio+0x1b0/0x1d8
__dm_io_complete+0x22c/0x280
clone_endio+0xe4/0x280
bio_endio+0x1b0/0x1d8
blk_update_request+0x138/0x3a4
blk_mq_plug_issue_direct+0xd4/0x19c
blk_mq_flush_plug_list+0x2b0/0x354
__blk_flush_plug+0x110/0x160
blk_finish_plug+0x30/0x4c
read_pages+0x2fc/0x370
page_cache_ra_unbounded+0xa4/0x23c
page_cache_ra_order+0x290/0x320
do_sync_mmap_readahead+0x108/0x2c0
filemap_fault+0x19c/0x52c
__do_fault+0xc4/0x114
handle_mm_fault+0x5b4/0x1168
do_page_fault+0x338/0x4b4
do_translation_fault+0x40/0x60
do_mem_abort+0x60/0xc8
el0_da+0x4c/0xe0
el0t_64_sync_handler+0xd4/0xfc
el0t_64_sync+0x1a0/0x1a4
[2] Link: https://lore.kernel.org/all/20210317035448.13921-1-huangjianan@oppo.com/
Reported-by: Will Shiu <Will.Shiu@mediatek.com>
Suggested-by: Gao Xiang <xiang@kernel.org>
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Alexandre Mergnat <amergnat@baylibre.com>
Link: https://lore.kernel.org/r/20230621220848.3379029-1-dhavale@google.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
There are a few assignments to variable len where the value is not
being read and so the assignments are redundant and can be removed.
In one case, the variable len can be removed completely. Cleans up
4 clang scan warnings of the form:
fs/nfsd/export.c:100:7: warning: Although the value stored to 'len'
is used in the enclosing expression, the value is never actually
read from 'len' [deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
For filenames that begin with . and are between 2 and 5 characters long,
UDF charset conversion code would read uninitialized memory in the
output buffer. The only practical impact is that the name may be prepended a
"unification hash" when it is not actually needed but still it is good
to fix this.
Reported-by: syzbot+cd311b1e43cc25f90d18@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/000000000000e2638a05fe9dc8f9@google.com
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Fuse shouldn't return ENOSYS from its ioctl implementation. If userspace
responds with ENOSYS it should be translated to ENOTTY.
There are two ways to return an error from the IOCTL request:
- fuse_out_header.error
- fuse_ioctl_out.result
Commit 02c0cab8e734 ("fuse: ioctl: translate ENOSYS") already fixed this
issue for the first case, but missed the second case. This patch fixes the
second case.
Reported-by: Jonathan Katz <jkatz@eitmlabs.org>
Closes: https://lore.kernel.org/all/CALKgVmcC1VUV_gJVq70n--omMJZUb4HSh_FqvLTHgNBc+HCLFQ@mail.gmail.com/
Fixes: 02c0cab8e734 ("fuse: ioctl: translate ENOSYS")
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
One-element arrays are deprecated, and we are replacing them with flexible
array members instead. So, replace one-element arrays with flexible-array
members in multiple structures.
Address the following -Wstringop-overflow warnings seen when built
m68k architecture with m5307c3_defconfig configuration:
In function '__put_user_fn',
inlined from 'fillonedir' at fs/readdir.c:170:2:
include/asm-generic/uaccess.h:49:35: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
49 | *(u8 __force *)to = *(u8 *)from;
| ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~
fs/readdir.c: In function 'fillonedir':
fs/readdir.c:134:25: note: at offset 1 into destination object 'd_name' of size 1
134 | char d_name[1];
| ^~~~~~
In function '__put_user_fn',
inlined from 'filldir' at fs/readdir.c:257:2:
include/asm-generic/uaccess.h:49:35: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
49 | *(u8 __force *)to = *(u8 *)from;
| ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~
fs/readdir.c: In function 'filldir':
fs/readdir.c:211:25: note: at offset 1 into destination object 'd_name' of size 1
211 | char d_name[1];
| ^~~~~~
This helps with the ongoing efforts to globally enable
-Wstringop-overflow.
This results in no differences in binary output.
Link: https://github.com/KSPP/linux/issues/79
Link: https://github.com/KSPP/linux/issues/312
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Message-Id: <ZJHiPJkNKwxkKz1c@work>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
fsverity builtin signatures (CONFIG_FS_VERITY_BUILTIN_SIGNATURES) aren't
the only way to do signatures with fsverity, and they have some major
limitations. Yet, more users have tried to use them, e.g. recently by
https://github.com/ostreedev/ostree/pull/2640. In most cases this seems
to be because users aren't sufficiently familiar with the limitations of
this feature and what the alternatives are.
Therefore, make some updates to the documentation to try to clarify the
properties of this feature and nudge users in the right direction.
Note that the Integrity Policy Enforcement (IPE) LSM, which is not yet
upstream, is planned to use the builtin signatures. (This differs from
IMA, which uses its own signature mechanism.) For that reason, my
earlier patch "fsverity: mark builtin signatures as deprecated"
(https://lore.kernel.org/r/20221208033548.122704-1-ebiggers@kernel.org),
which marked builtin signatures as "deprecated", was controversial.
This patch therefore stops short of marking the feature as deprecated.
I've also revised the language to focus on better explaining the feature
and what its alternatives are.
Link: https://lore.kernel.org/r/20230620041937.5809-1-ebiggers@kernel.org
Reviewed-by: Colin Walters <walters@verbum.org>
Reviewed-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
This fixes the following warning reported by kernel test robot
fs/smb/client/connect.c:2974 generic_ip_connect() error: we
previously assumed 'socket' could be null (see line 2962)
Link: https://lore.kernel.org/all/202306170124.CtQqzf0I-lkp@intel.com/
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
This fixes the following warning reported by kernel test robot
fs/smb/client/cifssmb.c:4216 CIFSFindNext() warn: missing error
code? 'rc'
Link: https://lore.kernel.org/all/202306170124.CtQqzf0I-lkp@intel.com/
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
This fixes the following warning reported by kernel test robot
fs/smb/client/cifssmb.c:4089 CIFSFindFirst() warn: missing error
code? 'rc'
Link: https://lore.kernel.org/all/202306170124.CtQqzf0I-lkp@intel.com/
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
There were cases reported where servers will sometimes return more
credits than requested on oplock break responses, which can lead to
most of the credits being allocated for oplock breaks (instead of
for normal operations like read and write) if number of SMB3 requests
in flight always stays above 0 (the oplock and echo credits are
rebalanced when in flight requests goes down to zero).
If oplock credits gets unexpectedly large (e.g. three is more than it
would ever be expected to be) and in flight requests are greater than
zero, then rebalance the oplock credits and regular credits (go
back to reserving just one oplock credit).
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
We had seen cases where cifs_invalidate_mapping was logging:
"Could not invalidate inode ..."
if invalidate_inode_pages2 fails but this message does not show what
the rc is. Update the logged message to also log the return code.
Suggested-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
This fixes the following warning reported by kernel test robot
fs/smb/client/cifsfs.c:982 cifs_smb3_do_mount() warn: possible
memory leak of 'cifs_sb'
Link: https://lore.kernel.org/all/202306170124.CtQqzf0I-lkp@intel.com/
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
"19 hotfixes. 8 of these are cc:stable.
This includes a wholesale reversion of the post-6.4 series 'make slab
shrink lockless'. After input from Dave Chinner it has been decided
that we should go a different way [1]"
Link: https://lkml.kernel.org/r/ZH6K0McWBeCjaf16@dread.disaster.area [1]
* tag 'mm-hotfixes-stable-2023-06-20-12-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
selftests/mm: fix cross compilation with LLVM
mailmap: add entries for Ben Dooks
nilfs2: prevent general protection fault in nilfs_clear_dirty_page()
Revert "mm: vmscan: make global slab shrink lockless"
Revert "mm: vmscan: make memcg slab shrink lockless"
Revert "mm: vmscan: add shrinker_srcu_generation"
Revert "mm: shrinkers: make count and scan in shrinker debugfs lockless"
Revert "mm: vmscan: hold write lock to reparent shrinker nr_deferred"
Revert "mm: vmscan: remove shrinker_rwsem from synchronize_shrinkers()"
Revert "mm: shrinkers: convert shrinker_rwsem to mutex"
nilfs2: fix buffer corruption due to concurrent device reads
scripts/gdb: fix SB_* constants parsing
scripts: fix the gfp flags header path in gfp-translate
udmabuf: revert 'Add support for mapping hugepages (v4)'
mm/khugepaged: fix iteration in collapse_file
memfd: check for non-NULL file_seals in memfd_create() syscall
mm/vmalloc: do not output a spurious warning when huge vmalloc() fails
mm/mprotect: fix do_mprotect_pkey() limit check
writeback: fix dereferencing NULL mapping->host on writeback_page_template
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"One more regression fix for an assertion failure that uncovered a
nasty problem with stripe calculations. This is caused by a u32
overflow when there are enough devices. The fstests require 6 so this
hasn't been caught, I was able to hit it with 8.
The fix is minimal and only adds u64 casts, we'll clean that up later.
I did various additional tests to be sure"
* tag 'for-6.4-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix u32 overflows when left shifting stripe_nr
|
|
While trying to fix the jfs UBSAN problem reported in syzkaller,
(https://syzkaller.appspot.com/bug?id=01abadbd6ae6a08b1f1987aa61554c6b3ac19ff2)
I found the typo in the comment of dbInitTree function and fix it.
Signed-off-by: Wonguk Lee <wonguk.lee1023@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
Pull smb server fixes from Steve French:
"Four smb3 server fixes, all also for stable:
- fix potential oops in parsing compounded requests
- fix various paths (mkdir, create etc) where mnt_want_write was not
checked first
- fix slab out of bounds in check_message and write"
* tag '6.4-rc6-smb3-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: validate session id and tree id in the compound request
ksmbd: fix out-of-bound read in smb2_write
ksmbd: add mnt_want_write to ksmbd vfs functions
ksmbd: validate command payload size
|
|
In jfs_dmap.c at line 381, BLKTODMAP is used to get a logical block
number inside dbFree(). db_l2nbperpage, which is the log2 number of
blocks per page, is passed as an argument to BLKTODMAP which uses it
for shifting.
Syzbot reported a shift out-of-bounds crash because db_l2nbperpage is
too big. This happens because the large value is set without any
validation in dbMount() at line 181.
Thus, make sure that db_l2nbperpage is correct while mounting.
Max number of blocks per page = Page size / Min block size
=> log2(Max num_block per page) = log2(Page size / Min block size)
= log2(Page size) - log2(Min block size)
=> Max db_l2nbperpage = L2PSIZE - L2MINBLOCKSIZE
Reported-and-tested-by: syzbot+d2cd27dcf8e04b232eb2@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?id=2a70a453331db32ed491f5cbb07e81bf2d225715
Cc: stable@vger.kernel.org
Suggested-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Siddh Raman Pant <code@siddh.me>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
[BUG]
David reported an ASSERT() get triggered during fio load on 8 devices
with data/raid6 and metadata/raid1c3:
fio --rw=randrw --randrepeat=1 --size=3000m \
--bsrange=512b-64k --bs_unaligned \
--ioengine=libaio --fsync=1024 \
--name=job0 --name=job1 \
The ASSERT() is from rbio_add_bio() of raid56.c:
ASSERT(orig_logical >= full_stripe_start &&
orig_logical + orig_len <= full_stripe_start +
rbio->nr_data * BTRFS_STRIPE_LEN);
Which is checking if the target rbio is crossing the full stripe
boundary.
[100.789] assertion failed: orig_logical >= full_stripe_start && orig_logical + orig_len <= full_stripe_start + rbio->nr_data * BTRFS_STRIPE_LEN, in fs/btrfs/raid56.c:1622
[100.795] ------------[ cut here ]------------
[100.796] kernel BUG at fs/btrfs/raid56.c:1622!
[100.797] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[100.798] CPU: 1 PID: 100 Comm: kworker/u8:4 Not tainted 6.4.0-rc6-default+ #124
[100.799] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
[100.802] Workqueue: writeback wb_workfn (flush-btrfs-1)
[100.803] RIP: 0010:rbio_add_bio+0x204/0x210 [btrfs]
[100.806] RSP: 0018:ffff888104a8f300 EFLAGS: 00010246
[100.808] RAX: 00000000000000a1 RBX: ffff8881075907e0 RCX: ffffed1020951e01
[100.809] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000001
[100.811] RBP: 0000000141d20000 R08: 0000000000000001 R09: ffff888104a8f04f
[100.813] R10: ffffed1020951e09 R11: 0000000000000003 R12: ffff88810e87f400
[100.815] R13: 0000000041d20000 R14: 0000000144529000 R15: ffff888101524000
[100.817] FS: 0000000000000000(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000
[100.821] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[100.822] CR2: 000055d54e44c270 CR3: 000000010a9a1006 CR4: 00000000003706a0
[100.824] Call Trace:
[100.825] <TASK>
[100.825] ? die+0x32/0x80
[100.826] ? do_trap+0x12d/0x160
[100.827] ? rbio_add_bio+0x204/0x210 [btrfs]
[100.827] ? rbio_add_bio+0x204/0x210 [btrfs]
[100.829] ? do_error_trap+0x90/0x130
[100.830] ? rbio_add_bio+0x204/0x210 [btrfs]
[100.831] ? handle_invalid_op+0x2c/0x30
[100.833] ? rbio_add_bio+0x204/0x210 [btrfs]
[100.835] ? exc_invalid_op+0x29/0x40
[100.836] ? asm_exc_invalid_op+0x16/0x20
[100.837] ? rbio_add_bio+0x204/0x210 [btrfs]
[100.837] raid56_parity_write+0x64/0x270 [btrfs]
[100.838] btrfs_submit_chunk+0x26e/0x800 [btrfs]
[100.840] ? btrfs_bio_init+0x80/0x80 [btrfs]
[100.841] ? release_pages+0x503/0x6d0
[100.842] ? folio_unlock+0x2f/0x60
[100.844] ? __folio_put+0x60/0x60
[100.845] ? btrfs_do_readpage+0xae0/0xae0 [btrfs]
[100.847] btrfs_submit_bio+0x21/0x60 [btrfs]
[100.847] submit_one_bio+0x6a/0xb0 [btrfs]
[100.849] extent_write_cache_pages+0x395/0x680 [btrfs]
[100.850] ? __extent_writepage+0x520/0x520 [btrfs]
[100.851] ? mark_usage+0x190/0x190
[100.852] extent_writepages+0xdb/0x130 [btrfs]
[100.853] ? extent_write_locked_range+0x480/0x480 [btrfs]
[100.854] ? mark_usage+0x190/0x190
[100.854] ? attach_extent_buffer_page+0x220/0x220 [btrfs]
[100.855] ? reacquire_held_locks+0x178/0x280
[100.856] ? writeback_sb_inodes+0x245/0x7f0
[100.857] do_writepages+0x102/0x2e0
[100.858] ? page_writeback_cpu_online+0x10/0x10
[100.859] ? __lock_release.isra.0+0x14a/0x4d0
[100.860] ? reacquire_held_locks+0x280/0x280
[100.861] ? __lock_acquired+0x1e9/0x3d0
[100.862] ? do_raw_spin_lock+0x1b0/0x1b0
[100.863] __writeback_single_inode+0x94/0x450
[100.864] writeback_sb_inodes+0x372/0x7f0
[100.864] ? lock_sync+0xd0/0xd0
[100.865] ? do_raw_spin_unlock+0x93/0xf0
[100.866] ? sync_inode_metadata+0xc0/0xc0
[100.867] ? rwsem_optimistic_spin+0x340/0x340
[100.868] __writeback_inodes_wb+0x70/0x130
[100.869] wb_writeback+0x2d1/0x530
[100.869] ? __writeback_inodes_wb+0x130/0x130
[100.870] ? lockdep_hardirqs_on_prepare.part.0+0xf1/0x1c0
[100.870] wb_do_writeback+0x3eb/0x480
[100.871] ? wb_writeback+0x530/0x530
[100.871] ? mark_lock_irq+0xcd0/0xcd0
[100.872] wb_workfn+0xe0/0x3f0<
[CAUSE]
Commit a97699d1d610 ("btrfs: replace map_lookup->stripe_len by
BTRFS_STRIPE_LEN") changes how we calculate the map length, to reduce
u64 division.
Function btrfs_max_io_len() is to get the length to the stripe boundary.
It calculates the full stripe start offset (inside the chunk) by the
following code:
*full_stripe_start =
rounddown(*stripe_nr, nr_data_stripes(map)) <<
BTRFS_STRIPE_LEN_SHIFT;
The calculation itself is fine, but the value returned by rounddown() is
dependent on both @stripe_nr (which is u32) and nr_data_stripes() (which
returned int).
Thus the result is also u32, then we do the left shift, which can
overflow u32.
If such overflow happens, @full_stripe_start will be a value way smaller
than @offset, causing later "full_stripe_len - (offset -
*full_stripe_start)" to underflow, thus make later length calculation to
have no stripe boundary limit, resulting a write bio to exceed stripe
boundary.
There are some other locations like this, with a u32 @stripe_nr got left
shift, which can lead to a similar overflow.
[FIX]
Fix all @stripe_nr with left shift with a type cast to u64 before the
left shift.
Those involved @stripe_nr or similar variables are recording the stripe
number inside the chunk, which is small enough to be contained by u32,
but their offset inside the chunk can not fit into u32.
Thus for those specific left shifts, a type cast to u64 is necessary so
this patch does not touch them and the code will be cleaned up in the
future to keep the fix minimal.
Reported-by: David Sterba <dsterba@suse.com>
Fixes: a97699d1d610 ("btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LEN")
Tested-by: David Sterba <dsterba@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Using the old mount api to remount an overlayfs superblock via
mount(MS_REMOUNT) all mount options will be silently ignored. For
example, if you create an overlayfs mount:
mount -t overlay overlay -o lowerdir=/mnt/a:/mnt/b,upperdir=/mnt/upper,workdir=/mnt/work /mnt/merged
and then issue a remount via:
# force mount(8) to use mount(2)
export LIBMOUNT_FORCE_MOUNT2=always
mount -t overlay overlay -o remount,WOOTWOOT,lowerdir=/DOESNT-EXIST /mnt/merged
with completely nonsensical mount options whatsoever it will succeed
nonetheless. This prevents us from every changing any mount options we
might introduce in the future that could reasonably be changed during a
remount.
We don't need to carry this issue into the new mount api port. Similar
to FUSE we can use the fs_context::oldapi member to figure out that this
is a request coming through the legacy mount api. If we detect it we
continue silently ignoring all mount options.
But for the new mount api we simply report that mount options cannot
currently be changed. This will allow us to potentially alter mount
properties for new or even old properties. It any case, silently
ignoring everything is not something new apis should do.
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
The current way how lowcomms is configured is due configfs entries. Each
comms configfs entry will create a lowcomms connection. Even the local
connection itself will be stored as a lowcomms connection, although most
functionality for a local lowcomms connection struct is not necessary.
Now in some scenarios we will see that dlm_controld reports a -EEXIST
when configure a node via configfs:
... /sys/kernel/config/dlm/cluster/comms/1/addr: write failed: 17 -1
Doing a:
cat /sys/kernel/config/dlm/cluster/comms/1/addr_list
reported nothing. This was being seen on cluster with nodeid 1 and it's
local configuration. To be sure the configfs entries are in sync with
lowcomms connection structures we always call dlm_midcomms_close() to be
sure the lowcomms connection gets removed when the configfs entry gets
dropped.
Before commit 07ee38674a0b ("fs: dlm: filter ourself midcomms calls") it
was just doing this by accident and the filter by doing:
if (nodeid == dlm_our_nodeid())
return 0;
inside dlm_midcomms_close() was never been hit because drop_comm() sets
local_comm to NULL and cause that dlm_our_nodeid() returns always the
invalid nodeid 0.
Fixes: 07ee38674a0b ("fs: dlm: filter ourself midcomms calls")
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
|
|
In journal_init_dev(), if super bdev is used as 'j_dev_bd', then
blkdev_get_by_dev() is called with NULL holder, otherwise, holder will
be journal. However, later in release_journal_dev(), blkdev_put() is
called with journal unconditionally, cause following warning:
WARNING: CPU: 1 PID: 5034 at block/bdev.c:617 bd_end_claim block/bdev.c:617 [inline]
WARNING: CPU: 1 PID: 5034 at block/bdev.c:617 blkdev_put+0x562/0x8a0 block/bdev.c:901
RIP: 0010:blkdev_put+0x562/0x8a0 block/bdev.c:901
Call Trace:
<TASK>
release_journal_dev fs/reiserfs/journal.c:2592 [inline]
free_journal_ram+0x421/0x5c0 fs/reiserfs/journal.c:1896
do_journal_release fs/reiserfs/journal.c:1960 [inline]
journal_release+0x276/0x630 fs/reiserfs/journal.c:1971
reiserfs_put_super+0xe4/0x5c0 fs/reiserfs/super.c:616
generic_shutdown_super+0x158/0x480 fs/super.c:499
kill_block_super+0x64/0xb0 fs/super.c:1422
deactivate_locked_super+0x98/0x160 fs/super.c:330
deactivate_super+0xb1/0xd0 fs/super.c:361
cleanup_mnt+0x2ae/0x3d0 fs/namespace.c:1247
task_work_run+0x16f/0x270 kernel/task_work.c:179
exit_task_work include/linux/task_work.h:38 [inline]
do_exit+0xadc/0x2a30 kernel/exit.c:874
do_group_exit+0xd4/0x2a0 kernel/exit.c:1024
__do_sys_exit_group kernel/exit.c:1035 [inline]
__se_sys_exit_group kernel/exit.c:1033 [inline]
__x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1033
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fix this problem by passing in NULL holder in this case.
Reported-by: syzbot+04625c80899f4555de39@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=04625c80899f4555de39
Fixes: 2736e8eeb0cc ("block: use the holder as indication for exclusive opens")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230620111322.1014775-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Provide helpers to set and clear sb->s_readonly_remount including
appropriate memory barriers. Also use this opportunity to document what
the barriers pair with and why they are needed.
Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Message-Id: <20230620112832.5158-1-jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
We ran into issues where mount(8) passed multiple lower layers as one
big string through fsconfig(). But the fsconfig() FSCONFIG_SET_STRING
option is limited to 256 bytes in strndup_user(). While this would be
fixable by extending the fsconfig() buffer I'd rather encourage users to
append layers via multiple fsconfig() calls as the interface allows
nicely for this. This has also been requested as a feature before.
With this port to the new mount api the following will be possible:
fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);
/* set upper layer */
fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);
/* append "/lower2", "/lower3", and "/lower4" */
fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);
/* turn index feature on */
fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);
/* append "/lower5" */
fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);
Specifying ':' would have been rejected so this isn't a regression. And
we can't simply use "lowerdir=/lower" to append on top of existing
layers as "lowerdir=/lower,lowerdir=/other-lower" would make
"/other-lower" the only lower layer so we'd break uapi if we changed
this. So the ':' prefix seems a good compromise.
Users can choose to specify multiple layers at once or individual
layers. A layer is appended if it starts with ":". This requires that
the user has already added at least one layer before. If lowerdir is
specified again without a leading ":" then all previous layers are
dropped and replaced with the new layers. If lowerdir is specified and
empty than all layers are simply dropped.
An additional change is that overlayfs will now parse and resolve layers
right when they are specified in fsconfig() instead of deferring until
super block creation. This allows users to receive early errors.
It also allows users to actually use up to 500 layers something which
was theoretically possible but ended up not working due to the mount
option string passed via mount(2) being too large.
This also allows a more privileged process to set config options for a
lesser privileged process as the creds for fsconfig() and the creds for
fsopen() can differ. We could restrict that they match by enforcing that
the creds of fsopen() and fsconfig() match but I don't see why that
needs to be the case and allows for a good delegation mechanism.
Plus, in the future it means we're able to extend overlayfs mount
options and allow users to specify layers via file descriptors instead
of paths:
fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);
/* append */
fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);
/* append */
fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);
/* clear all layers specified until now */
fsconfig(FSCONFIG_SET_STRING, "lowerdir", NULL, 0);
This would be especially nice if users create an overlayfs mount on top
of idmapped layers or just in general private mounts created via
open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
anywhere in the filesystem. But for now just do the minimal thing.
We should probably aim to move more validation into ovl_fs_parse_param()
so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
be done in additional patches later.
This is now also rebased on top of the lazy lowerdata lookup which
allows the specificatin of data only layers using the new "::" syntax.
The rules are simple. A data only layers cannot be followed by any
regular layers and data layers must be preceeded by at least one regular
layer.
Parsing the lowerdir mount option must change because of this. The
original patchset used the old lowerdir parsing function to split a
lowerdir mount option string such as:
lowerdir=/lower1:/lower2::/lower3::/lower4
simply replacing each non-escaped ":" by "\0". So sequences of
non-escaped ":" were counted as layers. For example, the previous
lowerdir mount option above would've counted 6 layers instead of 4 and a
lowerdir mount option such as:
lowerdir="/lower1:/lower2::/lower3::/lower4:::::::::::::::::::::::::::"
would be counted as 33 layers. Other than being ugly this didn't matter
much because kern_path() would reject the first "\0" layer. However,
this overcounting of layers becomes problematic when we base allocations
on it where we very much only want to allocate space for 4 layers
instead of 33.
So the new parsing function rejects non-escaped sequences of colons
other than ":" and "::" immediately instead of relying on kern_path().
Link: https://github.com/util-linux/util-linux/issues/2287
Link: https://github.com/util-linux/util-linux/issues/1992
Link: https://bugs.archlinux.org/task/78702
Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
|
|
Support large folios in block_truncate_page() and avoid three hidden calls
to compound_head().
[willy@infradead.org: fix check of filemap_grab_folio() return value]
Link: https://lkml.kernel.org/r/ZItZOt+XxV12HtzL@casper.infradead.org
Link: https://lkml.kernel.org/r/20230612210141.730128-15-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Saves a call to compound_head() and may be needed to support block size >
PAGE_SIZE.
Link: https://lkml.kernel.org/r/20230612210141.730128-14-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|