Age | Commit message (Collapse) | Author | Files | Lines |
|
->d_revalidate() often needs to access dentry parent and name; that has
to be done carefully, since the locking environment varies from caller
to caller. We are not guaranteed that dentry in question will not be
moved right under us - not unless the filesystem is such that nothing
on it ever gets renamed.
It can be dealt with, but that results in boilerplate code that isn't
even needed - the callers normally have just found the dentry via dcache
lookup and want to verify that it's in the right place; they already
have the values of ->d_parent and ->d_name stable. There is a couple
of exceptions (overlayfs and, to less extent, ecryptfs), but for the
majority of calls that song and dance is not needed at all.
It's easier to make ecryptfs and overlayfs find and pass those values if
there's a ->d_revalidate() instance to be called, rather than doing that
in the instances.
This commit only changes the calling conventions; making use of supplied
values is left to followups.
NOTE: some instances need more than just the parent - things like CIFS
may need to build an entire path from filesystem root, so they need
more precautions than the usual boilerplate. This series doesn't
do anything to that need - these filesystems have to keep their locking
mechanisms (rename_lock loops, use of dentry_path_raw(), private rwsem
a-la v9fs).
One thing to keep in mind when using name is that name->name will normally
point into the pathname being resolved; the filename in question occupies
name->len bytes starting at name->name, and there is NUL somewhere after it,
but it the next byte might very well be '/' rather than '\0'. Do not
ignore name->len.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Pull jfs updates from Dave Kleikamp:
"A few more patches to add sanity checks in jfs"
* tag 'jfs-6.13' of github.com:kleikamp/linux-shaggy:
jfs: add a check to prevent array-index-out-of-bounds in dbAdjTree
jfs: xattr: check invalid xattr size more strictly
jfs: fix array-index-out-of-bounds in jfs_readdir
jfs: fix shift-out-of-bounds in dbSplit
jfs: array-index-out-of-bounds fix in dtReadFirst
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount api conversions from Christian Brauner:
"Convert adfs, affs, befs, hfs, hfsplus, jfs, and hpfs to the new mount
api"
* tag 'vfs-6.13.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
efs: fix the efs new mount api implementation
ubifs: Convert ubifs to use the new mount API
hpfs: convert hpfs to use the new mount api
jfs: convert jfs to use the new mount api
hfsplus: convert hfsplus to use the new mount api
hfs: convert hfs to use the new mount api
befs: convert befs to use the new mount api
affs: convert affs to use the new mount api
adfs: convert adfs to use the new mount api
|
|
When the value of lp is 0 at the beginning of the for loop, it will
become negative in the next assignment and we should bail out.
Reported-by: syzbot+412dea214d8baa3f7483@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=412dea214d8baa3f7483
Tested-by: syzbot+412dea214d8baa3f7483@syzkaller.appspotmail.com
Signed-off-by: Nihar Chaithanya <niharchaithanya@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
Commit 7c55b78818cf ("jfs: xattr: fix buffer overflow for invalid xattr")
also addresses this issue but it only fixes it for positive values, while
ea_size is an integer type and can take negative values, e.g. in case of
a corrupted filesystem. This still breaks validation and would overflow
because of implicit conversion from int to size_t in print_hex_dump().
Fix this issue by clamping the ea_size value instead.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Cc: stable@vger.kernel.org
Signed-off-by: Artem Sadovnikov <ancowi69@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
The stbl might contain some invalid values. Added a check to
return error code in that case.
Reported-by: syzbot+0315f8fe99120601ba88@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=0315f8fe99120601ba88
Signed-off-by: Ghanshyam Agrawal <ghanshyam1898@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
When dmt_budmin is less than zero, it causes errors
in the later stages. Added a check to return an error beforehand
in dbAllocCtl itself.
Reported-by: syzbot+b5ca8a249162c4b9a7d0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b5ca8a249162c4b9a7d0
Signed-off-by: Ghanshyam Agrawal <ghanshyam1898@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
The value of stbl can be sometimes out of bounds due
to a bad filesystem. Added a check with appopriate return
of error code in that case.
Reported-by: syzbot+65fa06e29859e41a83f3@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=65fa06e29859e41a83f3
Signed-off-by: Ghanshyam Agrawal <ghanshyam1898@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
MAXAG is a legitimate value for bmp->db_numag
Fixes: e63866a47556 ("jfs: fix out-of-bounds in dbNextAG() and diAlloc()")
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
Convert the jfs filesystem to use the new mount API.
Tested by comparing random mount & remount options before and after
the change.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Link: https://lore.kernel.org/r/20240926171947.682881-1-sandeen@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Pull jfs updates from David Kleikamp:
"A few fixes for jfs"
* tag 'jfs-6.12' of github.com:kleikamp/linux-shaggy:
jfs: Fix uninit-value access of new_ea in ea_buffer
jfs: check if leafidx greater than num leaves per dmap tree
jfs: Fix uaf in dbFreeBits
jfs: fix out-of-bounds in dbNextAG() and diAlloc()
jfs: UBSAN: shift-out-of-bounds in dbFindBits
|
|
syzbot reports that lzo1x_1_do_compress is using uninit-value:
=====================================================
BUG: KMSAN: uninit-value in lzo1x_1_do_compress+0x19f9/0x2510 lib/lzo/lzo1x_compress.c:178
...
Uninit was stored to memory at:
ea_put fs/jfs/xattr.c:639 [inline]
...
Local variable ea_buf created at:
__jfs_setxattr+0x5d/0x1ae0 fs/jfs/xattr.c:662
__jfs_xattr_set+0xe6/0x1f0 fs/jfs/xattr.c:934
=====================================================
The reason is ea_buf->new_ea is not initialized properly.
Fix this by using memset to empty its content at the beginning
in ea_get().
Reported-by: syzbot+02341e0daa42a15ce130@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=02341e0daa42a15ce130
Signed-off-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
syzbot report a out of bounds in dbSplit, it because dmt_leafidx greater
than num leaves per dmap tree, add a checking for dmt_leafidx in dbFindLeaf.
Shaggy:
Modified sanity check to apply to control pages as well as leaf pages.
Reported-and-tested-by: syzbot+dca05492eff41f604890@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=dca05492eff41f604890
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
[syzbot reported]
==================================================================
BUG: KASAN: slab-use-after-free in __mutex_lock_common kernel/locking/mutex.c:587 [inline]
BUG: KASAN: slab-use-after-free in __mutex_lock+0xfe/0xd70 kernel/locking/mutex.c:752
Read of size 8 at addr ffff8880229254b0 by task syz-executor357/5216
CPU: 0 UID: 0 PID: 5216 Comm: syz-executor357 Not tainted 6.11.0-rc3-syzkaller-00156-gd7a5aa4b3c00 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/27/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:93 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x169/0x550 mm/kasan/report.c:488
kasan_report+0x143/0x180 mm/kasan/report.c:601
__mutex_lock_common kernel/locking/mutex.c:587 [inline]
__mutex_lock+0xfe/0xd70 kernel/locking/mutex.c:752
dbFreeBits+0x7ea/0xd90 fs/jfs/jfs_dmap.c:2390
dbFreeDmap fs/jfs/jfs_dmap.c:2089 [inline]
dbFree+0x35b/0x680 fs/jfs/jfs_dmap.c:409
dbDiscardAG+0x8a9/0xa20 fs/jfs/jfs_dmap.c:1650
jfs_ioc_trim+0x433/0x670 fs/jfs/jfs_discard.c:100
jfs_ioctl+0x2d0/0x3e0 fs/jfs/ioctl.c:131
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:907 [inline]
__se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
Freed by task 5218:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579
poison_slab_object+0xe0/0x150 mm/kasan/common.c:240
__kasan_slab_free+0x37/0x60 mm/kasan/common.c:256
kasan_slab_free include/linux/kasan.h:184 [inline]
slab_free_hook mm/slub.c:2252 [inline]
slab_free mm/slub.c:4473 [inline]
kfree+0x149/0x360 mm/slub.c:4594
dbUnmount+0x11d/0x190 fs/jfs/jfs_dmap.c:278
jfs_mount_rw+0x4ac/0x6a0 fs/jfs/jfs_mount.c:247
jfs_remount+0x3d1/0x6b0 fs/jfs/super.c:454
reconfigure_super+0x445/0x880 fs/super.c:1083
vfs_cmd_reconfigure fs/fsopen.c:263 [inline]
vfs_fsconfig_locked fs/fsopen.c:292 [inline]
__do_sys_fsconfig fs/fsopen.c:473 [inline]
__se_sys_fsconfig+0xb6e/0xf80 fs/fsopen.c:345
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
[Analysis]
There are two paths (dbUnmount and jfs_ioc_trim) that generate race
condition when accessing bmap, which leads to the occurrence of uaf.
Use the lock s_umount to synchronize them, in order to avoid uaf caused
by race condition.
Reported-and-tested-by: syzbot+3c010e21296f33a5dc16@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
In dbNextAG() , there is no check for the case where bmp->db_numag is
greater or same than MAXAG due to a polluted image, which causes an
out-of-bounds. Therefore, a bounds check should be added in dbMount().
And in dbNextAG(), a check for the case where agpref is greater than
bmp->db_numag should be added, so an out-of-bounds exception should be
prevented.
Additionally, a check for the case where agno is greater or same than
MAXAG should be added in diAlloc() to prevent out-of-bounds.
Reported-by: Jeongjun Park <aha310510@gmail.com>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
Fix issue with UBSAN throwing shift-out-of-bounds warning.
Reported-by: syzbot+e38d703eeb410b17b473@syzkaller.appspotmail.com
Signed-off-by: Remington Brasga <rbrasga@uci.edu>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
Convert all callers from working on a page to working on one page
of a folio (support for working on an entire folio can come later).
Removes a lot of folio->page->folio conversions.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Most callers have a folio, and most implementations operate on a folio,
so remove the conversion from folio->page->folio to fit through this
interface.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Pull jfs updates from David Kleikamp:
"Folio conversion from Matthew Wilcox and a few various fixes"
* tag 'jfs-6.11' of github.com:kleikamp/linux-shaggy:
jfs: don't walk off the end of ealist
jfs: Fix shift-out-of-bounds in dbDiscardAG
jfs: Fix array-index-out-of-bounds in diFree
jfs: fix null ptr deref in dtInsertEntry
jfs: Remove use of folio error flag
fs: Remove i_blocks_per_page
jfs: Change metapage->page to metapage->folio
jfs: Convert force_metapage to use a folio
jfs: Convert inc_io to take a folio
jfs: Convert page_to_mp to folio_to_mp
jfs; Convert __invalidate_metapages to use a folio
jfs: Convert dec_io to take a folio
jfs: Convert drop_metapage and remove_metapage to take a folio
jfs; Convert release_metapage to use a folio
jfs: Convert insert_metapage() to take a folio
jfs: Convert __get_metapage to use a folio
jfs: Convert metapage_writepage to metapage_write_folio
jfs: Convert metapage_read_folio to use folio APIs
|
|
Add a check before visiting the members of ea to
make sure each ea stays within the ealist.
Signed-off-by: lei lu <llfamsec@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
When searching for the next smaller log2 block, BLKSTOL2() returned 0,
causing shift exponent -1 to be negative.
This patch fixes the issue by exiting the loop directly when negative
shift is found.
Reported-by: syzbot+61be3359d2ee3467e7e4@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=61be3359d2ee3467e7e4
Signed-off-by: Pei Li <peili.dev@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
Reported-by: syzbot+241c815bda521982cb49@syzkaller.appspotmail.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
[syzbot reported]
general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 PID: 5061 Comm: syz-executor404 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
RIP: 0010:dtInsertEntry+0xd0c/0x1780 fs/jfs/jfs_dtree.c:3713
...
[Analyze]
In dtInsertEntry(), when the pointer h has the same value as p, after writing
name in UniStrncpy_to_le(), p->header.flag will be cleared. This will cause the
previously true judgment "p->header.flag & BT-LEAF" to change to no after writing
the name operation, this leads to entering an incorrect branch and accessing the
uninitialized object ih when judging this condition for the second time.
[Fix]
After got the page, check freelist first, if freelist == 0 then exit dtInsert()
and return -EINVAL.
Reported-by: syzbot+bba84aef3a26fb93deb9@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
When an xattr size is not what is expected, it is printed out to the
kernel log in hex format as a form of debugging. But when that xattr
size is bigger than the expected size, printing it out can cause an
access off the end of the buffer.
Fix this all up by properly restricting the size of the debug hex dump
in the kernel log.
Reported-by: syzbot+9dfe490c8176301c1d06@syzkaller.appspotmail.com
Cc: Dave Kleikamp <shaggy@kernel.org>
Link: https://lore.kernel.org/r/2024051433-slider-cloning-98f9@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Store the blk_status per folio (if we can have multiple metapages per
folio) instead of setting the folio error flag. This will allow us to
reclaim a precious folio flag shortly.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
Convert all the users to operate on a folio. Saves sixteen calls to
compound_head(). We still use sizeof(struct page) in print_hex_dump,
otherwise it will go into the second and third pages of the folio which
won't exist for jfs folios (since they are not large). This needs a
better solution, but finding it can be postponed.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
Convert the mp->page to a folio and operate on it. That lets us
convert metapage_write_one() to take a folio. Replaces five calls to
compound_head() with one.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
All their callers now have a folio, so pass it in. Remove mp_anchor()
as inc_io() was the last user. No savings here, just cleaning up some
remnants.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
Access folio->private directly instead of testing the page private flag.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
Retrieve a folio from the page cache instead of a page. Saves a
couple of calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
This means also converting the two handlers to take a folio.
Saves four calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
All callers now have a folio, so pass it in instead of the page.
Removes a couple of calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
Convert mp->page to a folio and remove 7 hidden calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
Both of its callers now have a folio, so convert this function.
Use folio_attach_private() instead of manually setting folio->private.
This also gets the expected refcount of the folio correct.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
Remove four hidden calls to compound_head().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
Implement writepages rather than writepage by using write_cache_pages()
to call metapage_write_folio(). Use bio_add_folio_nofail() as we know
we just allocated the bio. Replace the call to SetPageError (which
is never checked) with a call to mapping_set_error (which ... might be
checked somewhere?)
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
Use bio_add_folio_nofail() as we just allocated the bio and know
it cannot fail. Other than that, this is a 1:1 conversion from
page APIs to folio APIs.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
Currently a device is only really released once the umount returns to
userspace due to how file closing works. That ultimately could cause
an old umount assumption to be violated that concurrent umount and mount
don't fail. So an exclusively held device with a temporary holder should
be yielded before the filesystem is gone. Add a helper that allows
callers to do that. This also allows us to remove the two holder ops
that Linus wasn't excited about.
Link: https://lore.kernel.org/r/20240326-vfs-bdev-end_holder-v1-1-20af85202918@kernel.org
Fixes: f3a608827d1f ("bdev: open block device as files") # mainline only
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, isofs, udf, and quota updates from Jan Kara:
"A lot of material this time:
- removal of a lot of GFP_NOFS usage from ext2, udf, quota (either it
was legacy or replaced with scoped memalloc_nofs_*() API)
- removal of BUG_ONs in quota code
- conversion of UDF to the new mount API
- tightening quota on disk format verification
- fix some potentially unsafe use of RCU pointers in quota code and
annotate everything properly to make sparse happy
- a few other small quota, ext2, udf, and isofs fixes"
* tag 'fs_for_v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (26 commits)
udf: remove SLAB_MEM_SPREAD flag usage
quota: remove SLAB_MEM_SPREAD flag usage
isofs: remove SLAB_MEM_SPREAD flag usage
ext2: remove SLAB_MEM_SPREAD flag usage
ext2: mark as deprecated
udf: convert to new mount API
udf: convert novrs to an option flag
MAINTAINERS: add missing git address for ext2 entry
quota: Detect loops in quota tree
quota: Properly annotate i_dquot arrays with __rcu
quota: Fix rcu annotations of inode dquot pointers
isofs: handle CDs with bad root inode but good Joliet root directory
udf: Avoid invalid LVID used on mount
quota: Fix potential NULL pointer dereference
quota: Drop GFP_NOFS instances under dquot->dq_lock and dqio_sem
quota: Set nofs allocation context when acquiring dqio_sem
ext2: Remove GFP_NOFS use in ext2_xattr_cache_insert()
ext2: Drop GFP_NOFS use in ext2_get_blocks()
ext2: Drop GFP_NOFS allocation from ext2_init_block_alloc_info()
udf: Remove GFP_NOFS allocation in udf_expand_file_adinicb()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull block handle updates from Christian Brauner:
"Last cycle we changed opening of block devices, and opening a block
device would return a bdev_handle. This allowed us to implement
support for restricting and forbidding writes to mounted block
devices. It was accompanied by converting and adding helpers to
operate on bdev_handles instead of plain block devices.
That was already a good step forward but ultimately it isn't necessary
to have special purpose helpers for opening block devices internally
that return a bdev_handle.
Fundamentally, opening a block device internally should just be
equivalent to opening files. So now all internal opens of block
devices return files just as a userspace open would. Instead of
introducing a separate indirection into bdev_open_by_*() via struct
bdev_handle bdev_file_open_by_*() is made to just return a struct
file. Opening and closing a block device just becomes equivalent to
opening and closing a file.
This all works well because internally we already have a pseudo fs for
block devices and so opening block devices is simple. There's a few
places where we needed to be careful such as during boot when the
kernel is supposed to mount the rootfs directly without init doing it.
Here we need to take care to ensure that we flush out any asynchronous
file close. That's what we already do for opening, unpacking, and
closing the initramfs. So nothing new here.
The equivalence of opening and closing block devices to regular files
is a win in and of itself. But it also has various other advantages.
We can remove struct bdev_handle completely. Various low-level helpers
are now private to the block layer. Other helpers were simply
removable completely.
A follow-up series that is already reviewed build on this and makes it
possible to remove bdev->bd_inode and allows various clean ups of the
buffer head code as well. All places where we stashed a bdev_handle
now just stash a file and use simple accessors to get to the actual
block device which was already the case for bdev_handle"
* tag 'vfs-6.9.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits)
block: remove bdev_handle completely
block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access
bdev: remove bdev pointer from struct bdev_handle
bdev: make struct bdev_handle private to the block layer
bdev: make bdev_{release, open_by_dev}() private to block layer
bdev: remove bdev_open_by_path()
reiserfs: port block device access to file
ocfs2: port block device access to file
nfs: port block device access to files
jfs: port block device access to file
f2fs: port block device access to files
ext4: port block device access to file
erofs: port device access to file
btrfs: port device access to file
bcachefs: port block device access to file
target: port block device access to file
s390: port block device access to file
nvme: port block device access to file
block2mtd: port device access to files
bcache: port block device access to files
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"Misc features, cleanups, and fixes for vfs and individual filesystems.
Features:
- Support idmapped mounts for hugetlbfs.
- Add RWF_NOAPPEND flag for pwritev2(). This allows us to fix a bug
where the passed offset is ignored if the file is O_APPEND. The new
flag allows a caller to enforce that the offset is honored to
conform to posix even if the file was opened in append mode.
- Move i_mmap_rwsem in struct address_space to avoid false sharing
between i_mmap and i_mmap_rwsem.
- Convert efs, qnx4, and coda to use the new mount api.
- Add a generic is_dot_dotdot() helper that's used by various
filesystems and the VFS code instead of open-coding it multiple
times.
- Recently we've added stable offsets which allows stable ordering
when iterating directories exported through NFS on e.g., tmpfs
filesystems. Originally an xarray was used for the offset map but
that caused slab fragmentation issues over time. This switches the
offset map to the maple tree which has a dense mode that handles
this scenario a lot better. Includes tests.
- Finally merge the case-insensitive improvement series Gabriel has
been working on for a long time. This cleanly propagates case
insensitive operations through ->s_d_op which in turn allows us to
remove the quite ugly generic_set_encrypted_ci_d_ops() operations.
It also improves performance by trying a case-sensitive comparison
first and then fallback to case-insensitive lookup if that fails.
This also fixes a bug where overlayfs would be able to be mounted
over a case insensitive directory which would lead to all sort of
odd behaviors.
Cleanups:
- Make file_dentry() a simple accessor now that ->d_real() is
simplified because of the backing file work we did the last two
cycles.
- Use the dedicated file_mnt_idmap helper in ntfs3.
- Use smp_load_acquire/store_release() in the i_size_read/write
helpers and thus remove the hack to handle i_size reads in the
filemap code.
- The SLAB_MEM_SPREAD is a nop now. Remove it from various places in
fs/
- It's no longer necessary to perform a second built-in initramfs
unpack call because we retain the contents of the previous
extraction. Remove it.
- Now that we have removed various allocators kfree_rcu() always
works with kmem caches and kmalloc(). So simplify various places
that only use an rcu callback in order to handle the kmem cache
case.
- Convert the pipe code to use a lockdep comparison function instead
of open-coding the nesting making lockdep validation easier.
- Move code into fs-writeback.c that was located in a header but can
be made static as it's only used in that one file.
- Rewrite the alignment checking iterators for iovec and bvec to be
easier to read, and also significantly more compact in terms of
generated code. This saves 270 bytes of text on x86-64 (with
clang-18) and 224 bytes on arm64 (with gcc-13). In profiles it also
saves a bit of time for the same workload.
- Switch various places to use KMEM_CACHE instead of
kmem_cache_create().
- Use inode_set_ctime_to_ts() in inode_set_ctime_current()
- Use kzalloc() in name_to_handle_at() to avoid kernel infoleak.
- Various smaller cleanups for eventfds.
Fixes:
- Fix various comments and typos, and unneeded initializations.
- Fix stack allocation hack for clang in the select code.
- Improve dump_mapping() debug code on a best-effort basis.
- Fix build errors in various selftests.
- Avoid wrap-around instrumentation in various places.
- Don't allow user namespaces without an idmapping to be used for
idmapped mounts.
- Fix sysv sb_read() call.
- Fix fallback implementation of the get_name() export operation"
* tag 'vfs-6.9.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (70 commits)
hugetlbfs: support idmapped mounts
qnx4: convert qnx4 to use the new mount api
fs: use inode_set_ctime_to_ts to set inode ctime to current time
libfs: Drop generic_set_encrypted_ci_d_ops
ubifs: Configure dentry operations at dentry-creation time
f2fs: Configure dentry operations at dentry-creation time
ext4: Configure dentry operations at dentry-creation time
libfs: Add helper to choose dentry operations at mount-time
libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops
fscrypt: Drop d_revalidate once the key is added
fscrypt: Drop d_revalidate for valid dentries during lookup
fscrypt: Factor out a helper to configure the lookup dentry
ovl: Always reject mounting over case-insensitive directories
libfs: Attempt exact-match comparison first during casefolded lookup
efs: remove SLAB_MEM_SPREAD flag usage
jfs: remove SLAB_MEM_SPREAD flag usage
minix: remove SLAB_MEM_SPREAD flag usage
openpromfs: remove SLAB_MEM_SPREAD flag usage
proc: remove SLAB_MEM_SPREAD flag usage
qnx6: remove SLAB_MEM_SPREAD flag usage
...
|
|
The SLAB_MEM_SPREAD flag used to be implemented in SLAB, which was
removed as of v6.8-rc1 (see [1]), so it became a dead flag since the
commit 16a1d968358a ("mm/slab: remove mm/slab.c and slab_def.h"). And
the series[1] went on to mark it obsolete explicitly to avoid confusion
for users. Here we can just remove all its users, which has no any
functional change.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Link: https://lore.kernel.org/all/20240223-slab-cleanup-flags-v2-1-02f1753e8303@suse.cz [1]
Link: https://lore.kernel.org/r/20240224134925.829677-1-chengming.zhou@linux.dev
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-23-adbd023e19cc@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add two new helpers to allow opening block devices as files.
This is not the final infrastructure. This still opens the block device
before opening a struct a file. Until we have removed all references to
struct bdev_handle we can't switch the order:
* Introduce blk_to_file_flags() to translate from block specific to
flags usable to pen a new file.
* Introduce bdev_file_open_by_{dev,path}().
* Introduce temporary sb_bdev_handle() helper to retrieve a struct
bdev_handle from a block device file and update places that directly
reference struct bdev_handle to rely on it.
* Don't count block device openes against the number of open files. A
bdev_file_open_by_{dev,path}() file is never installed into any
file descriptor table.
One idea that came to mind was to use kernel_tmpfile_open() which
would require us to pass a path and it would then call do_dentry_open()
going through the regular fops->open::blkdev_open() path. But then we're
back to the problem of routing block specific flags such as
BLK_OPEN_RESTRICT_WRITES through the open path and would have to waste
FMODE_* flags every time we add a new one. With this we can avoid using
a flag bit and we have more leeway in how we open block devices from
bdev_open_by_{dev,path}().
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-1-adbd023e19cc@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Dquots pointed to from i_dquot arrays in inodes are protected by
dquot_srcu. Annotate them as such and change .get_dquots callback to
return properly annotated pointer to make sparse happy.
Fixes: b9ba6f94b238 ("quota: remove dqptr_sem")
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
This reverts commit cca974daeb6c43ea971f8ceff5a7080d7d49ee30.
The added sanity check is incorrect. BUDMIN is not the wrong value and
is too small.
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
The kernel thread function jfs_lazycommit() and jfs_sync() invoke the
try_to_freeze() in its loop. But all the kernel threads are no-freezable
by default. So if we want to make a kernel thread to be freezable, we have
to invoke set_freezable() explicitly.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
[Syz report]
UBSAN: array-index-out-of-bounds in fs/jfs/jfs_imap.c:2360:2
index -878706688 is out of range for type 'struct iagctl[128]'
CPU: 1 PID: 5065 Comm: syz-executor282 Not tainted 6.7.0-rc4-syzkaller-00009-gbee0e7762ad2 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
ubsan_epilogue lib/ubsan.c:217 [inline]
__ubsan_handle_out_of_bounds+0x11c/0x150 lib/ubsan.c:348
diNewExt+0x3cf3/0x4000 fs/jfs/jfs_imap.c:2360
diAllocExt fs/jfs/jfs_imap.c:1949 [inline]
diAllocAG+0xbe8/0x1e50 fs/jfs/jfs_imap.c:1666
diAlloc+0x1d3/0x1760 fs/jfs/jfs_imap.c:1587
ialloc+0x8f/0x900 fs/jfs/jfs_inode.c:56
jfs_mkdir+0x1c5/0xb90 fs/jfs/namei.c:225
vfs_mkdir+0x2f1/0x4b0 fs/namei.c:4106
do_mkdirat+0x264/0x3a0 fs/namei.c:4129
__do_sys_mkdir fs/namei.c:4149 [inline]
__se_sys_mkdir fs/namei.c:4147 [inline]
__x64_sys_mkdir+0x6e/0x80 fs/namei.c:4147
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x45/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7fcb7e6a0b57
Code: ff ff 77 07 31 c0 c3 0f 1f 40 00 48 c7 c2 b8 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 b8 53 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffd83023038 EFLAGS: 00000286 ORIG_RAX: 0000000000000053
RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007fcb7e6a0b57
RDX: 00000000000a1020 RSI: 00000000000001ff RDI: 0000000020000140
RBP: 0000000020000140 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000286 R12: 00007ffd830230d0
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[Analysis]
When the agstart is too large, it can cause agno overflow.
[Fix]
After obtaining agno, if the value is invalid, exit the subsequent process.
Reported-and-tested-by: syzbot+553d90297e6d2f50dbc7@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Modified the test from agno > MAXAG to agno >= MAXAG based on linux-next
report by kernel test robot (Dan Carpenter).
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
Currently while joining the leaf in a buddy system there is shift out
of bound error in calculation of BUDSIZE. Added the required check
to the BUDSIZE and fixed the documentation as well.
Reported-by: syzbot+411debe54d318eaed386@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=411debe54d318eaed386
Signed-off-by: Manas Ghandat <ghandatmanas@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|
|
When the execution of diMount(ipimap) fails, the object ipimap that has been
released may be accessed in diFreeSpecial(). Asynchronous ipimap release occurs
when rcu_core() calls jfs_free_node().
Therefore, when diMount(ipimap) fails, sbi->ipimap should not be initialized as
ipimap.
Reported-and-tested-by: syzbot+01cf2dbcbe2022454388@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
|