Age | Commit message (Collapse) | Author | Files | Lines |
|
We should convert fs/fuse code to use a newly introduced
invalid_mnt_idmap instead of passing a NULL as idmap pointer.
Suggested-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Let's convert all existing callers properly.
No functional changes intended.
Suggested-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
It is not possible with the current fuse code, but let's protect ourselves
from regressions in the future.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
RENAME_WHITEOUT is a special case of ->rename
and we need to take idmappings into account there.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Need to translate uid and gid in case of chown(2).
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
We only cover the case when "default_permissions" flag
is used. A reason for that is that otherwise all the permission
checks are done in the userspace and we have to deal with
VFS idmapping in the userspace (which is bad), alternatively
we have to provide the userspace with idmapped req->in.h.uid/req->in.h.gid
which is also not align with VFS idmaps philosophy.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
We have to:
- pass an idmapping to the generic_fillattr()
to properly handle UIG/GID mapping for the userspace.
- pass -/- to fuse_fillattr() (analog of generic_fillattr() in fuse).
Difference between these two is that generic_fillattr() takes all the
stat() data from the inode directly, while fuse_fillattr() codepath takes a
fresh data just from the userspace reply on the FUSE_GETATTR request.
In some cases we can just pass &nop_mnt_idmap, because idmapping won't be
used in these codepaths. For example, when 3rd argument of
fuse_do_getattr() is NULL then idmap argument is not used.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
We have all the infrastructure in place, we just need
to pass an idmapping here.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
We don't need to remap parent_gid, but have to adjust
group membership checks and take idmapping into account.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
If idmap == NULL *and* filesystem daemon declared idmapped mounts
support, then uid/gid values in a fuse header will be -1.
No functional changes intended.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The memory of struct fuse_file is allocated but not freed
when get_create_ext return error.
Fixes: 3e2b6fdbdc9a ("fuse: send security context of inode on file")
Cc: stable@vger.kernel.org # v5.17
Signed-off-by: yangyun <yangyun50@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
FUSE attempts to detect server support for statx by trying it once and
setting no_statx=1 if it fails with ENOSYS, but consider the following
scenario:
- Userspace (e.g. sh) calls stat() on a file
* succeeds
- Userspace (e.g. lsd) calls statx(BTIME) on the same file
- request_mask = STATX_BASIC_STATS | STATX_BTIME
- first pass: sync=true due to differing cache_mask
- statx fails and returns ENOSYS
- set no_statx and retry
- retry sets mask = STATX_BASIC_STATS
- now mask == cache_mask; sync=false (time_before: still valid)
- so we take the "else if (stat)" path
- "err" is still ENOSYS from the failed statx call
Fix this by zeroing "err" before retrying the failed call.
Fixes: d3045530bdd2 ("fuse: implement statx")
Cc: stable@vger.kernel.org # v6.6
Signed-off-by: Danny Lin <danny@orbstack.dev>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Found by chance while working on support for idmapped mounts in fuse.
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The root inode has a fixed nodeid and generation (1, 0).
Prior to the commit 15db16837a35 ("fuse: fix illegal access to inode with
reused nodeid") generation number on lookup was ignored. After this commit
lookup with the wrong generation number resulted in the inode being
unhashed. This is correct for non-root inodes, but replacing the root
inode is wrong and results in weird behavior.
Fix by reverting to the old behavior if ignoring the generation for the
root inode, but issuing a warning in dmesg.
Reported-by: Antonio SJ Musumeci <trapexit@spawn.link>
Closes: https://lore.kernel.org/all/CAOQ4uxhek5ytdN8Yz2tNEOg5ea4NkBb4nk0FGPjPk_9nz-VG3g@mail.gmail.com/
Fixes: 15db16837a35 ("fuse: fix illegal access to inode with reused nodeid")
Cc: <stable@vger.kernel.org> # v5.14
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
fuse_do_statx() was added with the wrong helper.
Fixes: d3045530bdd2 ("fuse: implement statx")
Cc: <stable@vger.kernel.org> # v6.6
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
In preparation for opening file in passthrough mode, store the
fuse_open_out argument in ff->args to be passed into fuse_file_io_open()
with the optional backing_id member.
This will be used for setting up passthrough to backing file on open
reply with FOPEN_PASSTHROUGH flag and a valid backing_id.
Opening a file in passthrough mode may fail for several reasons, such as
missing capability, conflicting open flags or inode in caching mode.
Return EIO from fuse_file_io_open() in those cases.
The combination of FOPEN_PASSTHROUGH and FOPEN_DIRECT_IO is allowed -
it mean that read/write operations will go directly to the server,
but mmap will be done to the backing file.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
In preparation for inode io modes, a server open response could fail due to
conflicting inode io modes.
Allow returning an error from fuse_finish_open() and handle the error in
the callers.
fuse_finish_open() is used as the callback of finish_open(), so that
FMODE_OPENED will not be set if fuse_finish_open() fails.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
fuse_open_common() has a lot of code relevant only for regular files and
O_TRUNC in particular.
Copy the little bit of remaining code into fuse_dir_open() and stop using
this common helper for directory open.
Also split out fuse_dir_finish_open() from fuse_finish_open() before we add
inode io modes to fuse_finish_open().
Suggested-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This removed the need to pass isdir argument to fuse_put_file().
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Convert to using the new inode timestamp accessor functions.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20231004185347.80880-37-jlayton@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
- Revert non-waiting FLUSH due to a regression
- Fix a lookup counter leak in readdirplus
- Add an option to allow shared mmaps in no-cache mode
- Add btime support and statx intrastructure to the protocol
- Invalidate positive/negative dentry on failed create/delete
* tag 'fuse-update-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: conditionally fill kstat in fuse_do_statx()
fuse: invalidate dentry on EEXIST creates or ENOENT deletes
fuse: cache btime
fuse: implement statx
fuse: add ATTR_TIMEOUT macro
fuse: add STATX request
fuse: handle empty request_mask in statx
fuse: write back dirty pages before direct write in direct_io_relax mode
fuse: add a new fuse init flag to relax restrictions in no cache mode
fuse: invalidate page cache pages before direct write
fuse: nlookup missing decrement in fuse_direntplus_link
Revert "fuse: in fuse_flush only wait if someone wants the return code"
|
|
The code path
fuse_update_attributes
fuse_update_get_attr
fuse_do_statx
has the risk to use a NULL pointer for struct kstat *stat, although current
callers of fuse_update_attributes() only set request_mask to values that
will trigger the call of fuse_do_getattr(), which already handles the NULL
pointer. Future updates might miss that fuse_do_statx() does not handle it
it is safer to add a condition already right now.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Fixes: d3045530bdd2 ("fuse: implement statx")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs timestamp updates from Christian Brauner:
"This adds VFS support for multi-grain timestamps and converts tmpfs,
xfs, ext4, and btrfs to use them. This carries acks from all relevant
filesystems.
The VFS always uses coarse-grained timestamps when updating the ctime
and mtime after a change. This has the benefit of allowing filesystems
to optimize away a lot of metadata updates, down to around 1 per
jiffy, even when a file is under heavy writes.
Unfortunately, this has always been an issue when we're exporting via
NFSv3, which relies on timestamps to validate caches. A lot of changes
can happen in a jiffy, so timestamps aren't sufficient to help the
client decide to invalidate the cache.
Even with NFSv4, a lot of exported filesystems don't properly support
a change attribute and are subject to the same problems with timestamp
granularity. Other applications have similar issues with timestamps
(e.g., backup applications).
If we were to always use fine-grained timestamps, that would improve
the situation, but that becomes rather expensive, as the underlying
filesystem would have to log a lot more metadata updates.
This introduces fine-grained timestamps that are used when they are
actively queried.
This uses the 31st bit of the ctime tv_nsec field to indicate that
something has queried the inode for the mtime or ctime. When this flag
is set, on the next mtime or ctime update, the kernel will fetch a
fine-grained timestamp instead of the usual coarse-grained one.
As POSIX generally mandates that when the mtime changes, the ctime
must also change the kernel always stores normalized ctime values, so
only the first 30 bits of the tv_nsec field are ever used.
Filesytems can opt into this behavior by setting the FS_MGTIME flag in
the fstype. Filesystems that don't set this flag will continue to use
coarse-grained timestamps.
Various preparatory changes, fixes and cleanups are included:
- Fixup all relevant places where POSIX requires updating ctime
together with mtime. This is a wide-range of places and all
maintainers provided necessary Acks.
- Add new accessors for inode->i_ctime directly and change all
callers to rely on them. Plain accesses to inode->i_ctime are now
gone and it is accordingly rename to inode->__i_ctime and commented
as requiring accessors.
- Extend generic_fillattr() to pass in a request mask mirroring in a
sense the statx() uapi. This allows callers to pass in a request
mask to only get a subset of attributes filled in.
- Rework timestamp updates so it's possible to drop the @now
parameter the update_time() inode operation and associated helpers.
- Add inode_update_timestamps() and convert all filesystems to it
removing a bunch of open-coding"
* tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (107 commits)
btrfs: convert to multigrain timestamps
ext4: switch to multigrain timestamps
xfs: switch to multigrain timestamps
tmpfs: add support for multigrain timestamps
fs: add infrastructure for multigrain timestamps
fs: drop the timespec64 argument from update_time
xfs: have xfs_vn_update_time gets its own timestamp
fat: make fat_update_time get its own timestamp
fat: remove i_version handling from fat_update_time
ubifs: have ubifs_update_time use inode_update_timestamps
btrfs: have it use inode_update_timestamps
fs: drop the timespec64 arg from generic_update_time
fs: pass the request_mask to generic_fillattr
fs: remove silly warning from current_time
gfs2: fix timestamp handling on quota inodes
fs: rename i_ctime field to __i_ctime
selinux: convert to ctime accessor functions
security: convert to ctime accessor functions
apparmor: convert to ctime accessor functions
sunrpc: convert to ctime accessor functions
...
|
|
The EEXIST errors returned from server are strong sign that a local
negative dentry should be invalidated. Similarly, The ENOENT errors from
server can also be a sign of revalidate failure.
This commit invalidates dentries on EEXIST creates and ENOENT deletes by
calling fuse_invalidate_entry(), which improves the consistency with no
performance degradation.
Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Not all inode attributes are supported by all filesystems, but for the
basic stats (which are returned by stat(2) and friends) all of them will
have some value, even if that doesn't reflect a real attribute of the file.
Btime is different, in that filesystems are free to report or not report a
value in statx. If the value is available, then STATX_BTIME bit is set in
stx_mask.
When caching the value of btime, remember the availability of the attribute
as well as the value (if available). This is done by using the
FUSE_I_BTIME bit in fuse_inode->state to indicate availability, while using
fuse_inode->inval_mask & STATX_BTIME to indicate the state of the cache
itself (i.e. set if cache is invalid, and cleared if cache is valid).
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Allow querying btime. When btime is requested in mask, then FUSE_STATX
request is sent. Otherwise keep using FUSE_GETATTR.
The userspace interface for statx matches that of the statx(2) API.
However there are limitations on how this interface is used:
- returned basic stats and btime are used, stx_attributes, etc. are
ignored
- always query basic stats and btime, regardless of what was requested
- requested sync type is ignored, the default is passed to the server
- if server returns with some attributes missing from the result_mask,
then no attributes will be cached
- btime is not cached yet (next patch will fix that)
For new inodes initialize fi->inval_mask to "all invalid", instead of "all
valid" as previously. Also only clear basic stats from inval_mask when
caching attributes. This will result in the caching logic not thinking
that btime is cached.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Next patch will introduce yet another type attribute reply. Add a macro
that can handle attribute timeouts for all of the structs.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
If no attribute is requested, then don't send request to userspace.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
generic_fillattr just fills in the entire stat struct indiscriminately
today, copying data from the inode. There is at least one attribute
(STATX_CHANGE_COOKIE) that can have side effects when it is reported,
and we're looking at adding more with the addition of multigrain
timestamps.
Add a request_mask argument to generic_fillattr and have most callers
just pass in the value that is passed to getattr. Have other callers
(e.g. ksmbd) just pass in STATX_BASIC_STATS. Also move the setting of
STATX_CHANGE_COOKIE into generic_fillattr.
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: "Paulo Alcantara (SUSE)" <pc@manguebit.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Message-Id: <20230807-mgctime-v7-2-d1dec143a704@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In later patches, we're going to change how the inode's ctime field is
used. Switch to using accessor functions instead of raw accesses of
inode->i_ctime.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230705190309.579783-44-jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
If the LOOKUP request triggered from fuse_dentry_revalidate() is
interrupted, then the dentry will be invalidated, possibly resulting in
submounts being unmounted.
Reported-by: Xu Rongbo <xurongbo@baidu.com>
Closes: https://lore.kernel.org/all/CAJfpegswN_CJJ6C3RZiaK6rpFmNyWmXfaEpnQUJ42KCwNF5tWw@mail.gmail.com/
Fixes: 9e6268db496a ("[PATCH] FUSE - read-write operations")
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
before this check, the nodeid has already been checked once, so
the check here doesn't make an sense, so remove the check for
nodeid here.
if (err || !outarg->nodeid)
goto out_put_forget;
err = -EIO;
>>> if (!outarg->nodeid)
goto out_put_forget;
Signed-off-by: zyfjeff <zyfjeff@linux.alibaba.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
- Fix regression in fileattr permission checking
- Fix possible hang during PID namespace destruction
- Add generic support for request extensions
- Add supplementary group list extension
- Add limited support for supplying supplementary groups in create
requests
- Documentation fixes
* tag 'fuse-update-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: add inode/permission checks to fileattr_get/fileattr_set
fuse: fix all W=1 kernel-doc warnings
fuse: in fuse_flush only wait if someone wants the return code
fuse: optional supplementary group in create requests
fuse: add request extension
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull vfs idmapping updates from Christian Brauner:
- Last cycle we introduced the dedicated struct mnt_idmap type for
mount idmapping and the required infrastucture in 256c8aed2b42 ("fs:
introduce dedicated idmap type for mounts"). As promised in last
cycle's pull request message this converts everything to rely on
struct mnt_idmap.
Currently we still pass around the plain namespace that was attached
to a mount. This is in general pretty convenient but it makes it easy
to conflate namespaces that are relevant on the filesystem with
namespaces that are relevant on the mount level. Especially for
non-vfs developers without detailed knowledge in this area this was a
potential source for bugs.
This finishes the conversion. Instead of passing the plain namespace
around this updates all places that currently take a pointer to a
mnt_userns with a pointer to struct mnt_idmap.
Now that the conversion is done all helpers down to the really
low-level helpers only accept a struct mnt_idmap argument instead of
two namespace arguments.
Conflating mount and other idmappings will now cause the compiler to
complain loudly thus eliminating the possibility of any bugs. This
makes it impossible for filesystem developers to mix up mount and
filesystem idmappings as they are two distinct types and require
distinct helpers that cannot be used interchangeably.
Everything associated with struct mnt_idmap is moved into a single
separate file. With that change no code can poke around in struct
mnt_idmap. It can only be interacted with through dedicated helpers.
That means all filesystems are and all of the vfs is completely
oblivious to the actual implementation of idmappings.
We are now also able to extend struct mnt_idmap as we see fit. For
example, we can decouple it completely from namespaces for users that
don't require or don't want to use them at all. We can also extend
the concept of idmappings so we can cover filesystem specific
requirements.
In combination with the vfs{g,u}id_t work we finished in v6.2 this
makes this feature substantially more robust and thus difficult to
implement wrong by a given filesystem and also protects the vfs.
- Enable idmapped mounts for tmpfs and fulfill a longstanding request.
A long-standing request from users had been to make it possible to
create idmapped mounts for tmpfs. For example, to share the host's
tmpfs mount between multiple sandboxes. This is a prerequisite for
some advanced Kubernetes cases. Systemd also has a range of use-cases
to increase service isolation. And there are more users of this.
However, with all of the other work going on this was way down on the
priority list but luckily someone other than ourselves picked this
up.
As usual the patch is tiny as all the infrastructure work had been
done multiple kernel releases ago. In addition to all the tests that
we already have I requested that Rodrigo add a dedicated tmpfs
testsuite for idmapped mounts to xfstests. It is to be included into
xfstests during the v6.3 development cycle. This should add a slew of
additional tests.
* tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (26 commits)
shmem: support idmapped mounts for tmpfs
fs: move mnt_idmap
fs: port vfs{g,u}id helpers to mnt_idmap
fs: port fs{g,u}id helpers to mnt_idmap
fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap
fs: port i_{g,u}id_{needs_}update() to mnt_idmap
quota: port to mnt_idmap
fs: port privilege checking helpers to mnt_idmap
fs: port inode_owner_or_capable() to mnt_idmap
fs: port inode_init_owner() to mnt_idmap
fs: port acl to mnt_idmap
fs: port xattr to mnt_idmap
fs: port ->permission() to pass mnt_idmap
fs: port ->fileattr_set() to pass mnt_idmap
fs: port ->set_acl() to pass mnt_idmap
fs: port ->get_acl() to pass mnt_idmap
fs: port ->tmpfile() to pass mnt_idmap
fs: port ->rename() to pass mnt_idmap
fs: port ->mknod() to pass mnt_idmap
fs: port ->mkdir() to pass mnt_idmap
...
|
|
Use correct function name in kernel-doc notation. (1)
Don't use "/**" to begin non-kernel-doc comments. (3)
Fixes these warnings:
fs/fuse/cuse.c:272: warning: expecting prototype for cuse_parse_dev_info(). Prototype was for cuse_parse_devinfo() instead
fs/fuse/dev.c:212: warning: expecting prototype for A new request is available, wake fiq(). Prototype was for fuse_dev_wake_and_unlock() instead
fs/fuse/dir.c:149: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Mark the attributes as stale due to an atime change. Avoid the invalidate if
fs/fuse/file.c:656: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* In case of short read, the caller sets 'pos' to the position of
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Permission to create an object (create, mkdir, symlink, mknod) needs to
take supplementary groups into account.
Add a supplementary group request extension. This can contain an arbitrary
number of group IDs and can be added to any request. This extension is not
added to any request by default.
Add FUSE_CREATE_SUPP_GROUP init flag to enable supplementary group info in
creation requests. This adds just a single supplementary group that
matches the parent group in the case described above. In other cases the
extension is not added.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Will need to add supplementary groups to create messages, so add the
general concept of a request extension. A request extension is appended to
the end of the main request. It has a header indicating the size and type
of the extension.
The create security context (fuse_secctx_*) is similar to the generic
request extension, so include that as well in a backward compatible manner.
Add the total extension length to the request header. The offset of the
extension block within the request can be calculated by:
inh->len - inh->total_extlen * 8
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This cycle we ported all filesystems to the new posix acl api. While
looking at further simplifications in this area to remove the last
remnants of the generic dummy posix acl handlers we realized that we
regressed fuse daemons that don't set FUSE_POSIX_ACL but still make use
of posix acls.
With the change to a dedicated posix acl api interacting with posix acls
doesn't go through the old xattr codepaths anymore and instead only
relies the get acl and set acl inode operations.
Before this change fuse daemons that don't set FUSE_POSIX_ACL were able
to get and set posix acl albeit with two caveats. First, that posix acls
aren't cached. And second, that they aren't used for permission checking
in the vfs.
We regressed that use-case as we currently refuse to retrieve any posix
acls if they aren't enabled via FUSE_POSIX_ACL. So older fuse daemons
would see a change in behavior.
We can restore the old behavior in multiple ways. We could change the
new posix acl api and look for a dedicated xattr handler and if we find
one prefer that over the dedicated posix acl api. That would break the
consistency of the new posix acl api so we would very much prefer not to
do that.
We could introduce a new ACL_*_CACHE sentinel that would instruct the
vfs permission checking codepath to not call into the filesystem and
ignore acls.
But a more straightforward fix for v6.2 is to do the same thing that
Overlayfs does and give fuse a separate get acl method for permission
checking. Overlayfs uses this to express different needs for vfs
permission lookup and acl based retrieval via the regular system call
path as well. Let fuse do the same for now. This way fuse can continue
to refuse to retrieve posix acls for daemons that don't set
FUSE_POSXI_ACL for permission checking while allowing a fuse server to
retrieve it via the usual system calls.
In the future, we could extend the get acl inode operation to not just
pass a simple boolean to indicate rcu lookup but instead make it a flag
argument. Then in addition to passing the information that this is an
rcu lookup to the filesystem we could also introduce a flag that tells
the filesystem that this is a request from the vfs to use these acls for
permission checking. Then fuse could refuse the get acl request for
permission checking when the daemon doesn't have FUSE_POSIX_ACL set in
the same get acl method. This would also help Overlayfs and allow us to
remove the second method for it as well.
But since that change is more invasive as we need to update the get acl
inode operation for multiple filesystems we should not do this as a fix
for v6.2. Instead we will do this for the v6.3 merge window.
Fwiw, since posix acls are now always correctly translated in the new
posix acl api we could also allow them to be used for daemons without
FUSE_POSIX_ACL that are not mounted on the host. But this is behavioral
change and again if dones should be done for v6.3. For now, let's just
restore the original behavior.
A nice side-effect of this change is that for fuse daemons with and
without FUSE_POSIX_ACL the same code is used for posix acls in a
backwards compatible way. This also means we can remove the legacy xattr
handlers completely. We've also added comments to explain the expected
behavior for daemons without FUSE_POSIX_ACL into the code.
Fixes: 318e66856dde ("xattr: use posix acl api")
Signed-off-by: Seth Forshee (Digital Ocean) <sforshee@kernel.org>
Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
|
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
|
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
|
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
|
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
|
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
|
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
|
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
|
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
|
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse update from Miklos Szeredi:
- Allow some write requests to proceed in parallel
- Fix a performance problem with allow_sys_admin_access
- Add a special kind of invalidation that doesn't immediately purge
submounts
- On revalidation treat the target of rename(RENAME_NOREPLACE) the same
as open(O_EXCL)
- Use type safe helpers for some mnt_userns transformations
* tag 'fuse-update-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: Rearrange fuse_allow_current_process checks
fuse: allow non-extending parallel direct writes on the same file
fuse: remove the unneeded result variable
fuse: port to vfs{g,u}id_t and associated helpers
fuse: Remove user_ns check for FUSE_DEV_IOC_CLONE
fuse: always revalidate rename target dentry
fuse: add "expire only" mode to FUSE_NOTIFY_INVAL_ENTRY
fs/fuse: Replace kmap() with kmap_local_page()
|
|
This is a followup to a previous commit of mine [0], which added the
allow_sys_admin_access && capable(CAP_SYS_ADMIN) check. This patch
rearranges the order of checks in fuse_allow_current_process without
changing functionality.
Commit 9ccf47b26b73 ("fuse: Add module param for CAP_SYS_ADMIN access
bypassing allow_other") added allow_sys_admin_access &&
capable(CAP_SYS_ADMIN) check to the beginning of the function, with the
reasoning that allow_sys_admin_access should be an 'escape hatch' for users
with CAP_SYS_ADMIN, allowing them to skip any subsequent checks.
However, placing this new check first results in many capable() calls when
allow_sys_admin_access is set, where another check would've also returned
1. This can be problematic when a BPF program is tracing capable() calls.
At Meta we ran into such a scenario recently. On a host where
allow_sys_admin_access is set but most of the FUSE access is from processes
which would pass other checks - i.e. they don't need CAP_SYS_ADMIN 'escape
hatch' - this results in an unnecessary capable() call for each fs op. We
also have a daemon tracing capable() with BPF and doing some data
collection, so tracing these extraneous capable() calls has the potential
to regress performance for an application doing many FUSE ops.
So rearrange the order of these checks such that CAP_SYS_ADMIN 'escape
hatch' is checked last. Add a small helper, fuse_permissible_uidgid, to
make the logic easier to understand. Previously, if allow_other is set on
the fuse_conn, uid/git checking doesn't happen as current_in_userns result
is returned. These semantics are maintained here: fuse_permissible_uidgid
check only happens if allow_other is not set.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The previous commit df8629af2934 ("fuse: always revalidate if exclusive
create") ensures that the dentries are revalidated on O_EXCL creates. This
commit complements it by also performing revalidation for rename target
dentries. Otherwise, a rename target file that only exists in kernel
dentry cache but not in the filesystem will result in EEXIST if
RENAME_NOREPLACE flag is used.
Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
Signed-off-by: Zhang Tianci <zhangtianci.1997@bytedance.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|