<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/fuse, branch linux-5.11.y</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=linux-5.11.y</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=linux-5.11.y'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2021-05-19T08:29:32+00:00</updated>
<entry>
<title>cuse: prevent clone</title>
<updated>2021-05-19T08:29:32+00:00</updated>
<author>
<name>Miklos Szeredi</name>
<email>mszeredi@redhat.com</email>
</author>
<published>2021-04-14T08:40:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7aa0d34a47c276dbb7f8571b6eae56cfc5b76958'/>
<id>urn:sha1:7aa0d34a47c276dbb7f8571b6eae56cfc5b76958</id>
<content type='text'>
[ Upstream commit 8217673d07256b22881127bf50dce874d0e51653 ]

For cloned connections cuse_channel_release() will be called more than
once, resulting in use after free.

Prevent device cloning for CUSE, which does not make sense at this point,
and highly unlikely to be used in real life.

Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>virtiofs: fix userns</title>
<updated>2021-05-19T08:29:32+00:00</updated>
<author>
<name>Miklos Szeredi</name>
<email>mszeredi@redhat.com</email>
</author>
<published>2021-04-14T08:40:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=067c57c94581799b8672d36ec6a14ec28585bfa9'/>
<id>urn:sha1:067c57c94581799b8672d36ec6a14ec28585bfa9</id>
<content type='text'>
[ Upstream commit 0a7419c68a45d2d066b996be5087aa2d07ce80eb ]

get_user_ns() is done twice (once in virtio_fs_get_tree() and once in
fuse_conn_init()), resulting in a reference leak.

Also looks better to use fsc-&gt;user_ns (which *should* be the
current_user_ns() at this point).

Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>fuse: invalidate attrs when page writeback completes</title>
<updated>2021-05-19T08:29:32+00:00</updated>
<author>
<name>Vivek Goyal</name>
<email>vgoyal@redhat.com</email>
</author>
<published>2021-04-06T14:07:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7e60b26884d6d767549edb1c141f63efd73102f1'/>
<id>urn:sha1:7e60b26884d6d767549edb1c141f63efd73102f1</id>
<content type='text'>
[ Upstream commit 3466958beb31a8e9d3a1441a34228ed088b84f3e ]

In fuse when a direct/write-through write happens we invalidate attrs
because that might have updated mtime/ctime on server and cached
mtime/ctime will be stale.

What about page writeback path.  Looks like we don't invalidate attrs
there.  To be consistent, invalidate attrs in writeback path as well.  Only
exception is when writeback_cache is enabled.  In that case we strust local
mtime/ctime and there is no need to invalidate attrs.

Recently users started experiencing failure of xfstests generic/080,
geneirc/215 and generic/614 on virtiofs.  This happened only newer "stat"
utility and not older one.  This patch fixes the issue.

So what's the root cause of the issue.  Here is detailed explanation.

generic/080 test does mmap write to a file, closes the file and then checks
if mtime has been updated or not.  When file is closed, it leads to
flushing of dirty pages (and that should update mtime/ctime on server).
But we did not explicitly invalidate attrs after writeback finished.  Still
generic/080 passed so far and reason being that we invalidated atime in
fuse_readpages_end().  This is called in fuse_readahead() path and always
seems to trigger before mmaped write.

So after mmaped write when lstat() is called, it sees that atleast one of
the fields being asked for is invalid (atime) and that results in
generating GETATTR to server and mtime/ctime also get updated and test
passes.

But newer /usr/bin/stat seems to have moved to using statx() syscall now
(instead of using lstat()).  And statx() allows it to query only ctime or
mtime (and not rest of the basic stat fields).  That means when querying
for mtime, fuse_update_get_attr() sees that mtime is not invalid (only
atime is invalid).  So it does not generate a new GETATTR and fill stat
with cached mtime/ctime.  And that means updated mtime is not seen by
xfstest and tests start failing.

Invalidating attrs after writeback completion should solve this problem in
a generic manner.

Signed-off-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>fuse: fix write deadlock</title>
<updated>2021-05-12T06:37:34+00:00</updated>
<author>
<name>Vivek Goyal</name>
<email>vgoyal@redhat.com</email>
</author>
<published>2020-10-21T20:12:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8649ea2b772146666963707ed012166f4fae44b6'/>
<id>urn:sha1:8649ea2b772146666963707ed012166f4fae44b6</id>
<content type='text'>
commit 4f06dd92b5d0a6f8eec6a34b8d6ef3e1f4ac1e10 upstream.

There are two modes for write(2) and friends in fuse:

a) write through (update page cache, send sync WRITE request to userspace)

b) buffered write (update page cache, async writeout later)

The write through method kept all the page cache pages locked that were
used for the request.  Keeping more than one page locked is deadlock prone
and Qian Cai demonstrated this with trinity fuzzing.

The reason for keeping the pages locked is that concurrent mapped reads
shouldn't try to pull possibly stale data into the page cache.

For full page writes, the easy way to fix this is to make the cached page
be the authoritative source by marking the page PG_uptodate immediately.
After this the page can be safely unlocked, since mapped/cached reads will
take the written data from the cache.

Concurrent mapped writes will now cause data in the original WRITE request
to be updated; this however doesn't cause any data inconsistency and this
scenario should be exceedingly rare anyway.

If the WRITE request returns with an error in the above case, currently the
page is not marked uptodate; this means that a concurrent read will always
read consistent data.  After this patch the page is uptodate between
writing to the cache and receiving the error: there's window where a cached
read will read the wrong data.  While theoretically this could be a
regression, it is unlikely to be one in practice, since this is normal for
buffered writes.

In case of a partial page write to an already uptodate page the locking is
also unnecessary, with the above caveats.

Partial write of a not uptodate page still needs to be handled.  One way
would be to read the complete page before doing the write.  This is not
possible, since it might break filesystems that don't expect any READ
requests when the file was opened O_WRONLY.

The other solution is to serialize the synchronous write with reads from
the partial pages.  The easiest way to do this is to keep the partial pages
locked.  The problem is that a write() may involve two such pages (one head
and one tail).  This patch fixes it by only locking the partial tail page.
If there's a partial head page as well, then split that off as a separate
WRITE request.

Reported-by: Qian Cai &lt;cai@lca.pw&gt;
Link: https://lore.kernel.org/linux-fsdevel/4794a3fa3742a5e84fb0f934944204b55730829b.camel@lca.pw/
Fixes: ea9b9907b82a ("fuse: implement perform_write")
Cc: &lt;stable@vger.kernel.org&gt; # v2.6.26
Signed-off-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>virtiofs: fix memory leak in virtio_fs_probe()</title>
<updated>2021-05-12T06:37:31+00:00</updated>
<author>
<name>Luis Henriques</name>
<email>lhenriques@suse.de</email>
</author>
<published>2021-03-17T08:44:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9b9d60c0eb8ada99cce2a9ab5c15dffc523b01ae'/>
<id>urn:sha1:9b9d60c0eb8ada99cce2a9ab5c15dffc523b01ae</id>
<content type='text'>
commit c79c5e0178922a9e092ec8fed026750f39dcaef4 upstream.

When accidentally passing twice the same tag to qemu, kmemleak ended up
reporting a memory leak in virtiofs.  Also, looking at the log I saw the
following error (that's when I realised the duplicated tag):

  virtiofs: probe of virtio5 failed with error -17

Here's the kmemleak log for reference:

unreferenced object 0xffff888103d47800 (size 1024):
  comm "systemd-udevd", pid 118, jiffies 4294893780 (age 18.340s)
  hex dump (first 32 bytes):
    00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .....N..........
    ff ff ff ff ff ff ff ff 80 90 02 a0 ff ff ff ff  ................
  backtrace:
    [&lt;000000000ebb87c1&gt;] virtio_fs_probe+0x171/0x7ae [virtiofs]
    [&lt;00000000f8aca419&gt;] virtio_dev_probe+0x15f/0x210
    [&lt;000000004d6baf3c&gt;] really_probe+0xea/0x430
    [&lt;00000000a6ceeac8&gt;] device_driver_attach+0xa8/0xb0
    [&lt;00000000196f47a7&gt;] __driver_attach+0x98/0x140
    [&lt;000000000b20601d&gt;] bus_for_each_dev+0x7b/0xc0
    [&lt;00000000399c7b7f&gt;] bus_add_driver+0x11b/0x1f0
    [&lt;0000000032b09ba7&gt;] driver_register+0x8f/0xe0
    [&lt;00000000cdd55998&gt;] 0xffffffffa002c013
    [&lt;000000000ea196a2&gt;] do_one_initcall+0x64/0x2e0
    [&lt;0000000008f727ce&gt;] do_init_module+0x5c/0x260
    [&lt;000000003cdedab6&gt;] __do_sys_finit_module+0xb5/0x120
    [&lt;00000000ad2f48c6&gt;] do_syscall_64+0x33/0x40
    [&lt;00000000809526b5&gt;] entry_SYSCALL_64_after_hwframe+0x44/0xae

Cc: stable@vger.kernel.org
Signed-off-by: Luis Henriques &lt;lhenriques@suse.de&gt;
Fixes: a62a8ef9d97d ("virtio-fs: add virtiofs filesystem")
Reviewed-by: Stefan Hajnoczi &lt;stefanha@redhat.com&gt;
Reviewed-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>virtiofs: Fail dax mount if device does not support it</title>
<updated>2021-04-07T13:02:23+00:00</updated>
<author>
<name>Vivek Goyal</name>
<email>vgoyal@redhat.com</email>
</author>
<published>2021-02-09T22:47:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f2cc0312f7affcb174874ea50c9d6540c2926b5b'/>
<id>urn:sha1:f2cc0312f7affcb174874ea50c9d6540c2926b5b</id>
<content type='text'>
[ Upstream commit 3f9b9efd82a84f27e95d0414f852caf1fa839e83 ]

Right now "mount -t virtiofs -o dax myfs /mnt/virtiofs" succeeds even
if filesystem deivce does not have a cache window and hence DAX can't
be supported.

This gives a false sense to user that they are using DAX with virtiofs
but fact of the matter is that they are not.

Fix this by returning error if dax can't be supported and user has asked
for it.

Signed-off-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Reviewed-by: Stefan Hajnoczi &lt;stefanha@redhat.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>fuse: fix live lock in fuse_iget()</title>
<updated>2021-03-20T09:51:14+00:00</updated>
<author>
<name>Amir Goldstein</name>
<email>amir73il@gmail.com</email>
</author>
<published>2021-03-04T09:09:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5676df54d7d44f497b8dbf7bff04f2f1b165da93'/>
<id>urn:sha1:5676df54d7d44f497b8dbf7bff04f2f1b165da93</id>
<content type='text'>
commit 775c5033a0d164622d9d10dd0f0a5531639ed3ed upstream.

Commit 5d069dbe8aaf ("fuse: fix bad inode") replaced make_bad_inode()
in fuse_iget() with a private implementation fuse_make_bad().

The private implementation fails to remove the bad inode from inode
cache, so the retry loop with iget5_locked() finds the same bad inode
and marks it bad forever.

kmsg snip:

[ ] rcu: INFO: rcu_sched self-detected stall on CPU
...
[ ]  ? bit_wait_io+0x50/0x50
[ ]  ? fuse_init_file_inode+0x70/0x70
[ ]  ? find_inode.isra.32+0x60/0xb0
[ ]  ? fuse_init_file_inode+0x70/0x70
[ ]  ilookup5_nowait+0x65/0x90
[ ]  ? fuse_init_file_inode+0x70/0x70
[ ]  ilookup5.part.36+0x2e/0x80
[ ]  ? fuse_init_file_inode+0x70/0x70
[ ]  ? fuse_inode_eq+0x20/0x20
[ ]  iget5_locked+0x21/0x80
[ ]  ? fuse_inode_eq+0x20/0x20
[ ]  fuse_iget+0x96/0x1b0

Fixes: 5d069dbe8aaf ("fuse: fix bad inode")
Cc: stable@vger.kernel.org # 5.10+
Signed-off-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>fuse: fix bad inode</title>
<updated>2020-12-10T14:33:14+00:00</updated>
<author>
<name>Miklos Szeredi</name>
<email>mszeredi@redhat.com</email>
</author>
<published>2020-12-10T14:33:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5d069dbe8aaf2a197142558b6fb2978189ba3454'/>
<id>urn:sha1:5d069dbe8aaf2a197142558b6fb2978189ba3454</id>
<content type='text'>
Jan Kara's analysis of the syzbot report (edited):

  The reproducer opens a directory on FUSE filesystem, it then attaches
  dnotify mark to the open directory.  After that a fuse_do_getattr() call
  finds that attributes returned by the server are inconsistent, and calls
  make_bad_inode() which, among other things does:

          inode-&gt;i_mode = S_IFREG;

  This then confuses dnotify which doesn't tear down its structures
  properly and eventually crashes.

Avoid calling make_bad_inode() on a live inode: switch to a private flag on
the fuse inode.  Also add the test to ops which the bad_inode_ops would
have caught.

This bug goes back to the initial merge of fuse in 2.6.14...

Reported-by: syzbot+f427adf9324b92652ccc@syzkaller.appspotmail.com
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Tested-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: &lt;stable@vger.kernel.org&gt;
</content>
</entry>
<entry>
<title>fuse: support SB_NOSEC flag to improve write performance</title>
<updated>2020-11-11T16:22:33+00:00</updated>
<author>
<name>Vivek Goyal</name>
<email>vgoyal@redhat.com</email>
</author>
<published>2020-10-09T18:15:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9d769e6aa2524e1762e3b8681e0ed78f8acf6cad'/>
<id>urn:sha1:9d769e6aa2524e1762e3b8681e0ed78f8acf6cad</id>
<content type='text'>
Virtiofs can be slow with small writes if xattr are enabled and we are
doing cached writes (No direct I/O).  Ganesh Mahalingam noticed this.

Some debugging showed that file_remove_privs() is called in cached write
path on every write.  And everytime it calls security_inode_need_killpriv()
which results in call to __vfs_getxattr(XATTR_NAME_CAPS).  And this goes to
file server to fetch xattr.  This extra round trip for every write slows
down writes tremendously.

Normally to avoid paying this penalty on every write, vfs has the notion of
caching this information in inode (S_NOSEC).  So vfs sets S_NOSEC, if
filesystem opted for it using super block flag SB_NOSEC.  And S_NOSEC is
cleared when setuid/setgid bit is set or when security xattr is set on
inode so that next time a write happens, we check inode again for clearing
setuid/setgid bits as well clear any security.capability xattr.

This seems to work well for local file systems but for remote file systems
it is possible that VFS does not have full picture and a different client
sets setuid/setgid bit or security.capability xattr on file and that means
VFS information about S_NOSEC on another client will be stale.  So for
remote filesystems SB_NOSEC was disabled by default.

Commit 9e1f1de02c22 ("more conservative S_NOSEC handling") mentioned that
these filesystems can still make use of SB_NOSEC as long as they clear
S_NOSEC when they are refreshing inode attriutes from server.

So this patch tries to enable SB_NOSEC on fuse (regular fuse as well as
virtiofs).  And clear SB_NOSEC when we are refreshing inode attributes.

This is enabled only if server supports FUSE_HANDLE_KILLPRIV_V2.  This says
that server will clear setuid/setgid/security.capability on
chown/truncate/write as apporpriate.

This should provide tighter coherency because now suid/sgid/
security.capability will be cleared even if fuse client cache has not seen
these attrs.

Basic idea is that fuse client will trigger suid/sgid/security.capability
clearing based on its attr cache.  But even if cache has gone stale, it is
fine because FUSE_HANDLE_KILLPRIV_V2 will make sure WRITE clear
suid/sgid/security.capability.

We make this change only if server supports FUSE_HANDLE_KILLPRIV_V2.  This
should make sure that existing filesystems which might be relying on
seucurity.capability always being queried from server are not impacted.

This tighter coherency relies on WRITE showing up on server (and not being
cached in guest).  So writeback_cache mode will not provide that tight
coherency and it is not recommended to use two together.  Having said that
it might work reasonably well for lot of use cases.

This change improves random write performance very significantly.  Running
virtiofsd with cache=auto and following fio command:

fio --ioengine=libaio --direct=1  --name=test --filename=/mnt/virtiofs/random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randwrite

Bandwidth increases from around 50MB/s to around 250MB/s as a result of
applying this patch.  So improvement is very significant.

Link: https://github.com/kata-containers/runtime/issues/2815
Reported-by: "Mahalingam, Ganesh" &lt;ganesh.mahalingam@intel.com&gt;
Signed-off-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
</entry>
<entry>
<title>fuse: add a flag FUSE_OPEN_KILL_SUIDGID for open() request</title>
<updated>2020-11-11T16:22:33+00:00</updated>
<author>
<name>Vivek Goyal</name>
<email>vgoyal@redhat.com</email>
</author>
<published>2020-10-09T18:15:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=643a666a89c358ef588d2b3ef9f2dc1efc421e61'/>
<id>urn:sha1:643a666a89c358ef588d2b3ef9f2dc1efc421e61</id>
<content type='text'>
With FUSE_HANDLE_KILLPRIV_V2 support, server will need to kill suid/sgid/
security.capability on open(O_TRUNC), if server supports
FUSE_ATOMIC_O_TRUNC.

But server needs to kill suid/sgid only if caller does not have CAP_FSETID.
Given server does not have this information, client needs to send this info
to server.

So add a flag FUSE_OPEN_KILL_SUIDGID to fuse_open_in request which tells
server to kill suid/sgid (only if group execute is set).

This flag is added to the FUSE_OPEN request, as well as the FUSE_CREATE
request if the create was non-exclusive, since that might result in an
existing file being opened/truncated.

Signed-off-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
</entry>
</feed>
