kernel/linux.git/fs/fuse/file.c, branch v6.18.21

writeback: don't block sync for filesystems with no data integrity guarantees

2026-04-02T11:23:20+00:00

commit 76f9377cd2ab7a9220c25d33940d9ca20d368172 upstream. Add a SB_I_NO_DATA_INTEGRITY superblock flag for filesystems that cannot guarantee data persistence on sync (eg fuse). For superblocks with this flag set, sync kicks off writeback of dirty inodes but does not wait for the flusher threads to complete the writeback. This replaces the per-inode AS_NO_DATA_INTEGRITY mapping flag added in commit f9a49aa302a0 ("fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()"). The flag belongs at the superblock level because data integrity is a filesystem-wide property, not a per-inode one. Having this flag at the superblock level also allows us to skip having to iterate every dirty inode in wait_sb_inodes() only to skip each inode individually. Prior to this commit, mappings with no data integrity guarantees skipped waiting on writeback completion but still waited on the flusher threads to finish initiating the writeback. Waiting on the flusher threads is unnecessary. This commit kicks off writeback but does not wait on the flusher threads. This change properly addresses a recent report [1] for a suspend-to-RAM hang seen on fuse-overlayfs that was caused by waiting on the flusher threads to finish: Workqueue: pm_fs_sync pm_fs_sync_work_fn Call Trace: __schedule+0x457/0x1720 schedule+0x27/0xd0 wb_wait_for_completion+0x97/0xe0 sync_inodes_sb+0xf8/0x2e0 __iterate_supers+0xdc/0x160 ksys_sync+0x43/0xb0 pm_fs_sync_work_fn+0x17/0xa0 process_one_work+0x193/0x350 worker_thread+0x1a1/0x310 kthread+0xfc/0x240 ret_from_fork+0x243/0x280 ret_from_fork_asm+0x1a/0x30 On fuse this is problematic because there are paths that may cause the flusher thread to block (eg if systemd freezes the user session cgroups first, which freezes the fuse daemon, before invoking the kernel suspend. The kernel suspend triggers ->write_node() which on fuse issues a synchronous setattr request, which cannot be processed since the daemon is frozen. Or if the daemon is buggy and cannot properly complete writeback, initiating writeback on a dirty folio already under writeback leads to writeback_get_folio() -> folio_prepare_writeback() -> unconditional wait on writeback to finish, which will cause a hang). This commit restores fuse to its prior behavior before tmp folios were removed, where sync was essentially a no-op. [1] https://lore.kernel.org/linux-fsdevel/CAJnrk1a-asuvfrbKXbEwwDSctvemF+6zfhdnuzO65Pt8HsFSRw@mail.gmail.com/T/#m632c4648e9cafc4239299887109ebd880ac6c5c1 Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") Reported-by: John Cc: stable@vger.kernel.org Signed-off-by: Joanne Koong Link: https://patch.msgid.link/20260320005145.2483161-2-joannelkoong@gmail.com Reviewed-by: Jan Kara Reviewed-by: David Hildenbrand (Arm) Signed-off-by: Christian Brauner Signed-off-by: Greg Kroah-Hartman

fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()

2026-01-30T09:32:15+00:00

commit f9a49aa302a05e91ca01f69031cb79a0ea33031f upstream. Above the while() loop in wait_sb_inodes(), we document that we must wait for all pages under writeback for data integrity. Consequently, if a mapping, like fuse, traditionally does not have data integrity semantics, there is no need to wait at all; we can simply skip these inodes. This restores fuse back to prior behavior where syncs are no-ops. This fixes a user regression where if a system is running a faulty fuse server that does not reply to issued write requests, this causes wait_sb_inodes() to wait forever. Link: https://lkml.kernel.org/r/20260105211737.4105620-2-joannelkoong@gmail.com Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") Signed-off-by: Joanne Koong Reported-by: Athul Krishna Reported-by: J. Neuschäfer Reviewed-by: Bernd Schubert Tested-by: J. Neuschäfer Cc: Alexander Viro Cc: Bernd Schubert Cc: Bonaccorso Salvatore Cc: Christian Brauner Cc: David Hildenbrand Cc: Jan Kara Cc: "Liam R. Howlett" Cc: Lorenzo Stoakes Cc: "Matthew Wilcox (Oracle)" Cc: Michal Hocko Cc: Mike Rapoport Cc: Miklos Szeredi Cc: Suren Baghdasaryan Cc: Vlastimil Babka Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman

fuse: fix readahead reclaim deadlock

2026-01-02T11:57:31+00:00

commit bd5603eaae0aabf527bfb3ce1bb07e979ce5bd50 upstream. Commit e26ee4efbc79 ("fuse: allocate ff->release_args only if release is needed") skips allocating ff->release_args if the server does not implement open. However in doing so, fuse_prepare_release() now skips grabbing the reference on the inode, which makes it possible for an inode to be evicted from the dcache while there are inflight readahead requests. This causes a deadlock if the server triggers reclaim while servicing the readahead request and reclaim attempts to evict the inode of the file being read ahead. Since the folio is locked during readahead, when reclaim evicts the fuse inode and fuse_evict_inode() attempts to remove all folios associated with the inode from the page cache (truncate_inode_pages_range()), reclaim will block forever waiting for the lock since readahead cannot relinquish the lock because it is itself blocked in reclaim: >>> stack_trace(1504735) folio_wait_bit_common (mm/filemap.c:1308:4) folio_lock (./include/linux/pagemap.h:1052:3) truncate_inode_pages_range (mm/truncate.c:336:10) fuse_evict_inode (fs/fuse/inode.c:161:2) evict (fs/inode.c:704:3) dentry_unlink_inode (fs/dcache.c:412:3) __dentry_kill (fs/dcache.c:615:3) shrink_kill (fs/dcache.c:1060:12) shrink_dentry_list (fs/dcache.c:1087:3) prune_dcache_sb (fs/dcache.c:1168:2) super_cache_scan (fs/super.c:221:10) do_shrink_slab (mm/shrinker.c:435:9) shrink_slab (mm/shrinker.c:626:10) shrink_node (mm/vmscan.c:5951:2) shrink_zones (mm/vmscan.c:6195:3) do_try_to_free_pages (mm/vmscan.c:6257:3) do_swap_page (mm/memory.c:4136:11) handle_pte_fault (mm/memory.c:5562:10) handle_mm_fault (mm/memory.c:5870:9) do_user_addr_fault (arch/x86/mm/fault.c:1338:10) handle_page_fault (arch/x86/mm/fault.c:1481:3) exc_page_fault (arch/x86/mm/fault.c:1539:2) asm_exc_page_fault+0x22/0x27 Fix this deadlock by allocating ff->release_args and grabbing the reference on the inode when preparing the file for release even if the server does not implement open. The inode reference will be dropped when the last reference on the fuse file is dropped (see fuse_file_put() -> fuse_release_end()). Fixes: e26ee4efbc79 ("fuse: allocate ff->release_args only if release is needed") Cc: stable@vger.kernel.org Signed-off-by: Joanne Koong Reported-by: Omar Sandoval Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman

fuse: Invalidate the page cache after FOPEN_DIRECT_IO write

2026-01-02T11:57:01+00:00

[ Upstream commit b359af8275a982a458e8df6c6beab1415be1f795 ] generic_file_direct_write() also does this and has a large comment about. Reproducer here is xfstest's generic/209, which is exactly to have competing DIO write and cached IO read. Signed-off-by: Bernd Schubert Signed-off-by: Miklos Szeredi Signed-off-by: Sasha Levin

fuse: Always flush the page cache before FOPEN_DIRECT_IO write

2026-01-02T11:57:01+00:00

[ Upstream commit 1ce120dcefc056ce8af2486cebbb77a458aad4c3 ] This was done as condition on direct_io_allow_mmap, but I believe this is not right, as a file might be open two times - once with write-back enabled another time with FOPEN_DIRECT_IO. Signed-off-by: Bernd Schubert Signed-off-by: Miklos Szeredi Signed-off-by: Sasha Levin

fuse: fix livelock in synchronous file put from fuseblk workers

2025-09-23T09:32:17+00:00

I observed a hang when running generic/323 against a fuseblk server. This test opens a file, initiates a lot of AIO writes to that file descriptor, and closes the file descriptor before the writes complete. Unsurprisingly, the AIO exerciser threads are mostly stuck waiting for responses from the fuseblk server: # cat /proc/372265/task/372313/stack [<0>] request_wait_answer+0x1fe/0x2a0 [fuse] [<0>] __fuse_simple_request+0xd3/0x2b0 [fuse] [<0>] fuse_do_getattr+0xfc/0x1f0 [fuse] [<0>] fuse_file_read_iter+0xbe/0x1c0 [fuse] [<0>] aio_read+0x130/0x1e0 [<0>] io_submit_one+0x542/0x860 [<0>] __x64_sys_io_submit+0x98/0x1a0 [<0>] do_syscall_64+0x37/0xf0 [<0>] entry_SYSCALL_64_after_hwframe+0x4b/0x53 But the /weird/ part is that the fuseblk server threads are waiting for responses from itself: # cat /proc/372210/task/372232/stack [<0>] request_wait_answer+0x1fe/0x2a0 [fuse] [<0>] __fuse_simple_request+0xd3/0x2b0 [fuse] [<0>] fuse_file_put+0x9a/0xd0 [fuse] [<0>] fuse_release+0x36/0x50 [fuse] [<0>] __fput+0xec/0x2b0 [<0>] task_work_run+0x55/0x90 [<0>] syscall_exit_to_user_mode+0xe9/0x100 [<0>] do_syscall_64+0x43/0xf0 [<0>] entry_SYSCALL_64_after_hwframe+0x4b/0x53 The fuseblk server is fuse2fs so there's nothing all that exciting in the server itself. So why is the fuse server calling fuse_file_put? The commit message for the fstest sheds some light on that: "By closing the file descriptor before calling io_destroy, you pretty much guarantee that the last put on the ioctx will be done in interrupt context (during I/O completion). Aha. AIO fgets a new struct file from the fd when it queues the ioctx. The completion of the FUSE_WRITE command from userspace causes the fuse server to call the AIO completion function. The completion puts the struct file, queuing a delayed fput to the fuse server task. When the fuse server task returns to userspace, it has to run the delayed fput, which in the case of a fuseblk server, it does synchronously. Sending the FUSE_RELEASE command sychronously from fuse server threads is a bad idea because a client program can initiate enough simultaneous AIOs such that all the fuse server threads end up in delayed_fput, and now there aren't any threads left to handle the queued fuse commands. Fix this by only using asynchronous fputs when closing files, and leave a comment explaining why. Cc: stable@vger.kernel.org # v2.6.38 Fixes: 5a18ec176c934c ("fuse: fix hang of single threaded fuseblk filesystem") Signed-off-by: Darrick J. Wong Signed-off-by: Miklos Szeredi

fuse: remove fuse_readpages_end() null mapping check

2025-09-02T09:14:15+00:00

Remove extra logic in fuse_readpages_end() that checks against null folio mappings. This was added in commit ce534fb05292 ("fuse: allow splice to move pages"): "Since the remove_from_page_cache() + add_to_page_cache_locked() are non-atomic it is possible that the page cache is repopulated in between the two and add_to_page_cache_locked() will fail. This could be fixed by creating a new atomic replace_page_cache_page() function. fuse_readpages_end() needed to be reworked so it works even if page->mapping is NULL for some or all pages which can happen if the add_to_page_cache_locked() failed." Commit ef6a3c63112e ("mm: add replace_page_cache_page() function") added atomic page cache replacement, which means the check against null mappings can be removed. Signed-off-by: Joanne Koong Signed-off-by: Miklos Szeredi

fuse: use default writeback accounting

2025-08-27T12:29:43+00:00

commit 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") removed temp folios for dirty page writeback. Consequently, fuse can now use the default writeback accounting. With switching fuse to use default writeback accounting, there are some added benefits. This updates wb->writeback_inodes tracking as well now and updates writeback throughput estimates after writeback completion. This commit also removes inc_wb_stat() and dec_wb_stat(). These have no callers anymore now that fuse does not call them. Signed-off-by: Joanne Koong Reviewed-by: David Hildenbrand Reviewed-by: Bernd Schubert Signed-off-by: Miklos Szeredi

fuse: remove unneeded offset assignment when filling write pages

2025-08-27T12:29:43+00:00

With the change in aee03ea7ff98 ("fuse: support large folios for writethrough writes"), this old line for setting ap->descs[0].offset is now obsolete and unneeded. This should have been removed as part of aee03ea7ff98. Signed-off-by: Joanne Koong Fixes: aee03ea7ff98 ("fuse: support large folios for writethrough writes") Signed-off-by: Miklos Szeredi

fuse: add COPY_FILE_RANGE_64 that allows large copies

2025-08-27T12:29:43+00:00

The FUSE protocol uses struct fuse_write_out to convey the return value of copy_file_range, which is restricted to uint32_t. But the COPY_FILE_RANGE interface supports a 64-bit size copies and there's no reason why copies should be limited to 32-bit. Introduce a new op COPY_FILE_RANGE_64, which is identical, except the number of bytes copied is returned in a 64-bit value. If the fuse server does not support COPY_FILE_RANGE_64, fall back to COPY_FILE_RANGE. Reported-by: Florian Weimer Closes: https://lore.kernel.org/all/lhuh5ynl8z5.fsf@oldenburg.str.redhat.com/ Signed-off-by: Miklos Szeredi