<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/netfs/write_issue.c, branch v6.12.80</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-08-28T14:31:04+00:00</updated>
<entry>
<title>netfs: Fix unbuffered write error handling</title>
<updated>2025-08-28T14:31:04+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2025-08-22T03:08:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f08c80af3c9a9849cd178b4843b7c01d103506a1'/>
<id>urn:sha1:f08c80af3c9a9849cd178b4843b7c01d103506a1</id>
<content type='text'>
[ Upstream commit a3de58b12ce074ec05b8741fa28d62ccb1070468 ]

If all the subrequests in an unbuffered write stream fail, the subrequest
collector doesn't update the stream-&gt;transferred value and it retains its
initial LONG_MAX value.  Unfortunately, if all active streams fail, then we
take the smallest value of { LONG_MAX, LONG_MAX, ... } as the value to set
in wreq-&gt;transferred - which is then returned from -&gt;write_iter().

LONG_MAX was chosen as the initial value so that all the streams can be
quickly assessed by taking the smallest value of all stream-&gt;transferred -
but this only works if we've set any of them.

Fix this by adding a flag to indicate whether the value in
stream-&gt;transferred is valid and checking that when we integrate the
values.  stream-&gt;transferred can then be initialised to zero.

This was found by running the generic/750 xfstest against cifs with
cache=none.  It splices data to the target file.  Once (if) it has used up
all the available scratch space, the writes start failing with ENOSPC.
This causes -&gt;write_iter() to fail.  However, it was returning
wreq-&gt;transferred, i.e. LONG_MAX, rather than an error (because it thought
the amount transferred was non-zero) and iter_file_splice_write() would
then try to clean up that amount of pipe bufferage - leading to an oops
when it overran.  The kernel log showed:

    CIFS: VFS: Send error in write = -28

followed by:

    BUG: kernel NULL pointer dereference, address: 0000000000000008

with:

    RIP: 0010:iter_file_splice_write+0x3a4/0x520
    do_splice+0x197/0x4e0

or:

    RIP: 0010:pipe_buf_release (include/linux/pipe_fs_i.h:282)
    iter_file_splice_write (fs/splice.c:755)

Also put a warning check into splice to announce if -&gt;write_iter() returned
that it had written more than it was asked to.

Fixes: 288ace2f57c9 ("netfs: New writeback implementation")
Reported-by: Xiaoli Feng &lt;fengxiaoli0714@gmail.com&gt;
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220445
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Link: https://lore.kernel.org/915443.1755207950@warthog.procyon.org.uk
cc: Paulo Alcantara &lt;pc@manguebit.org&gt;
cc: Steve French &lt;sfrench@samba.org&gt;
cc: Shyam Prasad N &lt;sprasad@microsoft.com&gt;
cc: netfs@lists.linux.dev
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
[ Dropped read_collect.c hunk ]
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>netfs: Fix missing wakeup after issuing writes</title>
<updated>2024-10-02T14:56:15+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2024-10-02T14:45:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1ca4169c391c370e0f3a92938df2862900575096'/>
<id>urn:sha1:1ca4169c391c370e0f3a92938df2862900575096</id>
<content type='text'>
After dividing up a proposed write into subrequests, netfslib sets
NETFS_RREQ_ALL_QUEUED to indicate to the collector that it can move on to
the final cleanup once it has emptied the subrequest queues.

Now, whilst the collector will normally end up running at least once after
this bit is set just because it takes a while to process all the write
subrequests before the collector runs out of subrequests, there exists the
possibility that the issuing thread will be forced to sleep and the
collector thread will clean up all the subrequests before ALL_QUEUED gets
set.

In such a case, the collector thread will not get triggered again and will
never clear NETFS_RREQ_IN_PROGRESS thus leaving a request uncompleted and
causing a potential futute hang.

Fix this by scheduling the write collector if all the subrequest queues are
empty (and thus no writes pending issuance).

Note that we'd do this ideally before queuing the subrequest, but in the
case of buffered writeback, at least, we can't find out that we've run out
of folios until after we've called writeback_iter() and it has returned
NULL - at which point we might not actually have any subrequests still
under construction.

Fixes: 288ace2f57c9 ("netfs: New writeback implementation")
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Link: https://lore.kernel.org/r/3317784.1727880350@warthog.procyon.org.uk
cc: Jeff Layton &lt;jlayton@kernel.org&gt;
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'vfs-6.12-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs</title>
<updated>2024-09-30T17:59:44+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-09-30T17:59:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a5f24c795513ff098dc8e350e5733aec8796fbf8'/>
<id>urn:sha1:a5f24c795513ff098dc8e350e5733aec8796fbf8</id>
<content type='text'>
Pull vfs fixes from Christian Brauner:
 "afs:

   - Fix setting of the server responding flag

   - Remove unused struct afs_address_list and afs_put_address_list()
     function

   - Fix infinite loop because of unresponsive servers

   - Ensure that afs_retry_request() function is correctly added to the
     afs_req_ops netfs operations table

  netfs:

   - Fix netfs_folio tracepoint handling to handle NULL mappings

   - Add a missing folio_queue API documentation

   - Ensure that netfs_write_folio() correctly advances the iterator via
     iov_iter_advance()

   - Fix a dentry leak during concurrent cull and cookie lookup
     operations in cachefiles

  pidfs:

   - Correctly handle accessing another task's pid namespace"

* tag 'vfs-6.12-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  netfs: Fix the netfs_folio tracepoint to handle NULL mapping
  netfs: Add folio_queue API documentation
  netfs: Advance iterator correctly rather than jumping it
  afs: Fix the setting of the server responding flag
  afs: Remove unused struct and function prototype
  afs: Fix possible infinite loop with unresponsive servers
  pidfs: check for valid pid namespace
  afs: Fix missing wire-up of afs_retry_request()
  cachefiles: fix dentry leak in cachefiles_open_file()
</content>
</entry>
<entry>
<title>netfs: Advance iterator correctly rather than jumping it</title>
<updated>2024-09-27T16:29:20+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2024-09-27T08:08:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9fffa4e9b3b158f63334e603e610da7d529a0f9a'/>
<id>urn:sha1:9fffa4e9b3b158f63334e603e610da7d529a0f9a</id>
<content type='text'>
In netfs_write_folio(), use iov_iter_advance() to advance the folio as we
split bits of it off to subrequests rather than manually jumping the
-&gt;iov_offset value around.  This becomes more problematic when we use a
bounce buffer made out of single-page folios to cover a multipage pagecache
folio.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Link: https://lore.kernel.org/r/2238548.1727424522@warthog.procyon.org.uk
cc: Jeff Layton &lt;jlayton@kernel.org&gt;
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>netfs: Fix write oops in generic/346 (9p) and generic/074 (cifs)</title>
<updated>2024-09-26T22:45:20+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2024-09-26T13:58:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=df9b455633aee0bad3e5c3dc9fc1c860b13c96d2'/>
<id>urn:sha1:df9b455633aee0bad3e5c3dc9fc1c860b13c96d2</id>
<content type='text'>
In netfslib, a buffered writeback operation has a 'write queue' of folios
that are being written, held in a linear sequence of folio_queue structs.
The 'issuer' adds new folio_queues on the leading edge of the queue and
populates each one progressively; the 'collector' pops them off the
trailing edge and discards them and the folios they point to as they are
consumed.

The queue is required to always retain at least one folio_queue structure.
This allows the queue to be accessed without locking and with just a bit of
barriering.

When a new subrequest is prepared, its -&gt;io_iter iterator is pointed at the
current end of the write queue and then the iterator is extended as more
data is added to the queue until the subrequest is committed.

Now, the problem is that the folio_queue at the leading edge of the write
queue when a subrequest is prepared might have been entirely consumed - but
not yet removed from the queue as it is the only remaining one and is
preventing the queue from collapsing.

So, what happens is that subreq-&gt;io_iter is pointed at the spent
folio_queue, then a new folio_queue is added, and, at that point, the
collector is at entirely at liberty to immediately delete the spent
folio_queue.

This leaves the subreq-&gt;io_iter pointing at a freed object.  If the system
is lucky, iterate_folioq() sees -&gt;io_iter, sees the as-yet uncorrupted
freed object and advances to the next folio_queue in the queue.

In the case seen, however, the freed object gets recycled and put back onto
the queue at the tail and filled to the end.  This confuses
iterate_folioq() and it tries to step -&gt;next, which may be NULL - resulting
in an oops.

Fix this by the following means:

 (1) When preparing a write subrequest, make sure there's a folio_queue
     struct with space in it at the leading edge of the queue.  A function
     to make space is split out of the function to append a folio so that
     it can be called for this purpose.

 (2) If the request struct iterator is pointing to a completely spent
     folio_queue when we make space, then advance the iterator to the newly
     allocated folio_queue.  The subrequest's iterator will then be set
     from this.

The oops could be triggered using the generic/346 xfstest with a filesystem
on9P over TCP with cache=loose.  The oops looked something like:

 BUG: kernel NULL pointer dereference, address: 0000000000000008
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 ...
 RIP: 0010:_copy_from_iter+0x2db/0x530
 ...
 Call Trace:
  &lt;TASK&gt;
 ...
  p9pdu_vwritef+0x3d8/0x5d0
  p9_client_prepare_req+0xa8/0x140
  p9_client_rpc+0x81/0x280
  p9_client_write+0xcf/0x1c0
  v9fs_issue_write+0x87/0xc0
  netfs_advance_write+0xa0/0xb0
  netfs_write_folio.isra.0+0x42d/0x500
  netfs_writepages+0x15a/0x1f0
  do_writepages+0xd1/0x220
  filemap_fdatawrite_wbc+0x5c/0x80
  v9fs_mmap_vm_close+0x7d/0xb0
  remove_vma+0x35/0x70
  vms_complete_munmap_vmas+0x11a/0x170
  do_vmi_align_munmap+0x17d/0x1c0
  do_vmi_munmap+0x13e/0x150
  __vm_munmap+0x92/0xd0
  __x64_sys_munmap+0x17/0x20
  do_syscall_64+0x80/0xe0
  entry_SYSCALL_64_after_hwframe+0x71/0x79

This also fixed a similar-looking issue with cifs and generic/074.

Fixes: cd0277ed0c18 ("netfs: Use new folio_queue data type and iterator instead of xarray iter")
Reported-by: kernel test robot &lt;oliver.sang@intel.com&gt;
Closes: https://lore.kernel.org/oe-lkp/202409180928.f20b5a08-oliver.sang@intel.com
Closes: https://lore.kernel.org/oe-lkp/202409131438.3f225fbf-oliver.sang@intel.com
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Tested-by: kernel test robot &lt;oliver.sang@intel.com&gt;
cc: Eric Van Hensbergen &lt;ericvh@kernel.org&gt;
cc: Latchesar Ionkov &lt;lucho@ionkov.net&gt;
cc: Dominique Martinet &lt;asmadeus@codewreck.org&gt;
cc: Christian Schoenebeck &lt;linux_oss@crudebyte.com&gt;
cc: Paulo Alcantara &lt;pc@manguebit.com&gt;
cc: Jeff Layton &lt;jlayton@kernel.org&gt;
cc: v9fs@lists.linux.dev
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French &lt;stfrench@microsoft.com&gt;
</content>
</entry>
<entry>
<title>netfs: Cancel dirty folios that have no storage destination</title>
<updated>2024-09-12T10:20:41+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2024-07-29T11:23:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8f246b7c0a1be0882374f2ff831a61f0dbe77678'/>
<id>urn:sha1:8f246b7c0a1be0882374f2ff831a61f0dbe77678</id>
<content type='text'>
Kafs wants to be able to cache the contents of directories (and symlinks),
but whilst these are downloaded from the server with the FS.FetchData RPC
op and similar, the same as for regular files, they can't be updated by
FS.StoreData, but rather have special operations (FS.MakeDir, etc.).

Now, rather than redownloading a directory's content after each change made
to that directory, kafs modifies the local blob.  This blob can be saved
out to the cache, and since it's using netfslib, kafs just marks the folios
dirty and lets -&gt;writepages() on the directory take care of it, as for an
regular file.

This is fine as long as there's a cache as although the upload stream is
disabled, there's a cache stream to drive the procedure.  But if the cache
goes away in the meantime, suddenly there's no way do any writes and the
code gets confused, complains "R=%x: No submit" to dmesg and leaves the
dirty folio hanging.

Fix this by just cancelling the store of the folio if neither stream is
active.  (If there's no cache at the time of dirtying, we should just not
mark the folio dirty).

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
cc: Jeff Layton &lt;jlayton@kernel.org&gt;
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-23-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>cachefiles, netfs: Fix write to partial block at EOF</title>
<updated>2024-09-12T10:20:41+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2024-07-12T11:44:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c4f1450ecccc5311db87f806998eda1c824c4e35'/>
<id>urn:sha1:c4f1450ecccc5311db87f806998eda1c824c4e35</id>
<content type='text'>
Because it uses DIO writes, cachefiles is unable to make a write to the
backing file if that write is not aligned to and sized according to the
backing file's DIO block alignment.  This makes it tricky to handle a write
to the cache where the EOF on the network file is not correctly aligned.

To get around this, netfslib attempts to tell the driver it is calling how
much more data there is available beyond the EOF that it can use to pad the
write (netfslib preclears the part of the folio above the EOF).  However,
it tries to tell the cache what the maximum length is, but doesn't
calculate this correctly; and, in any case, cachefiles actually ignores the
value and just skips the block.

Fix this by:

 (1) Change the value passed to indicate the amount of extra data that can
     be added to the operation (now -&gt;submit_extendable_to).  This is much
     simpler to calculate as it's just the end of the folio minus the top
     of the data within the folio - rather than having to account for data
     spread over multiple folios.

 (2) Make cachefiles add some of this data if the subrequest it is given
     ends at the network file's i_size if the extra data is sufficient to
     pad out to a whole block.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
cc: Jeff Layton &lt;jlayton@kernel.org&gt;
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-22-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>netfs: Speed up buffered reading</title>
<updated>2024-09-12T10:20:41+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2024-07-01T23:40:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ee4cdf7ba857a894ad1650d6ab77669cbbfa329e'/>
<id>urn:sha1:ee4cdf7ba857a894ad1650d6ab77669cbbfa329e</id>
<content type='text'>
Improve the efficiency of buffered reads in a number of ways:

 (1) Overhaul the algorithm in general so that it's a lot more compact and
     split the read submission code between buffered and unbuffered
     versions.  The unbuffered version can be vastly simplified.

 (2) Read-result collection is handed off to a work queue rather than being
     done in the I/O thread.  Multiple subrequests can be processes
     simultaneously.

 (3) When a subrequest is collected, any folios it fully spans are
     collected and "spare" data on either side is donated to either the
     previous or the next subrequest in the sequence.

Notes:

 (*) Readahead expansion is massively slows down fio, presumably because it
     causes a load of extra allocations, both folio and xarray, up front
     before RPC requests can be transmitted.

 (*) RDMA with cifs does appear to work, both with SIW and RXE.

 (*) PG_private_2-based reading and copy-to-cache is split out into its own
     file and altered to use folio_queue.  Note that the copy to the cache
     now creates a new write transaction against the cache and adds the
     folios to be copied into it.  This allows it to use part of the
     writeback I/O code.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
cc: Jeff Layton &lt;jlayton@kernel.org&gt;
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-20-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>netfs: Simplify the writeback code</title>
<updated>2024-09-12T10:20:40+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2024-06-06T06:48:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=983cdcf8fe141b0ce16bc71959a5dc55bcb0764d'/>
<id>urn:sha1:983cdcf8fe141b0ce16bc71959a5dc55bcb0764d</id>
<content type='text'>
Use the new folio_queue structures to simplify the writeback code.  The
problem with referring to the i_pages xarray directly is that we may have
gaps in the sequence of folios we're writing from that we need to skip when
we're removing the writeback mark from the folios we're writing back from.

At the moment the code tries to deal with this by carefully tracking the
gaps in each writeback stream (eg. write to server and write to cache) and
divining when there's a gap that spans folios (something that's not helped
by folios not being a consistent size).

Instead, the folio_queue buffer contains pointers only the folios we're
dealing with, has them in ascending order and indicates a gap by placing
non-consequitive folios next to each other.  This makes it possible to
track where we need to clean up to by just keeping track of where we've
processed to on each stream and taking the minimum.

Note that the I/O iterator is always rounded up to the end of the folio,
even if that is beyond the EOF position, so that the cache can do DIO from
the page.  The excess space is cleared, though mmapped writes clobber it.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
cc: Jeff Layton &lt;jlayton@kernel.org&gt;
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-18-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>netfs: Provide an iterator-reset function</title>
<updated>2024-09-12T10:20:40+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2024-07-08T13:49:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bfaa33b8ba196f9506a45e5a36e968f087c8cd16'/>
<id>urn:sha1:bfaa33b8ba196f9506a45e5a36e968f087c8cd16</id>
<content type='text'>
Provide a function to reset the iterator on a subrequest.

Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
cc: Jeff Layton &lt;jlayton@kernel.org&gt;
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20240814203850.2240469-17-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
</feed>
