<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/pipe.c, branch v6.18.21</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-08-29T13:08:07+00:00</updated>
<entry>
<title>Add RWF_NOSIGNAL flag for pwritev2</title>
<updated>2025-08-29T13:08:07+00:00</updated>
<author>
<name>Lauri Vasama</name>
<email>git@vasama.org</email>
</author>
<published>2025-08-27T13:39:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=db2ab24a341ce89351a1bede37a96a3e3ce1726a'/>
<id>urn:sha1:db2ab24a341ce89351a1bede37a96a3e3ce1726a</id>
<content type='text'>
For a user mode library to avoid generating SIGPIPE signals (e.g.
because this behaviour is not portable across operating systems) is
cumbersome. It is generally bad form to change the process-wide signal
mask in a library, so a local solution is needed instead.

For I/O performed directly using system calls (synchronous or readiness
based asynchronous) this currently involves applying a thread-specific
signal mask before the operation and reverting it afterwards. This can be
avoided when it is known that the file descriptor refers to neither a
pipe nor a socket, but a conservative implementation must always apply
the mask. This incurs the cost of two additional system calls. In the
case of sockets, the existing MSG_NOSIGNAL flag can be used with send.

For asynchronous I/O performed using io_uring, currently the only option
(apart from MSG_NOSIGNAL for sockets), is to mask SIGPIPE entirely in the
call to io_uring_enter. Thankfully io_uring_enter takes a signal mask, so
only a single syscall is needed. However, copying the signal mask on
every call incurs a non-zero performance penalty. Furthermore, this mask
applies to all completions, meaning that if the non-signaling behaviour
is desired only for some subset of operations, the desired signals must
be raised manually from user-mode depending on the completed operation.

Add RWF_NOSIGNAL flag for pwritev2. This flag prevents the SIGPIPE signal
from being raised when writing on disconnected pipes or sockets. The flag
is handled directly by the pipe filesystem and converted to the existing
MSG_NOSIGNAL flag for sockets.

Signed-off-by: Lauri Vasama &lt;git@vasama.org&gt;
Link: https://lore.kernel.org/20250827133901.1820771-1-git@vasama.org
Reviewed-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>fs/pipe: set FMODE_NOWAIT in create_pipe_files()</title>
<updated>2025-06-10T11:16:19+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2025-05-30T11:25:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=dd765ba8723958514eab2fc742bef69019a21069'/>
<id>urn:sha1:dd765ba8723958514eab2fc742bef69019a21069</id>
<content type='text'>
Rather than have the caller set the FMODE_NOWAIT flags for both output
files, move it to create_pipe_files() where other f_mode flags are set
anyway with stream_open(). With that, both __do_pipe_flags() and
io_pipe() can remove the manual setting of the NOWAIT flags.

No intended functional changes, just a code cleanup.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Link: https://lore.kernel.org/1f0473f8-69f3-4eb1-aa77-3334c6a71d24@kernel.dk
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>sort.h: hoist cmp_int() into generic header file</title>
<updated>2025-05-12T00:54:12+00:00</updated>
<author>
<name>Fedor Pchelkin</name>
<email>pchelkin@ispras.ru</email>
</author>
<published>2025-04-27T20:14:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f3def8270c67217efb0e23dcd54714f74de3b6b2'/>
<id>urn:sha1:f3def8270c67217efb0e23dcd54714f74de3b6b2</id>
<content type='text'>
Deduplicate the same functionality implemented in several places by
moving the cmp_int() helper macro into linux/sort.h.

The macro performs a three-way comparison of the arguments mostly useful
in different sorting strategies and algorithms.

Link: https://lkml.kernel.org/r/20250427201451.900730-1-pchelkin@ispras.ru
Signed-off-by: Fedor Pchelkin &lt;pchelkin@ispras.ru&gt;
Suggested-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Acked-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
Acked-by: Coly Li &lt;colyli@kernel.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Carlos Maiolino &lt;cem@kernel.org&gt;
Cc: Christian Brauner &lt;brauner@kernel.org&gt;
Cc: Coly Li &lt;colyli@kernel.org&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge patch series "pipe: Trivial cleanups"</title>
<updated>2025-03-10T07:55:13+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2025-03-10T07:55:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3732d8f16531ddc591622bc64ce4d4c160c34bb4'/>
<id>urn:sha1:3732d8f16531ddc591622bc64ce4d4c160c34bb4</id>
<content type='text'>
K Prateek Nayak &lt;kprateek.nayak@amd.com&gt; says:

Based on the suggestion on the RFC, the treewide conversion of
references to pipe-&gt;{head,tail} from unsigned int to pipe_index_t has
been dropped for now. The series contains trivial cleanup suggested to
limit the nr_slots in pipe_resize_ring() to be covered between
pipe_index_t limits of pipe-&gt;{head,tail} and using pipe_buf() to remove
the open-coded usage of masks to access pipe buffer building on Linus'
cleanup of fs/fuse/dev.c in commit ebb0f38bb47f ("fs/pipe: fix pipe
buffer index use in FUSE")

* patches from https://lore.kernel.org/r/20250307052919.34542-1-kprateek.nayak@amd.com:
  fs/splice: Use pipe_buf() helper to retrieve pipe buffer
  fs/pipe: Use pipe_buf() helper to retrieve pipe buffer
  kernel/watch_queue: Use pipe_buf() to retrieve the pipe buffer
  fs/pipe: Limit the slots in pipe_resize_ring()

Link: https://lore.kernel.org/r/20250307052919.34542-1-kprateek.nayak@amd.com
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>fs/pipe: Use pipe_buf() helper to retrieve pipe buffer</title>
<updated>2025-03-10T07:55:05+00:00</updated>
<author>
<name>K Prateek Nayak</name>
<email>kprateek.nayak@amd.com</email>
</author>
<published>2025-03-07T05:29:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ba0822021c3c5aa8029c16078c7091d35b4979bc'/>
<id>urn:sha1:ba0822021c3c5aa8029c16078c7091d35b4979bc</id>
<content type='text'>
Use pipe_buf() helper to retrieve the pipe buffer throughout the file
replacing the open-coded the logic.

Suggested-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: K Prateek Nayak &lt;kprateek.nayak@amd.com&gt;
Link: https://lore.kernel.org/r/20250307052919.34542-4-kprateek.nayak@amd.com
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>fs/pipe: Limit the slots in pipe_resize_ring()</title>
<updated>2025-03-10T07:55:05+00:00</updated>
<author>
<name>K Prateek Nayak</name>
<email>kprateek.nayak@amd.com</email>
</author>
<published>2025-03-07T05:29:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cf3d0c54b21c4a351d4f94cf188e9715dbd1ef5b'/>
<id>urn:sha1:cf3d0c54b21c4a351d4f94cf188e9715dbd1ef5b</id>
<content type='text'>
Limit the number of slots in pipe_resize_ring() to the maximum value
representable by pipe-&gt;{head,tail}. Values beyond the max limit can
lead to incorrect pipe occupancy related calculations where the pipe
will never appear full.

Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: K Prateek Nayak &lt;kprateek.nayak@amd.com&gt;
Link: https://lore.kernel.org/r/20250307052919.34542-2-kprateek.nayak@amd.com
Reviewed-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge mainline pipe changes</title>
<updated>2025-03-10T07:53:40+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2025-03-10T07:53:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=38962d9b15ce3d50f5c61dc441b7c53f4e0d2e33'/>
<id>urn:sha1:38962d9b15ce3d50f5c61dc441b7c53f4e0d2e33</id>
<content type='text'>
Mainline now contains various changes to pipes that are relevant for
other pipe work this cycle. So merge them into the respective VFS tree.

Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>fs/pipe: add simpler helpers for common cases</title>
<updated>2025-03-07T04:25:35+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-03-07T04:25:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=00a7d39898c8010bfd5ff62af31ca5db34421b38'/>
<id>urn:sha1:00a7d39898c8010bfd5ff62af31ca5db34421b38</id>
<content type='text'>
The fix to atomically read the pipe head and tail state when not holding
the pipe mutex has caused a number of headaches due to the size change
of the involved types.

It turns out that we don't have _that_ many places that access these
fields directly and were affected, but we have more than we strictly
should have, because our low-level helper functions have been designed
to have intimate knowledge of how the pipes work.

And as a result, that random noise of direct 'pipe-&gt;head' and
'pipe-&gt;tail' accesses makes it harder to pinpoint any actual potential
problem spots remaining.

For example, we didn't have a "is the pipe full" helper function, but
instead had a "given these pipe buffer indexes and this pipe size, is
the pipe full".  That's because some low-level pipe code does actually
want that much more complicated interface.

But most other places literally just want a "is the pipe full" helper,
and not having it meant that those places ended up being unnecessarily
much too aware of this all.

It would have been much better if only the very core pipe code that
cared had been the one aware of this all.

So let's fix it - better late than never.  This just introduces the
trivial wrappers for "is this pipe full or empty" and to get how many
pipe buffers are used, so that instead of writing

        if (pipe_full(pipe-&gt;head, pipe-&gt;tail, pipe-&gt;max_usage))

the places that literally just want to know if a pipe is full can just
say

        if (pipe_is_full(pipe))

instead.  The existing trivial cases were converted with a 'sed' script.

This cuts down on the places that access pipe-&gt;head and pipe-&gt;tail
directly outside of the pipe code (and core splice code) quite a lot.

The splice code in particular still revels in doing the direct low-level
accesses, and the fuse fuse_dev_splice_write() code also seems a bit
unnecessarily eager to go very low-level, but it's at least a bit better
than it used to be.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fs/pipe: do not open-code pipe head/tail logic in FIONREAD</title>
<updated>2025-03-06T17:33:58+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-03-06T17:33:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d810d4c27bf34c719243bab9feb0d843edc09fd7'/>
<id>urn:sha1:d810d4c27bf34c719243bab9feb0d843edc09fd7</id>
<content type='text'>
Rasmus points out that we do indeed have other cases of breakage from
the type changes that were introduced on 32-bit targets in order to read
the pipe head and tail values atomically (commit 3d252160b818: "fs/pipe:
Read pipe-&gt;{head,tail} atomically outside pipe-&gt;mutex").

Fix it up by using the proper helper functions that now deal with the
pipe buffer index types properly.  This makes the code simpler and more
obvious.

The compiler does the CSE and loop hoisting of the pipe ring size
masking that we used to do manually, so open-coding this was never a
good idea.

Reported-by: Rasmus Villemoes &lt;ravi@prevas.dk&gt;
Link: https://lore.kernel.org/all/87cyeu5zgk.fsf@prevas.dk/
Fixes: 3d252160b818 ("fs/pipe: Read pipe-&gt;{head,tail} atomically outside pipe-&gt;mutex")Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Mateusz Guzik &lt;mjguzik@gmail.com&gt;
Cc: K Prateek Nayak &lt;kprateek.nayak@amd.com&gt;
Cc: Swapnil Sapkal &lt;swapnil.sapkal@amd.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fs/pipe: Read pipe-&gt;{head,tail} atomically outside pipe-&gt;mutex</title>
<updated>2025-03-04T18:51:48+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-03-04T13:51:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3d252160b818045f3a152b13756f6f37ca34639d'/>
<id>urn:sha1:3d252160b818045f3a152b13756f6f37ca34639d</id>
<content type='text'>
pipe_readable(), pipe_writable(), and pipe_poll() can read "pipe-&gt;head"
and "pipe-&gt;tail" outside of "pipe-&gt;mutex" critical section. When the
head and the tail are read individually in that order, there is a window
for interruption between the two reads in which both the head and the
tail can be updated by concurrent readers and writers.

One of the problematic scenarios observed with hackbench running
multiple groups on a large server on a particular pipe inode is as
follows:

    pipe-&gt;head = 36
    pipe-&gt;tail = 36

    hackbench-118762  [057] .....  1029.550548: pipe_write: *wakes up: pipe not full*
    hackbench-118762  [057] .....  1029.550548: pipe_write: head: 36 -&gt; 37 [tail: 36]
    hackbench-118762  [057] .....  1029.550548: pipe_write: *wake up next reader 118740*
    hackbench-118762  [057] .....  1029.550548: pipe_write: *wake up next writer 118768*

    hackbench-118768  [206] .....  1029.55055X: pipe_write: *writer wakes up*
    hackbench-118768  [206] .....  1029.55055X: pipe_write: head = READ_ONCE(pipe-&gt;head) [37]
    ... CPU 206 interrupted (exact wakeup was not traced but 118768 did read head at 37 in traces)

    hackbench-118740  [057] .....  1029.550558: pipe_read:  *reader wakes up: pipe is not empty*
    hackbench-118740  [057] .....  1029.550558: pipe_read:  tail: 36 -&gt; 37 [head = 37]
    hackbench-118740  [057] .....  1029.550559: pipe_read:  *pipe is empty; wakeup writer 118768*
    hackbench-118740  [057] .....  1029.550559: pipe_read:  *sleeps*

    hackbench-118766  [185] .....  1029.550592: pipe_write: *New writer comes in*
    hackbench-118766  [185] .....  1029.550592: pipe_write: head: 37 -&gt; 38 [tail: 37]
    hackbench-118766  [185] .....  1029.550592: pipe_write: *wakes up reader 118766*

    hackbench-118740  [185] .....  1029.550598: pipe_read:  *reader wakes up; pipe not empty*
    hackbench-118740  [185] .....  1029.550599: pipe_read:  tail: 37 -&gt; 38 [head: 38]
    hackbench-118740  [185] .....  1029.550599: pipe_read:  *pipe is empty*
    hackbench-118740  [185] .....  1029.550599: pipe_read:  *reader sleeps; wakeup writer 118768*

    ... CPU 206 switches back to writer
    hackbench-118768  [206] .....  1029.550601: pipe_write: tail = READ_ONCE(pipe-&gt;tail) [38]
    hackbench-118768  [206] .....  1029.550601: pipe_write: pipe_full()? (u32)(37 - 38) &gt;= 16? Yes
    hackbench-118768  [206] .....  1029.550601: pipe_write: *writer goes back to sleep*

    [ Tasks 118740 and 118768 can then indefinitely wait on each other. ]

The unsigned arithmetic in pipe_occupancy() wraps around when
"pipe-&gt;tail &gt; pipe-&gt;head" leading to pipe_full() returning true despite
the pipe being empty.

The case of genuine wraparound of "pipe-&gt;head" is handled since pipe
buffer has data allowing readers to make progress until the pipe-&gt;tail
wraps too after which the reader will wakeup a sleeping writer, however,
mistaking the pipe to be full when it is in fact empty can lead to
readers and writers waiting on each other indefinitely.

This issue became more problematic and surfaced as a hang in hackbench
after the optimization in commit aaec5a95d596 ("pipe_read: don't wake up
the writer if the pipe is still full") significantly reduced the number
of spurious wakeups of writers that had previously helped mask the
issue.

To avoid missing any updates between the reads of "pipe-&gt;head" and
"pipe-&gt;write", unionize the two with a single unsigned long
"pipe-&gt;head_tail" member that can be loaded atomically.

Using "pipe-&gt;head_tail" to read the head and the tail ensures the
lockless checks do not miss any updates to the head or the tail and
since those two are only updated under "pipe-&gt;mutex", it ensures that
the head is always ahead of, or equal to the tail resulting in correct
calculations.

  [ prateek: commit log, testing on x86 platforms. ]

Reported-and-debugged-by: Swapnil Sapkal &lt;swapnil.sapkal@amd.com&gt;
Closes: https://lore.kernel.org/lkml/e813814e-7094-4673-bc69-731af065a0eb@amd.com/
Reported-by: Alexey Gladkov &lt;legion@kernel.org&gt;
Closes: https://lore.kernel.org/all/Z8Wn0nTvevLRG_4m@example.org/
Fixes: 8cefc107ca54 ("pipe: Use head and tail pointers for the ring, not cursor and length")
Tested-by: Swapnil Sapkal &lt;swapnil.sapkal@amd.com&gt;
Reviewed-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Tested-by: Alexey Gladkov &lt;legion@kernel.org&gt;
Signed-off-by: K Prateek Nayak &lt;kprateek.nayak@amd.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
