<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/eventfd.c, branch v6.6.132</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.132</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.132'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2023-07-11T09:41:34+00:00</updated>
<entry>
<title>eventfd: prevent underflow for eventfd semaphores</title>
<updated>2023-07-11T09:41:34+00:00</updated>
<author>
<name>Wen Yang</name>
<email>wenyang.linux@foxmail.com</email>
</author>
<published>2023-07-09T06:54:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=758b492047816a3158d027e9fca660bc5bcf20bf'/>
<id>urn:sha1:758b492047816a3158d027e9fca660bc5bcf20bf</id>
<content type='text'>
For eventfd with flag EFD_SEMAPHORE, when its ctx-&gt;count is 0, calling
eventfd_ctx_do_read will cause ctx-&gt;count to overflow to ULLONG_MAX.

An underflow can happen with EFD_SEMAPHORE eventfds in at least the
following three subsystems:

(1) virt/kvm/eventfd.c
(2) drivers/vfio/virqfd.c
(3) drivers/virt/acrn/irqfd.c

where (2) and (3) are just modeled after (1). An eventfd must be
specified for use with the KVM_IRQFD ioctl(). This can also be an
EFD_SEMAPHORE eventfd. When the eventfd count is zero or has been
decremented to zero an underflow can be triggered when the irqfd is shut
down by raising the KVM_IRQFD_FLAG_DEASSIGN flag in the KVM_IRQFD
ioctl():

        // ctx-&gt;count == 0
        kvm_vm_ioctl()
        -&gt; kvm_irqfd()
           -&gt; kvm_irqfd_deassign()
              -&gt; irqfd_deactivate()
                 -&gt; irqfd_shutdown()
                    -&gt; eventfd_ctx_remove_wait_queue(&amp;cnt)
                       -&gt; eventfd_ctx_do_read(&amp;cnt)

Userspace polling on the eventfd wouldn't notice the underflow because 1
is always returned as the value from eventfd_read() while ctx-&gt;count
would've underflowed. It's not a huge deal because this should only be
happening when the irqfd is shutdown but we should still fix it and
avoid the spurious wakeup.

Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove")
Signed-off-by: Wen Yang &lt;wenyang.linux@foxmail.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Christian Brauner &lt;brauner@kernel.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Dylan Yudaken &lt;dylany@fb.com&gt;
Cc: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Message-Id: &lt;tencent_7588DFD1F365950A757310D764517A14B306@qq.com&gt;
[brauner: rewrite commit message and add explanation how this underflow can happen]
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>eventfd: show the EFD_SEMAPHORE flag in fdinfo</title>
<updated>2023-06-15T07:22:23+00:00</updated>
<author>
<name>Wen Yang</name>
<email>wenyang.linux@foxmail.com</email>
</author>
<published>2023-06-13T17:01:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=33d8b5d7824c7175ed968b8e89e6db3566e9c177'/>
<id>urn:sha1:33d8b5d7824c7175ed968b8e89e6db3566e9c177</id>
<content type='text'>
The EFD_SEMAPHORE flag should be displayed in fdinfo,
as different value could affect the behavior of eventfd.

Suggested-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Wen Yang &lt;wenyang.linux@foxmail.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Christian Brauner &lt;brauner@kernel.org&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Dylan Yudaken &lt;dylany@fb.com&gt;
Cc: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Eric Biggers &lt;ebiggers@google.com&gt;
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Message-Id: &lt;tencent_05B9CFEFE6B9BC2A9B3A27886A122A7D9205@qq.com&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>fs: use correct __poll_t type</title>
<updated>2023-05-15T07:33:09+00:00</updated>
<author>
<name>Min-Hua Chen</name>
<email>minhuadotchen@gmail.com</email>
</author>
<published>2023-05-11T16:46:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=38f1755a3e59a3f88e33030f8e4ee0421de2f05a'/>
<id>urn:sha1:38f1755a3e59a3f88e33030f8e4ee0421de2f05a</id>
<content type='text'>
Fix the following sparse warnings by using __poll_t instead
of unsigned type.

fs/eventpoll.c:541:9: sparse: warning: restricted __poll_t degrades to integer
fs/eventfd.c:67:17: sparse: warning: restricted __poll_t degrades to integer

Signed-off-by: Min-Hua Chen &lt;minhuadotchen@gmail.com&gt;
Message-Id: &lt;20230511164628.336586-1-minhuadotchen@gmail.com&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>eventfd: use wait_event_interruptible_locked_irq() helper</title>
<updated>2023-04-06T08:01:50+00:00</updated>
<author>
<name>Wen Yang</name>
<email>wenyang.linux@foxmail.com</email>
</author>
<published>2023-04-05T19:20:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=113348a44b8622b497fb884f41c8659481ad0b04'/>
<id>urn:sha1:113348a44b8622b497fb884f41c8659481ad0b04</id>
<content type='text'>
wait_event_interruptible_locked_irq was introduced by commit 22c43c81a51e
("wait_event_interruptible_locked() interface"), but older code such as
eventfd_{write,read} still uses the open code implementation.
Inspired by commit 8120a8aadb20
("fs/timerfd.c: make use of wait_event_interruptible_locked_irq()"), this
patch replaces the open code implementation with a single macro call.

No functional change intended.

Signed-off-by: Wen Yang &lt;wenyang.linux@foxmail.com&gt;
Reviewed-by: Eric Biggers &lt;ebiggers@google.com&gt;
Reviewed-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Dylan Yudaken &lt;dylany@fb.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Cc: Fu Wei &lt;wefu@redhat.com&gt;
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: Michal Nazarewicz &lt;m.nazarewicz@samsung.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Message-Id: &lt;tencent_16F9553E8354D950D704214D6EA407315F0A@qq.com&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>eventfd: provide a eventfd_signal_mask() helper</title>
<updated>2022-11-22T13:07:55+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2022-11-20T17:13:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=03e02acda8e267a8183e1e0ed289ff1ef9cd7ed8'/>
<id>urn:sha1:03e02acda8e267a8183e1e0ed289ff1ef9cd7ed8</id>
<content type='text'>
This is identical to eventfd_signal(), but it allows the caller to pass
in a mask to be used for the poll wakeup key. The use case is avoiding
repeated multishot triggers if we have a dependency between eventfd and
io_uring.

If we setup an eventfd context and register that as the io_uring eventfd,
and at the same time queue a multishot poll request for the eventfd
context, then any CQE posted will repeatedly trigger the multishot request
until it terminates when the CQ ring overflows.

In preparation for io_uring detecting this circular dependency, add the
mentioned helper so that io_uring can pass in EPOLL_URING as part of the
poll wakeup key.

Cc: stable@vger.kernel.org # 6.0
[axboe: fold in !CONFIG_EVENTFD fix from Zhang Qilong]
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>eventfd: guard wake_up in eventfd fs calls as well</title>
<updated>2022-09-21T16:30:42+00:00</updated>
<author>
<name>Dylan Yudaken</name>
<email>dylany@fb.com</email>
</author>
<published>2022-08-16T13:59:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9f0deaa12d832f488500a5afe9b912e9b3cfc432'/>
<id>urn:sha1:9f0deaa12d832f488500a5afe9b912e9b3cfc432</id>
<content type='text'>
Guard wakeups that the user can trigger, and that may end up triggering a
call back into eventfd_signal. This is in addition to the current approach
that only guards in eventfd_signal.

Rename in_eventfd_signal -&gt; in_eventfd at the same time to reflect this.

Without this there would be a deadlock in the following code using libaio:

int main()
{
	struct io_context *ctx = NULL;
	struct iocb iocb;
	struct iocb *iocbs[] = { &amp;iocb };
	int evfd;
        uint64_t val = 1;

	evfd = eventfd(0, EFD_CLOEXEC);
	assert(!io_setup(2, &amp;ctx));
	io_prep_poll(&amp;iocb, evfd, POLLIN);
	io_set_eventfd(&amp;iocb, evfd);
	assert(1 == io_submit(ctx, 1, iocbs));
        write(evfd, &amp;val, 8);
}

Signed-off-by: Dylan Yudaken &lt;dylany@fb.com&gt;
Reviewed-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Link: https://lore.kernel.org/r/20220816135959.1490641-1-dylany@fb.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>eventfd: Make signal recursion protection a task bit</title>
<updated>2021-08-27T23:33:02+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2021-07-29T11:01:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b542e383d8c005f06a131e2b40d5889b812f19c6'/>
<id>urn:sha1:b542e383d8c005f06a131e2b40d5889b812f19c6</id>
<content type='text'>
The recursion protection for eventfd_signal() is based on a per CPU
variable and relies on the !RT semantics of spin_lock_irqsave() for
protecting this per CPU variable. On RT kernels spin_lock_irqsave() neither
disables preemption nor interrupts which allows the spin lock held section
to be preempted. If the preempting task invokes eventfd_signal() as well,
then the recursion warning triggers.

Paolo suggested to protect the per CPU variable with a local lock, but
that's heavyweight and actually not necessary. The goal of this protection
is to prevent the task stack from overflowing, which can be achieved with a
per task recursion protection as well.

Replace the per CPU variable with a per task bit similar to other recursion
protection bits like task_struct::in_page_owner. This works on both !RT and
RT kernels and removes as a side effect the extra per CPU storage.

No functional change for !RT kernels.

Reported-by: Daniel Bristot de Oliveira &lt;bristot@redhat.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Tested-by: Daniel Bristot de Oliveira &lt;bristot@redhat.com&gt;
Acked-by: Jason Wang &lt;jasowang@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Link: https://lore.kernel.org/r/87wnp9idso.ffs@tglx

</content>
</entry>
<entry>
<title>eventfd: Export eventfd_ctx_do_read()</title>
<updated>2020-11-15T14:49:10+00:00</updated>
<author>
<name>David Woodhouse</name>
<email>dwmw@amazon.co.uk</email>
</author>
<published>2020-10-27T13:55:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=28f1326710555bbe666f64452d08f2d7dd657cae'/>
<id>urn:sha1:28f1326710555bbe666f64452d08f2d7dd657cae</id>
<content type='text'>
Where events are consumed in the kernel, for example by KVM's
irqfd_wakeup() and VFIO's virqfd_wakeup(), they currently lack a
mechanism to drain the eventfd's counter.

Since the wait queue is already locked while the wakeup functions are
invoked, all they really need to do is call eventfd_ctx_do_read().

Add a check for the lock, and export it for them.

Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Message-Id: &lt;20201027135523.646811-2-dwmw2@infradead.org&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>eventfd: convert to f_op-&gt;read_iter()</title>
<updated>2020-05-07T02:33:43+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2020-05-01T19:11:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=12aceb89b0bce19eb89735f9de7a9983e4f0adae'/>
<id>urn:sha1:12aceb89b0bce19eb89735f9de7a9983e4f0adae</id>
<content type='text'>
eventfd is using -&gt;read() as it's file_operations read handler, but
this prevents passing in information about whether a given IO operation
is blocking or not. We can only use the file flags for that. To support
async (-EAGAIN/poll based) retries for io_uring, we need -&gt;read_iter()
support. Convert eventfd to using -&gt;read_iter().

With -&gt;read_iter(), we can support IOCB_NOWAIT. Ensure the fd setup
is done such that we set file-&gt;f_mode with FMODE_NOWAIT.

[missing include added]

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>eventfd: track eventfd_signal() recursion depth</title>
<updated>2020-02-04T00:27:38+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2020-02-02T15:23:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b5e683d5cab8cd433b06ae178621f083cabd4f63'/>
<id>urn:sha1:b5e683d5cab8cd433b06ae178621f083cabd4f63</id>
<content type='text'>
eventfd use cases from aio and io_uring can deadlock due to circular
or resursive calling, when eventfd_signal() tries to grab the waitqueue
lock. On top of that, it's also possible to construct notification
chains that are deep enough that we could blow the stack.

Add a percpu counter that tracks the percpu recursion depth, warn if we
exceed it. The counter is also exposed so that users of eventfd_signal()
can do the right thing if it's non-zero in the context where it is
called.

Cc: stable@vger.kernel.org # 4.19+
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
