kernel/linux.git/fs/eventfd.c, branch v6.6.132

eventfd: prevent underflow for eventfd semaphores

2023-07-11T09:41:34+00:00

For eventfd with flag EFD_SEMAPHORE, when its ctx->count is 0, calling eventfd_ctx_do_read will cause ctx->count to overflow to ULLONG_MAX. An underflow can happen with EFD_SEMAPHORE eventfds in at least the following three subsystems: (1) virt/kvm/eventfd.c (2) drivers/vfio/virqfd.c (3) drivers/virt/acrn/irqfd.c where (2) and (3) are just modeled after (1). An eventfd must be specified for use with the KVM_IRQFD ioctl(). This can also be an EFD_SEMAPHORE eventfd. When the eventfd count is zero or has been decremented to zero an underflow can be triggered when the irqfd is shut down by raising the KVM_IRQFD_FLAG_DEASSIGN flag in the KVM_IRQFD ioctl(): // ctx->count == 0 kvm_vm_ioctl() -> kvm_irqfd() -> kvm_irqfd_deassign() -> irqfd_deactivate() -> irqfd_shutdown() -> eventfd_ctx_remove_wait_queue(&cnt) -> eventfd_ctx_do_read(&cnt) Userspace polling on the eventfd wouldn't notice the underflow because 1 is always returned as the value from eventfd_read() while ctx->count would've underflowed. It's not a huge deal because this should only be happening when the irqfd is shutdown but we should still fix it and avoid the spurious wakeup. Fixes: cb289d6244a3 ("eventfd - allow atomic read and waitqueue remove") Signed-off-by: Wen Yang Cc: Alexander Viro Cc: Jens Axboe Cc: Christian Brauner Cc: Christoph Hellwig Cc: Dylan Yudaken Cc: David Woodhouse Cc: Matthew Wilcox Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Message-Id: [brauner: rewrite commit message and add explanation how this underflow can happen] Signed-off-by: Christian Brauner

eventfd: show the EFD_SEMAPHORE flag in fdinfo

2023-06-15T07:22:23+00:00

The EFD_SEMAPHORE flag should be displayed in fdinfo, as different value could affect the behavior of eventfd. Suggested-by: Christian Brauner Signed-off-by: Wen Yang Cc: Alexander Viro Cc: Jens Axboe Cc: Christian Brauner Cc: Christoph Hellwig Cc: Dylan Yudaken Cc: David Woodhouse Cc: Matthew Wilcox Cc: Eric Biggers Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Message-Id: Signed-off-by: Christian Brauner

fs: use correct __poll_t type

2023-05-15T07:33:09+00:00

Fix the following sparse warnings by using __poll_t instead of unsigned type. fs/eventpoll.c:541:9: sparse: warning: restricted __poll_t degrades to integer fs/eventfd.c:67:17: sparse: warning: restricted __poll_t degrades to integer Signed-off-by: Min-Hua Chen Message-Id: <20230511164628.336586-1-minhuadotchen@gmail.com> Signed-off-by: Christian Brauner

eventfd: use wait_event_interruptible_locked_irq() helper

2023-04-06T08:01:50+00:00

wait_event_interruptible_locked_irq was introduced by commit 22c43c81a51e ("wait_event_interruptible_locked() interface"), but older code such as eventfd_{write,read} still uses the open code implementation. Inspired by commit 8120a8aadb20 ("fs/timerfd.c: make use of wait_event_interruptible_locked_irq()"), this patch replaces the open code implementation with a single macro call. No functional change intended. Signed-off-by: Wen Yang Reviewed-by: Eric Biggers Reviewed-by: Jens Axboe Cc: Alexander Viro Cc: Christoph Hellwig Cc: Dylan Yudaken Cc: Jens Axboe Cc: David Woodhouse Cc: Fu Wei Cc: Paolo Bonzini Cc: Michal Nazarewicz Cc: Matthew Wilcox Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Message-Id: Signed-off-by: Christian Brauner

eventfd: provide a eventfd_signal_mask() helper

2022-11-22T13:07:55+00:00

This is identical to eventfd_signal(), but it allows the caller to pass in a mask to be used for the poll wakeup key. The use case is avoiding repeated multishot triggers if we have a dependency between eventfd and io_uring. If we setup an eventfd context and register that as the io_uring eventfd, and at the same time queue a multishot poll request for the eventfd context, then any CQE posted will repeatedly trigger the multishot request until it terminates when the CQ ring overflows. In preparation for io_uring detecting this circular dependency, add the mentioned helper so that io_uring can pass in EPOLL_URING as part of the poll wakeup key. Cc: stable@vger.kernel.org # 6.0 [axboe: fold in !CONFIG_EVENTFD fix from Zhang Qilong] Signed-off-by: Jens Axboe

eventfd: guard wake_up in eventfd fs calls as well

2022-09-21T16:30:42+00:00

Guard wakeups that the user can trigger, and that may end up triggering a call back into eventfd_signal. This is in addition to the current approach that only guards in eventfd_signal. Rename in_eventfd_signal -> in_eventfd at the same time to reflect this. Without this there would be a deadlock in the following code using libaio: int main() { struct io_context *ctx = NULL; struct iocb iocb; struct iocb *iocbs[] = { &iocb }; int evfd; uint64_t val = 1; evfd = eventfd(0, EFD_CLOEXEC); assert(!io_setup(2, &ctx)); io_prep_poll(&iocb, evfd, POLLIN); io_set_eventfd(&iocb, evfd); assert(1 == io_submit(ctx, 1, iocbs)); write(evfd, &val, 8); } Signed-off-by: Dylan Yudaken Reviewed-by: Jens Axboe Link: https://lore.kernel.org/r/20220816135959.1490641-1-dylany@fb.com Signed-off-by: Jens Axboe

eventfd: Make signal recursion protection a task bit

2021-08-27T23:33:02+00:00

The recursion protection for eventfd_signal() is based on a per CPU variable and relies on the !RT semantics of spin_lock_irqsave() for protecting this per CPU variable. On RT kernels spin_lock_irqsave() neither disables preemption nor interrupts which allows the spin lock held section to be preempted. If the preempting task invokes eventfd_signal() as well, then the recursion warning triggers. Paolo suggested to protect the per CPU variable with a local lock, but that's heavyweight and actually not necessary. The goal of this protection is to prevent the task stack from overflowing, which can be achieved with a per task recursion protection as well. Replace the per CPU variable with a per task bit similar to other recursion protection bits like task_struct::in_page_owner. This works on both !RT and RT kernels and removes as a side effect the extra per CPU storage. No functional change for !RT kernels. Reported-by: Daniel Bristot de Oliveira Signed-off-by: Thomas Gleixner Tested-by: Daniel Bristot de Oliveira Acked-by: Jason Wang Cc: Al Viro Link: https://lore.kernel.org/r/87wnp9idso.ffs@tglx

eventfd: Export eventfd_ctx_do_read()

2020-11-15T14:49:10+00:00

Where events are consumed in the kernel, for example by KVM's irqfd_wakeup() and VFIO's virqfd_wakeup(), they currently lack a mechanism to drain the eventfd's counter. Since the wait queue is already locked while the wakeup functions are invoked, all they really need to do is call eventfd_ctx_do_read(). Add a check for the lock, and export it for them. Signed-off-by: David Woodhouse Message-Id: <20201027135523.646811-2-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini

eventfd: convert to f_op->read_iter()

2020-05-07T02:33:43+00:00

eventfd is using ->read() as it's file_operations read handler, but this prevents passing in information about whether a given IO operation is blocking or not. We can only use the file flags for that. To support async (-EAGAIN/poll based) retries for io_uring, we need ->read_iter() support. Convert eventfd to using ->read_iter(). With ->read_iter(), we can support IOCB_NOWAIT. Ensure the fd setup is done such that we set file->f_mode with FMODE_NOWAIT. [missing include added] Signed-off-by: Jens Axboe Signed-off-by: Al Viro

eventfd: track eventfd_signal() recursion depth

2020-02-04T00:27:38+00:00

eventfd use cases from aio and io_uring can deadlock due to circular or resursive calling, when eventfd_signal() tries to grab the waitqueue lock. On top of that, it's also possible to construct notification chains that are deep enough that we could blow the stack. Add a percpu counter that tracks the percpu recursion depth, warn if we exceed it. The counter is also exposed so that users of eventfd_signal() can do the right thing if it's non-zero in the context where it is called. Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Jens Axboe