kernel/linux.git/drivers/vhost/net.c, branch v5.15.208

vhost/net: Protect ubufs with rcu read lock in vhost_net_ubuf_put()

2025-09-04T12:28:44+00:00

commit dd54bcf86c91a4455b1f95cbc8e9ac91205f3193 upstream. When operating on struct vhost_net_ubuf_ref, the following execution sequence is theoretically possible: CPU0 is finalizing DMA operation CPU1 is doing VHOST_NET_SET_BACKEND // ubufs->refcount == 2 vhost_net_ubuf_put() vhost_net_ubuf_put_wait_and_free(oldubufs) vhost_net_ubuf_put_and_wait() vhost_net_ubuf_put() int r = atomic_sub_return(1, &ubufs->refcount); // r = 1 int r = atomic_sub_return(1, &ubufs->refcount); // r = 0 wait_event(ubufs->wait, !atomic_read(&ubufs->refcount)); // no wait occurs here because condition is already true kfree(ubufs); if (unlikely(!r)) wake_up(&ubufs->wait); // use-after-free This leads to use-after-free on ubufs access. This happens because CPU1 skips waiting for wake_up() when refcount is already zero. To prevent that use a read-side RCU critical section in vhost_net_ubuf_put(), as suggested by Hillf Danton. For this lock to take effect, free ubufs with kfree_rcu(). Cc: stable@vger.kernel.org Fixes: 0ad8b480d6ee9 ("vhost: fix ref cnt checking deadlock") Reported-by: Andrey Ryabinin Suggested-by: Hillf Danton Signed-off-by: Nikolay Kuratov Message-Id: <20250805130917.727332-1-kniv@yandex-team.ru> Signed-off-by: Michael S. Tsirkin Signed-off-by: Greg Kroah-Hartman

vhost_net: revert upend_idx only on retriable error

2023-06-28T08:29:53+00:00

[ Upstream commit 1f5d2e3bab16369d5d4b4020a25db4ab1f4f082c ] Fix possible virtqueue used buffers leak and corresponding stuck in case of temporary -EIO from sendmsg() which is produced by tun driver while backend device is not up. In case of no-retriable error and zcopy do not revert upend_idx to pass packet data (that is update used_idx in corresponding vhost_zerocopy_signal_used()) as if packet data has been transferred successfully. v2: set vq->heads[ubuf->desc].len equal to VHOST_DMA_DONE_LEN in case of fake successful transmit. Signed-off-by: Andrey Smetanin Message-Id: <20230424204411.24888-1-asmetanin@yandex-team.ru> Signed-off-by: Michael S. Tsirkin Signed-off-by: Andrey Smetanin Acked-by: Jason Wang Signed-off-by: Sasha Levin

vhost/net: Clear the pending messages when the backend is removed

2023-02-09T10:26:34+00:00

[ Upstream commit 9526f9a2b762af16be94a72aca5d65c677d28f50 ] When the vhost iotlb is used along with a guest virtual iommu and the guest gets rebooted, some MISS messages may have been recorded just before the reboot and spuriously executed by the virtual iommu after the reboot. As vhost does not have any explicit reset user API, VHOST_NET_SET_BACKEND looks a reasonable point where to clear the pending messages, in case the backend is removed. Export vhost_clear_msg() and call it in vhost_net_set_backend() when fd == -1. Signed-off-by: Eric Auger Suggested-by: Jason Wang Fixes: 6b1e6cc7855b0 ("vhost: new device IOTLB API") Message-Id: <20230117151518.44725-3-eric.auger@redhat.com> Signed-off-by: Michael S. Tsirkin Signed-off-by: Sasha Levin

Fix double fget() in vhost_net_set_backend()

2022-05-25T07:57:27+00:00

commit fb4554c2232e44d595920f4d5c66cf8f7d13f9bc upstream. Descriptor table is a shared resource; two fget() on the same descriptor may return different struct file references. get_tap_ptr_ring() is called after we'd found (and pinned) the socket we'll be using and it tries to find the private tun/tap data structures associated with it. Redoing the lookup by the same file descriptor we'd used to get the socket is racy - we need to same struct file. Thanks to Jason for spotting a braino in the original variant of patch - I'd missed the use of fd == -1 for disabling backend, and in that case we can end up with sock == NULL and sock != oldsock. Cc: stable@kernel.org Acked-by: Michael S. Tsirkin Signed-off-by: Jason Wang Signed-off-by: Al Viro Signed-off-by: Greg Kroah-Hartman

tuntap: add sanity checks about msg_controllen in sendmsg

2022-04-13T18:59:07+00:00

[ Upstream commit 74a335a07a17d131b9263bfdbdcb5e40673ca9ca ] In patch [1], tun_msg_ctl was added to allow pass batched xdp buffers to tun_sendmsg. Although we donot use msg_controllen in this path, we should check msg_controllen to make sure the caller pass a valid msg_ctl. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fe8dd45bb7556246c6b76277b1ba4296c91c2505 Reported-by: Eric Dumazet Suggested-by: Jason Wang Signed-off-by: Harold Huang Acked-by: Jason Wang Link: https://lore.kernel.org/r/20220303022441.383865-1-baymaxhuang@gmail.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin

vhost_net: fix OoB on sendmsg() failure.

2021-09-09T09:52:12+00:00

If the sendmsg() call in vhost_tx_batch() fails, both the 'batched_xdp' and 'done_idx' indexes are left unchanged. If such failure happens when batched_xdp == VHOST_NET_BATCH, the next call to vhost_net_build_xdp() will access and write memory outside the xdp buffers area. Since sendmsg() can only error with EBADFD, this change addresses the issue explicitly freeing the XDP buffers batch on error. Fixes: 0a0be13b8fe2 ("vhost_net: batch submitting XDP buffers to underlayer sockets") Suggested-by: Jason Wang Signed-off-by: Paolo Abeni Acked-by: Jason Wang Signed-off-by: David S. Miller

sock: remove one redundant SKB_FRAG_PAGE_ORDER macro

2021-08-26T09:46:20+00:00

Both SKB_FRAG_PAGE_ORDER are defined to the same value in net/core/sock.c and drivers/vhost/net.c. Move the SKB_FRAG_PAGE_ORDER definition to net/core/sock.h, as both net/core/sock.c and drivers/vhost/net.c include it, and it seems a reasonable file to put the macro. Signed-off-by: Yunsheng Lin Acked-by: Jason Wang Signed-off-by: David S. Miller

vhost_net: use XDP helpers

2021-05-14T22:20:10+00:00

Make use of the xdp_{init,prepare}_buff() helpers instead of an open-coded version. Also, the field xdp->rxq was never set, so pass NULL to xdp_init_buff() to clear it. Signed-off-by: Matteo Croce Signed-off-by: David S. Miller

vhost_net: avoid tx queue stuck when sendmsg fails

2021-01-19T19:13:30+00:00

Currently the driver doesn't drop a packet which can't be sent by tun (e.g bad packet). In this case, the driver will always process the same packet lead to the tx queue stuck. To fix this issue: 1. in the case of persistent failure (e.g bad packet), the driver can skip this descriptor by ignoring the error. 2. in the case of transient failure (e.g -ENOBUFS, -EAGAIN and -ENOMEM), the driver schedules the worker to try again. Signed-off-by: Yunjian Wang Acked-by: Jason Wang Acked-by: Willem de Bruijn Acked-by: Michael S. Tsirkin Link: https://lore.kernel.org/r/1610685980-38608-1-git-send-email-wangyunjian@huawei.com Signed-off-by: Jakub Kicinski

tap/tun: add skb_zcopy_init() helper for initialization.

2021-01-08T00:08:37+00:00

Replace direct assignments with skb_zcopy_init() for zerocopy cases where a new skb is initialized, without changing the reference counts. Signed-off-by: Jonathan Lemon Signed-off-by: Jakub Kicinski