kernel/linux.git/drivers/vhost/net.c, branch v6.12.80

vhost/net: Protect ubufs with rcu read lock in vhost_net_ubuf_put()

2025-09-04T13:31:44+00:00

commit dd54bcf86c91a4455b1f95cbc8e9ac91205f3193 upstream. When operating on struct vhost_net_ubuf_ref, the following execution sequence is theoretically possible: CPU0 is finalizing DMA operation CPU1 is doing VHOST_NET_SET_BACKEND // ubufs->refcount == 2 vhost_net_ubuf_put() vhost_net_ubuf_put_wait_and_free(oldubufs) vhost_net_ubuf_put_and_wait() vhost_net_ubuf_put() int r = atomic_sub_return(1, &ubufs->refcount); // r = 1 int r = atomic_sub_return(1, &ubufs->refcount); // r = 0 wait_event(ubufs->wait, !atomic_read(&ubufs->refcount)); // no wait occurs here because condition is already true kfree(ubufs); if (unlikely(!r)) wake_up(&ubufs->wait); // use-after-free This leads to use-after-free on ubufs access. This happens because CPU1 skips waiting for wake_up() when refcount is already zero. To prevent that use a read-side RCU critical section in vhost_net_ubuf_put(), as suggested by Hillf Danton. For this lock to take effect, free ubufs with kfree_rcu(). Cc: stable@vger.kernel.org Fixes: 0ad8b480d6ee9 ("vhost: fix ref cnt checking deadlock") Reported-by: Andrey Ryabinin Suggested-by: Hillf Danton Signed-off-by: Nikolay Kuratov Message-Id: <20250805130917.727332-1-kniv@yandex-team.ru> Signed-off-by: Michael S. Tsirkin Signed-off-by: Greg Kroah-Hartman

net: extend ubuf_info callback to ops structure

2024-04-22T23:21:35+00:00

We'll need to associate additional callbacks with ubuf_info, introduce a structure holding ubuf_info callbacks. Apart from a more smarter io_uring notification management introduced in next patches, it can be used to generalise msg_zerocopy_put_abort() and also store ->sg_from_iter, which is currently passed in struct msghdr. Reviewed-by: Jens Axboe Reviewed-by: David Ahern Signed-off-by: Pavel Begunkov Reviewed-by: Willem de Bruijn Link: https://lore.kernel.org/all/a62015541de49c0e2a8a0377a1d5d0a5aeb07016.1713369317.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

2024-03-19T15:57:39+00:00

Pull virtio updates from Michael Tsirkin: - Per vq sizes in vdpa - Info query for block devices support in vdpa - DMA sync callbacks in vduse - Fixes, cleanups * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (35 commits) virtio_net: rename free_old_xmit_skbs to free_old_xmit virtio_net: unify the code for recycling the xmit ptr virtio-net: add cond_resched() to the command waiting loop virtio-net: convert rx mode setting to use workqueue virtio: packed: fix unmap leak for indirect desc table vDPA: report virtio-blk flush info to user space vDPA: report virtio-block read-only info to user space vDPA: report virtio-block write zeroes configuration to user space vDPA: report virtio-block discarding configuration to user space vDPA: report virtio-block topology info to user space vDPA: report virtio-block MQ info to user space vDPA: report virtio-block max segments in a request to user space vDPA: report virtio-block block-size to user space vDPA: report virtio-block max segment size to user space vDPA: report virtio-block capacity to user space virtio: make virtio_bus const vdpa: make vdpa_bus const vDPA/ifcvf: implement vdpa_config_ops.get_vq_num_min vDPA/ifcvf: get_max_vq_size to return max size virtio_vdpa: create vqs with the actual size ...

vhost: Added pad cleanup if vnet_hdr is not present.

2024-03-19T06:45:49+00:00

When the Qemu launched with vhost but without tap vnet_hdr, vhost tries to copy vnet_hdr from socket iter with size 0 to the page that may contain some trash. That trash can be interpreted as unpredictable values for vnet_hdr. That leads to dropping some packets and in some cases to stalling vhost routine when the vhost_net tries to process packets and fails in a loop. Qemu options: -netdev tap,vhost=on,vnet_hdr=off,... Signed-off-by: Andrew Melnychenko Message-Id: <20240115194840.1183077-1-andrew@daynix.com> Signed-off-by: Michael S. Tsirkin

vhost/net: remove vhost_net_page_frag_refill()

2024-03-05T10:38:14+00:00

The page frag in vhost_net_page_frag_refill() uses the 'struct page_frag' from skb_page_frag_refill(), but it's implementation is similar to page_frag_alloc_align() now. This patch removes vhost_net_page_frag_refill() by using 'struct page_frag_cache' instead of 'struct page_frag', and allocating frag using page_frag_alloc_align(). The added benefit is that not only unifying the page frag implementation a little, but also having about 0.5% performance boost testing by using the vhost_net_test introduced in the last patch. Signed-off-by: Yunsheng Lin Acked-by: Jason Wang Acked-by: Michael S. Tsirkin Signed-off-by: Paolo Abeni

page_frag: unify gfp bits for order 3 page allocation

2024-03-05T10:38:14+00:00

Currently there seems to be three page frag implementations which all try to allocate order 3 page, if that fails, it then fail back to allocate order 0 page, and each of them all allow order 3 page allocation to fail under certain condition by using specific gfp bits. The gfp bits for order 3 page allocation are different between different implementation, __GFP_NOMEMALLOC is or'd to forbid access to emergency reserves memory for __page_frag_cache_refill(), but it is not or'd in other implementions, __GFP_DIRECT_RECLAIM is masked off to avoid direct reclaim in vhost_net_page_frag_refill(), but it is not masked off in __page_frag_cache_refill(). This patch unifies the gfp bits used between different implementions by or'ing __GFP_NOMEMALLOC and masking off __GFP_DIRECT_RECLAIM for order 3 page allocation to avoid possible pressure for mm. Leave the gfp unifying for page frag implementation in sock.c for now as suggested by Paolo Abeni. Signed-off-by: Yunsheng Lin Reviewed-by: Alexander Duyck CC: Alexander Duyck Acked-by: Michael S. Tsirkin Signed-off-by: Paolo Abeni

vhost: convert poll work to be vq based

2023-07-03T16:15:13+00:00

This has the drivers pass in their poll to vq mapping and then converts the core poll code to use the vq based helpers. In the next patches we will allow vqs to be handled by different workers, so to allow drivers to execute operations like queue, stop, flush, etc on specific polls/vqs we need to know the mappings. Signed-off-by: Mike Christie Message-Id: <20230626232307.97930-8-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin

vhost, vhost_net: add helper to check if vq has work

2023-07-03T16:15:13+00:00

In the next patches each vq might have different workers so one could have work but others do not. For net, we only want to check specific vqs, so this adds a helper to check if a vq has work pending and converts vhost-net to use it. Signed-off-by: Mike Christie Acked-by: Jason Wang Message-Id: <20230626232307.97930-5-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin

vhost_net: revert upend_idx only on retriable error

2023-06-08T19:43:08+00:00

Fix possible virtqueue used buffers leak and corresponding stuck in case of temporary -EIO from sendmsg() which is produced by tun driver while backend device is not up. In case of no-retriable error and zcopy do not revert upend_idx to pass packet data (that is update used_idx in corresponding vhost_zerocopy_signal_used()) as if packet data has been transferred successfully. v2: set vq->heads[ubuf->desc].len equal to VHOST_DMA_DONE_LEN in case of fake successful transmit. Signed-off-by: Andrey Smetanin Message-Id: <20230424204411.24888-1-asmetanin@yandex-team.ru> Signed-off-by: Michael S. Tsirkin Signed-off-by: Andrey Smetanin Acked-by: Jason Wang

vhost-net: support VIRTIO_F_RING_RESET

2023-02-21T00:26:59+00:00

Add VIRTIO_F_RING_RESET, which indicates that the driver can reset a queue individually. VIRTIO_F_RING_RESET feature is added to virtio-spec 1.2. The relevant information is in oasis-tcs/virtio-spec#124 oasis-tcs/virtio-spec#139 The implementation only adds the feature bit in supported features. It does not require any other changes because we reuse the existing vhost protocol. The virtqueue reset process can be concluded as two parts: 1. The driver can reset a virtqueue. When it is triggered, we use the set_backend to disable the virtqueue. 2. After the virtqueue is disabled, the driver may optionally re-enable it. The process is basically similar to when the device is started, except that the restart process does not need to set features and set mem table since they do not change. QEMU will send messages containing size, base, addr, kickfd and callfd of the virtqueue in order. Specifically, the host kernel will receive these messages in order: a. VHOST_SET_VRING_NUM b. VHOST_SET_VRING_BASE c. VHOST_SET_VRING_ADDR d. VHOST_SET_VRING_KICK e. VHOST_SET_VRING_CALL f. VHOST_NET_SET_BACKEND Finally, after we use set_backend to attach the virtqueue, the virtqueue will be enabled and start to work. Signed-off-by: Kangjie Xu Signed-off-by: Xuan Zhuo Message-Id: <20220825085610.80315-1-kangjie.xu@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin Acked-by: Jason Wang