kernel/linux.git/drivers/vhost/net.c, branch v6.19.11

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

2025-12-05T02:59:21+00:00

Pull virtio updates from Michael Tsirkin: "Just a bunch of fixes and cleanups, mostly very simple. Several features were merged through net-next this time around" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_pci: drop kernel.h vhost: switch to arrays of feature bits vhost/test: add test specific macro for features virtio: clean up features qword/dword terms vduse: add WQ_PERCPU to alloc_workqueue users virtio_balloon: add WQ_PERCPU to alloc_workqueue users vdpa/pds: use %pe for ERR_PTR() in event handler registration vhost: Fix kthread worker cgroup failure handling virtio: vdpa: Fix reference count leak in octep_sriov_enable() vdpa/mlx5: Fix incorrect error code reporting in query_virtqueues virtio: fix map ops comment virtio: fix virtqueue_set_affinity() docs virtio: standardize Returns documentation style virtio: fix grammar in virtio_map_ops docs virtio: fix grammar in virtio_queue_info docs virtio: fix whitespace in virtio_config_ops virtio: fix typo in virtio_device_ready() comment virtio: fix kernel-doc for mapping/free_coherent functions virtio_vdpa: fix misleading return in void function

vhost: switch to arrays of feature bits

2025-11-30T23:02:43+00:00

The current interface where caller has to know in which 64 bit chunk each bit is, is inelegant and fragile. Let's simply use arrays of bits. By using unroll macros text size grows only slightly. Message-ID: <637e182e139980e5930d50b928ba5ac072d628a9.1764225384.git.mst@redhat.com> Signed-off-by: Michael S. Tsirkin

virtio: clean up features qword/dword terms

2025-11-27T07:03:07+00:00

virtio pci uses word to mean "16 bits". mmio uses it to mean "32 bits". To avoid confusion, let's avoid the term in core virtio altogether. Just say U64 to mean "64 bit". Fixes: e7d4c1c5a546 ("virtio: introduce extended features") Cc: Paolo Abeni Acked-by: Jason Wang Message-ID: Signed-off-by: Michael S. Tsirkin

vhost: rewind next_avail_head while discarding descriptors

2025-11-26T22:44:58+00:00

When discarding descriptors with IN_ORDER, we should rewind next_avail_head otherwise it would run out of sync with last_avail_idx. This would cause driver to report "id X is not a head". Fixing this by returning the number of descriptors that is used for each buffer via vhost_get_vq_desc_n() so caller can use the value while discarding descriptors. Fixes: 67a873df0c41 ("vhost: basic in order support") Cc: stable@vger.kernel.org Signed-off-by: Jason Wang Acked-by: Michael S. Tsirkin Link: https://patch.msgid.link/20251120022950.10117-1-jasowang@redhat.com Signed-off-by: Jakub Kicinski

vhost-net: flush batched before enabling notifications

2025-09-19T08:15:26+00:00

Commit 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after sendmsg") tries to defer the notification enabling by moving the logic out of the loop after the vhost_tx_batch() when nothing new is spotted. This caused unexpected side effects as the new logic is reused for several other error conditions. A previous patch reverted 8c2e6b26ffe2. Now, bring the performance back up by flushing batched buffers before enabling notifications. Reported-by: Jon Kohler Cc: stable@vger.kernel.org Fixes: 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after sendmsg") Signed-off-by: Jason Wang Signed-off-by: Michael S. Tsirkin Message-Id: <20250917063045.2042-3-jasowang@redhat.com>

Revert "vhost/net: Defer TX queue re-enable until after sendmsg"

2025-09-19T08:15:26+00:00

This reverts commit 8c2e6b26ffe243be1e78f5a4bfb1a857d6e6f6d6. It tries to defer the notification enabling by moving the logic out of the loop after the vhost_tx_batch() when nothing new is spotted. This will bring side effects as the new logic would be reused for several other error conditions. One example is the IOTLB: when there's an IOTLB miss, get_tx_bufs() might return -EAGAIN and exit the loop and see there's still available buffers, so it will queue the tx work again until userspace feed the IOTLB entry correctly. This will slowdown the tx processing and trigger the TX watchdog in the guest as reported in https://lkml.org/lkml/2025/9/10/1596. To fix, revert the change. A follow up patch will bring the performance back in a safe way. Reported-by: Jon Kohler Cc: stable@vger.kernel.org Fixes: 8c2e6b26ffe2 ("vhost/net: Defer TX queue re-enable until after sendmsg") Signed-off-by: Jason Wang Signed-off-by: Michael S. Tsirkin Message-Id: <20250917063045.2042-2-jasowang@redhat.com>

vhost-net: unbreak busy polling

2025-09-19T08:15:26+00:00

Commit 67a873df0c41 ("vhost: basic in order support") pass the number of used elem to vhost_net_rx_peek_head_len() to make sure it can signal the used correctly before trying to do busy polling. But it forgets to clear the count, this would cause the count run out of sync with handle_rx() and break the busy polling. Fixing this by passing the pointer of the count and clearing it after the signaling the used. Acked-by: Michael S. Tsirkin Cc: stable@vger.kernel.org Fixes: 67a873df0c41 ("vhost: basic in order support") Signed-off-by: Jason Wang Message-Id: <20250917063045.2042-1-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin

vhost/net: Protect ubufs with rcu read lock in vhost_net_ubuf_put()

2025-08-26T07:38:10+00:00

When operating on struct vhost_net_ubuf_ref, the following execution sequence is theoretically possible: CPU0 is finalizing DMA operation CPU1 is doing VHOST_NET_SET_BACKEND // ubufs->refcount == 2 vhost_net_ubuf_put() vhost_net_ubuf_put_wait_and_free(oldubufs) vhost_net_ubuf_put_and_wait() vhost_net_ubuf_put() int r = atomic_sub_return(1, &ubufs->refcount); // r = 1 int r = atomic_sub_return(1, &ubufs->refcount); // r = 0 wait_event(ubufs->wait, !atomic_read(&ubufs->refcount)); // no wait occurs here because condition is already true kfree(ubufs); if (unlikely(!r)) wake_up(&ubufs->wait); // use-after-free This leads to use-after-free on ubufs access. This happens because CPU1 skips waiting for wake_up() when refcount is already zero. To prevent that use a read-side RCU critical section in vhost_net_ubuf_put(), as suggested by Hillf Danton. For this lock to take effect, free ubufs with kfree_rcu(). Cc: stable@vger.kernel.org Fixes: 0ad8b480d6ee9 ("vhost: fix ref cnt checking deadlock") Reported-by: Andrey Ryabinin Suggested-by: Hillf Danton Signed-off-by: Nikolay Kuratov Message-Id: <20250805130917.727332-1-kniv@yandex-team.ru> Signed-off-by: Michael S. Tsirkin

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

2025-08-01T21:17:48+00:00

Pull virtio updates from Michael Tsirkin: - vhost can now support legacy threading if enabled in Kconfig - vsock memory allocation strategies for large buffers have been improved, reducing pressure on kmalloc - vhost now supports the in-order feature. guest bits missed the merge window. - fixes, cleanups all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (30 commits) vsock/virtio: Allocate nonlinear SKBs for handling large transmit buffers vsock/virtio: Rename virtio_vsock_skb_rx_put() vhost/vsock: Allocate nonlinear SKBs for handling large receive buffers vsock/virtio: Move SKB allocation lower-bound check to callers vsock/virtio: Rename virtio_vsock_alloc_skb() vsock/virtio: Resize receive buffers so that each SKB fits in a 4K page vsock/virtio: Move length check to callers of virtio_vsock_skb_rx_put() vsock/virtio: Validate length in packet header before skb_put() vhost/vsock: Avoid allocating arbitrarily-sized SKBs vhost_net: basic in_order support vhost: basic in order support vhost: fail early when __vhost_add_used() fails vhost: Reintroduce kthread API and add mode selection vdpa: Fix IDR memory leak in VDUSE module exit vdpa/mlx5: Fix release of uninitialized resources on error path vhost-scsi: Fix check for inline_sg_cnt exceeding preallocated limit virtio: virtio_dma_buf: fix missing parameter documentation vhost: Fix typos vhost: vringh: Remove unused functions vhost: vringh: Remove unused iotlb functions ...

vhost_net: basic in_order support

2025-08-01T13:11:09+00:00

This patch introduces basic in-order support for vhost-net. By recording the number of batched buffers in an array when calling `vhost_add_used_and_signal_n()`, we can reduce the number of userspace accesses. Note that the vhost-net batching logic is kept as we still count the number of buffers there. Testing Results: With testpmd: - TX: txonly mode + vhost_net with XDP_DROP on TAP shows a 17.5% improvement, from 4.75 Mpps to 5.35 Mpps. - RX: No obvious improvements were observed. With virtio-ring in-order experimental code in the guest: - TX: pktgen in the guest + XDP_DROP on TAP shows a 19% improvement, from 5.2 Mpps to 6.2 Mpps. - RX: pktgen on TAP with vhost_net + XDP_DROP in the guest achieves a 6.1% improvement, from 3.47 Mpps to 3.61 Mpps. Acked-by: Jonah Palmer Acked-by: Eugenio Pérez Signed-off-by: Jason Wang Message-Id: <20250714084755.11921-4-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin Tested-by: Lei Yang