kernel/linux.git/drivers/vhost/vhost.c, branch v7.2-rc1

vhost: remove unnecessary module_init/exit functions

2026-06-10T06:17:00+00:00

The vhost driver has unnecessary empty module_init and module_exit functions. Remove them. Note that if a module_init function exists, a module_exit function must also exist; otherwise, the module cannot be unloaded. Signed-off-by: Ethan Nelson-Moore Signed-off-by: Michael S. Tsirkin Message-ID: <20260131020010.45647-1-enelsonmoore@gmail.com>

vhost: fix vhost_get_avail_idx for a non empty ring

2026-06-04T04:25:54+00:00

vhost_get_avail_idx is supposed to report whether it has updated vq->avail_idx. Instead, it returns whether all entries have been consumed, which is usually the same. But not always - in drivers/vhost/net.c and when mergeable buffers have been enabled, the driver checks whether the combined entries are big enough to store an incoming packet. If not, the driver re-enables notifications with available entries still in the ring. The incorrect return value from vhost_get_avail_idx propagates through vhost_enable_notify and causes the host to livelock if the guest is not making progress, as vhost will immediately disable notifications and retry using the available entries. This goes back to commit d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()") which changed vhost_enable_notify() to compare the freshly read avail index against vq->last_avail_idx instead of the previously cached vq->avail_idx. Commit 7ad472397667 ("vhost: move smp_rmb() into vhost_get_avail_idx()") then carried over the same comparison when refactoring vhost_enable_notify() to call the unified vhost_get_avail_idx(). The obvious fix is to make vhost_get_avail_idx do what the comment says it does and report whether new entries have been added. Reported-by: ShuangYu Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()") Cc: Stefan Hajnoczi Acked-by: Jason Wang Reviewed-by: Stefano Garzarella Signed-off-by: Michael S. Tsirkin Message-Id: <559b04ae6ce52973c535dc47e461638b7f4c3d63.1772441455.git.mst@redhat.com>

Convert more 'alloc_obj' cases to default GFP_KERNEL arguments

2026-02-22T04:03:00+00:00

This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds

Convert 'alloc_flex' family to use the new default GFP_KERNEL argument

2026-02-22T01:09:51+00:00

This is the exact same thing as the 'alloc_obj()' version, only much smaller because there are a lot fewer users of the *alloc_flex() interface. As with alloc_obj() version, this was done entirely with mindless brute force, using the same script, except using 'flex' in the pattern rather than 'objs*'. Signed-off-by: Linus Torvalds

Convert 'alloc_obj' family to use the new default GFP_KERNEL argument

2026-02-22T01:09:51+00:00

This was done entirely with mindless brute force, using git grep -l '\

treewide: Replace kmalloc with kmalloc_obj for non-scalar types

2026-02-21T09:02:28+00:00

This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook

vhost: use "checked" versions of get_user() and put_user()

2025-12-26T20:00:01+00:00

vhost_get_user and vhost_put_user leverage __get_user and __put_user, respectively, which were both added in 2016 by commit 6b1e6cc7855b ("vhost: new device IOTLB API"). In a heavy UDP transmit workload on a vhost-net backed tap device, these functions showed up as ~11.6% of samples in a flamegraph of the underlying vhost worker thread. Quoting Linus from [1]: Anyway, every single __get_user() call I looked at looked like historical garbage. [...] End result: I get the feeling that we should just do a global search-and-replace of the __get_user/ __put_user users, replace them with plain get_user/put_user instead, and then fix up any fallout (eg the coco code). Switch to plain get_user/put_user in vhost, which results in a slight throughput speedup. get_user now about ~8.4% of samples in flamegraph. Basic iperf3 test on a Intel 5416S CPU with Ubuntu 25.10 guest: TX: taskset -c 2 iperf3 -c -t 60 -p 5200 -b 0 -u -i 5 RX: taskset -c 2 iperf3 -s -p 5200 -D Before: 6.08 Gbits/sec After: 6.32 Gbits/sec As to what drives the speedup, Sean's patch [2] explains: Use the normal, checked versions for get_user() and put_user() instead of the double-underscore versions that omit range checks, as the checked versions are actually measurably faster on modern CPUs (12%+ on Intel, 25%+ on AMD). The performance hit on the unchecked versions is almost entirely due to the added LFENCE on CPUs where LFENCE is serializing (which is effectively all modern CPUs), which was added by commit 304ec1b05031 ("x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec"). The small optimizations done by commit b19b74bc99b1 ("x86/mm: Rework address range check in get_user() and put_user()") likely shave a few cycles off, but the bulk of the extra latency comes from the LFENCE. [1] https://lore.kernel.org/all/CAHk-=wiJiDSPZJTV7z3Q-u4DfLgQTNWqUqqrwSBHp0+Dh016FA@mail.gmail.com/ [2] https://lore.kernel.org/all/20251106210206.221558-1-seanjc@google.com/ Suggested-by: Linus Torvalds Cc: Borislav Petkov Cc: Sean Christopherson Signed-off-by: Jon Kohler Message-Id: <20251113005529.2494066-1-jon@nutanix.com> Acked-by: Jason Wang Signed-off-by: Michael S. Tsirkin

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

2025-12-05T02:59:21+00:00

Pull virtio updates from Michael Tsirkin: "Just a bunch of fixes and cleanups, mostly very simple. Several features were merged through net-next this time around" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_pci: drop kernel.h vhost: switch to arrays of feature bits vhost/test: add test specific macro for features virtio: clean up features qword/dword terms vduse: add WQ_PERCPU to alloc_workqueue users virtio_balloon: add WQ_PERCPU to alloc_workqueue users vdpa/pds: use %pe for ERR_PTR() in event handler registration vhost: Fix kthread worker cgroup failure handling virtio: vdpa: Fix reference count leak in octep_sriov_enable() vdpa/mlx5: Fix incorrect error code reporting in query_virtqueues virtio: fix map ops comment virtio: fix virtqueue_set_affinity() docs virtio: standardize Returns documentation style virtio: fix grammar in virtio_map_ops docs virtio: fix grammar in virtio_queue_info docs virtio: fix whitespace in virtio_config_ops virtio: fix typo in virtio_device_ready() comment virtio: fix kernel-doc for mapping/free_coherent functions virtio_vdpa: fix misleading return in void function

vhost: Fix kthread worker cgroup failure handling

2025-11-27T07:03:06+00:00

If we fail to attach to a cgroup we are leaking the id. This adds a new goto to free the id. Fixes: 7d9896e9f6d0 ("vhost: Reintroduce kthread API and add mode selection") Signed-off-by: Mike Christie Acked-by: Jason Wang Reviewed-by: Chaitanya Kulkarni Signed-off-by: Michael S. Tsirkin Message-Id: <20251101194358.13605-1-michael.christie@oracle.com>

vhost: rewind next_avail_head while discarding descriptors

2025-11-26T22:44:58+00:00

When discarding descriptors with IN_ORDER, we should rewind next_avail_head otherwise it would run out of sync with last_avail_idx. This would cause driver to report "id X is not a head". Fixing this by returning the number of descriptors that is used for each buffer via vhost_get_vq_desc_n() so caller can use the value while discarding descriptors. Fixes: 67a873df0c41 ("vhost: basic in order support") Cc: stable@vger.kernel.org Signed-off-by: Jason Wang Acked-by: Michael S. Tsirkin Link: https://patch.msgid.link/20251120022950.10117-1-jasowang@redhat.com Signed-off-by: Jakub Kicinski