kernel/linux.git/drivers/vhost/vhost.c, branch v6.6.132

vhost: fail early when __vhost_add_used() fails

2025-08-28T14:28:25+00:00

[ Upstream commit b4ba1207d45adaafa2982c035898b36af2d3e518 ] This patch fails vhost_add_used_n() early when __vhost_add_used() fails to make sure used idx is not updated with stale used ring information. Reported-by: Eugenio Pérez Signed-off-by: Jason Wang Message-Id: <20250714084755.11921-2-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin Tested-by: Lei Yang Signed-off-by: Sasha Levin

vhost_task: Handle SIGKILL by flushing work and exiting

2024-07-11T10:49:10+00:00

[ Upstream commit db5247d9bf5c6ade9fd70b4e4897441e0269b233 ] Instead of lingering until the device is closed, this has us handle SIGKILL by: 1. marking the worker as killed so we no longer try to use it with new virtqueues and new flush operations. 2. setting the virtqueue to worker mapping so no new works are queued. 3. running all the exiting works. Suggested-by: Edward Adam Davis Reported-and-tested-by: syzbot+98edc2df894917b3431f@syzkaller.appspotmail.com Message-Id: Signed-off-by: Mike Christie Message-Id: <20240316004707.45557-9-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin Signed-off-by: Sasha Levin

vhost: Release worker mutex during flushes

2024-07-11T10:49:10+00:00

[ Upstream commit ba704ff4e142fd3cfaf3379dd3b3b946754e06e3 ] In the next patches where the worker can be killed while in use, we need to be able to take the worker mutex and kill queued works for new IO and flushes, and set some new flags to prevent new __vhost_vq_attach_worker calls from swapping in/out killed workers. If we are holding the worker mutex during a flush and the flush's work is still in the queue, the worker code that will handle the SIGKILL cleanup won't be able to take the mutex and perform it's cleanup. So this patch has us drop the worker mutex while waiting for the flush to complete. Signed-off-by: Mike Christie Message-Id: <20240316004707.45557-8-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin Signed-off-by: Sasha Levin

vhost: Use virtqueue mutex for swapping worker

2024-07-11T10:49:10+00:00

[ Upstream commit 34cf9ba5f00a222dddd9fc71de7c68fdaac7fb97 ] __vhost_vq_attach_worker uses the vhost_dev mutex to serialize the swapping of a virtqueue's worker. This was done for simplicity because we are already holding that mutex. In the next patches where the worker can be killed while in use, we need finer grained locking because some drivers will hold the vhost_dev mutex while flushing. However in the SIGKILL handler in the next patches, we will need to be able to swap workers (set current one to NULL), kill queued works and stop new flushes while flushes are in progress. To prepare us, this has us use the virtqueue mutex for swapping workers instead of the vhost_dev one. Signed-off-by: Mike Christie Message-Id: <20240316004707.45557-7-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin Signed-off-by: Sasha Levin

vhost: Add smp_rmb() in vhost_enable_notify()

2024-04-17T09:19:35+00:00

commit df9ace7647d4123209395bb9967e998d5758c645 upstream. A smp_rmb() has been missed in vhost_enable_notify(), inspired by Will. Otherwise, it's not ensured the available ring entries pushed by guest can be observed by vhost in time, leading to stale available ring entries fetched by vhost in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's grace-hopper (ARM64) platform. /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ -accel kvm -machine virt,gic-version=host -cpu host \ -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \ -m 4096M,slots=16,maxmem=64G \ -object memory-backend-ram,id=mem0,size=4096M \ : \ -netdev tap,id=vnet0,vhost=true \ -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0 : guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM virtio_net virtio0: output.0:id 100 is not a head! Add the missed smp_rmb() in vhost_enable_notify(). When it returns true, it means there's still pending tx buffers. Since it might read indices, so it still can bypass the smp_rmb() in vhost_get_vq_desc(). Note that it should be safe until vq->avail_idx is changed by commit d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()"). Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()") Cc: # v5.18+ Reported-by: Yihuang Yu Suggested-by: Will Deacon Signed-off-by: Gavin Shan Acked-by: Jason Wang Message-Id: <20240328002149.1141302-3-gshan@redhat.com> Signed-off-by: Michael S. Tsirkin Reviewed-by: Stefano Garzarella Signed-off-by: Greg Kroah-Hartman

vhost: Add smp_rmb() in vhost_vq_avail_empty()

2024-04-17T09:19:35+00:00

commit 22e1992cf7b034db5325660e98c41ca5afa5f519 upstream. A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by Will. Otherwise, it's not ensured the available ring entries pushed by guest can be observed by vhost in time, leading to stale available ring entries fetched by vhost in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's grace-hopper (ARM64) platform. /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ -accel kvm -machine virt,gic-version=host -cpu host \ -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \ -m 4096M,slots=16,maxmem=64G \ -object memory-backend-ram,id=mem0,size=4096M \ : \ -netdev tap,id=vnet0,vhost=true \ -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0 : guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM virtio_net virtio0: output.0:id 100 is not a head! Add the missed smp_rmb() in vhost_vq_avail_empty(). When tx_can_batch() returns true, it means there's still pending tx buffers. Since it might read indices, so it still can bypass the smp_rmb() in vhost_get_vq_desc(). Note that it should be safe until vq->avail_idx is changed by commit 275bf960ac697 ("vhost: better detection of available buffers"). Fixes: 275bf960ac69 ("vhost: better detection of available buffers") Cc: # v4.11+ Reported-by: Yihuang Yu Suggested-by: Will Deacon Signed-off-by: Gavin Shan Acked-by: Jason Wang Message-Id: <20240328002149.1141302-2-gshan@redhat.com> Signed-off-by: Michael S. Tsirkin Reviewed-by: Stefano Garzarella Signed-off-by: Greg Kroah-Hartman

vhost: Allow null msg.size on VHOST_IOTLB_INVALIDATE

2023-10-18T15:29:56+00:00

Commit e2ae38cf3d91 ("vhost: fix hung thread due to erroneous iotlb entries") Forbade vhost iotlb msg with null size to prevent entries with size = start = 0 and last = ULONG_MAX to end up in the iotlb. Then commit 95932ab2ea07 ("vhost: allow batching hint without size") only applied the check for VHOST_IOTLB_UPDATE and VHOST_IOTLB_INVALIDATE message types to fix a regression observed with batching hit. Still, the introduction of that check introduced a regression for some users attempting to invalidate the whole ULONG_MAX range by setting the size to 0. This is the case with qemu/smmuv3/vhost integration which does not work anymore. It Looks safe to partially revert the original commit and allow VHOST_IOTLB_INVALIDATE messages with null size. vhost_iotlb_del_range() will compute a correct end iova. Same for vhost_vdpa_iotlb_unmap(). Signed-off-by: Eric Auger Fixes: e2ae38cf3d91 ("vhost: fix hung thread due to erroneous iotlb entries") Cc: stable@vger.kernel.org # v5.17+ Acked-by: Jason Wang Message-Id: <20230927140544.205088-1-eric.auger@redhat.com> Signed-off-by: Michael S. Tsirkin

vhost: Allow worker switching while work is queueing

2023-07-03T16:15:14+00:00

This patch drops the requirement that we can only switch workers if work has not been queued by using RCU for the vq based queueing paths and a mutex for the device wide flush. We can also use this to support SIGKILL properly in the future where we should exit almost immediately after getting that signal. With this patch, when get_signal returns true, we can set the vq->worker to NULL and do a synchronize_rcu to prevent new work from being queued to the vhost_task that has been killed. Signed-off-by: Mike Christie Message-Id: <20230626232307.97930-18-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin

vhost: allow userspace to create workers

2023-07-03T16:15:14+00:00

For vhost-scsi with 3 vqs or more and a workload that tries to use them in parallel like: fio --filename=/dev/sdb --direct=1 --rw=randrw --bs=4k \ --ioengine=libaio --iodepth=128 --numjobs=3 the single vhost worker thread will become a bottlneck and we are stuck at around 500K IOPs no matter how many jobs, virtqueues, and CPUs are used. To better utilize virtqueues and available CPUs, this patch allows userspace to create workers and bind them to vqs. You can have N workers per dev and also share N workers with M vqs on that dev. This patch adds the interface related code and the next patch will hook vhost-scsi into it. The patches do not try to hook net and vsock into the interface because: 1. multiple workers don't seem to help vsock. The problem is that with only 2 virtqueues we never fully use the existing worker when doing bidirectional tests. This seems to match vhost-scsi where we don't see the worker as a bottleneck until 3 virtqueues are used. 2. net already has a way to use multiple workers. Signed-off-by: Mike Christie Message-Id: <20230626232307.97930-16-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin

vhost: replace single worker pointer with xarray

2023-07-03T16:15:14+00:00

The next patch allows userspace to create multiple workers per device, so this patch replaces the vhost_worker pointer with an xarray so we can store mupltiple workers and look them up. Signed-off-by: Mike Christie Message-Id: <20230626232307.97930-15-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin