kernel/linux.git/drivers/vhost/vsock.c, branch v6.1.168

vhost/vsock: improve RCU read sections around vhost_vsock_get()

2026-01-11T14:18:50+00:00

[ Upstream commit d8ee3cfdc89b75dc059dc21c27bef2c1440f67eb ] vhost_vsock_get() uses hash_for_each_possible_rcu() to find the `vhost_vsock` associated with the `guest_cid`. hash_for_each_possible_rcu() should only be called within an RCU read section, as mentioned in the following comment in include/linux/rculist.h: /** * hlist_for_each_entry_rcu - iterate over rcu list of given type * @pos: the type * to use as a loop cursor. * @head: the head for your list. * @member: the name of the hlist_node within the struct. * @cond: optional lockdep expression if called from non-RCU protection. * * This list-traversal primitive may safely run concurrently with * the _rcu list-mutation primitives such as hlist_add_head_rcu() * as long as the traversal is guarded by rcu_read_lock(). */ Currently, all calls to vhost_vsock_get() are between rcu_read_lock() and rcu_read_unlock() except for calls in vhost_vsock_set_cid() and vhost_vsock_reset_orphans(). In both cases, the current code is safe, but we can make improvements to make it more robust. About vhost_vsock_set_cid(), when building the kernel with CONFIG_PROVE_RCU_LIST enabled, we get the following RCU warning when the user space issues `ioctl(dev, VHOST_VSOCK_SET_GUEST_CID, ...)` : WARNING: suspicious RCU usage 6.18.0-rc7 #62 Not tainted ----------------------------- drivers/vhost/vsock.c:74 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by rpc-libvirtd/3443: #0: ffffffffc05032a8 (vhost_vsock_mutex){+.+.}-{4:4}, at: vhost_vsock_dev_ioctl+0x2ff/0x530 [vhost_vsock] stack backtrace: CPU: 2 UID: 0 PID: 3443 Comm: rpc-libvirtd Not tainted 6.18.0-rc7 #62 PREEMPT(none) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-7.fc42 06/10/2025 Call Trace: dump_stack_lvl+0x75/0xb0 dump_stack+0x14/0x1a lockdep_rcu_suspicious.cold+0x4e/0x97 vhost_vsock_get+0x8f/0xa0 [vhost_vsock] vhost_vsock_dev_ioctl+0x307/0x530 [vhost_vsock] __x64_sys_ioctl+0x4f2/0xa00 x64_sys_call+0xed0/0x1da0 do_syscall_64+0x73/0xfa0 entry_SYSCALL_64_after_hwframe+0x76/0x7e ... This is not a real problem, because the vhost_vsock_get() caller, i.e. vhost_vsock_set_cid(), holds the `vhost_vsock_mutex` used by the hash table writers. Anyway, to prevent that warning, add lockdep_is_held() condition to hash_for_each_possible_rcu() to verify that either the caller is in an RCU read section or `vhost_vsock_mutex` is held when CONFIG_PROVE_RCU_LIST is enabled; and also clarify the comment for vhost_vsock_get() to better describe the locking requirements and the scope of the returned pointer validity. About vhost_vsock_reset_orphans(), currently this function is only called via vsock_for_each_connected_socket(), which holds the `vsock_table_lock` spinlock (which is also an RCU read-side critical section). However, add an explicit RCU read lock there to make the code more robust and explicit about the RCU requirements, and to prevent issues if the calling context changes in the future or if vhost_vsock_reset_orphans() is called from other contexts. Fixes: 834e772c8db0 ("vhost/vsock: fix use-after-free in network stack callers") Cc: stefanha@redhat.com Signed-off-by: Stefano Garzarella Reviewed-by: Stefan Hajnoczi Message-Id: <20251126133826.142496-1-sgarzare@redhat.com> Message-ID: <20251126210313.GA499503@fedora> Acked-by: Jason Wang Signed-off-by: Michael S. Tsirkin Signed-off-by: Sasha Levin

vhost/vsock: Avoid allocating arbitrarily-sized SKBs

2025-08-28T14:26:07+00:00

commit 10a886aaed293c4db3417951f396827216299e3d upstream. vhost_vsock_alloc_skb() returns NULL for packets advertising a length larger than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE in the packet header. However, this is only checked once the SKB has been allocated and, if the length in the packet header is zero, the SKB may not be freed immediately. Hoist the size check before the SKB allocation so that an iovec larger than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE + the header size is rejected outright. The subsequent check on the length field in the header can then simply check that the allocated SKB is indeed large enough to hold the packet. Cc: Fixes: 71dc9ec9ac7d ("virtio/vsock: replace virtio_vsock_pkt with sk_buff") Reviewed-by: Stefano Garzarella Signed-off-by: Will Deacon Message-Id: <20250717090116.11987-2-will@kernel.org> Signed-off-by: Michael S. Tsirkin Signed-off-by: Greg Kroah-Hartman

vhost/vsock: always initialize seqpacket_allow

2024-08-03T06:49:21+00:00

[ Upstream commit 1e1fdcbdde3b7663e5d8faeb2245b9b151417d22 ] There are two issues around seqpacket_allow: 1. seqpacket_allow is not initialized when socket is created. Thus if features are never set, it will be read uninitialized. 2. if VIRTIO_VSOCK_F_SEQPACKET is set and then cleared, then seqpacket_allow will not be cleared appropriately (existing apps I know about don't usually do this but it's legal and there's no way to be sure no one relies on this). To fix: - initialize seqpacket_allow after allocation - set it unconditionally in set_features Reported-by: syzbot+6c21aeb59d0e82eb2782@syzkaller.appspotmail.com Reported-by: Jeongjun Park Fixes: ced7b713711f ("vhost/vsock: support SEQPACKET for transport"). Tested-by: Arseny Krasnov Cc: David S. Miller Cc: Stefan Hajnoczi Message-ID: <20240422100010-mutt-send-email-mst@kernel.org> Signed-off-by: Michael S. Tsirkin Acked-by: Jason Wang Reviewed-by: Stefano Garzarella Reviewed-by: Eugenio Pérez Acked-by: Jakub Kicinski Signed-off-by: Sasha Levin

virtio/vsock: replace virtio_vsock_pkt with sk_buff

2023-11-20T10:52:17+00:00

[ Upstream commit 71dc9ec9ac7d3eee785cdc986c3daeb821381e20 ] This commit changes virtio/vsock to use sk_buff instead of virtio_vsock_pkt. Beyond better conforming to other net code, using sk_buff allows vsock to use sk_buff-dependent features in the future (such as sockmap) and improves throughput. This patch introduces the following performance changes: Tool: Uperf Env: Phys Host + L1 Guest Payload: 64k Threads: 16 Test Runs: 10 Type: SOCK_STREAM Before: commit b7bfaa761d760 ("Linux 6.2-rc3") Before ------ g2h: 16.77Gb/s h2g: 10.56Gb/s After ----- g2h: 21.04Gb/s h2g: 10.76Gb/s Signed-off-by: Bobby Eshleman Reviewed-by: Stefano Garzarella Signed-off-by: David S. Miller Stable-dep-of: 3a5cc90a4d17 ("vsock/virtio: remove socket from connected/bound list on shutdown") Signed-off-by: Sasha Levin

use less confusing names for iov_iter direction initializers

2023-02-09T10:28:04+00:00

[ Upstream commit de4eda9de2d957ef2d6a8365a01e26a435e958cb ] READ/WRITE proved to be actively confusing - the meanings are "data destination, as used with read(2)" and "data source, as used with write(2)", but people keep interpreting those as "we read data from it" and "we write data to it", i.e. exactly the wrong way. Call them ITER_DEST and ITER_SOURCE - at least that is harder to misinterpret... Signed-off-by: Al Viro Stable-dep-of: 6dd88fd59da8 ("vhost-scsi: unbreak any layout for response") Signed-off-by: Sasha Levin

vhost/vsock: Fix error handling in vhost_vsock_init()

2023-01-12T11:02:06+00:00

[ Upstream commit 7a4efe182ca61fb3e5307e69b261c57cbf434cd4 ] A problem about modprobe vhost_vsock failed is triggered with the following log given: modprobe: ERROR: could not insert 'vhost_vsock': Device or resource busy The reason is that vhost_vsock_init() returns misc_register() directly without checking its return value, if misc_register() failed, it returns without calling vsock_core_unregister() on vhost_transport, resulting the vhost_vsock can never be installed later. A simple call graph is shown as below: vhost_vsock_init() vsock_core_register() # register vhost_transport misc_register() device_create_with_groups() device_create_groups_vargs() dev = kzalloc(...) # OOM happened # return without unregister vhost_transport Fix by calling vsock_core_unregister() when misc_register() returns error. Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko") Signed-off-by: Yuan Can Message-Id: <20221108101705.45981-1-yuancan@huawei.com> Signed-off-by: Michael S. Tsirkin Reviewed-by: Stefano Garzarella Acked-by: Jason Wang Signed-off-by: Sasha Levin

vhost/vsock: Use kvmalloc/kvfree for larger packets.

2022-09-30T01:34:08+00:00

When copying a large file over sftp over vsock, data size is usually 32kB, and kmalloc seems to fail to try to allocate 32 32kB regions. vhost-5837: page allocation failure: order:4, mode:0x24040c0 Call Trace: [] dump_stack+0x97/0xdb [] warn_alloc_failed+0x10f/0x138 [] ? __alloc_pages_direct_compact+0x38/0xc8 [] __alloc_pages_nodemask+0x84c/0x90d [] alloc_kmem_pages+0x17/0x19 [] kmalloc_order_trace+0x2b/0xdb [] __kmalloc+0x177/0x1f7 [] ? copy_from_iter+0x8d/0x31d [] vhost_vsock_handle_tx_kick+0x1fa/0x301 [vhost_vsock] [] vhost_worker+0xf7/0x157 [vhost] [] kthread+0xfd/0x105 [] ? vhost_dev_set_owner+0x22e/0x22e [vhost] [] ? flush_kthread_worker+0xf3/0xf3 [] ret_from_fork+0x4e/0x80 [] ? flush_kthread_worker+0xf3/0xf3 Work around by doing kvmalloc instead. Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko") Signed-off-by: Junichi Uekawa Reviewed-by: Stefano Garzarella Acked-by: Michael S. Tsirkin Link: https://lore.kernel.org/r/20220928064538.667678-1-uekawa@chromium.org Signed-off-by: Jakub Kicinski

vhost: rename vhost_work_dev_flush

2022-05-31T16:45:10+00:00

This patch renames vhost_work_dev_flush to just vhost_dev_flush to relfect that it flushes everything on the device and that drivers don't know/care that polls are based on vhost_works. Drivers just flush the entire device and polls, and works for vhost-scsi management TMFs and IO net virtqueues, etc all are flushed. Signed-off-by: Mike Christie Acked-by: Jason Wang Reviewed-by: Stefano Garzarella Message-Id: <20220517180850.198915-9-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin

vhost_vsock: simplify vhost_vsock_flush()

2022-05-31T16:45:10+00:00

vhost_vsock_flush() calls vhost_work_dev_flush(vsock->vqs[i].poll.dev) before vhost_work_dev_flush(&vsock->dev). This seems pointless as vsock->vqs[i].poll.dev is the same as &vsock->dev and several flushes in a row doesn't do anything useful, one is just enough. Signed-off-by: Andrey Ryabinin Reviewed-by: Stefano Garzarella Signed-off-by: Mike Christie Acked-by: Jason Wang Message-Id: <20220517180850.198915-6-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin

vhost: get rid of vhost_poll_flush() wrapper

2022-05-31T16:45:10+00:00

vhost_poll_flush() is a simple wrapper around vhost_work_dev_flush(). It gives wrong impression that we are doing some work over vhost_poll, while in fact it flushes vhost_poll->dev. It only complicate understanding of the code and leads to mistakes like flushing the same vhost_dev several times in a row. Just remove vhost_poll_flush() and call vhost_work_dev_flush() directly. Signed-off-by: Andrey Ryabinin [merge vhost_poll_flush removal from Stefano Garzarella] Signed-off-by: Mike Christie Reviewed-by: Chaitanya Kulkarni Acked-by: Jason Wang Reviewed-by: Stefano Garzarella Message-Id: <20220517180850.198915-2-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin