summaryrefslogtreecommitdiff
path: root/drivers/vhost
AgeCommit message (Collapse)AuthorFilesLines
2026-01-19vhost/vsock: improve RCU read sections around vhost_vsock_get()Stefano Garzarella1-4/+11
[ Upstream commit d8ee3cfdc89b75dc059dc21c27bef2c1440f67eb ] vhost_vsock_get() uses hash_for_each_possible_rcu() to find the `vhost_vsock` associated with the `guest_cid`. hash_for_each_possible_rcu() should only be called within an RCU read section, as mentioned in the following comment in include/linux/rculist.h: /** * hlist_for_each_entry_rcu - iterate over rcu list of given type * @pos: the type * to use as a loop cursor. * @head: the head for your list. * @member: the name of the hlist_node within the struct. * @cond: optional lockdep expression if called from non-RCU protection. * * This list-traversal primitive may safely run concurrently with * the _rcu list-mutation primitives such as hlist_add_head_rcu() * as long as the traversal is guarded by rcu_read_lock(). */ Currently, all calls to vhost_vsock_get() are between rcu_read_lock() and rcu_read_unlock() except for calls in vhost_vsock_set_cid() and vhost_vsock_reset_orphans(). In both cases, the current code is safe, but we can make improvements to make it more robust. About vhost_vsock_set_cid(), when building the kernel with CONFIG_PROVE_RCU_LIST enabled, we get the following RCU warning when the user space issues `ioctl(dev, VHOST_VSOCK_SET_GUEST_CID, ...)` : WARNING: suspicious RCU usage 6.18.0-rc7 #62 Not tainted ----------------------------- drivers/vhost/vsock.c:74 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by rpc-libvirtd/3443: #0: ffffffffc05032a8 (vhost_vsock_mutex){+.+.}-{4:4}, at: vhost_vsock_dev_ioctl+0x2ff/0x530 [vhost_vsock] stack backtrace: CPU: 2 UID: 0 PID: 3443 Comm: rpc-libvirtd Not tainted 6.18.0-rc7 #62 PREEMPT(none) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-7.fc42 06/10/2025 Call Trace: <TASK> dump_stack_lvl+0x75/0xb0 dump_stack+0x14/0x1a lockdep_rcu_suspicious.cold+0x4e/0x97 vhost_vsock_get+0x8f/0xa0 [vhost_vsock] vhost_vsock_dev_ioctl+0x307/0x530 [vhost_vsock] __x64_sys_ioctl+0x4f2/0xa00 x64_sys_call+0xed0/0x1da0 do_syscall_64+0x73/0xfa0 entry_SYSCALL_64_after_hwframe+0x76/0x7e ... </TASK> This is not a real problem, because the vhost_vsock_get() caller, i.e. vhost_vsock_set_cid(), holds the `vhost_vsock_mutex` used by the hash table writers. Anyway, to prevent that warning, add lockdep_is_held() condition to hash_for_each_possible_rcu() to verify that either the caller is in an RCU read section or `vhost_vsock_mutex` is held when CONFIG_PROVE_RCU_LIST is enabled; and also clarify the comment for vhost_vsock_get() to better describe the locking requirements and the scope of the returned pointer validity. About vhost_vsock_reset_orphans(), currently this function is only called via vsock_for_each_connected_socket(), which holds the `vsock_table_lock` spinlock (which is also an RCU read-side critical section). However, add an explicit RCU read lock there to make the code more robust and explicit about the RCU requirements, and to prevent issues if the calling context changes in the future or if vhost_vsock_reset_orphans() is called from other contexts. Fixes: 834e772c8db0 ("vhost/vsock: fix use-after-free in network stack callers") Cc: stefanha@redhat.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20251126133826.142496-1-sgarzare@redhat.com> Message-ID: <20251126210313.GA499503@fedora> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-19vdpa: Sync calls set/get config/status with cf_mutexEli Cohen1-4/+3
[ Upstream commit 73bc0dbb591baea322a7319c735e5f6c7dba9cfb ] Add wrappers to get/set status and protect these operations with cf_mutex to serialize these operations with respect to get/set config operations. Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220105114646.577224-4-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Stable-dep-of: e40b6abe0b12 ("virtio_vdpa: fix misleading return in void function") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-19vdpa: Introduce and use vdpa device get, set config helpersParav Pandit1-2/+1
[ Upstream commit 6dbb1f1687a2ccdfc5b84b0a35bbc6dfefc4de3b ] Subsequent patches enable get and set configuration either via management device or via vdpa device' config ops. This requires synchronization between multiple callers to get and set config callbacks. Features setting also influence the layout of the configuration fields endianness. To avoid exposing synchronization primitives to callers, introduce helper for setting the configuration and use it. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20211026175519.87795-2-parav@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Stable-dep-of: e40b6abe0b12 ("virtio_vdpa: fix misleading return in void function") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-04vhost/net: Protect ubufs with rcu read lock in vhost_net_ubuf_put()Nikolay Kuratov1-2/+7
commit dd54bcf86c91a4455b1f95cbc8e9ac91205f3193 upstream. When operating on struct vhost_net_ubuf_ref, the following execution sequence is theoretically possible: CPU0 is finalizing DMA operation CPU1 is doing VHOST_NET_SET_BACKEND // ubufs->refcount == 2 vhost_net_ubuf_put() vhost_net_ubuf_put_wait_and_free(oldubufs) vhost_net_ubuf_put_and_wait() vhost_net_ubuf_put() int r = atomic_sub_return(1, &ubufs->refcount); // r = 1 int r = atomic_sub_return(1, &ubufs->refcount); // r = 0 wait_event(ubufs->wait, !atomic_read(&ubufs->refcount)); // no wait occurs here because condition is already true kfree(ubufs); if (unlikely(!r)) wake_up(&ubufs->wait); // use-after-free This leads to use-after-free on ubufs access. This happens because CPU1 skips waiting for wake_up() when refcount is already zero. To prevent that use a read-side RCU critical section in vhost_net_ubuf_put(), as suggested by Hillf Danton. For this lock to take effect, free ubufs with kfree_rcu(). Cc: stable@vger.kernel.org Fixes: 0ad8b480d6ee9 ("vhost: fix ref cnt checking deadlock") Reported-by: Andrey Ryabinin <arbn@yandex-team.com> Suggested-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Nikolay Kuratov <kniv@yandex-team.ru> Message-Id: <20250805130917.727332-1-kniv@yandex-team.ru> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-28vhost: fail early when __vhost_add_used() failsJason Wang1-0/+3
[ Upstream commit b4ba1207d45adaafa2982c035898b36af2d3e518 ] This patch fails vhost_add_used_n() early when __vhost_add_used() fails to make sure used idx is not updated with stale used ring information. Reported-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20250714084755.11921-2-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-08-28vhost-scsi: Fix log flooding with target does not exist errorsMike Christie1-3/+1
[ Upstream commit 69cd720a8a5e9ef0f05ce5dd8c9ea6e018245c82 ] As part of the normal initiator side scanning the guest's scsi layer will loop over all possible targets and send an inquiry. Since the max number of targets for virtio-scsi is 256, this can result in 255 error messages about targets not existing if you only have a single target. When there's more than 1 vhost-scsi device each with a single target, then you get N * 255 log messages. It looks like the log message was added by accident in: commit 3f8ca2e115e5 ("vhost/scsi: Extract common handling code from control queue handler") when we added common helpers. Then in: commit 09d7583294aa ("vhost/scsi: Use common handling code in request queue handler") we converted the scsi command processing path to use the new helpers so we started to see the extra log messages during scanning. The patches were just making some code common but added the vq_err call and I'm guessing the patch author forgot to enable the vq_err call (vq_err is implemented by pr_debug which defaults to off). So this patch removes the call since it's expected to hit this path during device discovery. Fixes: 09d7583294aa ("vhost/scsi: Use common handling code in request queue handler") Signed-off-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20250611210113.10912-1-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-17vhost-scsi: protect vq->log_used with vq->mutexDongli Zhang1-1/+6
commit f591cf9fce724e5075cc67488c43c6e39e8cbe27 upstream. The vhost-scsi completion path may access vq->log_base when vq->log_used is already set to false. vhost-thread QEMU-thread vhost_scsi_complete_cmd_work() -> vhost_add_used() -> vhost_add_used_n() if (unlikely(vq->log_used)) QEMU disables vq->log_used via VHOST_SET_VRING_ADDR. mutex_lock(&vq->mutex); vq->log_used = false now! mutex_unlock(&vq->mutex); QEMU gfree(vq->log_base) log_used() -> log_write(vq->log_base) Assuming the VMM is QEMU. The vq->log_base is from QEMU userpace and can be reclaimed via gfree(). As a result, this causes invalid memory writes to QEMU userspace. The control queue path has the same issue. Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20250403063028.16045-2-dongli.zhang@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> [ Resolved conflicts in drivers/vhost/scsi.c bacause vhost_scsi_complete_cmd_work() has been refactored. ] Signed-off-by: Xinyu Zheng <zhengxinyu6@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-17vhost/scsi: null-ptr-dereference in vhost_scsi_get_req()Haoran Zhang1-12/+15
commit 221af82f606d928ccef19a16d35633c63026f1be upstream. Since commit 3f8ca2e115e5 ("vhost/scsi: Extract common handling code from control queue handler") a null pointer dereference bug can be triggered when guest sends an SCSI AN request. In vhost_scsi_ctl_handle_vq(), `vc.target` is assigned with `&v_req.tmf.lun[1]` within a switch-case block and is then passed to vhost_scsi_get_req() which extracts `vc->req` and `tpg`. However, for a `VIRTIO_SCSI_T_AN_*` request, tpg is not required, so `vc.target` is set to NULL in this branch. Later, in vhost_scsi_get_req(), `vc->target` is dereferenced without being checked, leading to a null pointer dereference bug. This bug can be triggered from guest. When this bug occurs, the vhost_worker process is killed while holding `vq->mutex` and the corresponding tpg will remain occupied indefinitely. Below is the KASAN report: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 1 PID: 840 Comm: poc Not tainted 6.10.0+ #1 Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:vhost_scsi_get_req+0x165/0x3a0 Code: 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 2b 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 65 30 4c 89 e2 48 c1 ea 03 <0f> b6 04 02 4c 89 e2 83 e2 07 38 d0 7f 08 84 c0 0f 85 be 01 00 00 RSP: 0018:ffff888017affb50 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff88801b000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888017affcb8 RBP: ffff888017affb80 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff888017affc88 R14: ffff888017affd1c R15: ffff888017993000 FS: 000055556e076500(0000) GS:ffff88806b100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000200027c0 CR3: 0000000010ed0004 CR4: 0000000000370ef0 Call Trace: <TASK> ? show_regs+0x86/0xa0 ? die_addr+0x4b/0xd0 ? exc_general_protection+0x163/0x260 ? asm_exc_general_protection+0x27/0x30 ? vhost_scsi_get_req+0x165/0x3a0 vhost_scsi_ctl_handle_vq+0x2a4/0xca0 ? __pfx_vhost_scsi_ctl_handle_vq+0x10/0x10 ? __switch_to+0x721/0xeb0 ? __schedule+0xda5/0x5710 ? __kasan_check_write+0x14/0x30 ? _raw_spin_lock+0x82/0xf0 vhost_scsi_ctl_handle_kick+0x52/0x90 vhost_run_work_list+0x134/0x1b0 vhost_task_fn+0x121/0x350 ... </TASK> ---[ end trace 0000000000000000 ]--- Let's add a check in vhost_scsi_get_req. Fixes: 3f8ca2e115e5 ("vhost/scsi: Extract common handling code from control queue handler") Signed-off-by: Haoran Zhang <wh1sper@zju.edu.cn> [whitespace fixes] Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <b26d7ddd-b098-4361-88f8-17ca7f90adf7@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-17vhost_vdpa: assign irq bypass producer token correctlyJason Wang1-3/+13
[ Upstream commit 02e9e9366fefe461719da5d173385b6685f70319 ] We used to call irq_bypass_unregister_producer() in vhost_vdpa_setup_vq_irq() which is problematic as we don't know if the token pointer is still valid or not. Actually, we use the eventfd_ctx as the token so the life cycle of the token should be bound to the VHOST_SET_VRING_CALL instead of vhost_vdpa_setup_vq_irq() which could be called by set_status(). Fixing this by setting up irq bypass producer's token when handling VHOST_SET_VRING_CALL and un-registering the producer before calling vhost_vring_ioctl() to prevent a possible use after free as eventfd could have been released in vhost_vring_ioctl(). And such registering and unregistering will only be done if DRIVER_OK is set. Reported-by: Dragos Tatulea <dtatulea@nvidia.com> Tested-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Fixes: 2cf1ba9a4d15 ("vhost_vdpa: implement IRQ offloading in vhost_vdpa") Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20240816031900.18013-1-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17vdpa: Add eventfd for the vdpa callbackXie Yongji1-0/+2
[ Upstream commit 5e68470f4e80a4120e9ecec408f6ab4ad386bd4a ] Add eventfd for the vdpa callback so that user can signal it directly instead of triggering the callback. It will be used for vhost-vdpa case. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20230323053043.35-9-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Stable-dep-of: 02e9e9366fef ("vhost_vdpa: assign irq bypass producer token correctly") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-19vhost-vdpa: switch to use vmf_insert_pfn() in the fault handlerJason Wang1-7/+1
commit 0823dc64586ba5ea13a7d200a5d33e4c5fa45950 upstream. remap_pfn_page() should not be called in the fault handler as it may change the vma->flags which may trigger lockdep warning since the vma write lock is not held. Actually there's no need to modify the vma->flags as it has been set in the mmap(). So this patch switches to use vmf_insert_pfn() instead. Reported-by: Dragos Tatulea <dtatulea@nvidia.com> Tested-by: Dragos Tatulea <dtatulea@nvidia.com> Fixes: ddd89d0a059d ("vhost_vdpa: support doorbell mapping via mmap") Cc: stable@vger.kernel.org Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20240701033159.18133-1-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-19vhost/vsock: always initialize seqpacket_allowMichael S. Tsirkin1-2/+2
[ Upstream commit 1e1fdcbdde3b7663e5d8faeb2245b9b151417d22 ] There are two issues around seqpacket_allow: 1. seqpacket_allow is not initialized when socket is created. Thus if features are never set, it will be read uninitialized. 2. if VIRTIO_VSOCK_F_SEQPACKET is set and then cleared, then seqpacket_allow will not be cleared appropriately (existing apps I know about don't usually do this but it's legal and there's no way to be sure no one relies on this). To fix: - initialize seqpacket_allow after allocation - set it unconditionally in set_features Reported-by: syzbot+6c21aeb59d0e82eb2782@syzkaller.appspotmail.com Reported-by: Jeongjun Park <aha310510@gmail.com> Fixes: ced7b713711f ("vhost/vsock: support SEQPACKET for transport"). Tested-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Cc: David S. Miller <davem@davemloft.net> Cc: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20240422100010-mutt-send-email-mst@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-17vhost: Add smp_rmb() in vhost_vq_avail_empty()Gavin Shan1-1/+11
commit 22e1992cf7b034db5325660e98c41ca5afa5f519 upstream. A smp_rmb() has been missed in vhost_vq_avail_empty(), spotted by Will. Otherwise, it's not ensured the available ring entries pushed by guest can be observed by vhost in time, leading to stale available ring entries fetched by vhost in vhost_get_vq_desc(), as reported by Yihuang Yu on NVidia's grace-hopper (ARM64) platform. /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ -accel kvm -machine virt,gic-version=host -cpu host \ -smp maxcpus=1,cpus=1,sockets=1,clusters=1,cores=1,threads=1 \ -m 4096M,slots=16,maxmem=64G \ -object memory-backend-ram,id=mem0,size=4096M \ : \ -netdev tap,id=vnet0,vhost=true \ -device virtio-net-pci,bus=pcie.8,netdev=vnet0,mac=52:54:00:f1:26:b0 : guest# netperf -H 10.26.1.81 -l 60 -C -c -t UDP_STREAM virtio_net virtio0: output.0:id 100 is not a head! Add the missed smp_rmb() in vhost_vq_avail_empty(). When tx_can_batch() returns true, it means there's still pending tx buffers. Since it might read indices, so it still can bypass the smp_rmb() in vhost_get_vq_desc(). Note that it should be safe until vq->avail_idx is changed by commit 275bf960ac697 ("vhost: better detection of available buffers"). Fixes: 275bf960ac69 ("vhost: better detection of available buffers") Cc: <stable@kernel.org> # v4.11+ Reported-by: Yihuang Yu <yihyu@redhat.com> Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Gavin Shan <gshan@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20240328002149.1141302-2-gshan@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23vhost: use kzalloc() instead of kmalloc() followed by memset()Prathu Baronia1-3/+2
commit 4d8df0f5f79f747d75a7d356d9b9ea40a4e4c8a9 upstream. Use kzalloc() to allocate new zeroed out msg node instead of memsetting a node allocated with kmalloc(). Signed-off-by: Prathu Baronia <prathubaronia2011@gmail.com> Message-Id: <20230522085019.42914-1-prathubaronia2011@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Ajay Kaher <ajay.kaher@broadcom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-08vhost: Allow null msg.size on VHOST_IOTLB_INVALIDATEEric Auger1-3/+1
commit ca50ec377c2e94b0a9f8735de2856cd0f13beab4 upstream. Commit e2ae38cf3d91 ("vhost: fix hung thread due to erroneous iotlb entries") Forbade vhost iotlb msg with null size to prevent entries with size = start = 0 and last = ULONG_MAX to end up in the iotlb. Then commit 95932ab2ea07 ("vhost: allow batching hint without size") only applied the check for VHOST_IOTLB_UPDATE and VHOST_IOTLB_INVALIDATE message types to fix a regression observed with batching hit. Still, the introduction of that check introduced a regression for some users attempting to invalidate the whole ULONG_MAX range by setting the size to 0. This is the case with qemu/smmuv3/vhost integration which does not work anymore. It Looks safe to partially revert the original commit and allow VHOST_IOTLB_INVALIDATE messages with null size. vhost_iotlb_del_range() will compute a correct end iova. Same for vhost_vdpa_iotlb_unmap(). Signed-off-by: Eric Auger <eric.auger@redhat.com> Fixes: e2ae38cf3d91 ("vhost: fix hung thread due to erroneous iotlb entries") Cc: stable@vger.kernel.org # v5.17+ Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230927140544.205088-1-eric.auger@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-10vringh: don't use vringh_kiov_advance() in vringh_iov_xfer()Stefano Garzarella1-1/+11
commit 7aed44babc7f97e82b38e9a68515e699692cc100 upstream. In the while loop of vringh_iov_xfer(), `partlen` could be 0 if one of the `iov` has 0 lenght. In this case, we should skip the iov and go to the next one. But calling vringh_kiov_advance() with 0 lenght does not cause the advancement, since it returns immediately if asked to advance by 0 bytes. Let's restore the code that was there before commit b8c06ad4d67d ("vringh: implement vringh_kiov_advance()"), avoiding using vringh_kiov_advance(). Fixes: b8c06ad4d67d ("vringh: implement vringh_kiov_advance()") Cc: stable@vger.kernel.org Reported-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-28vhost_net: revert upend_idx only on retriable errorAndrey Smetanin1-3/+8
[ Upstream commit 1f5d2e3bab16369d5d4b4020a25db4ab1f4f082c ] Fix possible virtqueue used buffers leak and corresponding stuck in case of temporary -EIO from sendmsg() which is produced by tun driver while backend device is not up. In case of no-retriable error and zcopy do not revert upend_idx to pass packet data (that is update used_idx in corresponding vhost_zerocopy_signal_used()) as if packet data has been transferred successfully. v2: set vq->heads[ubuf->desc].len equal to VHOST_DMA_DONE_LEN in case of fake successful transmit. Signed-off-by: Andrey Smetanin <asmetanin@yandex-team.ru> Message-Id: <20230424204411.24888-1-asmetanin@yandex-team.ru> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Andrey Smetanin <asmetanin@yandex-team.ru> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-14vhost_vdpa: support PACKED when setting-getting vring_baseShannon Nelson1-4/+17
[ Upstream commit beee7fdb5b56a46415a4992d28dd4c2d06eb52df ] Use the right structs for PACKED or split vqs when setting and getting the vring base. Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230424225031.18947-4-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-14vhost: support PACKED when setting-getting vring_baseShannon Nelson2-7/+19
[ Upstream commit 55d8122f5cd62d5aaa225d7167dcd14a44c850b9 ] Use the right structs for PACKED or split vqs when setting and getting the vring base. Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230424225031.18947-3-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-09vhost/net: Clear the pending messages when the backend is removedEric Auger3-1/+6
[ Upstream commit 9526f9a2b762af16be94a72aca5d65c677d28f50 ] When the vhost iotlb is used along with a guest virtual iommu and the guest gets rebooted, some MISS messages may have been recorded just before the reboot and spuriously executed by the virtual iommu after the reboot. As vhost does not have any explicit reset user API, VHOST_NET_SET_BACKEND looks a reasonable point where to clear the pending messages, in case the backend is removed. Export vhost_clear_msg() and call it in vhost_net_set_backend() when fd == -1. Signed-off-by: Eric Auger <eric.auger@redhat.com> Suggested-by: Jason Wang <jasowang@redhat.com> Fixes: 6b1e6cc7855b0 ("vhost: new device IOTLB API") Message-Id: <20230117151518.44725-3-eric.auger@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12vhost: fix range used in translate_desc()Stefano Garzarella1-2/+2
[ Upstream commit 98047313cdb46828093894d0ac8b1183b8b317f9 ] vhost_iotlb_itree_first() requires `start` and `last` parameters to search for a mapping that overlaps the range. In translate_desc() we cyclically call vhost_iotlb_itree_first(), incrementing `addr` by the amount already translated, so rightly we move the `start` parameter passed to vhost_iotlb_itree_first(), but we should hold the `last` parameter constant. Let's fix it by saving the `last` parameter value before incrementing `addr` in the loop. Fixes: a9709d6874d5 ("vhost: convert pre sorted vhost memory array to interval tree") Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20221109102503.18816-3-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12vringh: fix range used in iotlb_translate()Stefano Garzarella1-3/+2
[ Upstream commit f85efa9b0f5381874f727bd98f56787840313f0b ] vhost_iotlb_itree_first() requires `start` and `last` parameters to search for a mapping that overlaps the range. In iotlb_translate() we cyclically call vhost_iotlb_itree_first(), incrementing `addr` by the amount already translated, so rightly we move the `start` parameter passed to vhost_iotlb_itree_first(), but we should hold the `last` parameter constant. Let's fix it by saving the `last` parameter value before incrementing `addr` in the loop. Fixes: 9ad9c49cfe97 ("vringh: IOTLB support") Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20221109102503.18816-2-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12vhost/vsock: Fix error handling in vhost_vsock_init()Yuan Can1-1/+8
[ Upstream commit 7a4efe182ca61fb3e5307e69b261c57cbf434cd4 ] A problem about modprobe vhost_vsock failed is triggered with the following log given: modprobe: ERROR: could not insert 'vhost_vsock': Device or resource busy The reason is that vhost_vsock_init() returns misc_register() directly without checking its return value, if misc_register() failed, it returns without calling vsock_core_unregister() on vhost_transport, resulting the vhost_vsock can never be installed later. A simple call graph is shown as below: vhost_vsock_init() vsock_core_register() # register vhost_transport misc_register() device_create_with_groups() device_create_groups_vargs() dev = kzalloc(...) # OOM happened # return without unregister vhost_transport Fix by calling vsock_core_unregister() when misc_register() returns error. Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko") Signed-off-by: Yuan Can <yuancan@huawei.com> Message-Id: <20221108101705.45981-1-yuancan@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-26vhost/vsock: Use kvmalloc/kvfree for larger packets.Junichi Uekawa1-1/+1
[ Upstream commit 0e3f72931fc47bb81686020cc643cde5d9cd0bb8 ] When copying a large file over sftp over vsock, data size is usually 32kB, and kmalloc seems to fail to try to allocate 32 32kB regions. vhost-5837: page allocation failure: order:4, mode:0x24040c0 Call Trace: [<ffffffffb6a0df64>] dump_stack+0x97/0xdb [<ffffffffb68d6aed>] warn_alloc_failed+0x10f/0x138 [<ffffffffb68d868a>] ? __alloc_pages_direct_compact+0x38/0xc8 [<ffffffffb664619f>] __alloc_pages_nodemask+0x84c/0x90d [<ffffffffb6646e56>] alloc_kmem_pages+0x17/0x19 [<ffffffffb6653a26>] kmalloc_order_trace+0x2b/0xdb [<ffffffffb66682f3>] __kmalloc+0x177/0x1f7 [<ffffffffb66e0d94>] ? copy_from_iter+0x8d/0x31d [<ffffffffc0689ab7>] vhost_vsock_handle_tx_kick+0x1fa/0x301 [vhost_vsock] [<ffffffffc06828d9>] vhost_worker+0xf7/0x157 [vhost] [<ffffffffb683ddce>] kthread+0xfd/0x105 [<ffffffffc06827e2>] ? vhost_dev_set_owner+0x22e/0x22e [vhost] [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3 [<ffffffffb6eb332e>] ret_from_fork+0x4e/0x80 [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3 Work around by doing kvmalloc instead. Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko") Signed-off-by: Junichi Uekawa <uekawa@chromium.org> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20220928064538.667678-1-uekawa@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14vringh: Fix loop descriptors check in the indirect casesXie Yongji1-2/+8
[ Upstream commit dbd29e0752286af74243cf891accf472b2f3edd8 ] We should use size of descriptor chain to test loop condition in the indirect case. And another statistical count is also introduced for indirect descriptors to avoid conflict with the statistical count of direct descriptors. Fixes: f87d0fbb5798 ("vringh: host-side implementation of virtio rings.") Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Fam Zheng <fam.zheng@bytedance.com> Message-Id: <20220505100910.137-1-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-25Fix double fget() in vhost_net_set_backend()Al Viro1-8/+7
commit fb4554c2232e44d595920f4d5c66cf8f7d13f9bc upstream. Descriptor table is a shared resource; two fget() on the same descriptor may return different struct file references. get_tap_ptr_ring() is called after we'd found (and pinned) the socket we'll be using and it tries to find the private tun/tap data structures associated with it. Redoing the lookup by the same file descriptor we'd used to get the socket is racy - we need to same struct file. Thanks to Jason for spotting a braino in the original variant of patch - I'd missed the use of fd == -1 for disabling backend, and in that case we can end up with sock == NULL and sock != oldsock. Cc: stable@kernel.org Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-25vhost_vdpa: don't setup irq offloading when irq_num < 0Zhu Lingshan1-1/+4
[ Upstream commit cce0ab2b2a39072d81f98017f7b076f3410ef740 ] When irq number is negative(e.g., -EINVAL), the virtqueue may be disabled or the virtqueues are sharing a device irq. In such case, we should not setup irq offloading for a virtqueue. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Link: https://lore.kernel.org/r/20220222115428.998334-3-lingshan.zhu@intel.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-13tuntap: add sanity checks about msg_controllen in sendmsgHarold Huang1-0/+1
[ Upstream commit 74a335a07a17d131b9263bfdbdcb5e40673ca9ca ] In patch [1], tun_msg_ctl was added to allow pass batched xdp buffers to tun_sendmsg. Although we donot use msg_controllen in this path, we should check msg_controllen to make sure the caller pass a valid msg_ctl. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fe8dd45bb7556246c6b76277b1ba4296c91c2505 Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Harold Huang <baymaxhuang@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20220303022441.383865-1-baymaxhuang@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08vhost: handle error while adding split ranges to iotlbAnirudh Rayabharam1-1/+5
commit 03a91c9af2c42ae14afafb829a4b7e6589ab5892 upstream. vhost_iotlb_add_range_ctx() handles the range [0, ULONG_MAX] by splitting it into two ranges and adding them separately. The return value of adding the first range to the iotlb is currently ignored. Check the return value and bail out in case of an error. Signed-off-by: Anirudh Rayabharam <mail@anirudhrb.com> Link: https://lore.kernel.org/r/20220312141121.4981-1-mail@anirudhrb.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Fixes: e2ae38cf3d91 ("vhost: fix hung thread due to erroneous iotlb entries") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-23vsock: each transport cycles only on its own socketsJiyong Park1-1/+2
[ Upstream commit 8e6ed963763fe21429eabfc76c69ce2b0163a3dd ] When iterating over sockets using vsock_for_each_connected_socket, make sure that a transport filters out sockets that don't belong to the transport. There actually was an issue caused by this; in a nested VM configuration, destroying the nested VM (which often involves the closing of /dev/vhost-vsock if there was h2g connections to the nested VM) kills not only the h2g connections, but also all existing g2h connections to the (outmost) host which are totally unrelated. Tested: Executed the following steps on Cuttlefish (Android running on a VM) [1]: (1) Enter into an `adb shell` session - to have a g2h connection inside the VM, (2) open and then close /dev/vhost-vsock by `exec 3< /dev/vhost-vsock && exec 3<&-`, (3) observe that the adb session is not reset. [1] https://android.googlesource.com/device/google/cuttlefish/ Fixes: c0cfa2d8a788 ("vsock: add multi-transports support") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jiyong Park <jiyong@google.com> Link: https://lore.kernel.org/r/20220311020017.1509316-1-jiyong@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-16vhost: allow batching hint without sizeJason Wang1-1/+3
commit 95932ab2ea07b79cdb33121e2f40ccda9e6a73b5 upstream. Commit e2ae38cf3d91 ("vhost: fix hung thread due to erroneous iotlb entries") tries to reject the IOTLB message whose size is zero. But the size is not necessarily meaningful, one example is the batching hint, so the commit breaks that. Fixing this be reject zero size message only if the message is used to update/invalidate the IOTLB. Fixes: e2ae38cf3d91 ("vhost: fix hung thread due to erroneous iotlb entries") Reported-by: Eli Cohen <elic@nvidia.com> Cc: Anirudh Rayabharam <mail@anirudhrb.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20220310075211.4801-1-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-16vhost: fix hung thread due to erroneous iotlb entriesAnirudh Rayabharam2-0/+16
[ Upstream commit e2ae38cf3d91837a493cb2093c87700ff3cbe667 ] In vhost_iotlb_add_range_ctx(), range size can overflow to 0 when start is 0 and last is ULONG_MAX. One instance where it can happen is when userspace sends an IOTLB message with iova=size=uaddr=0 (vhost_process_iotlb_msg). So, an entry with size = 0, start = 0, last = ULONG_MAX ends up in the iotlb. Next time a packet is sent, iotlb_access_ok() loops indefinitely due to that erroneous entry. Call Trace: <TASK> iotlb_access_ok+0x21b/0x3e0 drivers/vhost/vhost.c:1340 vq_meta_prefetch+0xbc/0x280 drivers/vhost/vhost.c:1366 vhost_transport_do_send_pkt+0xe0/0xfd0 drivers/vhost/vsock.c:104 vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372 kthread+0x2e9/0x3a0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 </TASK> Reported by syzbot at: https://syzkaller.appspot.com/bug?extid=0abd373e2e50d704db87 To fix this, do two things: 1. Return -EINVAL in vhost_chr_write_iter() when userspace asks to map a range with size 0. 2. Fix vhost_iotlb_add_range_ctx() to handle the range [0, ULONG_MAX] by splitting it into two entries. Fixes: 0bbe30668d89e ("vhost: factor out IOTLB") Reported-by: syzbot+0abd373e2e50d704db87@syzkaller.appspotmail.com Tested-by: syzbot+0abd373e2e50d704db87@syzkaller.appspotmail.com Signed-off-by: Anirudh Rayabharam <mail@anirudhrb.com> Link: https://lore.kernel.org/r/20220305095525.5145-1-mail@anirudhrb.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-02vhost/vsock: don't check owner in vhost_vsock_stop() while releasingStefano Garzarella1-7/+14
commit a58da53ffd70294ebea8ecd0eb45fd0d74add9f9 upstream. vhost_vsock_stop() calls vhost_dev_check_owner() to check the device ownership. It expects current->mm to be valid. vhost_vsock_stop() is also called by vhost_vsock_dev_release() when the user has not done close(), so when we are in do_exit(). In this case current->mm is invalid and we're releasing the device, so we should clean it anyway. Let's check the owner only when vhost_vsock_stop() is called by an ioctl. When invoked from release we can not fail so we don't check return code of vhost_vsock_stop(). We need to stop vsock even if it's not the owner. Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko") Cc: stable@vger.kernel.org Reported-by: syzbot+1e3ea63db39f2b4440e0@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+3140b17cb44a7b174008@syzkaller.appspotmail.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-22vdpa: check that offsets are within boundsDan Carpenter1-1/+1
commit 3ed21c1451a14d139e1ceb18f2fa70865ce3195a upstream. In this function "c->off" is a u32 and "size" is a long. On 64bit systems if "c->off" is greater than "size" then "size - c->off" is a negative and we always return -E2BIG. But on 32bit systems the subtraction is type promoted to a high positive u32 value and basically any "c->len" is accepted. Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Reported-by: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20211208103337.GA4047@kili Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01vhost/vsock: fix incorrect used length reported to the guestStefano Garzarella1-1/+1
commit 49d8c5ffad07ca014cfae72a1b9b8c52b6ad9cb8 upstream. The "used length" reported by calling vhost_add_used() must be the number of bytes written by the device (using "in" buffers). In vhost_vsock_handle_tx_kick() the device only reads the guest buffers (they are all "out" buffers), without writing anything, so we must pass 0 as "used length" to comply virtio spec. Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko") Cc: stable@vger.kernel.org Reported-by: Halil Pasic <pasic@linux.ibm.com> Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20211122163525.294024-2-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-18Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-5/+5
Pull virtio fixes from Michael Tsirkin: "Fixes up some issues in rc5" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost-vdpa: Fix the wrong input in config_cb VDUSE: fix documentation underline warning Revert "virtio-blk: Add validation for block size in config space" vhost_vdpa: unset vq irq before freeing irq virtio: write back F_VERSION_1 before validate
2021-10-13vhost-vdpa: Fix the wrong input in config_cbCindy Lu1-1/+1
Fix the wrong input in for config_cb. In function vhost_vdpa_config_cb, the input cb.private was used as struct vhost_vdpa, so the input was wrong here, fix this issue Fixes: 776f395004d8 ("vhost_vdpa: Support config interrupt in vdpa") Signed-off-by: Cindy Lu <lulu@redhat.com> Link: https://lore.kernel.org/r/20210929090933.20465-1-lulu@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-10-13vhost_vdpa: unset vq irq before freeing irqWu Zongyong1-4/+4
Currently we unset vq irq after freeing irq and that will result in error messages: pi_update_irte: failed to update PI IRTE irq bypass consumer (token 000000005a07a12b) unregistration fails: -22 This patch solves this. Signed-off-by: Wu Zongyong <wuzongyong@linux.alibaba.com> Link: https://lore.kernel.org/r/02637d38dcf4e4b836c5b3a65055fe92bf812b3b.1631687872.git.wuzongyong@linux.alibaba.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2021-09-28Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-1/+1
Pull virtio/vdpa fixes from Michael Tsirkin: "Fixes up some issues in rc1" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vdpa: potential uninitialized return in vhost_vdpa_va_map() vdpa/mlx5: Avoid executing set_vq_ready() if device is reset vdpa/mlx5: Clear ready indication for control VQ vduse: Cleanup the old kernel states after reset failure vduse: missing error code in vduse_init() virtio: don't fail on !of_device_is_compatible
2021-09-16Merge tag 'net-5.15-rc2' of ↵Linus Torvalds1-1/+10
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf. Current release - regressions: - vhost_net: fix OoB on sendmsg() failure - mlx5: bridge, fix uninitialized variable usage - bnxt_en: fix error recovery regression Current release - new code bugs: - bpf, mm: fix lockdep warning triggered by stack_map_get_build_id_offset() Previous releases - regressions: - r6040: restore MDIO clock frequency after MAC reset - tcp: fix tp->undo_retrans accounting in tcp_sacktag_one() - dsa: flush switchdev workqueue before tearing down CPU/DSA ports Previous releases - always broken: - ptp: dp83640: don't define PAGE0, avoid compiler warning - igc: fix tunnel segmentation offloads - phylink: update SFP selected interface on advertising changes - stmmac: fix system hang caused by eee_ctrl_timer during suspend/resume - mlx5e: fix mutual exclusion between CQE compression and HW TS Misc: - bpf, cgroups: fix cgroup v2 fallback on v1/v2 mixed mode - sfc: fallback for lack of xdp tx queues - hns3: add option to turn off page pool feature" * tag 'net-5.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (67 commits) mlxbf_gige: clear valid_polarity upon open igc: fix tunnel offloading net/{mlx5|nfp|bnxt}: Remove unnecessary RTNL lock assert net: wan: wanxl: define CROSS_COMPILE_M68K selftests: nci: replace unsigned int with int net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports Revert "net: phy: Uniform PHY driver access" net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup ptp: dp83640: don't define PAGE0 bnx2x: Fix enabling network interfaces without VFs Revert "Revert "ipv4: fix memory leaks in ip_cmsg_send() callers"" tcp: fix tp->undo_retrans accounting in tcp_sacktag_one() net-caif: avoid user-triggerable WARN_ON(1) bpf, selftests: Add test case for mixed cgroup v1/v2 bpf, selftests: Add cgroup v1 net_cls classid helpers bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode bpf: Add oversize check before call kvcalloc() net: hns3: fix the timing issue of VF clearing interrupt sources net: hns3: fix the exception when query imp info net: hns3: disable mac in flr process ...
2021-09-15vdpa: potential uninitialized return in vhost_vdpa_va_map()Dan Carpenter1-1/+1
The concern here is that "ret" can be uninitialized if we hit the "goto next" condition on every iteration through the loop. Fixes: 41ba1b5f9d4b ("vdpa: Support transferring virtual addressing during DMA mapping") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20210907073253.GB18254@kili Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2021-09-12Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds4-73/+177
Pull virtio updates from Michael Tsirkin: - vduse driver ("vDPA Device in Userspace") supporting emulated virtio block devices - virtio-vsock support for end of record with SEQPACKET - vdpa: mac and mq support for ifcvf and mlx5 - vdpa: management netlink for ifcvf - virtio-i2c, gpio dt bindings - misc fixes and cleanups * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (39 commits) Documentation: Add documentation for VDUSE vduse: Introduce VDUSE - vDPA Device in Userspace vduse: Implement an MMU-based software IOTLB vdpa: Support transferring virtual addressing during DMA mapping vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap() vdpa: Add an opaque pointer for vdpa_config_ops.dma_map() vhost-iotlb: Add an opaque pointer for vhost IOTLB vhost-vdpa: Handle the failure of vdpa_reset() vdpa: Add reset callback in vdpa_config_ops vdpa: Fix some coding style issues file: Export receive_fd() to modules eventfd: Export eventfd_wake_count to modules iova: Export alloc_iova_fast() and free_iova_fast() virtio-blk: remove unneeded "likely" statements virtio-balloon: Use virtio_find_vqs() helper vdpa: Make use of PFN_PHYS/PFN_UP/PFN_DOWN helper macro vsock_test: update message bounds test for MSG_EOR af_vsock: rename variables in receive loop virtio/vsock: support MSG_EOR bit processing vhost/vsock: support MSG_EOR bit processing ...
2021-09-09vhost_net: fix OoB on sendmsg() failure.Paolo Abeni1-1/+10
If the sendmsg() call in vhost_tx_batch() fails, both the 'batched_xdp' and 'done_idx' indexes are left unchanged. If such failure happens when batched_xdp == VHOST_NET_BATCH, the next call to vhost_net_build_xdp() will access and write memory outside the xdp buffers area. Since sendmsg() can only error with EBADFD, this change addresses the issue explicitly freeing the XDP buffers batch on error. Fixes: 0a0be13b8fe2 ("vhost_net: batch submitting XDP buffers to underlayer sockets") Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-06vdpa: Support transferring virtual addressing during DMA mappingXie Yongji1-11/+88
This patch introduces an attribute for vDPA device to indicate whether virtual address can be used. If vDPA device driver set it, vhost-vdpa bus driver will not pin user page and transfer userspace virtual address instead of physical address during DMA mapping. And corresponding vma->vm_file and offset will be also passed as an opaque pointer. Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210831103634.33-11-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-06vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap()Xie Yongji1-20/+35
The upcoming patch is going to support VA mapping/unmapping. So let's factor out the logic of PA mapping/unmapping firstly to make the code more readable. Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210831103634.33-10-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-06vdpa: Add an opaque pointer for vdpa_config_ops.dma_map()Xie Yongji1-1/+1
Add an opaque pointer for DMA mapping. Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210831103634.33-9-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-06vhost-iotlb: Add an opaque pointer for vhost IOTLBXie Yongji1-4/+16
Add an opaque pointer for vhost IOTLB. And introduce vhost_iotlb_add_range_ctx() to accept it. Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210831103634.33-8-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-06vhost-vdpa: Handle the failure of vdpa_reset()Xie Yongji1-3/+6
The vdpa_reset() may fail now. This adds check to its return value and fail the vhost_vdpa_open(). Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20210831103634.33-7-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-06vdpa: Add reset callback in vdpa_config_opsXie Yongji1-2/+7
This adds a new callback to support device specific reset behavior. The vdpa bus driver will call the reset function instead of setting status to zero during resetting. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Link: https://lore.kernel.org/r/20210831103634.33-6-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-06vdpa: Make use of PFN_PHYS/PFN_UP/PFN_DOWN helper macroCai Huoqing1-12/+12
it's a nice refactor to make use of PFN_PHYS/PFN_UP/PFN_DOWN helper macro Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Link: https://lore.kernel.org/r/20210802013717.851-1-caihuoqing@baidu.com Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>