kernel/linux.git/drivers/vhost/scsi.c, branch v6.6.133

vhost-scsi: Fix log flooding with target does not exist errors

2025-08-15T10:08:57+00:00

[ Upstream commit 69cd720a8a5e9ef0f05ce5dd8c9ea6e018245c82 ] As part of the normal initiator side scanning the guest's scsi layer will loop over all possible targets and send an inquiry. Since the max number of targets for virtio-scsi is 256, this can result in 255 error messages about targets not existing if you only have a single target. When there's more than 1 vhost-scsi device each with a single target, then you get N * 255 log messages. It looks like the log message was added by accident in: commit 3f8ca2e115e5 ("vhost/scsi: Extract common handling code from control queue handler") when we added common helpers. Then in: commit 09d7583294aa ("vhost/scsi: Use common handling code in request queue handler") we converted the scsi command processing path to use the new helpers so we started to see the extra log messages during scanning. The patches were just making some code common but added the vq_err call and I'm guessing the patch author forgot to enable the vq_err call (vq_err is implemented by pr_debug which defaults to off). So this patch removes the call since it's expected to hit this path during device discovery. Fixes: 09d7583294aa ("vhost/scsi: Use common handling code in request queue handler") Signed-off-by: Mike Christie Reviewed-by: Stefan Hajnoczi Reviewed-by: Stefano Garzarella Message-Id: <20250611210113.10912-1-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin Signed-off-by: Sasha Levin

vhost-scsi: Return queue full for page alloc failures during copy

2025-06-04T12:42:05+00:00

[ Upstream commit 891b99eab0f89dbe08d216f4ab71acbeaf7a3102 ] This has us return queue full if we can't allocate a page during the copy operation so the initiator can retry. Signed-off-by: Mike Christie Message-Id: <20241203191705.19431-5-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin Acked-by: Stefan Hajnoczi Signed-off-by: Sasha Levin

vhost-scsi: protect vq->log_used with vq->mutex

2025-06-04T12:41:53+00:00

[ Upstream commit f591cf9fce724e5075cc67488c43c6e39e8cbe27 ] The vhost-scsi completion path may access vq->log_base when vq->log_used is already set to false. vhost-thread QEMU-thread vhost_scsi_complete_cmd_work() -> vhost_add_used() -> vhost_add_used_n() if (unlikely(vq->log_used)) QEMU disables vq->log_used via VHOST_SET_VRING_ADDR. mutex_lock(&vq->mutex); vq->log_used = false now! mutex_unlock(&vq->mutex); QEMU gfree(vq->log_base) log_used() -> log_write(vq->log_base) Assuming the VMM is QEMU. The vq->log_base is from QEMU userpace and can be reclaimed via gfree(). As a result, this causes invalid memory writes to QEMU userspace. The control queue path has the same issue. Signed-off-by: Dongli Zhang Acked-by: Jason Wang Reviewed-by: Mike Christie Message-Id: <20250403063028.16045-2-dongli.zhang@oracle.com> Signed-off-by: Michael S. Tsirkin Signed-off-by: Sasha Levin

vhost-scsi: Fix handling of multiple calls to vhost_scsi_set_endpoint

2025-04-10T12:37:32+00:00

[ Upstream commit 5dd639a1646ef5fe8f4bf270fad47c5c3755b9b6 ] If vhost_scsi_set_endpoint is called multiple times without a vhost_scsi_clear_endpoint between them, we can hit multiple bugs found by Haoran Zhang: 1. Use-after-free when no tpgs are found: This fixes a use after free that occurs when vhost_scsi_set_endpoint is called more than once and calls after the first call do not find any tpgs to add to the vs_tpg. When vhost_scsi_set_endpoint first finds tpgs to add to the vs_tpg array match=true, so we will do: vhost_vq_set_backend(vq, vs_tpg); ... kfree(vs->vs_tpg); vs->vs_tpg = vs_tpg; If vhost_scsi_set_endpoint is called again and no tpgs are found match=false so we skip the vhost_vq_set_backend call leaving the pointer to the vs_tpg we then free via: kfree(vs->vs_tpg); vs->vs_tpg = vs_tpg; If a scsi request is then sent we do: vhost_scsi_handle_vq -> vhost_scsi_get_req -> vhost_vq_get_backend which sees the vs_tpg we just did a kfree on. 2. Tpg dir removal hang: This patch fixes an issue where we cannot remove a LIO/target layer tpg (and structs above it like the target) dir due to the refcount dropping to -1. The problem is that if vhost_scsi_set_endpoint detects a tpg is already in the vs->vs_tpg array or if the tpg has been removed so target_depend_item fails, the undepend goto handler will do target_undepend_item on all tpgs in the vs_tpg array dropping their refcount to 0. At this time vs_tpg contains both the tpgs we have added in the current vhost_scsi_set_endpoint call as well as tpgs we added in previous calls which are also in vs->vs_tpg. Later, when vhost_scsi_clear_endpoint runs it will do target_undepend_item on all the tpgs in the vs->vs_tpg which will drop their refcount to -1. Userspace will then not be able to remove the tpg and will hang when it tries to do rmdir on the tpg dir. 3. Tpg leak: This fixes a bug where we can leak tpgs and cause them to be un-removable because the target name is overwritten when vhost_scsi_set_endpoint is called multiple times but with different target names. The bug occurs if a user has called VHOST_SCSI_SET_ENDPOINT and setup a vhost-scsi device to target/tpg mapping, then calls VHOST_SCSI_SET_ENDPOINT again with a new target name that has tpgs we haven't seen before (target1 has tpg1 but target2 has tpg2). When this happens we don't teardown the old target tpg mapping and just overwrite the target name and the vs->vs_tpg array. Later when we do vhost_scsi_clear_endpoint, we are passed in either target1 or target2's name and we will only match that target's tpgs when we loop over the vs->vs_tpg. We will then return from the function without doing target_undepend_item on the tpgs. Because of all these bugs, it looks like being able to call vhost_scsi_set_endpoint multiple times was never supported. The major user, QEMU, already has checks to prevent this use case. So to fix the issues, this patch prevents vhost_scsi_set_endpoint from being called if it's already successfully added tpgs. To add, remove or change the tpg config or target name, you must do a vhost_scsi_clear_endpoint first. Fixes: 25b98b64e284 ("vhost scsi: alloc cmds per vq instead of session") Fixes: 4f7f46d32c98 ("tcm_vhost: Use vq->private_data to indicate if the endpoint is setup") Reported-by: Haoran Zhang Closes: https://lore.kernel.org/virtualization/e418a5ee-45ca-4d18-9b5d-6f8b6b1add8e@oracle.com/T/#me6c0041ce376677419b9b2563494172a01487ecb Signed-off-by: Mike Christie Reviewed-by: Stefan Hajnoczi Message-Id: <20250129210922.121533-1-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin Acked-by: Stefano Garzarella Signed-off-by: Sasha Levin

vhost/scsi: null-ptr-dereference in vhost_scsi_get_req()

2024-10-10T09:58:08+00:00

commit 221af82f606d928ccef19a16d35633c63026f1be upstream. Since commit 3f8ca2e115e5 ("vhost/scsi: Extract common handling code from control queue handler") a null pointer dereference bug can be triggered when guest sends an SCSI AN request. In vhost_scsi_ctl_handle_vq(), `vc.target` is assigned with `&v_req.tmf.lun[1]` within a switch-case block and is then passed to vhost_scsi_get_req() which extracts `vc->req` and `tpg`. However, for a `VIRTIO_SCSI_T_AN_*` request, tpg is not required, so `vc.target` is set to NULL in this branch. Later, in vhost_scsi_get_req(), `vc->target` is dereferenced without being checked, leading to a null pointer dereference bug. This bug can be triggered from guest. When this bug occurs, the vhost_worker process is killed while holding `vq->mutex` and the corresponding tpg will remain occupied indefinitely. Below is the KASAN report: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 1 PID: 840 Comm: poc Not tainted 6.10.0+ #1 Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:vhost_scsi_get_req+0x165/0x3a0 Code: 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 2b 02 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 65 30 4c 89 e2 48 c1 ea 03 <0f> b6 04 02 4c 89 e2 83 e2 07 38 d0 7f 08 84 c0 0f 85 be 01 00 00 RSP: 0018:ffff888017affb50 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff88801b000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888017affcb8 RBP: ffff888017affb80 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff888017affc88 R14: ffff888017affd1c R15: ffff888017993000 FS: 000055556e076500(0000) GS:ffff88806b100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000200027c0 CR3: 0000000010ed0004 CR4: 0000000000370ef0 Call Trace: ? show_regs+0x86/0xa0 ? die_addr+0x4b/0xd0 ? exc_general_protection+0x163/0x260 ? asm_exc_general_protection+0x27/0x30 ? vhost_scsi_get_req+0x165/0x3a0 vhost_scsi_ctl_handle_vq+0x2a4/0xca0 ? __pfx_vhost_scsi_ctl_handle_vq+0x10/0x10 ? __switch_to+0x721/0xeb0 ? __schedule+0xda5/0x5710 ? __kasan_check_write+0x14/0x30 ? _raw_spin_lock+0x82/0xf0 vhost_scsi_ctl_handle_kick+0x52/0x90 vhost_run_work_list+0x134/0x1b0 vhost_task_fn+0x121/0x350 ... ---[ end trace 0000000000000000 ]--- Let's add a check in vhost_scsi_get_req. Fixes: 3f8ca2e115e5 ("vhost/scsi: Extract common handling code from control queue handler") Signed-off-by: Haoran Zhang [whitespace fixes] Signed-off-by: Mike Christie Message-Id: Signed-off-by: Michael S. Tsirkin Signed-off-by: Greg Kroah-Hartman

vhost-scsi: Handle vhost_vq_work_queue failures for events

2024-07-11T10:49:20+00:00

[ Upstream commit b1b2ce58ed23c5d56e0ab299a5271ac01f95b75c ] Currently, we can try to queue an event's work before the vhost_task is created. When this happens we just drop it in vhost_scsi_do_plug before even calling vhost_vq_work_queue. During a device shutdown we do the same thing after vhost_scsi_clear_endpoint has cleared the backends. In the next patches we will be able to kill the vhost_task before we have cleared the endpoint. In that case, vhost_vq_work_queue can fail and we will leak the event's memory. This has handle the failure by just freeing the event. This is safe to do, because vhost_vq_work_queue will only return failure for us when the vhost_task is killed and so userspace will not be able to handle events if we sent them. Signed-off-by: Mike Christie Message-Id: <20240316004707.45557-2-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin Signed-off-by: Sasha Levin

vhost-scsi: Rename vhost_scsi_iov_to_sgl

2023-08-10T19:24:28+00:00

Rename vhost_scsi_iov_to_sgl to vhost_scsi_map_iov_to_sgl so it matches matches the naming style used for vhost_scsi_copy_iov_to_sgl. Signed-off-by: Mike Christie Message-Id: <20230709202859.138387-3-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin Acked-by: Stefan Hajnoczi

vhost-scsi: Fix alignment handling with windows

2023-08-10T19:24:27+00:00

The linux block layer requires bios/requests to have lengths with a 512 byte alignment. Some drivers/layers like dm-crypt and the directi IO code will test for it and just fail. Other drivers like SCSI just assume the requirement is met and will end up in infinte retry loops. The problem for drivers like SCSI is that it uses functions like blk_rq_cur_sectors and blk_rq_sectors which divide the request's length by 512. If there's lefovers then it just gets dropped. But other code in the block/scsi layer may use blk_rq_bytes/blk_rq_cur_bytes and end up thinking there is still data left and try to retry the cmd. We can then end up getting stuck in retry loops where part of the block/scsi thinks there is data left, but other parts think we want to do IOs of zero length. Linux will always check for alignment, but windows will not. When vhost-scsi then translates the iovec it gets from a windows guest to a scatterlist, we can end up with sg items where the sg->length is not divisible by 512 due to the misaligned offset: sg[0].offset = 255; sg[0].length = 3841; sg... sg[N].offset = 0; sg[N].length = 255; When the lio backends then convert the SG to bios or other iovecs, we end up sending them with the same misaligned values and can hit the issues above. This just has us drop down to allocating a temp page and copying the data when we detect a misaligned buffer and the IO is large enough that it will get split into multiple bad IOs. Signed-off-by: Mike Christie Message-Id: <20230709202859.138387-2-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin Acked-by: Stefan Hajnoczi

vhost_scsi: add support for worker ioctls

2023-07-03T16:15:14+00:00

This has vhost-scsi support the worker ioctls by calling the vhost_worker_ioctl helper. With a single worker, the single thread becomes a bottlneck when trying to use 3 or more virtqueues like: fio --filename=/dev/sdb --direct=1 --rw=randrw --bs=4k \ --ioengine=libaio --iodepth=128 --numjobs=3 With the patches and doing a worker per vq, we can scale to at least 16 vCPUs/vqs (that's my system limit) with the same command fio command above with numjobs=16: fio --filename=/dev/sdb --direct=1 --rw=randrw --bs=4k \ --ioengine=libaio --iodepth=64 --numjobs=16 which gives around 2002K IOPs. Note that for testing I dropped depth to 64 above because the vhost/virt layer supports only 1024 total commands per device. And the only tuning I did was set LIO's emulate_pr to 0 to avoid LIO's PR lock in the main IO path which becomes an issue at around 12 jobs/virtqueues. Signed-off-by: Mike Christie Message-Id: <20230626232307.97930-17-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin

vhost_scsi: flush IO vqs then send TMF rsp

2023-07-03T16:15:14+00:00

With one worker we will always send the scsi cmd responses then send the TMF rsp, because LIO will always complete the scsi cmds first then call into us to send the TMF response. With multiple workers, the IO vq workers could be running while the TMF/ctl vq worker is running so this has us do a flush before completing the TMF to make sure cmds are completed when it's work is later queued and run. Signed-off-by: Mike Christie Message-Id: <20230626232307.97930-12-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin