kernel/linux.git/drivers/vfio, branch v6.1.168

hisi_acc_vfio_pci: update status after RAS error

2026-03-04T12:20:31+00:00

[ Upstream commit 8be14dd48dfee0df91e511acceb4beeb2461a083 ] After a RAS error occurs on the accelerator device, the accelerator device will be reset. The live migration state will be abnormal after reset, and the original state needs to be restored during the reset process. Therefore, reset processing needs to be performed in a live migration scenario. Signed-off-by: Longfang Liu Link: https://lore.kernel.org/r/20260122020205.2884497-3-liulongfang@huawei.com Signed-off-by: Alex Williamson Signed-off-by: Sasha Levin

iommufd: Don't overflow during division for dirty tracking

2025-12-06T21:12:36+00:00

[ Upstream commit cb30dfa75d55eced379a42fd67bd5fb7ec38555e ] If pgshift is 63 then BITS_PER_TYPE(*bitmap->bitmap) * pgsize will overflow to 0 and this triggers divide by 0. In this case the index should just be 0, so reorganize things to divide by shift and avoid hitting any overflows. Link: https://patch.msgid.link/r/0-v1-663679b57226+172-iommufd_dirty_div0_jgg@nvidia.com Cc: stable@vger.kernel.org Fixes: 58ccf0190d19 ("vfio: Add an IOVA bitmap support") Reviewed-by: Joao Martins Reviewed-by: Nicolin Chen Reviewed-by: Kevin Tian Reported-by: syzbot+093a8a8b859472e6c257@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=093a8a8b859472e6c257 Signed-off-by: Jason Gunthorpe [ drivers/iommu/iommufd/iova_bitmap.c => drivers/vfio/iova_bitmap.c ] Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman

vfio: return -ENOTTY for unsupported device feature

2025-12-06T21:12:25+00:00

[ Upstream commit 16df67f2189a71a8310bcebddb87ed569e8352be ] The two implementers of vfio_device_ops.device_feature, vfio_cdx_ioctl_feature and vfio_pci_core_ioctl_feature, return -ENOTTY in the fallthrough case when the feature is unsupported. For consistency, the base case, vfio_ioctl_device_feature, should do the same when device_feature == NULL, indicating an implementation has no feature extensions. Signed-off-by: Alex Mastro Link: https://lore.kernel.org/r/20250908-vfio-enotty-v1-1-4428e1539e2e@fb.com Signed-off-by: Alex Williamson Signed-off-by: Sasha Levin

vfio/mlx5: fix possible overflow in tracking max message size

2025-08-28T14:26:02+00:00

[ Upstream commit b3060198483bac43ec113c62ae3837076f61f5de ] MLX cap pg_track_log_max_msg_size consists of 5 bits, value of which is used as power of 2 for max_msg_size. This can lead to multiplication overflow between max_msg_size (u32) and integer constant, and afterwards incorrect value is being written to rq_size. Fix this issue by extending integer constant to u64 type. Found by Linux Verification Center (linuxtesting.org) with SVACE. Suggested-by: Alex Williamson Signed-off-by: Artem Sadovnikov Reviewed-by: Yishai Hadas Link: https://lore.kernel.org/r/20250701144017.2410-2-a.sadovnikov@ispras.ru Signed-off-by: Alex Williamson Signed-off-by: Sasha Levin

vfio/type1: conditional rescheduling while pinning

2025-08-28T14:26:02+00:00

[ Upstream commit b1779e4f209c7ff7e32f3c79d69bca4e3a3a68b6 ] A large DMA mapping request can loop through dma address pinning for many pages. In cases where THP can not be used, the repeated vmf_insert_pfn can be costly, so let the task reschedule as need to prevent CPU stalls. Failure to do so has potential harmful side effects, like increased memory pressure as unrelated rcu tasks are unable to make their reclaim callbacks and result in OOM conditions. rcu: INFO: rcu_sched self-detected stall on CPU rcu: 36-....: (20999 ticks this GP) idle=b01c/1/0x4000000000000000 softirq=35839/35839 fqs=3538 rcu: hardirqs softirqs csw/system rcu: number: 0 107 0 rcu: cputime: 50 0 10446 ==> 10556(ms) rcu: (t=21075 jiffies g=377761 q=204059 ncpus=384) ... ? asm_sysvec_apic_timer_interrupt+0x16/0x20 ? walk_system_ram_range+0x63/0x120 ? walk_system_ram_range+0x46/0x120 ? pgprot_writethrough+0x20/0x20 lookup_memtype+0x67/0xf0 track_pfn_insert+0x20/0x40 vmf_insert_pfn_prot+0x88/0x140 vfio_pci_mmap_huge_fault+0xf9/0x1b0 [vfio_pci_core] __do_fault+0x28/0x1b0 handle_mm_fault+0xef1/0x2560 fixup_user_fault+0xf5/0x270 vaddr_get_pfns+0x169/0x2f0 [vfio_iommu_type1] vfio_pin_pages_remote+0x162/0x8e0 [vfio_iommu_type1] vfio_iommu_type1_ioctl+0x1121/0x1810 [vfio_iommu_type1] ? futex_wake+0x1c1/0x260 x64_sys_call+0x234/0x17a0 do_syscall_64+0x63/0x130 ? exc_page_fault+0x63/0x130 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Signed-off-by: Keith Busch Reviewed-by: Paul E. McKenney Link: https://lore.kernel.org/r/20250715184622.3561598-1-kbusch@meta.com Signed-off-by: Alex Williamson Signed-off-by: Sasha Levin

vfio/pci: Separate SR-IOV VF dev_set

2025-08-15T10:05:07+00:00

[ Upstream commit e908f58b6beb337cbe4481d52c3f5c78167b1aab ] In the below noted Fixes commit we introduced a reflck mutex to allow better scaling between devices for open and close. The reflck was based on the hot reset granularity, device level for root bus devices which cannot support hot reset or bus/slot reset otherwise. Overlooked in this were SR-IOV VFs, where there's also no bus reset option, but the default for a non-root-bus, non-slot-based device is bus level reflck granularity. The reflck mutex has since become the dev_set mutex (via commit 2cd8b14aaa66 ("vfio/pci: Move to the device set infrastructure")) and is our defacto serialization for various operations and ioctls. It still seems to be the case though that sets of vfio-pci devices really only need serialization relative to hot resets affecting the entire set, which is not relevant to SR-IOV VFs. As described in the Closes link below, this serialization contributes to startup latency when multiple VFs sharing the same "bus" are opened concurrently. Mark the device itself as the basis of the dev_set for SR-IOV VFs. Reported-by: Aaron Lewis Closes: https://lore.kernel.org/all/20250626180424.632628-1-aaronlewis@google.com Tested-by: Aaron Lewis Fixes: e309df5b0c9e ("vfio/pci: Parallelize device open and release") Reviewed-by: Yi Liu Reviewed-by: Kevin Tian Reviewed-by: Jason Gunthorpe Link: https://lore.kernel.org/r/20250626225623.1180952-1-alex.williamson@redhat.com Signed-off-by: Alex Williamson Signed-off-by: Sasha Levin

vfio/type1: Fix error unwind in migration dirty bitmap allocation

2025-06-27T10:07:11+00:00

[ Upstream commit 4518e5a60c7fbf0cdff393c2681db39d77b4f87e ] When setting up dirty page tracking at the vfio IOMMU backend for device migration, if an error is encountered allocating a tracking bitmap, the unwind loop fails to free previously allocated tracking bitmaps. This occurs because the wrong loop index is used to generate the tracking object. This results in unintended memory usage for the life of the current DMA mappings where bitmaps were successfully allocated. Use the correct loop index to derive the tracking object for freeing during unwind. Fixes: d6a4c185660c ("vfio iommu: Implementation of ioctl for dirty pages tracking") Signed-off-by: Li RongQing Link: https://lore.kernel.org/r/20250521034647.2877-1-lirongqing@baidu.com Signed-off-by: Alex Williamson Signed-off-by: Sasha Levin

hisi_acc_vfio_pci: add eq and aeq interruption restore

2025-06-27T10:07:11+00:00

[ Upstream commit 3495cec0787721ba7a9d5c19d0bbb66d182de584 ] In order to ensure that the task packets of the accelerator device are not lost during the migration process, it is necessary to send an EQ and AEQ command to the device after the live migration is completed and to update the completion position of the task queue. Let the device recheck the completed tasks data and if there are uncollected packets, device resend a task completion interrupt to the software. Fixes: b0eed085903e ("hisi_acc_vfio_pci: Add support for VFIO live migration") Signed-off-by: Longfang Liu Reviewed-by: Shameer Kolothum Link: https://lore.kernel.org/r/20250510081155.55840-3-liulongfang@huawei.com Signed-off-by: Alex Williamson Signed-off-by: Sasha Levin

hisi_acc_vfio_pci: fix XQE dma address error

2025-06-27T10:07:11+00:00

[ Upstream commit 8bb7170c5a055ea17c6857c256ee73c10ff872eb ] The dma addresses of EQE and AEQE are wrong after migration and results in guest kernel-mode encryption services failure. Comparing the definition of hardware registers, we found that there was an error when the data read from the register was combined into an address. Therefore, the address combination sequence needs to be corrected. Even after fixing the above problem, we still have an issue where the Guest from an old kernel can get migrated to new kernel and may result in wrong data. In order to ensure that the address is correct after migration, if an old magic number is detected, the dma address needs to be updated. Fixes: b0eed085903e ("hisi_acc_vfio_pci: Add support for VFIO live migration") Signed-off-by: Longfang Liu Reviewed-by: Shameer Kolothum Link: https://lore.kernel.org/r/20250510081155.55840-2-liulongfang@huawei.com Signed-off-by: Alex Williamson Signed-off-by: Sasha Levin

vfio/pci: Handle INTx IRQ_NOTCONNECTED

2025-06-04T12:40:05+00:00

[ Upstream commit 860be250fc32de9cb24154bf21b4e36f40925707 ] Some systems report INTx as not routed by setting pdev->irq to IRQ_NOTCONNECTED, resulting in a -ENOTCONN error when trying to setup eventfd signaling. Include this in the set of conditions for which the PIN register is virtualized to zero. Additionally consolidate vfio_pci_get_irq_count() to use this virtualized value in reporting INTx support via ioctl and sanity checking ioctl paths since pdev->irq is re-used when the device is in MSI mode. The combination of these results in both the config space of the device and the ioctl interface behaving as if the device does not support INTx. Reviewed-by: Kevin Tian Link: https://lore.kernel.org/r/20250311230623.1264283-1-alex.williamson@redhat.com Signed-off-by: Alex Williamson Signed-off-by: Sasha Levin