kernel/linux.git/drivers/vfio/pci, branch v6.6.141

vfio/pci: Insert full vma on mmap'd MMIO fault

2026-04-11T12:19:30+00:00

commit d71a989cf5d961989c273093cdff2550acdde314 upstream. In order to improve performance of typical scenarios we can try to insert the entire vma on fault. This accelerates typical cases, such as when the MMIO region is DMA mapped by QEMU. The vfio_iommu_type1 driver will fault in the entire DMA mapped range through fixup_user_fault(). In synthetic testing, this improves the time required to walk a PCI BAR mapping from userspace by roughly 1/3rd. This is likely an interim solution until vmf_insert_pfn_{pmd,pud}() gain support for pfnmaps. Suggested-by: Yan Zhao Link: https://lore.kernel.org/all/Zl6XdUkt%2FzMMGOLF@yzhao56-desk.sh.intel.com/ Reviewed-by: Yan Zhao Link: https://lore.kernel.org/r/20240607035213.2054226-1-alex.williamson@redhat.com Signed-off-by: Alex Williamson Signed-off-by: Sasha Levin

vfio/pci: Use unmap_mapping_range()

2026-04-11T12:19:30+00:00

commit aac6db75a9fc2c7a6f73e152df8f15101dda38e6 upstream. With the vfio device fd tied to the address space of the pseudo fs inode, we can use the mm to track all vmas that might be mmap'ing device BARs, which removes our vma_list and all the complicated lock ordering necessary to manually zap each related vma. Note that we can no longer store the pfn in vm_pgoff if we want to use unmap_mapping_range() to zap a selective portion of the device fd corresponding to BAR mappings. This also converts our mmap fault handler to use vmf_insert_pfn() because we no longer have a vma_list to avoid the concurrency problem with io_remap_pfn_range(). The goal is to eventually use the vm_ops huge_fault handler to avoid the additional faulting overhead, but vmf_insert_pfn_{pmd,pud}() need to learn about pfnmaps first. Also, Jason notes that a race exists between unmap_mapping_range() and the fops mmap callback if we were to call io_remap_pfn_range() to populate the vma on mmap. Specifically, mmap_region() does call_mmap() before it does vma_link_file() which gives a window where the vma is populated but invisible to unmap_mapping_range(). Suggested-by: Jason Gunthorpe Reviewed-by: Jason Gunthorpe Reviewed-by: Kevin Tian Link: https://lore.kernel.org/r/20240530045236.1005864-3-alex.williamson@redhat.com Signed-off-by: Alex Williamson Signed-off-by: Axel Rasmussen Signed-off-by: Tugrul Kukul Acked-by: Alex Williamson Signed-off-by: Sasha Levin

hisi_acc_vfio_pci: update status after RAS error

2026-03-04T12:20:51+00:00

[ Upstream commit 8be14dd48dfee0df91e511acceb4beeb2461a083 ] After a RAS error occurs on the accelerator device, the accelerator device will be reset. The live migration state will be abnormal after reset, and the original state needs to be restored during the reset process. Therefore, reset processing needs to be performed in a live migration scenario. Signed-off-by: Longfang Liu Link: https://lore.kernel.org/r/20260122020205.2884497-3-liulongfang@huawei.com Signed-off-by: Alex Williamson Signed-off-by: Sasha Levin

vfio/pds: replace bitmap_free with vfree

2025-10-15T09:58:03+00:00

[ Upstream commit acb59a4bb8ed34e738a4c3463127bf3f6b5e11a9 ] host_seq_bmp is allocated with vzalloc but is currently freed with bitmap_free, which uses kfree internally. This mismach prevents the resource from being released properly and may result in memory leaks or other issues. Fix this by freeing host_seq_bmp with vfree to match the vzalloc allocation. Fixes: f232836a9152 ("vfio/pds: Add support for dirty page tracking") Signed-off-by: Zilin Guan Reviewed-by: Brett Creeley Link: https://lore.kernel.org/r/20250913153154.1028835-1-zilin@seu.edu.cn Signed-off-by: Alex Williamson Signed-off-by: Sasha Levin

vfio/mlx5: fix possible overflow in tracking max message size

2025-08-28T14:28:29+00:00

[ Upstream commit b3060198483bac43ec113c62ae3837076f61f5de ] MLX cap pg_track_log_max_msg_size consists of 5 bits, value of which is used as power of 2 for max_msg_size. This can lead to multiplication overflow between max_msg_size (u32) and integer constant, and afterwards incorrect value is being written to rq_size. Fix this issue by extending integer constant to u64 type. Found by Linux Verification Center (linuxtesting.org) with SVACE. Suggested-by: Alex Williamson Signed-off-by: Artem Sadovnikov Reviewed-by: Yishai Hadas Link: https://lore.kernel.org/r/20250701144017.2410-2-a.sadovnikov@ispras.ru Signed-off-by: Alex Williamson Signed-off-by: Sasha Levin

vfio/pci: Separate SR-IOV VF dev_set

2025-08-15T10:08:59+00:00

[ Upstream commit e908f58b6beb337cbe4481d52c3f5c78167b1aab ] In the below noted Fixes commit we introduced a reflck mutex to allow better scaling between devices for open and close. The reflck was based on the hot reset granularity, device level for root bus devices which cannot support hot reset or bus/slot reset otherwise. Overlooked in this were SR-IOV VFs, where there's also no bus reset option, but the default for a non-root-bus, non-slot-based device is bus level reflck granularity. The reflck mutex has since become the dev_set mutex (via commit 2cd8b14aaa66 ("vfio/pci: Move to the device set infrastructure")) and is our defacto serialization for various operations and ioctls. It still seems to be the case though that sets of vfio-pci devices really only need serialization relative to hot resets affecting the entire set, which is not relevant to SR-IOV VFs. As described in the Closes link below, this serialization contributes to startup latency when multiple VFs sharing the same "bus" are opened concurrently. Mark the device itself as the basis of the dev_set for SR-IOV VFs. Reported-by: Aaron Lewis Closes: https://lore.kernel.org/all/20250626180424.632628-1-aaronlewis@google.com Tested-by: Aaron Lewis Fixes: e309df5b0c9e ("vfio/pci: Parallelize device open and release") Reviewed-by: Yi Liu Reviewed-by: Kevin Tian Reviewed-by: Jason Gunthorpe Link: https://lore.kernel.org/r/20250626225623.1180952-1-alex.williamson@redhat.com Signed-off-by: Alex Williamson Signed-off-by: Sasha Levin

vfio/pds: Fix missing detach_ioas op

2025-08-15T10:08:59+00:00

[ Upstream commit fe24d5bc635e103a517ec201c3cb571eeab8be2f ] When CONFIG_IOMMUFD is enabled and a device is bound to the pds_vfio_pci driver, the following WARN_ON() trace is seen and probe fails: WARNING: CPU: 0 PID: 5040 at drivers/vfio/vfio_main.c:317 __vfio_register_dev+0x130/0x140 [vfio] <...> pds_vfio_pci 0000:08:00.1: probe with driver pds_vfio_pci failed with error -22 This is because the driver's vfio_device_ops.detach_ioas isn't set. Fix this by using the generic vfio_iommufd_physical_detach_ioas function. Fixes: 38fe3975b4c2 ("vfio/pds: Initial support for pds VFIO driver") Signed-off-by: Brett Creeley Reviewed-by: Kevin Tian Link: https://lore.kernel.org/r/20250702163744.69767-1-brett.creeley@amd.com Signed-off-by: Alex Williamson Signed-off-by: Sasha Levin

hisi_acc_vfio_pci: bugfix live migration function without VF device driver

2025-06-19T13:28:16+00:00

[ Upstream commit 2777a40998deb36f96b6afc48bd397cf58a4edf0 ] If the VF device driver is not loaded in the Guest OS and we attempt to perform device data migration, the address of the migrated data will be NULL. The live migration recovery operation on the destination side will access a null address value, which will cause access errors. Therefore, live migration of VMs without added VF device drivers does not require device data migration. In addition, when the queue address data obtained by the destination is empty, device queue recovery processing will not be performed. Fixes: b0eed085903e ("hisi_acc_vfio_pci: Add support for VFIO live migration") Signed-off-by: Longfang Liu Reviewed-by: Shameer Kolothum Link: https://lore.kernel.org/r/20250510081155.55840-6-liulongfang@huawei.com Signed-off-by: Alex Williamson Signed-off-by: Sasha Levin

hisi_acc_vfio_pci: add eq and aeq interruption restore

2025-06-19T13:28:16+00:00

[ Upstream commit 3495cec0787721ba7a9d5c19d0bbb66d182de584 ] In order to ensure that the task packets of the accelerator device are not lost during the migration process, it is necessary to send an EQ and AEQ command to the device after the live migration is completed and to update the completion position of the task queue. Let the device recheck the completed tasks data and if there are uncollected packets, device resend a task completion interrupt to the software. Fixes: b0eed085903e ("hisi_acc_vfio_pci: Add support for VFIO live migration") Signed-off-by: Longfang Liu Reviewed-by: Shameer Kolothum Link: https://lore.kernel.org/r/20250510081155.55840-3-liulongfang@huawei.com Signed-off-by: Alex Williamson Signed-off-by: Sasha Levin

hisi_acc_vfio_pci: fix XQE dma address error

2025-06-19T13:28:16+00:00

[ Upstream commit 8bb7170c5a055ea17c6857c256ee73c10ff872eb ] The dma addresses of EQE and AEQE are wrong after migration and results in guest kernel-mode encryption services failure. Comparing the definition of hardware registers, we found that there was an error when the data read from the register was combined into an address. Therefore, the address combination sequence needs to be corrected. Even after fixing the above problem, we still have an issue where the Guest from an old kernel can get migrated to new kernel and may result in wrong data. In order to ensure that the address is correct after migration, if an old magic number is detected, the dma address needs to be updated. Fixes: b0eed085903e ("hisi_acc_vfio_pci: Add support for VFIO live migration") Signed-off-by: Longfang Liu Reviewed-by: Shameer Kolothum Link: https://lore.kernel.org/r/20250510081155.55840-2-liulongfang@huawei.com Signed-off-by: Alex Williamson Signed-off-by: Sasha Levin