summaryrefslogtreecommitdiff
path: root/drivers/vfio
AgeCommit message (Collapse)AuthorFilesLines
2021-10-15vfio: Use cdev_device_add() instead of device_create()Jason Gunthorpe1-101/+92
Modernize how vfio is creating the group char dev and sysfs presence. These days drivers with state should use cdev_device_add() and cdev_device_del() to manage the cdev and sysfs lifetime. This API requires the driver to put the struct device and struct cdev inside its state struct (vfio_group), and then use the usual device_initialize()/cdev_device_add()/cdev_device_del() sequence. Split the code to make this possible: - vfio_group_alloc()/vfio_group_release() are pair'd functions to alloc/free the vfio_group. release is done under the struct device kref. - vfio_create_group()/vfio_group_put() are pairs that manage the sysfs/cdev lifetime. Once the uses count is zero the vfio group's userspace presence is destroyed. - The IDR is replaced with an IDA. container_of(inode->i_cdev) is used to get back to the vfio_group during fops open. The IDA assigns unique minor numbers. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-10-15vfio: Use a refcount_t instead of a kref in the vfio_groupJason Gunthorpe1-12/+9
The next patch adds a struct device to the struct vfio_group, and it is confusing/bad practice to have two krefs in the same struct. This kref is controlling the period when the vfio_group is registered in sysfs, and visible in the internal lookup. Switch it to a refcount_t instead. The refcount_dec_and_mutex_lock() is still required because we need atomicity of the list searches and sysfs presence. Reviewed-by: Liu Yi L <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-10-15vfio: Don't leak a group reference if the group already existsJason Gunthorpe1-14/+8
If vfio_create_group() searches the group list and returns an already existing group it does not put back the iommu_group reference that the caller passed in. Change the semantic of vfio_create_group() to not move the reference in from the caller, but instead obtain a new reference inside and leave the caller's reference alone. The two callers must now call iommu_group_put(). This is an unlikely race as the only caller that could hit it has already searched the group list before attempting to create the group. Fixes: cba3345cc494 ("vfio: VFIO core") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-10-15vfio: Do not open code the group list search in vfio_create_group()Jason Gunthorpe1-25/+30
Split vfio_group_get_from_iommu() into __vfio_group_get_from_iommu() so that vfio_create_group() can call it to consolidate this duplicated code. Reviewed-by: Liu Yi L <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-10-15vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier()Jason Gunthorpe1-85/+15
iommu_group_register_notifier()/iommu_group_unregister_notifier() are built using a blocking_notifier_chain which integrates a rwsem. The notifier function cannot be running outside its registration. When considering how the notifier function interacts with create/destroy of the group there are two fringe cases, the notifier starts before list_add(&vfio.group_list) and the notifier runs after the kref becomes 0. Prior to vfio_create_group() unlocking and returning we have container_users == 0 device_list == empty And this cannot change until the mutex is unlocked. After the kref goes to zero we must also have container_users == 0 device_list == empty Both are required because they are balanced operations and a 0 kref means some caller became unbalanced. Add the missing assertion that container_users must be zero as well. These two facts are important because when checking each operation we see: - IOMMU_GROUP_NOTIFY_ADD_DEVICE Empty device_list avoids the WARN_ON in vfio_group_nb_add_dev() 0 container_users ends the call - IOMMU_GROUP_NOTIFY_BOUND_DRIVER 0 container_users ends the call Finally, we have IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER, which only deletes items from the unbound list. During creation this list is empty, during kref == 0 nothing can read this list, and it will be freed soon. Since the vfio_group_release() doesn't hold the appropriate lock to manipulate the unbound_list and could race with the notifier, move the cleanup to directly before the kfree. This allows deleting all of the deferred group put code. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Liu Yi L <yi.l.liu@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-10-12Merge branch 'v5.16/vfio/colin_xu_igd_opregion_2.0_v8' into v5.16/vfio/nextAlex Williamson1-59/+175
2021-10-12vfio/pci: Add OpRegion 2.0+ Extended VBT support.Colin Xu1-59/+175
Due to historical reason, some legacy shipped system doesn't follow OpRegion 2.1 spec but still stick to OpRegion 2.0, in which the extended VBT is not contiguous after OpRegion in physical address, but any location pointed by RVDA via absolute address. Also although current OpRegion 2.1+ systems appears that the extended VBT follows OpRegion, RVDA is the relative address to OpRegion head, the extended VBT location may change to non-contiguous to OpRegion. In both cases, it's impossible to map a contiguous range to hold both OpRegion and the extended VBT and expose via one vfio region. The only difference between OpRegion 2.0 and 2.1 is where extended VBT is stored: For 2.0, RVDA is the absolute address of extended VBT while for 2.1, RVDA is the relative address of extended VBT to OpRegion baes, and there is no other difference between OpRegion 2.0 and 2.1. To support the non-contiguous region case as described, the updated read op will patch OpRegion version and RVDA on-the-fly accordingly. So that from vfio igd OpRegion view, only 2.1+ with contiguous extended VBT after OpRegion is exposed, regardless the underneath host OpRegion is 2.0 or 2.1+. The mechanism makes it possible to support legacy OpRegion 2.0 extended VBT systems with on the market, and support OpRegion 2.1+ where the extended VBT isn't contiguous after OpRegion. Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Hang Yuan <hang.yuan@linux.intel.com> Cc: Swee Yee Fonn <swee.yee.fonn@intel.com> Cc: Fred Gao <fred.gao@intel.com> Signed-off-by: Colin Xu <colin.xu@intel.com> Link: https://lore.kernel.org/r/20211012124855.52463-1-colin.xu@gmail.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-10-12Merge branch 'v5.16/vfio/diana-fsl-reset-v2' into v5.16/vfio/nextAlex Williamson1-15/+30
2021-09-30Merge branch 'v5.16/vfio/hch-cleanup-vfio-iommu_group-creation-v6' into ↵Alex Williamson9-432/+299
v5.16/vfio/next
2021-09-30vfio/iommu_type1: remove IS_IOMMU_CAP_DOMAIN_IN_CONTAINERChristoph Hellwig1-8/+5
IS_IOMMU_CAP_DOMAIN_IN_CONTAINER just obsfucated the checks being performed, so open code it in the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-16-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30vfio/iommu_type1: remove the "external" domainChristoph Hellwig1-66/+54
The external_domain concept rather misleading and not actually needed. Replace it with a list of emulated groups in struct vfio_iommu. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-15-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30vfio/iommu_type1: initialize pgsize_bitmap in ->openChristoph Hellwig1-1/+1
Ensure pgsize_bitmap is always valid by initializing it to PAGE_MASK in vfio_iommu_type1_open and remove the now pointless update for the external domain case in vfio_iommu_type1_attach_group, which was just setting pgsize_bitmap to PAGE_MASK when only external domains were attached. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210924155705.4258-14-hch@lst.de [aw: s/ULONG_MAX/PAGE_MASK/ per discussion in link] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30vfio/spapr_tce: reject mediated devicesChristoph Hellwig1-0/+3
Unlike the the type1 IOMMU backend, the SPAPR one does not contain any support for the magic non-IOMMU backed iommu_group used by mediated devices, so reject them in ->attach_group. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-13-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30vfio: clean up the check for mediated device in vfio_iommu_type1Christoph Hellwig4-46/+34
Pass the group flags to ->attach_group and remove the messy check for the bus type. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-12-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30vfio: remove the unused mdev iommu hookChristoph Hellwig1-107/+26
The iommu_device field in struct mdev_device has never been used since it was added more than 2 years ago. This is a manual revert of commit 7bd50f0cd2 ("vfio/type1: Add domain at(de)taching group helpers"). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-11-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30vfio: move the vfio_iommu_driver_ops interface out of <linux/vfio.h>Christoph Hellwig4-0/+50
Create a new private drivers/vfio/vfio.h header for the interface between the VFIO core and the iommu drivers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-10-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30vfio: remove unused method from vfio_iommu_driver_opsChristoph Hellwig1-50/+0
The read, write and mmap methods are never implemented, so remove them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-9-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30vfio: simplify iommu group allocation for mediated devicesChristoph Hellwig3-68/+71
Reuse the logic in vfio_noiommu_group_alloc to allocate a fake single-device iommu group for mediated devices by factoring out a common function, and replacing the noiommu boolean field in struct vfio_group with an enum to distinguish the three different kinds of groups. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-8-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30vfio: remove the iommudata hack for noiommu groupsChristoph Hellwig1-13/+8
Just pass a noiommu argument to vfio_create_group and set up the ->noiommu flag directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-7-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30vfio: refactor noiommu group creationChristoph Hellwig1-52/+52
Split the actual noiommu group creation from vfio_iommu_group_get into a new helper, and open code the rest of vfio_iommu_group_get in its only caller. This creates an entirely separate and clear code path for the noiommu group creation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-6-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30vfio: factor out a vfio_group_find_or_alloc helperChristoph Hellwig1-25/+35
Factor out a helper to find or allocate the vfio_group to reduce the spagetthi code in vfio_register_group_dev a little. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-5-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30vfio: remove the iommudata check in vfio_noiommu_attach_groupChristoph Hellwig1-1/+1
vfio_noiommu_attach_group has two callers: 1) __vfio_container_attach_groups is called by vfio_ioctl_set_iommu, which just called vfio_iommu_driver_allowed 2) vfio_group_set_container requires already checks ->noiommu on the vfio_group, which is propagated from the iommudata in vfio_create_group so this check is entirely superflous and can be removed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-4-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30vfio: factor out a vfio_iommu_driver_allowed helperChristoph Hellwig1-14/+19
Factor out a little helper to make the checks for the noiommu driver less ugly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-3-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30vfio: Move vfio_iommu_group_get() to vfio_register_group_dev()Jason Gunthorpe4-60/+19
We don't need to hold a reference to the group in the driver as well as obtain a reference to the same group as the first thing vfio_register_group_dev() does. Since the drivers never use the group move this all into the core code. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-2-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-29vfio/fsl-mc: Add per device reset supportDiana Craciun1-15/+30
Currently when a fsl-mc device is reset, the entire DPRC container is reset which is very inefficient because the devices within a container will be reset multiple times. Add support for individually resetting a device. Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com> Reviewed-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> Link: https://lore.kernel.org/r/20210922110530.24736-2-diana.craciun@oss.nxp.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-23vfio/pci: add missing identifier name in argument of function prototypeColin Ian King1-1/+1
The function prototype is missing an identifier name. Add one. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20210902212631.54260-1-colin.king@canonical.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-02Merge tag 'vfio-v5.15-rc1' of git://github.com/awilliam/linux-vfioLinus Torvalds25-2812/+2547
Pull VFIO updates from Alex Williamson: - Fix dma-valid return WAITED implementation (Anthony Yznaga) - SPDX license cleanups (Cai Huoqing) - Split vfio-pci-core from vfio-pci and enhance PCI driver matching to support future vendor provided vfio-pci variants (Yishai Hadas, Max Gurtovoy, Jason Gunthorpe) - Replace duplicated reflck with core support for managing first open, last close, and device sets (Jason Gunthorpe, Max Gurtovoy, Yishai Hadas) - Fix non-modular mdev support and don't nag about request callback support (Christoph Hellwig) - Add semaphore to protect instruction intercept handler and replace open-coded locks in vfio-ap driver (Tony Krowiak) - Convert vfio-ap to vfio_register_group_dev() API (Jason Gunthorpe) * tag 'vfio-v5.15-rc1' of git://github.com/awilliam/linux-vfio: (37 commits) vfio/pci: Introduce vfio_pci_core.ko vfio: Use kconfig if XX/endif blocks instead of repeating 'depends on' vfio: Use select for eventfd PCI / VFIO: Add 'override_only' support for VFIO PCI sub system PCI: Add 'override_only' field to struct pci_device_id vfio/pci: Move module parameters to vfio_pci.c vfio/pci: Move igd initialization to vfio_pci.c vfio/pci: Split the pci_driver code out of vfio_pci_core.c vfio/pci: Include vfio header in vfio_pci_core.h vfio/pci: Rename ops functions to fit core namings vfio/pci: Rename vfio_pci_device to vfio_pci_core_device vfio/pci: Rename vfio_pci_private.h to vfio_pci_core.h vfio/pci: Rename vfio_pci.c to vfio_pci_core.c vfio/ap_ops: Convert to use vfio_register_group_dev() s390/vfio-ap: replace open coded locks for VFIO_GROUP_NOTIFY_SET_KVM notification s390/vfio-ap: r/w lock for PQAP interception handler function pointer vfio/type1: Fix vfio_find_dma_valid return vfio-pci/zdev: Remove repeated verbose license text vfio: platform: reset: Convert to SPDX identifier vfio: Remove struct vfio_device_ops open/release ...
2021-08-26Merge branches 'v5.15/vfio/spdx-license-cleanups', ↵Alex Williamson16-2425/+2335
'v5.15/vfio/dma-valid-waited-v3', 'v5.15/vfio/vfio-pci-core-v5' and 'v5.15/vfio/vfio-ap' into v5.15/vfio/next
2021-08-26vfio/pci: Introduce vfio_pci_core.koMax Gurtovoy10-281/+64
Now that vfio_pci has been split into two source modules, one focusing on the "struct pci_driver" (vfio_pci.c) and a toolbox library of code (vfio_pci_core.c), complete the split and move them into two different kernel modules. As before vfio_pci.ko continues to present the same interface under sysfs and this change will have no functional impact. Splitting into another module and adding exports allows creating new HW specific VFIO PCI drivers that can implement device specific functionality, such as VFIO migration interfaces or specialized device requirements. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-14-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26vfio: Use kconfig if XX/endif blocks instead of repeating 'depends on'Jason Gunthorpe6-26/+26
This results in less kconfig wordage and a simpler understanding of the required "depends on" to create the menu structure. The next patch increases the nesting level a lot so this is a nice preparatory simplification. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-13-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26vfio: Use select for eventfdJason Gunthorpe4-4/+6
If VFIO_VIRQFD is required then turn on eventfd automatically. The majority of kconfig users of the EVENTFD use select not depends on. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-12-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26PCI / VFIO: Add 'override_only' support for VFIO PCI sub systemMax Gurtovoy1-1/+8
Expose an 'override_only' helper macro (i.e. PCI_DRIVER_OVERRIDE_DEVICE_VFIO) for VFIO PCI sub system and add the required code to prefix its matching entries with "vfio_" in modules.alias file. It allows VFIO device drivers to include match entries in the modules.alias file produced by kbuild that are not used for normal driver autoprobing and module autoloading. Drivers using these match entries can be connected to the PCI device manually, by userspace, using the existing driver_override sysfs. For example the resulting modules.alias may have: alias pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_core alias vfio_pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_vfio_pci alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci In this example mlx5_core and mlx5_vfio_pci match to the same PCI device. The kernel will autoload and autobind to mlx5_core but the kernel and udev mechanisms will ignore mlx5_vfio_pci. When userspace wants to change a device to the VFIO subsystem it can implement a generic algorithm: 1) Identify the sysfs path to the device: /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 2) Get the modalias string from the kernel: $ cat /sys/bus/pci/devices/0000:01:00.0/modalias pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00 3) Prefix it with vfio_: vfio_pci:v000015B3d00001021sv000015B3sd00000001bc02sc00i00 4) Search modules.alias for the above string and select the entry that has the fewest *'s: alias vfio_pci:v000015B3d00001021sv*sd*bc*sc*i* mlx5_vfio_pci 5) modprobe the matched module name: $ modprobe mlx5_vfio_pci 6) cat the matched module name to driver_override: echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:01:00.0/driver_override 7) unbind device from original module echo 0000:01:00.0 > /sys/bus/pci/devices/0000:01:00.0/driver/unbind 8) probe PCI drivers (or explicitly bind to mlx5_vfio_pci) echo 0000:01:00.0 > /sys/bus/pci/drivers_probe The algorithm is independent of bus type. In future the other buses with VFIO device drivers, like platform and ACPI, can use this algorithm as well. This patch is the infrastructure to provide the information in the modules.alias to userspace. Convert the only VFIO pci_driver which results in one new line in the modules.alias: alias vfio_pci:v*d*sv*sd*bc*sc*i* vfio_pci Later series introduce additional HW specific VFIO PCI drivers, such as mlx5_vfio_pci. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # for pci.h Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-11-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26vfio/pci: Move module parameters to vfio_pci.cYishai Hadas3-12/+33
This is a preparation before splitting vfio_pci.ko to 2 modules. As module parameters are a kind of uAPI they need to stay on vfio_pci.ko to avoid a user visible impact. For now continue to keep the implementation of these options in vfio_pci_core.c. Arguably they are vfio_pci functionality, but further splitting of vfio_pci_core.c will be better done in another series Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-9-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26vfio/pci: Move igd initialization to vfio_pci.cMax Gurtovoy3-36/+41
igd is related to the vfio_pci pci_driver implementation, move it out of vfio_pci_core.c. This is preparation for splitting vfio_pci.ko into 2 drivers. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-8-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26vfio/pci: Split the pci_driver code out of vfio_pci_core.cMax Gurtovoy4-215/+304
Split the vfio_pci driver into two logical parts, the 'struct pci_driver' (vfio_pci.c) which implements "Generic VFIO support for any PCI device" and a library of code (vfio_pci_core.c) that helps implementing a struct vfio_device on top of a PCI device. vfio_pci.ko continues to present the same interface under sysfs and this change should have no functional impact. Following patches will turn vfio_pci and vfio_pci_core into a separate module. This is a preparation for allowing another module to provide the pci_driver and allow that module to customize how VFIO is setup, inject its own operations, and easily extend vendor specific functionality. At this point the vfio_pci_core still contains a lot of vfio_pci functionality mixed into it. Following patches will move more of the large scale items out, but another cleanup series will be needed to get everything. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-7-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26vfio/pci: Include vfio header in vfio_pci_core.hMax Gurtovoy2-1/+1
The vfio_device structure is embedded into the vfio_pci_core_device structure, so there is no reason for not including the header file in the vfio_pci_core header as well. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-6-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26vfio/pci: Rename ops functions to fit core namingsMax Gurtovoy1-16/+16
This is another preparation patch for separating the vfio_pci driver to a subsystem driver and a generic pci driver. This patch doesn't change any logic. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-5-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26vfio/pci: Rename vfio_pci_device to vfio_pci_core_deviceMax Gurtovoy7-155/+158
This is a preparation patch for separating the vfio_pci driver to a subsystem driver and a generic pci driver. This patch doesn't change any logic. The new vfio_pci_core_device structure will be the main structure of the core driver and later on vfio_pci_device structure will be the main structure of the generic vfio_pci driver. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-4-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26vfio/pci: Rename vfio_pci_private.h to vfio_pci_core.hMax Gurtovoy7-9/+9
This is a preparation patch for separating the vfio_pci driver to a subsystem driver and a generic pci driver. This patch doesn't change any logic. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-3-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26vfio/pci: Rename vfio_pci.c to vfio_pci_core.cMax Gurtovoy2-1/+1
This is a preparation patch for separating the vfio_pci driver to a subsystem driver and a generic pci driver. This patch doesn't change any logic. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-2-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-24vfio/type1: Fix vfio_find_dma_valid returnAnthony Yznaga1-4/+4
vfio_find_dma_valid is defined to return WAITED on success if it was necessary to wait. However, the loop forgets the WAITED value returned by vfio_wait() and returns 0 in a later iteration. Fix it. Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com> Reviewed-by: Steve Sistare <steven.sistare@oracle.com> Link: https://lore.kernel.org/r/1629736550-2388-1-git-send-email-anthony.yznaga@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-24vfio-pci/zdev: Remove repeated verbose license textCai Huoqing1-6/+1
The SPDX and verbose license text are redundant, however in this case the verbose license indicates a GPL v2 only while SPDX specifies v2+. Remove the verbose license and correct SPDX to the more restricted version. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20210824003749.1039-1-caihuoqing@baidu.com [aw: commit log] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-24vfio: platform: reset: Convert to SPDX identifierCai Huoqing1-9/+1
use SPDX-License-Identifier instead of a verbose license text Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Link: https://lore.kernel.org/r/20210822043643.2040-1-caihuoqing@baidu.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-11vfio: Remove struct vfio_device_ops open/releaseJason Gunthorpe2-35/+1
Nothing uses this anymore, delete it. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/14-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-11vfio/pci: Reorganize VFIO_DEVICE_PCI_HOT_RESET to use the device setJason Gunthorpe1-124/+89
Like vfio_pci_dev_set_try_reset() this code wants to reset all of the devices in the "reset group" which is the same membership as the device set. Instead of trying to reconstruct the device set from the PCI list go directly from the device set's device list to execute the reset. The same basic structure as vfio_pci_dev_set_try_reset() is used. The 'vfio_devices' struct is replaced with the device set linked list and we simply sweep it multiple times under the lock. This eliminates a memory allocation and get/put traffic and another improperly locked test of pci_dev_driver(). Reviewed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/10-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-11vfio/pci: Change vfio_pci_try_bus_reset() to use the dev_setJason Gunthorpe1-92/+82
vfio_pci_try_bus_reset() is triggering a reset of the entire_dev set if any device within it has accumulated a needs_reset. This reset can only be done once all of the drivers operating the PCI devices to be reset are in a known safe state. Make this clearer by directly operating on the dev_set instead of the vfio_pci_device. Rename the function to vfio_pci_dev_set_try_reset(). Use the device list inside the dev_set to check that all drivers are in a safe state instead of working backwards from the pci_device. The dev_set->lock directly prevents devices from joining/leaving the set, or changing their state, which further implies the pci_device cannot change drivers or that the vfio_device be freed, eliminating the need for get/put's. If a pci_device to be reset is not in the dev_set then the reset cannot be used as we can't know what the state of that driver is. Directly measure this by checking that every pci_device is in the dev_set - which effectively proves that VFIO drivers are attached to everything. Remove the odd interaction around vfio_pci_set_power_state() - have the only caller avoid its redundant vfio_pci_set_power_state() instead of avoiding it inside vfio_pci_dev_set_try_reset(). This restructuring corrects a call to pci_dev_driver() without holding the device_lock() and removes a hard wiring to &vfio_pci_driver. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/9-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-11vfio/pci: Move to the device set infrastructureYishai Hadas2-132/+37
PCI wants to have the usual open/close_device() logic with the slight twist that the open/close_device() must be done under a singelton lock shared by all of the vfio_devices that are in the PCI "reset group". The reset group, and thus the device set, is determined by what devices pci_reset_bus() touches, which is either the entire bus or only the slot. Rely on the core code to do everything reflck was doing and delete reflck entirely. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/8-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-11vfio/platform: Use open_device() instead of open coding a refcnt schemeJason Gunthorpe2-48/+32
Platform simply wants to run some code when the device is first opened/last closed. Use the core framework and locking for this. Aside from removing a bit of code this narrows the locking scope from a global lock. Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/7-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-11vfio/fsl: Move to the device set infrastructureJason Gunthorpe3-139/+28
FSL uses the internal reflck to implement the open_device() functionality, conversion to the core code is straightforward. The decision on which set to be part of is trivially based on the is_fsl_mc_bus_dprc() and we use a 'struct device *' pointer as the set_id. The dev_set lock is protecting the interrupts setup. The FSL MC devices are using MSIs and only the DPRC device is allocating the MSIs from the MSI domain. The other devices just take interrupts from a pool. The lock is protecting the access to this pool. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Tested-by: Diana Craciun OSS <diana.craciun@oss.nxp.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-11vfio: Provide better generic support for open/release vfio_device_opsJason Gunthorpe2-24/+151
Currently the driver ops have an open/release pair that is called once each time a device FD is opened or closed. Add an additional set of open/close_device() ops which are called when the device FD is opened for the first time and closed for the last time. An analysis shows that all of the drivers require this semantic. Some are open coding it as part of their reflck implementation, and some are just buggy and miss it completely. To retain the current semantics PCI and FSL depend on, introduce the idea of a "device set" which is a grouping of vfio_device's that share the same lock around opening. The device set is established by providing a 'set_id' pointer. All vfio_device's that provide the same pointer will be joined to the same singleton memory and lock across the whole set. This effectively replaces the oddly named reflck. After conversion the set_id will be sourced from: - A struct device from a fsl_mc_device (fsl) - A struct pci_slot (pci) - A struct pci_bus (pci) - The struct vfio_device (everything) The design ensures that the above pointers are live as long as the vfio_device is registered, so they form reliable unique keys to group vfio_devices into sets. This implementation uses xarray instead of searching through the driver core structures, which simplifies the somewhat tricky locking in this area. Following patches convert all the drivers. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>