summaryrefslogtreecommitdiff
path: root/include/uapi/linux/vfio.h
AgeCommit message (Collapse)AuthorFilesLines
2021-05-05vfio/pci: Revert nvlink removal uAPI breakageAlex Williamson1-4/+42
Revert the uAPI changes from the below commit with notice that these regions and capabilities are no longer provided. Fixes: b392a1989170 ("vfio/pci: remove vfio_pci_nvlink2") Reported-by: Greg Kurz <groug@kaod.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Message-Id: <162014341432.3807030.11054087109120670135.stgit@omen>
2021-04-06vfio/pci: remove vfio_pci_nvlink2Christoph Hellwig1-34/+4
This driver never had any open userspace (which for VFIO would include VM kernel drivers) that use it, and thus should never have been added by our normal userspace ABI rules. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Message-Id: <20210326061311.1497642-2-hch@lst.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-02-01vfio: interfaces to update vaddrSteve Sistare1-0/+19
Define interfaces that allow the underlying memory object of an iova range to be mapped to a new host virtual address in the host process: - VFIO_DMA_UNMAP_FLAG_VADDR for VFIO_IOMMU_UNMAP_DMA - VFIO_DMA_MAP_FLAG_VADDR flag for VFIO_IOMMU_MAP_DMA - VFIO_UPDATE_VADDR extension for VFIO_CHECK_EXTENSION Unmap vaddr invalidates the host virtual address in an iova range, and blocks vfio translation of host virtual addresses. DMA to already-mapped pages continues. Map vaddr updates the base VA and resumes translation. See comments in uapi/linux/vfio.h for more details. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-02-01vfio: option to unmap allSteve Sistare1-0/+8
For the UNMAP_DMA ioctl, delete all mappings if VFIO_DMA_UNMAP_FLAG_ALL is set. Define the corresponding VFIO_UNMAP_ALL extension. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-12-04vfio-ccw: Wire in the request callbackEric Farman1-0/+1
The device is being unplugged, so pass the request to userspace to ask for a graceful cleanup. This should free up the thread that would otherwise loop waiting for the device to be fully released. Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-12Merge branches 'v5.10/vfio/fsl-mc-v6' and 'v5.10/vfio/zpci-info-v3' into ↵Alex Williamson1-0/+12
v5.10/vfio/next
2020-10-07vfio: Introduce capability definitions for VFIO_DEVICE_GET_INFOMatthew Rosato1-0/+11
Allow the VFIO_DEVICE_GET_INFO ioctl to include a capability chain. Add a flag indicating capability chain support, and introduce the definitions for the first set of capabilities which are specified to s390 zPCI devices. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-07vfio/fsl-mc: Add VFIO framework skeleton for fsl-mc devicesBharat Bhushan1-0/+1
DPAA2 (Data Path Acceleration Architecture) consists in mechanisms for processing Ethernet packets, queue management, accelerators, etc. The Management Complex (mc) is a hardware entity that manages the DPAA2 hardware resources. It provides an object-based abstraction for software drivers to use the DPAA2 hardware. The MC mediates operations such as create, discover, destroy of DPAA2 objects. The MC provides memory-mapped I/O command interfaces (MC portals) which DPAA2 software drivers use to operate on DPAA2 objects. A DPRC is a container object that holds other types of DPAA2 objects. Each object in the DPRC is a Linux device and bound to a driver. The MC-bus driver is a platform driver (different from PCI or platform bus). The DPRC driver does runtime management of a bus instance. It performs the initial scan of the DPRC and handles changes in the DPRC configuration (adding/removing objects). All objects inside a container share the same hardware isolation context, meaning that only an entire DPRC can be assigned to a virtual machine. When a container is assigned to a virtual machine, all the objects within that container are assigned to that virtual machine. The DPRC container assigned to the virtual machine is not allowed to change contents (add/remove objects) by the guest. The restriction is set by the host and enforced by the mc hardware. The DPAA2 objects can be directly assigned to the guest. However the MC portals (the memory mapped command interface to the MC) need to be emulated because there are commands that configure the interrupts and the isolation IDs which are virtual in the guest. Example: echo vfio-fsl-mc > /sys/bus/fsl-mc/devices/dprc.2/driver_override echo dprc.2 > /sys/bus/fsl-mc/drivers/vfio-fsl-mc/bind The dprc.2 is bound to the VFIO driver and all the objects within dprc.2 are going to be bound to the VFIO driver. This patch adds the infrastructure for VFIO support for fsl-mc devices. Subsequent patches will add support for binding and secure assigning these devices using VFIO. More details about the DPAA2 objects can be found here: Documentation/networking/device_drivers/freescale/dpaa2/overview.rst Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com> Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-22Merge branches 'v5.10/vfio/bardirty', 'v5.10/vfio/dma_avail', ↵Alex Williamson1-1/+16
'v5.10/vfio/misc', 'v5.10/vfio/no-cmd-mem' and 'v5.10/vfio/yan_zhao_fixes' into v5.10/vfio/next
2020-09-21vfio iommu: Add dma available capabilityMatthew Rosato1-0/+15
Commit 492855939bdb ("vfio/type1: Limit DMA mappings per container") added the ability to limit the number of memory backed DMA mappings. However on s390x, when lazy mapping is in use, we use a very large number of concurrent mappings. Let's provide the current allowable number of DMA mappings to userspace via the IOMMU info chain so that userspace can take appropriate mitigation. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-21vfio: Fix typo of the device_stateZenghui Yu1-1/+1
A typo fix ("_RUNNNG" => "_RUNNING") in comment block of the uapi header. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-06-18vfio/type1: Fix migration info capability IDAlex Williamson1-1/+1
ID 1 is already used by the IOVA range capability, use ID 2. Reported-by: Liu Yi L <yi.l.liu@intel.com> Fixes: ad721705d09c ("vfio iommu: Add migration capability to report supported features") Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-06-08Merge tag 's390-5.8-1' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Add support for multi-function devices in pci code. - Enable PF-VF linking for architectures using the pdev->no_vf_scan flag (currently just s390). - Add reipl from NVMe support. - Get rid of critical section cleanup in entry.S. - Refactor PNSO CHSC (perform network subchannel operation) in cio and qeth. - QDIO interrupts and error handling fixes and improvements, more refactoring changes. - Align ioremap() with generic code. - Accept requests without the prefetch bit set in vfio-ccw. - Enable path handling via two new regions in vfio-ccw. - Other small fixes and improvements all over the code. * tag 's390-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (52 commits) vfio-ccw: make vfio_ccw_regops variables declarations static vfio-ccw: Add trace for CRW event vfio-ccw: Wire up the CRW irq and CRW region vfio-ccw: Introduce a new CRW region vfio-ccw: Refactor IRQ handlers vfio-ccw: Introduce a new schib region vfio-ccw: Refactor the unregister of the async regions vfio-ccw: Register a chp_event callback for vfio-ccw vfio-ccw: Introduce new helper functions to free/destroy regions vfio-ccw: document possible errors vfio-ccw: Enable transparent CCW IPL from DASD s390/pci: Log new handle in clp_disable_fh() s390/cio, s390/qeth: cleanup PNSO CHSC s390/qdio: remove q->first_to_kick s390/qdio: fix up qdio_start_irq() kerneldoc s390: remove critical section cleanup from entry.S s390: add machine check SIGP s390/pci: ioremap() align with generic code s390/ap: introduce new ap function ap_get_qdev() Documentation/s390: Update / remove developerWorks web links ...
2020-06-03vfio-ccw: Introduce a new CRW regionFarhan Ali1-0/+2
This region provides a mechanism to pass a Channel Report Word that affect vfio-ccw devices, and needs to be passed to the guest for its awareness and/or processing. The base driver (see crw_collect_info()) provides space for two CRWs, as a subchannel event may have two CRWs chained together (one for the ssid, one for the subchannel). As vfio-ccw will deal with everything at the subchannel level, provide space for a single CRW to be transferred in one shot. Signed-off-by: Farhan Ali <alifm@linux.ibm.com> Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20200505122745.53208-7-farman@linux.ibm.com> [CH: added padding to ccw_crw_region] Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2020-06-02vfio-ccw: Introduce a new schib regionFarhan Ali1-0/+1
The schib region can be used by userspace to get the subchannel- information block (SCHIB) for the passthrough subchannel. This can be useful to get information such as channel path information via the SCHIB.PMCW fields. Signed-off-by: Farhan Ali <alifm@linux.ibm.com> Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20200505122745.53208-5-farman@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2020-05-29vfio iommu: Add migration capability to report supported featuresKirti Wankhede1-0/+23
Added migration capability in IOMMU info chain. User application should check IOMMU info chain for migration capability to use dirty page tracking feature provided by kernel module. User application must check page sizes supported and maximum dirty bitmap size returned by this capability structure for ioctls used to get dirty bitmap. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-29vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmapKirti Wankhede1-0/+11
DMA mapped pages, including those pinned by mdev vendor drivers, might get unpinned and unmapped while migration is active and device is still running. For example, in pre-copy phase while guest driver could access those pages, host device or vendor driver can dirty these mapped pages. Such pages should be marked dirty so as to maintain memory consistency for a user making use of dirty page tracking. To get bitmap during unmap, user should allocate memory for bitmap, set it all zeros, set size of allocated memory, set page size to be considered for bitmap and set flag VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-29vfio iommu: Add ioctl definition for dirty pages trackingKirti Wankhede1-0/+57
IOMMU container maintains a list of all pages pinned by vfio_pin_pages API. All pages pinned by vendor driver through this API should be considered as dirty during migration. When container consists of IOMMU capable device and all pages are pinned and mapped, then all pages are marked dirty. Added support to start/stop dirtied pages tracking and to get bitmap of all dirtied pages for requested IO virtual address range. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-29vfio: UAPI for migration interface for device stateKirti Wankhede1-0/+228
- Defined MIGRATION region type and sub-type. - Defined vfio_device_migration_info structure which will be placed at the 0th offset of migration region to get/set VFIO device related information. Defined members of structure and usage on read/write access. - Defined device states and state transition details. - Defined sequence to be followed while saving and resuming VFIO device. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-03-24vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first userAlex Williamson1-0/+37
The VFIO_DEVICE_FEATURE ioctl is meant to be a general purpose, device agnostic ioctl for setting, retrieving, and probing device features. This implementation provides a 16-bit field for specifying a feature index, where the data porition of the ioctl is determined by the semantics for the given feature. Additional flag bits indicate the direction and nature of the operation; SET indicates user data is provided into the device feature, GET indicates the device feature is written out into user data. The PROBE flag augments determining whether the given feature is supported, and if provided, whether the given operation on the feature is supported. The first user of this ioctl is for setting the vfio-pci VF token, where the user provides a shared secret key (UUID) on a SR-IOV PF device, which users must provide when opening associated VF devices. Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-08-23Merge branches 'v5.4/vfio/alexey-tce-memory-free-v1', ↵Alex Williamson1-20/+51
'v5.4/vfio/connie-re-arrange-v2', 'v5.4/vfio/hexin-pci-reset-v3', 'v5.4/vfio/parav-mtty-uuid-v2' and 'v5.4/vfio/shameer-iova-list-v8' into v5.4/vfio/next
2019-08-19vfio/type1: Add IOVA range capability supportShameer Kolothum1-1/+25
This allows the user-space to retrieve the supported IOVA range(s), excluding any non-relaxable reserved regions. The implementation is based on capability chains, added to VFIO_IOMMU_GET_INFO ioctl. Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-08-19vfio: re-arrange vfio region definitionsCornelia Huck1-19/+26
It is easy to miss already defined region types. Let's re-arrange the definitions a bit and add more comments to make it hopefully a bit clearer. No functional change. Signed-off-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-04-24vfio-ccw: add handling for async channel instructionsCornelia Huck1-0/+2
Add a region to the vfio-ccw device that can be used to submit asynchronous I/O instructions. ssch continues to be handled by the existing I/O region; the new region handles hsch and csch. Interrupt status continues to be reported through the same channels as for ssch. Acked-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Farhan Ali <alifm@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2019-04-24vfio-ccw: add capabilities chainCornelia Huck1-0/+2
Allow to extend the regions used by vfio-ccw. The first user will be handling of halt and clear subchannel. Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Farhan Ali <alifm@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2018-12-21vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriverAlexey Kardashevskiy1-0/+42
POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not pluggable PCIe devices but still have PCIe links which are used for config space and MMIO. In addition to that the GPUs have 6 NVLinks which are connected to other GPUs and the POWER9 CPU. POWER9 chips have a special unit on a die called an NPU which is an NVLink2 host bus adapter with p2p connections to 2 to 3 GPUs, 3 or 2 NVLinks to each. These systems also support ATS (address translation services) which is a part of the NVLink2 protocol. Such GPUs also share on-board RAM (16GB or 32GB) to the system via the same NVLink2 so a CPU has cache-coherent access to a GPU RAM. This exports GPU RAM to the userspace as a new VFIO device region. This preregisters the new memory as device memory as it might be used for DMA. This inserts pfns from the fault handler as the GPU memory is not onlined until the vendor driver is loaded and trained the NVLinks so doing this earlier causes low level errors which we fence in the firmware so it does not hurt the host system but still better be avoided; for the same reason this does not map GPU RAM into the host kernel (usual thing for emulated access otherwise). This exports an ATSD (Address Translation Shootdown) register of NPU which allows TLB invalidations inside GPU for an operating system. The register conveniently occupies a single 64k page. It is also presented to the userspace as a new VFIO device region. One NPU has 8 ATSD registers, each of them can be used for TLB invalidation in a GPU linked to this NPU. This allocates one ATSD register per an NVLink bridge allowing passing up to 6 registers. Due to the host firmware bug (just recently fixed), only 1 ATSD register per NPU was actually advertised to the host system so this passes that alone register via the first NVLink bridge device in the group which is still enough as QEMU collects them all back and presents to the guest via vPHB to mimic the emulated NPU PHB on the host. In order to provide the userspace with the information about GPU-to-NVLink connections, this exports an additional capability called "tgt" (which is an abbreviated host system bus address). The "tgt" property tells the GPU its own system address and allows the guest driver to conglomerate the routing information so each GPU knows how to get directly to the other GPUs. For ATS to work, the nest MMU (an NVIDIA block in a P9 CPU) needs to know LPID (a logical partition ID or a KVM guest hardware ID in other words) and PID (a memory context ID of a userspace process, not to be confused with a linux pid). This assigns a GPU to LPID in the NPU and this is why this adds a listener for KVM on an IOMMU group. A PID comes via NVLink from a GPU and NPU uses a PID wildcard to pass it through. This requires coherent memory and ATSD to be available on the host as the GPU vendor only supports configurations with both features enabled and other configurations are known not to work. Because of this and because of the ways the features are advertised to the host system (which is a device tree with very platform specific properties), this requires enabled POWERNV platform. The V100 GPUs do not advertise any of these capabilities via the config space and there are more than just one device ID so this relies on the platform to tell whether these GPUs have special abilities such as NVLinks. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-31Merge tag 'vfio-v4.20-rc1.v2' of git://github.com/awilliam/linux-vfioLinus Torvalds1-0/+50
Pull VFIO updates from Alex Williamson: - EDID interfaces for vfio devices supporting display extensions (Gerd Hoffmann) - Generically select Type-1 IOMMU model support on ARM/ARM64 (Geert Uytterhoeven) - Quirk for VFs reporting INTx pin (Alex Williamson) - Fix error path memory leak in MSI support (Li Qiang) * tag 'vfio-v4.20-rc1.v2' of git://github.com/awilliam/linux-vfio: vfio: add edid support to mbochs sample driver vfio: add edid api for display (vgpu) devices. drivers/vfio: Allow type-1 IOMMU instantiation with all ARM/ARM64 IOMMUs vfio/pci: Mask buggy SR-IOV VF INTx support vfio/pci: Fix potential memory leak in vfio_msi_cap_len
2018-10-11vfio: add edid api for display (vgpu) devices.Gerd Hoffmann1-0/+50
This allows to set EDID monitor information for the vgpu display, for a more flexible display configuration, using a special vfio region. Check the comment describing struct vfio_region_gfx_edid for more details. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-09-28s390: vfio-ap: implement VFIO_DEVICE_GET_INFO ioctlTony Krowiak1-0/+1
Adds support for the VFIO_DEVICE_GET_INFO ioctl to the VFIO AP Matrix device driver. This is a minimal implementation, as vfio-ap does not use I/O regions. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Halil Pasic <pasic@linux.ibm.com> Tested-by: Michael Mueller <mimu@linux.ibm.com> Tested-by: Farhan Ali <alifm@linux.ibm.com> Tested-by: Pierre Morel <pmorel@linux.ibm.com> Message-Id: <20180925231641.4954-13-akrowiak@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2018-09-26s390: vfio-ap: register matrix device with VFIO mdev frameworkTony Krowiak1-0/+1
Registers the matrix device created by the VFIO AP device driver with the VFIO mediated device framework. Registering the matrix device will create the sysfs structures needed to create mediated matrix devices each of which will be used to configure the AP matrix for a guest and connect it to the VFIO AP device driver. Registering the matrix device with the VFIO mediated device framework will create the following sysfs structures: /sys/devices/vfio_ap/matrix/ ...... [mdev_supported_types] ......... [vfio_ap-passthrough] ............ create To create a mediated device for the AP matrix device, write a UUID to the create file: uuidgen > create A symbolic link to the mediated device's directory will be created in the devices subdirectory named after the generated $uuid: /sys/devices/vfio_ap/matrix/ ...... [mdev_supported_types] ......... [vfio_ap-passthrough] ............ [devices] ............... [$uuid] A symbolic link to the mediated device will also be created in the vfio_ap matrix's directory: /sys/devices/vfio_ap/matrix/[$uuid] Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Tested-by: Michael Mueller <mimu@linux.ibm.com> Tested-by: Farhan Ali <alifm@linux.ibm.com> Message-Id: <20180925231641.4954-6-akrowiak@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2018-03-26vfio/pci: Add ioeventfd supportAlex Williamson1-0/+27
The ioeventfd here is actually irqfd handling of an ioeventfd such as supported in KVM. A user is able to pre-program a device write to occur when the eventfd triggers. This is yet another instance of eventfd-irqfd triggering between KVM and vfio. The impetus for this is high frequency writes to pages which are virtualized in QEMU. Enabling this near-direct write path for selected registers within the virtualized page can improve performance and reduce overhead. Specifically this is initially targeted at NVIDIA graphics cards where the driver issues a write to an MMIO register within a virtualized region in order to allow the MSI interrupt to re-trigger. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-02-02Merge tag 'drm-for-v4.16' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds1-0/+62
Pull drm updates from Dave Airlie: "This seems to have been a comparatively quieter merge window, I assume due to holidays etc. The "biggest" change is AMD header cleanups, which merge/remove a bunch of them. The AMD gpu scheduler is now being made generic with the etnaviv driver wanting to reuse the code, hopefully other drivers can go in the same direction. Otherwise it's the usual lots of stuff in i915/amdgpu, not so much stuff elsewhere. Core: - Add .last_close and .output_poll_changed helpers to reduce driver footprints - Fix plane clipping - Improved debug printing support - Add panel orientation property - Update edid derived properties at edid setting - Reduction in fbdev driver footprint - Move amdgpu scheduler into core for other drivers to use. i915: - Selftest and IGT improvements - Fast boot prep work on IPS, pipe config - HW workarounds for Cannonlake, Geminilake - Cannonlake clock and HDMI2.0 fixes - GPU cache invalidation and context switch improvements - Display planes cleanup - New PMU interface for perf queries - New firmware support for KBL/SKL - Geminilake HW workaround for perforamce - Coffeelake stolen memory improvements - GPU reset robustness work - Cannonlake horizontal plane flipping - GVT work amdgpu/radeon: - RV and Vega header file cleanups (lots of lines gone!) - TTM operation context support - 48-bit GPUVM support for Vega/RV - ECC support for Vega - Resizeable BAR support - Multi-display sync support - Enable swapout for reserved BOs during allocation - S3 fixes on Raven - GPU reset cleanup and fixes - 2+1 level GPU page table amdkfd: - GFX7/8 SDMA user queues support - Hardware scheduling for multiple processes - dGPU prep work rcar: - Added R8A7743/5 support - System suspend/resume support sun4i: - Multi-plane support for YUV formats - A83T and LVDS support msm: - Devfreq support for GPU tegra: - Prep work for adding Tegra186 support - Tegra186 HDMI support - HDMI2.0 and zpos support by using generic helpers tilcdc: - Misc fixes omapdrm: - Support memory bandwidth limits - DSI command mode panel cleanups - DMM error handling exynos: - drop the old IPP subdriver. etnaviv: - Occlusion query fixes - Job handling fixes - Prep work for hooking in gpu scheduler armada: - Move closer to atomic modesetting - Allow disabling primary plane if overlay is full screen imx: - Format modifier support - Add tile prefetch to PRE - Runtime PM support for PRG ast: - fix LUT loading" * tag 'drm-for-v4.16' of git://people.freedesktop.org/~airlied/linux: (1471 commits) drm/ast: Load lut in crtc_commit drm: Check for lessee in DROP_MASTER ioctl drm: fix gpu scheduler link order drm/amd/display: Demote error print to debug print when ATOM impl missing dma-buf: fix reservation_object_wait_timeout_rcu once more v2 drm/amdgpu: Avoid leaking PM domain on driver unbind (v2) drm/amd/amdgpu: Add Polaris version check drm/amdgpu: Reenable manual GPU reset from sysfs drm/amdgpu: disable MMHUB power gating on raven drm/ttm: Don't unreserve swapped BOs that were previously reserved drm/ttm: Don't add swapped BOs to swap-LRU list drm/amdgpu: only check for ECC on Vega10 drm/amd/powerplay: Fix smu_table_entry.handle type drm/ttm: add VADDR_FLAG_UPDATED_COUNT to correctly update dma_page global count drm: Fix PANEL_ORIENTATION_QUIRKS breaking the Kconfig DRM menuconfig drm/radeon: fill in rb backend map on evergreen/ni. drm/amdgpu/gfx9: fix ngg enablement to clear gds reserved memory (v2) drm/ttm: only free pages rather than update global memory count together drm/amdgpu: fix CPU based VM updates drm/amdgpu: fix typo in amdgpu_vce_validate_bo ...
2017-12-20vfio-pci: Allow mapping MSIX BARAlexey Kardashevskiy1-0/+10
By default VFIO disables mapping of MSIX BAR to the userspace as the userspace may program it in a way allowing spurious interrupts; instead the userspace uses the VFIO_DEVICE_SET_IRQS ioctl. In order to eliminate guessing from the userspace about what is mmapable, VFIO also advertises a sparse list of regions allowed to mmap. This works fine as long as the system page size equals to the MSIX alignment requirement which is 4KB. However with a bigger page size the existing code prohibits mapping non-MSIX parts of a page with MSIX structures so these parts have to be emulated via slow reads/writes on a VFIO device fd. If these emulated bits are accessed often, this has serious impact on performance. This allows mmap of the entire BAR containing MSIX vector table. This removes the sparse capability for PCI devices as it becomes useless. As the userspace needs to know for sure whether mmapping of the MSIX vector containing data can succeed, this adds a new capability - VFIO_REGION_INFO_CAP_MSIX_MAPPABLE - which explicitly tells the userspace that the entire BAR can be mmapped. This does not touch the MSIX mangling in the BAR read/write handlers as we are doing this just to enable direct access to non MSIX registers. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw - fixup whitespace, trim function name] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-12-08Merge airlied/drm-next into drm-intel-next-queuedRodrigo Vivi1-0/+1
Chris requested this backmerge for a reconciliation on drm_print.h between drm-misc-next and drm-intel-next-queued Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2017-12-04vfio: ABI for mdev display dma-buf operationTina Zhang1-0/+62
Add VFIO_DEVICE_QUERY_GFX_PLANE ioctl command to let user query and get a plane and its information. So far, two types of buffers are supported: buffers based on dma-buf and buffers based on region. This ioctl can be invoked with: 1) Either DMABUF or REGION flag. Vendor driver returns a plane_info successfully only when the specific kind of buffer is supported. 2) Flag PROBE. And at the same time either DMABUF or REGION must be set, so that vendor driver returns success only when the specific kind of buffer is supported. Add VFIO_DEVICE_GET_GFX_DMABUF ioctl command to let user get a specific dma-buf fd of an exposed MDEV buffer provided by dmabuf_id which was returned in VFIO_DEVICE_QUERY_GFX_PLANE ioctl command. The life cycle of an exposed MDEV buffer is handled by userspace and tracked by kernel space. The returned dmabuf_id in struct vfio_device_ query_gfx_plane can be a new id of a new exposed buffer or an old id of a re-exported buffer. Host user can check the value of dmabuf_id to see if it needs to create new resources according to the new exposed buffer or just re-use the existing resource related to the old buffer. v18: - update comments for VFIO_DEVICE_GET_GFX_DMABUF. (Alex) v17: - modify VFIO_DEVICE_GET_GFX_DMABUF interface. (Alex) v16: - add x_hot and y_hot fields. (Gerd) - add comments for VFIO_DEVICE_GET_GFX_DMABUF. (Alex) - rebase to 4.14.0-rc6. v15: - add a ioctl to get a dmabuf for a given dmabuf id. (Gerd) v14: - add PROBE, DMABUF and REGION flags. (Alex) v12: - add drm_format_mod back. (Gerd and Zhenyu) - add region_index. (Gerd) v11: - rename plane_type to drm_plane_type. (Gerd) - move fields of vfio_device_query_gfx_plane to vfio_device_gfx_plane_info. (Gerd) - remove drm_format_mod, start fields. (Daniel) - remove plane_id. v10: - refine the ABI API VFIO_DEVICE_QUERY_GFX_PLANE. (Alex) (Gerd) v3: - add a field gvt_plane_info in the drm_i915_gem_obj structure to save the decoded plane information to avoid look up while need the plane info. (Gerd) Signed-off-by: Tina Zhang <tina.zhang@intel.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
2017-11-02License cleanup: add SPDX license identifier to uapi header files with a licenseGreg Kroah-Hartman1-0/+1
Many user space API headers have licensing information, which is either incomplete, badly formatted or just a shorthand for referring to the license under which the file is supposed to be. This makes it hard for compliance tools to determine the correct license. Update these files with an SPDX license identifier. The identifier was chosen based on the license information in the file. GPL/LGPL licensed headers get the matching GPL/LGPL SPDX license identifier with the added 'WITH Linux-syscall-note' exception, which is the officially assigned exception identifier for the kernel syscall exception: NOTE! This copyright does *not* cover user programs that use kernel services by normal system calls - this is merely considered normal use of the kernel, and does *not* fall under the heading of "derived work". This exception makes it possible to include GPL headers into non GPL code, without confusing license compliance tools. Headers which have either explicit dual licensing or are just licensed under a non GPL license are updated with the corresponding SPDX identifier and the GPLv2 with syscall exception identifier. The format is: ((GPL-2.0 WITH Linux-syscall-note) OR SPDX-ID-OF-OTHER-LICENSE) SPDX license identifiers are a legally binding shorthand, which can be used instead of the full boiler plate text. The update does not remove existing license information as this has to be done on a case by case basis and the copyright holders might have to be consulted. This will happen in a separate step. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. See the previous patch in this series for the methodology of how this patch was researched. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-31vfio: ccw: realize VFIO_DEVICE_G(S)ET_IRQ_INFO ioctlsDong Jia Shi1-2/+8
Realize VFIO_DEVICE_GET_IRQ_INFO ioctl to retrieve VFIO_CCW_IO_IRQ information. Realize VFIO_DEVICE_SET_IRQS ioctl to set an eventfd fd for VFIO_CCW_IO_IRQ. Once a write operation to the ccw_io_region was performed, trigger a signal on this fd. Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <20170317031743.40128-12-bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2017-03-31vfio: ccw: realize VFIO_DEVICE_GET_REGION_INFO ioctlDong Jia Shi1-0/+11
Introduce device information about vfio-ccw: VFIO_DEVICE_FLAGS_CCW. Realize VFIO_DEVICE_GET_REGION_INFO ioctl for vfio-ccw. Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <20170317031743.40128-10-bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2017-03-31vfio: ccw: define device_api stringsDong Jia Shi1-0/+1
Define vfio-ccw device API strings. CCW vendor driver using mediated device framework should use this string for device_api attribute. Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <20170317031743.40128-4-bjsdjshi@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-11-17vfio: Define device_api stringsKirti Wankhede1-0/+10
Defined device API strings. Vendor driver using mediated device framework should use corresponding string for device_api attribute. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-02-23vfio/pci: Intel IGD host and LCP bridge config space accessAlex Williamson1-0/+3
Provide read-only access to PCI config space of the PCI host bridge and LPC bridge through device specific regions. This may be used to configure a VM with matching register contents to satisfy driver requirements. Providing this through the vfio file descriptor removes an additional userspace requirement for access through pci-sysfs and removes the CAP_SYS_ADMIN requirement that doesn't appear to apply to the specific devices we're accessing. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-02-23vfio/pci: Intel IGD OpRegion supportAlex Williamson1-0/+5
This is the first consumer of vfio device specific resource support, providing read-only access to the OpRegion for Intel graphics devices. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-02-23vfio: Define device specific region type capabilityAlex Williamson1-1/+30
To this point vfio has only provided an interface to the user that allows them to determine the number of regions and specifics about each region. What the region represents is left to the vfio bus driver. vfio-pci chooses to use fixed indexes for fixed resources, index 0 is BAR0, 1 is BAR1,... 7 is config space, etc. This works pretty well since all PCI devices have these regions, even if they don't necessarily populate all of them. Then we start to add things like VGA, which only certain device even support. We added this the same way, but now we've wasted a region index, and due to our offset implementation the corresponding address space, for all devices. Rather than continuing that process, let's try to make regions self describing by including a capability that defines their type. For vfio-pci we'll make the current VFIO_PCI_NUM_REGIONS fixed, defining the end of the static indexes and the beginning of self describing regions. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-02-23vfio: Define sparse mmap capability for regionsAlex Williamson1-1/+25
We can't always support mmap across an entire device region, for example we deny mmaps covering the MSI-X table of PCI devices, but we don't really have a way to report it. We expect the user to implicitly know this restriction. We also can't split the region because vfio-pci defines an API with fixed region index to BAR number mapping. We therefore define a new capability which lists areas within the region that may be mmap'd. In addition to the MSI-X case, this potentially enables in-kernel emulation and extensions to devices. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-02-23vfio: Define capability chainsAlex Williamson1-0/+27
We have a few cases where we need to extend the data returned from the INFO ioctls in VFIO. For instance we already have devices exposed through vfio-pci where VFIO_DEVICE_GET_REGION_INFO reports the region as mmap-capable, but really only supports sparse mmaps, avoiding the MSI-X table. If we wanted to provide in-kernel emulation or extended functionality for devices, we'd also want the ability to tell the user not to mmap various regions, rather than forcing them to figure it out on their own. Another example is VFIO_IOMMU_GET_INFO. We'd really like to expose the actual IOVA capabilities of the IOMMU rather than letting the user assume the address space they have available to them. We could add IOVA base and size fields to struct vfio_iommu_type1_info, but what if we have multiple IOVA ranges. For instance x86 uses a range of addresses at 0xfee00000 for MSI vectors. These typically are not available for standard DMA IOVA mappings and splits our available IOVA space into two ranges. POWER systems have both an IOVA window below 4G as well as dynamic data window which they can use to remap all of guest memory. Representing variable sized arrays within a fixed structure makes it very difficult to parse, we'd therefore like to put this data beyond fixed fields within the data structures. One way to do this is to emulate capabilities in PCI configuration space. A new flag indciates whether capabilties are supported and a new fixed field reports the offset of the first entry. Users can then walk the chain to find capabilities, adding capabilities does not require additional fields in the fixed structure, and parsing variable sized data becomes trivial. This patch outlines the theory and base header structure, which should be shared by all future users. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-12-22vfio: Include No-IOMMU modeAlex Williamson1-0/+7
There is really no way to safely give a user full access to a DMA capable device without an IOMMU to protect the host system. There is also no way to provide DMA translation, for use cases such as device assignment to virtual machines. However, there are still those users that want userspace drivers even under those conditions. The UIO driver exists for this use case, but does not provide the degree of device access and programming that VFIO has. In an effort to avoid code duplication, this introduces a No-IOMMU mode for VFIO. This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling the "enable_unsafe_noiommu_mode" option on the vfio driver. This should make it very clear that this mode is not safe. Additionally, CAP_SYS_RAWIO privileges are necessary to work with groups and containers using this mode. Groups making use of this support are named /dev/vfio/noiommu-$GROUP and can only make use of the special VFIO_NOIOMMU_IOMMU for the container. Use of this mode, specifically binding a device without a native IOMMU group to a VFIO bus driver will taint the kernel and should therefore not be considered supported. This patch includes no-iommu support for the vfio-pci bus driver only. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com>
2015-12-22vfio: Add explicit alignments in vfio_iommu_spapr_tce_createAlexey Kardashevskiy1-0/+2
The vfio_iommu_spapr_tce_create struct has 4x32bit and 2x64bit fields which should have resulted in sizeof(fio_iommu_spapr_tce_create) equal to 32 bytes. However due to the gcc's default alignment, the actual size of this struct is 40 bytes. This fills gaps with __resv1/2 fields. This should not cause any change in behavior. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-12-04Revert: "vfio: Include No-IOMMU mode"Alex Williamson1-7/+0
Revert commit 033291eccbdb ("vfio: Include No-IOMMU mode") due to lack of a user. This was originally intended to fill a need for the DPDK driver, but uptake has been slow so rather than support an unproven kernel interface revert it and revisit when userspace catches up. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-11-04vfio: Include No-IOMMU modeAlex Williamson1-0/+7
There is really no way to safely give a user full access to a DMA capable device without an IOMMU to protect the host system. There is also no way to provide DMA translation, for use cases such as device assignment to virtual machines. However, there are still those users that want userspace drivers even under those conditions. The UIO driver exists for this use case, but does not provide the degree of device access and programming that VFIO has. In an effort to avoid code duplication, this introduces a No-IOMMU mode for VFIO. This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling the "enable_unsafe_noiommu_mode" option on the vfio driver. This should make it very clear that this mode is not safe. Additionally, CAP_SYS_RAWIO privileges are necessary to work with groups and containers using this mode. Groups making use of this support are named /dev/vfio/noiommu-$GROUP and can only make use of the special VFIO_NOIOMMU_IOMMU for the container. Use of this mode, specifically binding a device without a native IOMMU group to a VFIO bus driver will taint the kernel and should therefore not be considered supported. This patch includes no-iommu support for the vfio-pci bus driver only. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com>
2015-06-11vfio: powerpc/spapr: Support Dynamic DMA windowsAlexey Kardashevskiy1-3/+58
This adds create/remove window ioctls to create and remove DMA windows. sPAPR defines a Dynamic DMA windows capability which allows para-virtualized guests to create additional DMA windows on a PCI bus. The existing linux kernels use this new window to map the entire guest memory and switch to the direct DMA operations saving time on map/unmap requests which would normally happen in a big amounts. This adds 2 ioctl handlers - VFIO_IOMMU_SPAPR_TCE_CREATE and VFIO_IOMMU_SPAPR_TCE_REMOVE - to create and remove windows. Up to 2 windows are supported now by the hardware and by this driver. This changes VFIO_IOMMU_SPAPR_TCE_GET_INFO handler to return additional information such as a number of supported windows and maximum number levels of TCE tables. DDW is added as a capability, not as a SPAPR TCE IOMMU v2 unique feature as we still want to support v2 on platforms which cannot do DDW for the sake of TCE acceleration in KVM (coming soon). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>