diff options
author | Costa Shulyupin <costa.shul@redhat.com> | 2023-07-18 07:55:02 +0300 |
---|---|---|
committer | Heiko Carstens <hca@linux.ibm.com> | 2023-07-24 13:12:24 +0300 |
commit | 37002bc6b6039e1491140869c6801e0a2deee43e (patch) | |
tree | baeb304521b33d4f36bfecb1a03afec2c7af9d93 /Documentation/s390/vfio-ap.rst | |
parent | e3123dfb5373939d65ac2b874189a773d37ac7f5 (diff) | |
download | linux-37002bc6b6039e1491140869c6801e0a2deee43e.tar.xz |
docs: move s390 under arch
and fix all in-tree references.
Architecture-specific documentation is being moved into Documentation/arch/
as a way of cleaning up the top-level documentation directory and making
the docs hierarchy more closely match the source hierarchy.
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20230718045550.495428-1-costa.shul@redhat.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Diffstat (limited to 'Documentation/s390/vfio-ap.rst')
-rw-r--r-- | Documentation/s390/vfio-ap.rst | 1069 |
1 files changed, 0 insertions, 1069 deletions
diff --git a/Documentation/s390/vfio-ap.rst b/Documentation/s390/vfio-ap.rst deleted file mode 100644 index bb3f4c4e2885..000000000000 --- a/Documentation/s390/vfio-ap.rst +++ /dev/null @@ -1,1069 +0,0 @@ -=============================== -Adjunct Processor (AP) facility -=============================== - - -Introduction -============ -The Adjunct Processor (AP) facility is an IBM Z cryptographic facility comprised -of three AP instructions and from 1 up to 256 PCIe cryptographic adapter cards. -The AP devices provide cryptographic functions to all CPUs assigned to a -linux system running in an IBM Z system LPAR. - -The AP adapter cards are exposed via the AP bus. The motivation for vfio-ap -is to make AP cards available to KVM guests using the VFIO mediated device -framework. This implementation relies considerably on the s390 virtualization -facilities which do most of the hard work of providing direct access to AP -devices. - -AP Architectural Overview -========================= -To facilitate the comprehension of the design, let's start with some -definitions: - -* AP adapter - - An AP adapter is an IBM Z adapter card that can perform cryptographic - functions. There can be from 0 to 256 adapters assigned to an LPAR. Adapters - assigned to the LPAR in which a linux host is running will be available to - the linux host. Each adapter is identified by a number from 0 to 255; however, - the maximum adapter number is determined by machine model and/or adapter type. - When installed, an AP adapter is accessed by AP instructions executed by any - CPU. - - The AP adapter cards are assigned to a given LPAR via the system's Activation - Profile which can be edited via the HMC. When the linux host system is IPL'd - in the LPAR, the AP bus detects the AP adapter cards assigned to the LPAR and - creates a sysfs device for each assigned adapter. For example, if AP adapters - 4 and 10 (0x0a) are assigned to the LPAR, the AP bus will create the following - sysfs device entries:: - - /sys/devices/ap/card04 - /sys/devices/ap/card0a - - Symbolic links to these devices will also be created in the AP bus devices - sub-directory:: - - /sys/bus/ap/devices/[card04] - /sys/bus/ap/devices/[card04] - -* AP domain - - An adapter is partitioned into domains. An adapter can hold up to 256 domains - depending upon the adapter type and hardware configuration. A domain is - identified by a number from 0 to 255; however, the maximum domain number is - determined by machine model and/or adapter type.. A domain can be thought of - as a set of hardware registers and memory used for processing AP commands. A - domain can be configured with a secure private key used for clear key - encryption. A domain is classified in one of two ways depending upon how it - may be accessed: - - * Usage domains are domains that are targeted by an AP instruction to - process an AP command. - - * Control domains are domains that are changed by an AP command sent to a - usage domain; for example, to set the secure private key for the control - domain. - - The AP usage and control domains are assigned to a given LPAR via the system's - Activation Profile which can be edited via the HMC. When a linux host system - is IPL'd in the LPAR, the AP bus module detects the AP usage and control - domains assigned to the LPAR. The domain number of each usage domain and - adapter number of each AP adapter are combined to create AP queue devices - (see AP Queue section below). The domain number of each control domain will be - represented in a bitmask and stored in a sysfs file - /sys/bus/ap/ap_control_domain_mask. The bits in the mask, from most to least - significant bit, correspond to domains 0-255. - -* AP Queue - - An AP queue is the means by which an AP command is sent to a usage domain - inside a specific adapter. An AP queue is identified by a tuple - comprised of an AP adapter ID (APID) and an AP queue index (APQI). The - APQI corresponds to a given usage domain number within the adapter. This tuple - forms an AP Queue Number (APQN) uniquely identifying an AP queue. AP - instructions include a field containing the APQN to identify the AP queue to - which the AP command is to be sent for processing. - - The AP bus will create a sysfs device for each APQN that can be derived from - the cross product of the AP adapter and usage domain numbers detected when the - AP bus module is loaded. For example, if adapters 4 and 10 (0x0a) and usage - domains 6 and 71 (0x47) are assigned to the LPAR, the AP bus will create the - following sysfs entries:: - - /sys/devices/ap/card04/04.0006 - /sys/devices/ap/card04/04.0047 - /sys/devices/ap/card0a/0a.0006 - /sys/devices/ap/card0a/0a.0047 - - The following symbolic links to these devices will be created in the AP bus - devices subdirectory:: - - /sys/bus/ap/devices/[04.0006] - /sys/bus/ap/devices/[04.0047] - /sys/bus/ap/devices/[0a.0006] - /sys/bus/ap/devices/[0a.0047] - -* AP Instructions: - - There are three AP instructions: - - * NQAP: to enqueue an AP command-request message to a queue - * DQAP: to dequeue an AP command-reply message from a queue - * PQAP: to administer the queues - - AP instructions identify the domain that is targeted to process the AP - command; this must be one of the usage domains. An AP command may modify a - domain that is not one of the usage domains, but the modified domain - must be one of the control domains. - -AP and SIE -========== -Let's now take a look at how AP instructions executed on a guest are interpreted -by the hardware. - -A satellite control block called the Crypto Control Block (CRYCB) is attached to -our main hardware virtualization control block. The CRYCB contains an AP Control -Block (APCB) that has three fields to identify the adapters, usage domains and -control domains assigned to the KVM guest: - -* The AP Mask (APM) field is a bit mask that identifies the AP adapters assigned - to the KVM guest. Each bit in the mask, from left to right, corresponds to - an APID from 0-255. If a bit is set, the corresponding adapter is valid for - use by the KVM guest. - -* The AP Queue Mask (AQM) field is a bit mask identifying the AP usage domains - assigned to the KVM guest. Each bit in the mask, from left to right, - corresponds to an AP queue index (APQI) from 0-255. If a bit is set, the - corresponding queue is valid for use by the KVM guest. - -* The AP Domain Mask field is a bit mask that identifies the AP control domains - assigned to the KVM guest. The ADM bit mask controls which domains can be - changed by an AP command-request message sent to a usage domain from the - guest. Each bit in the mask, from left to right, corresponds to a domain from - 0-255. If a bit is set, the corresponding domain can be modified by an AP - command-request message sent to a usage domain. - -If you recall from the description of an AP Queue, AP instructions include -an APQN to identify the AP queue to which an AP command-request message is to be -sent (NQAP and PQAP instructions), or from which a command-reply message is to -be received (DQAP instruction). The validity of an APQN is defined by the matrix -calculated from the APM and AQM; it is the Cartesian product of all assigned -adapter numbers (APM) with all assigned queue indexes (AQM). For example, if -adapters 1 and 2 and usage domains 5 and 6 are assigned to a guest, the APQNs -(1,5), (1,6), (2,5) and (2,6) will be valid for the guest. - -The APQNs can provide secure key functionality - i.e., a private key is stored -on the adapter card for each of its domains - so each APQN must be assigned to -at most one guest or to the linux host:: - - Example 1: Valid configuration: - ------------------------------ - Guest1: adapters 1,2 domains 5,6 - Guest2: adapter 1,2 domain 7 - - This is valid because both guests have a unique set of APQNs: - Guest1 has APQNs (1,5), (1,6), (2,5), (2,6); - Guest2 has APQNs (1,7), (2,7) - - Example 2: Valid configuration: - ------------------------------ - Guest1: adapters 1,2 domains 5,6 - Guest2: adapters 3,4 domains 5,6 - - This is also valid because both guests have a unique set of APQNs: - Guest1 has APQNs (1,5), (1,6), (2,5), (2,6); - Guest2 has APQNs (3,5), (3,6), (4,5), (4,6) - - Example 3: Invalid configuration: - -------------------------------- - Guest1: adapters 1,2 domains 5,6 - Guest2: adapter 1 domains 6,7 - - This is an invalid configuration because both guests have access to - APQN (1,6). - -The Design -========== -The design introduces three new objects: - -1. AP matrix device -2. VFIO AP device driver (vfio_ap.ko) -3. VFIO AP mediated pass-through device - -The VFIO AP device driver -------------------------- -The VFIO AP (vfio_ap) device driver serves the following purposes: - -1. Provides the interfaces to secure APQNs for exclusive use of KVM guests. - -2. Sets up the VFIO mediated device interfaces to manage a vfio_ap mediated - device and creates the sysfs interfaces for assigning adapters, usage - domains, and control domains comprising the matrix for a KVM guest. - -3. Configures the APM, AQM and ADM in the APCB contained in the CRYCB referenced - by a KVM guest's SIE state description to grant the guest access to a matrix - of AP devices - -Reserve APQNs for exclusive use of KVM guests ---------------------------------------------- -The following block diagram illustrates the mechanism by which APQNs are -reserved:: - - +------------------+ - 7 remove | | - +--------------------> cex4queue driver | - | | | - | +------------------+ - | - | - | +------------------+ +----------------+ - | 5 register driver | | 3 create | | - | +----------------> Device core +----------> matrix device | - | | | | | | - | | +--------^---------+ +----------------+ - | | | - | | +-------------------+ - | | +-----------------------------------+ | - | | | 4 register AP driver | | 2 register device - | | | | | - +--------+---+-v---+ +--------+-------+-+ - | | | | - | ap_bus +--------------------- > vfio_ap driver | - | | 8 probe | | - +--------^---------+ +--^--^------------+ - 6 edit | | | - apmask | +-----------------------------+ | 11 mdev create - aqmask | | 1 modprobe | - +--------+-----+---+ +----------------+-+ +----------------+ - | | | |10 create| mediated | - | admin | | VFIO device core |---------> matrix | - | + | | | device | - +------+-+---------+ +--------^---------+ +--------^-------+ - | | | | - | | 9 create vfio_ap-passthrough | | - | +------------------------------+ | - +-------------------------------------------------------------+ - 12 assign adapter/domain/control domain - -The process for reserving an AP queue for use by a KVM guest is: - -1. The administrator loads the vfio_ap device driver -2. The vfio-ap driver during its initialization will register a single 'matrix' - device with the device core. This will serve as the parent device for - all vfio_ap mediated devices used to configure an AP matrix for a guest. -3. The /sys/devices/vfio_ap/matrix device is created by the device core -4. The vfio_ap device driver will register with the AP bus for AP queue devices - of type 10 and higher (CEX4 and newer). The driver will provide the vfio_ap - driver's probe and remove callback interfaces. Devices older than CEX4 queues - are not supported to simplify the implementation by not needlessly - complicating the design by supporting older devices that will go out of - service in the relatively near future, and for which there are few older - systems around on which to test. -5. The AP bus registers the vfio_ap device driver with the device core -6. The administrator edits the AP adapter and queue masks to reserve AP queues - for use by the vfio_ap device driver. -7. The AP bus removes the AP queues reserved for the vfio_ap driver from the - default zcrypt cex4queue driver. -8. The AP bus probes the vfio_ap device driver to bind the queues reserved for - it. -9. The administrator creates a passthrough type vfio_ap mediated device to be - used by a guest -10. The administrator assigns the adapters, usage domains and control domains - to be exclusively used by a guest. - -Set up the VFIO mediated device interfaces ------------------------------------------- -The VFIO AP device driver utilizes the common interfaces of the VFIO mediated -device core driver to: - -* Register an AP mediated bus driver to add a vfio_ap mediated device to and - remove it from a VFIO group. -* Create and destroy a vfio_ap mediated device -* Add a vfio_ap mediated device to and remove it from the AP mediated bus driver -* Add a vfio_ap mediated device to and remove it from an IOMMU group - -The following high-level block diagram shows the main components and interfaces -of the VFIO AP mediated device driver:: - - +-------------+ - | | - | +---------+ | mdev_register_driver() +--------------+ - | | Mdev | +<-----------------------+ | - | | bus | | | vfio_mdev.ko | - | | driver | +----------------------->+ |<-> VFIO user - | +---------+ | probe()/remove() +--------------+ APIs - | | - | MDEV CORE | - | MODULE | - | mdev.ko | - | +---------+ | mdev_register_parent() +--------------+ - | |Physical | +<-----------------------+ | - | | device | | | vfio_ap.ko |<-> matrix - | |interface| +----------------------->+ | device - | +---------+ | callback +--------------+ - +-------------+ - -During initialization of the vfio_ap module, the matrix device is registered -with an 'mdev_parent_ops' structure that provides the sysfs attribute -structures, mdev functions and callback interfaces for managing the mediated -matrix device. - -* sysfs attribute structures: - - supported_type_groups - The VFIO mediated device framework supports creation of user-defined - mediated device types. These mediated device types are specified - via the 'supported_type_groups' structure when a device is registered - with the mediated device framework. The registration process creates the - sysfs structures for each mediated device type specified in the - 'mdev_supported_types' sub-directory of the device being registered. Along - with the device type, the sysfs attributes of the mediated device type are - provided. - - The VFIO AP device driver will register one mediated device type for - passthrough devices: - - /sys/devices/vfio_ap/matrix/mdev_supported_types/vfio_ap-passthrough - - Only the read-only attributes required by the VFIO mdev framework will - be provided:: - - ... name - ... device_api - ... available_instances - ... device_api - - Where: - - * name: - specifies the name of the mediated device type - * device_api: - the mediated device type's API - * available_instances: - the number of vfio_ap mediated passthrough devices - that can be created - * device_api: - specifies the VFIO API - mdev_attr_groups - This attribute group identifies the user-defined sysfs attributes of the - mediated device. When a device is registered with the VFIO mediated device - framework, the sysfs attribute files identified in the 'mdev_attr_groups' - structure will be created in the vfio_ap mediated device's directory. The - sysfs attributes for a vfio_ap mediated device are: - - assign_adapter / unassign_adapter: - Write-only attributes for assigning/unassigning an AP adapter to/from the - vfio_ap mediated device. To assign/unassign an adapter, the APID of the - adapter is echoed into the respective attribute file. - assign_domain / unassign_domain: - Write-only attributes for assigning/unassigning an AP usage domain to/from - the vfio_ap mediated device. To assign/unassign a domain, the domain - number of the usage domain is echoed into the respective attribute - file. - matrix: - A read-only file for displaying the APQNs derived from the Cartesian - product of the adapter and domain numbers assigned to the vfio_ap mediated - device. - guest_matrix: - A read-only file for displaying the APQNs derived from the Cartesian - product of the adapter and domain numbers assigned to the APM and AQM - fields respectively of the KVM guest's CRYCB. This may differ from the - the APQNs assigned to the vfio_ap mediated device if any APQN does not - reference a queue device bound to the vfio_ap device driver (i.e., the - queue is not in the host's AP configuration). - assign_control_domain / unassign_control_domain: - Write-only attributes for assigning/unassigning an AP control domain - to/from the vfio_ap mediated device. To assign/unassign a control domain, - the ID of the domain to be assigned/unassigned is echoed into the - respective attribute file. - control_domains: - A read-only file for displaying the control domain numbers assigned to the - vfio_ap mediated device. - -* functions: - - create: - allocates the ap_matrix_mdev structure used by the vfio_ap driver to: - - * Store the reference to the KVM structure for the guest using the mdev - * Store the AP matrix configuration for the adapters, domains, and control - domains assigned via the corresponding sysfs attributes files - * Store the AP matrix configuration for the adapters, domains and control - domains available to a guest. A guest may not be provided access to APQNs - referencing queue devices that do not exist, or are not bound to the - vfio_ap device driver. - - remove: - deallocates the vfio_ap mediated device's ap_matrix_mdev structure. - This will be allowed only if a running guest is not using the mdev. - -* callback interfaces - - open_device: - The vfio_ap driver uses this callback to register a - VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the matrix mdev - devices. The open_device callback is invoked by userspace to connect the - VFIO iommu group for the matrix mdev device to the MDEV bus. Access to the - KVM structure used to configure the KVM guest is provided via this callback. - The KVM structure, is used to configure the guest's access to the AP matrix - defined via the vfio_ap mediated device's sysfs attribute files. - - close_device: - unregisters the VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the - matrix mdev device and deconfigures the guest's AP matrix. - - ioctl: - this callback handles the VFIO_DEVICE_GET_INFO and VFIO_DEVICE_RESET ioctls - defined by the vfio framework. - -Configure the guest's AP resources ----------------------------------- -Configuring the AP resources for a KVM guest will be performed when the -VFIO_GROUP_NOTIFY_SET_KVM notifier callback is invoked. The notifier -function is called when userspace connects to KVM. The guest's AP resources are -configured via it's APCB by: - -* Setting the bits in the APM corresponding to the APIDs assigned to the - vfio_ap mediated device via its 'assign_adapter' interface. -* Setting the bits in the AQM corresponding to the domains assigned to the - vfio_ap mediated device via its 'assign_domain' interface. -* Setting the bits in the ADM corresponding to the domain dIDs assigned to the - vfio_ap mediated device via its 'assign_control_domains' interface. - -The linux device model precludes passing a device through to a KVM guest that -is not bound to the device driver facilitating its pass-through. Consequently, -an APQN that does not reference a queue device bound to the vfio_ap device -driver will not be assigned to a KVM guest's matrix. The AP architecture, -however, does not provide a means to filter individual APQNs from the guest's -matrix, so the adapters, domains and control domains assigned to vfio_ap -mediated device via its sysfs 'assign_adapter', 'assign_domain' and -'assign_control_domain' interfaces will be filtered before providing the AP -configuration to a guest: - -* The APIDs of the adapters, the APQIs of the domains and the domain numbers of - the control domains assigned to the matrix mdev that are not also assigned to - the host's AP configuration will be filtered. - -* Each APQN derived from the Cartesian product of the APIDs and APQIs assigned - to the vfio_ap mdev is examined and if any one of them does not reference a - queue device bound to the vfio_ap device driver, the adapter will not be - plugged into the guest (i.e., the bit corresponding to its APID will not be - set in the APM of the guest's APCB). - -The CPU model features for AP ------------------------------ -The AP stack relies on the presence of the AP instructions as well as three -facilities: The AP Facilities Test (APFT) facility; the AP Query -Configuration Information (QCI) facility; and the AP Queue Interruption Control -facility. These features/facilities are made available to a KVM guest via the -following CPU model features: - -1. ap: Indicates whether the AP instructions are installed on the guest. This - feature will be enabled by KVM only if the AP instructions are installed - on the host. - -2. apft: Indicates the APFT facility is available on the guest. This facility - can be made available to the guest only if it is available on the host (i.e., - facility bit 15 is set). - -3. apqci: Indicates the AP QCI facility is available on the guest. This facility - can be made available to the guest only if it is available on the host (i.e., - facility bit 12 is set). - -4. apqi: Indicates AP Queue Interruption Control faclity is available on the - guest. This facility can be made available to the guest only if it is - available on the host (i.e., facility bit 65 is set). - -Note: If the user chooses to specify a CPU model different than the 'host' -model to QEMU, the CPU model features and facilities need to be turned on -explicitly; for example:: - - /usr/bin/qemu-system-s390x ... -cpu z13,ap=on,apqci=on,apft=on,apqi=on - -A guest can be precluded from using AP features/facilities by turning them off -explicitly; for example:: - - /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off,apqi=off - -Note: If the APFT facility is turned off (apft=off) for the guest, the guest -will not see any AP devices. The zcrypt device drivers on the guest that -register for type 10 and newer AP devices - i.e., the cex4card and cex4queue -device drivers - need the APFT facility to ascertain the facilities installed on -a given AP device. If the APFT facility is not installed on the guest, then no -adapter or domain devices will get created by the AP bus running on the -guest because only type 10 and newer devices can be configured for guest use. - -Example -======= -Let's now provide an example to illustrate how KVM guests may be given -access to AP facilities. For this example, we will show how to configure -three guests such that executing the lszcrypt command on the guests would -look like this: - -Guest1 ------- -=========== ===== ============ -CARD.DOMAIN TYPE MODE -=========== ===== ============ -05 CEX5C CCA-Coproc -05.0004 CEX5C CCA-Coproc -05.00ab CEX5C CCA-Coproc -06 CEX5A Accelerator -06.0004 CEX5A Accelerator -06.00ab CEX5A Accelerator -=========== ===== ============ - -Guest2 ------- -=========== ===== ============ -CARD.DOMAIN TYPE MODE -=========== ===== ============ -05 CEX5C CCA-Coproc -05.0047 CEX5C CCA-Coproc -05.00ff CEX5C CCA-Coproc -=========== ===== ============ - -Guest3 ------- -=========== ===== ============ -CARD.DOMAIN TYPE MODE -=========== ===== ============ -06 CEX5A Accelerator -06.0047 CEX5A Accelerator -06.00ff CEX5A Accelerator -=========== ===== ============ - -These are the steps: - -1. Install the vfio_ap module on the linux host. The dependency chain for the - vfio_ap module is: - * iommu - * s390 - * zcrypt - * vfio - * vfio_mdev - * vfio_mdev_device - * KVM - - To build the vfio_ap module, the kernel build must be configured with the - following Kconfig elements selected: - * IOMMU_SUPPORT - * S390 - * ZCRYPT - * VFIO - * KVM - - If using make menuconfig select the following to build the vfio_ap module:: - - -> Device Drivers - -> IOMMU Hardware Support - select S390 AP IOMMU Support - -> VFIO Non-Privileged userspace driver framework - -> Mediated device driver frramework - -> VFIO driver for Mediated devices - -> I/O subsystem - -> VFIO support for AP devices - -2. Secure the AP queues to be used by the three guests so that the host can not - access them. To secure them, there are two sysfs files that specify - bitmasks marking a subset of the APQN range as usable only by the default AP - queue device drivers. All remaining APQNs are available for use by - any other device driver. The vfio_ap device driver is currently the only - non-default device driver. The location of the sysfs files containing the - masks are:: - - /sys/bus/ap/apmask - /sys/bus/ap/aqmask - - The 'apmask' is a 256-bit mask that identifies a set of AP adapter IDs - (APID). Each bit in the mask, from left to right, corresponds to an APID from - 0-255. If a bit is set, the APID belongs to the subset of APQNs marked as - available only to the default AP queue device drivers. - - The 'aqmask' is a 256-bit mask that identifies a set of AP queue indexes - (APQI). Each bit in the mask, from left to right, corresponds to an APQI from - 0-255. If a bit is set, the APQI belongs to the subset of APQNs marked as - available only to the default AP queue device drivers. - - The Cartesian product of the APIDs corresponding to the bits set in the - apmask and the APQIs corresponding to the bits set in the aqmask comprise - the subset of APQNs that can be used only by the host default device drivers. - All other APQNs are available to the non-default device drivers such as the - vfio_ap driver. - - Take, for example, the following masks:: - - apmask: - 0x7d00000000000000000000000000000000000000000000000000000000000000 - - aqmask: - 0x8000000000000000000000000000000000000000000000000000000000000000 - - The masks indicate: - - * Adapters 1, 2, 3, 4, 5, and 7 are available for use by the host default - device drivers. - - * Domain 0 is available for use by the host default device drivers - - * The subset of APQNs available for use only by the default host device - drivers are: - - (1,0), (2,0), (3,0), (4.0), (5,0) and (7,0) - - * All other APQNs are available for use by the non-default device drivers. - - The APQN of each AP queue device assigned to the linux host is checked by the - AP bus against the set of APQNs derived from the Cartesian product of APIDs - and APQIs marked as available to the default AP queue device drivers. If a - match is detected, only the default AP queue device drivers will be probed; - otherwise, the vfio_ap device driver will be probed. - - By default, the two masks are set to reserve all APQNs for use by the default - AP queue device drivers. There are two ways the default masks can be changed: - - 1. The sysfs mask files can be edited by echoing a string into the - respective sysfs mask file in one of two formats: - - * An absolute hex string starting with 0x - like "0x12345678" - sets - the mask. If the given string is shorter than the mask, it is padded - with 0s on the right; for example, specifying a mask value of 0x41 is - the same as specifying:: - - 0x4100000000000000000000000000000000000000000000000000000000000000 - - Keep in mind that the mask reads from left to right, so the mask - above identifies device numbers 1 and 7 (01000001). - - If the string is longer than the mask, the operation is terminated with - an error (EINVAL). - - * Individual bits in the mask can be switched on and off by specifying - each bit number to be switched in a comma separated list. Each bit - number string must be prepended with a ('+') or minus ('-') to indicate - the corresponding bit is to be switched on ('+') or off ('-'). Some - valid values are: - - - "+0" switches bit 0 on - - "-13" switches bit 13 off - - "+0x41" switches bit 65 on - - "-0xff" switches bit 255 off - - The following example: - - +0,-6,+0x47,-0xf0 - - Switches bits 0 and 71 (0x47) on - - Switches bits 6 and 240 (0xf0) off - - Note that the bits not specified in the list remain as they were before - the operation. - - 2. The masks can also be changed at boot time via parameters on the kernel - command line like this: - - ap.apmask=0xffff ap.aqmask=0x40 - - This would create the following masks:: - - apmask: - 0xffff000000000000000000000000000000000000000000000000000000000000 - - aqmask: - 0x4000000000000000000000000000000000000000000000000000000000000000 - - Resulting in these two pools:: - - default drivers pool: adapter 0-15, domain 1 - alternate drivers pool: adapter 16-255, domains 0, 2-255 - - **Note:** - Changing a mask such that one or more APQNs will be taken from a vfio_ap - mediated device (see below) will fail with an error (EBUSY). A message - is logged to the kernel ring buffer which can be viewed with the 'dmesg' - command. The output identifies each APQN flagged as 'in use' and identifies - the vfio_ap mediated device to which it is assigned; for example: - - Userspace may not re-assign queue 05.0054 already assigned to 62177883-f1bb-47f0-914d-32a22e3a8804 - Userspace may not re-assign queue 04.0054 already assigned to cef03c3c-903d-4ecc-9a83-40694cb8aee4 - -Securing the APQNs for our example ----------------------------------- - To secure the AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, 06.0047, - 06.00ab, and 06.00ff for use by the vfio_ap device driver, the corresponding - APQNs can be removed from the default masks using either of the following - commands:: - - echo -5,-6 > /sys/bus/ap/apmask - - echo -4,-0x47,-0xab,-0xff > /sys/bus/ap/aqmask - - Or the masks can be set as follows:: - - echo 0xf9ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff \ - > apmask - - echo 0xf7fffffffffffffffeffffffffffffffffffffffffeffffffffffffffffffffe \ - > aqmask - - This will result in AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, - 06.0047, 06.00ab, and 06.00ff getting bound to the vfio_ap device driver. The - sysfs directory for the vfio_ap device driver will now contain symbolic links - to the AP queue devices bound to it:: - - /sys/bus/ap - ... [drivers] - ...... [vfio_ap] - ......... [05.0004] - ......... [05.0047] - ......... [05.00ab] - ......... [05.00ff] - ......... [06.0004] - ......... [06.0047] - ......... [06.00ab] - ......... [06.00ff] - - Keep in mind that only type 10 and newer adapters (i.e., CEX4 and later) - can be bound to the vfio_ap device driver. The reason for this is to - simplify the implementation by not needlessly complicating the design by - supporting older devices that will go out of service in the relatively near - future and for which there are few older systems on which to test. - - The administrator, therefore, must take care to secure only AP queues that - can be bound to the vfio_ap device driver. The device type for a given AP - queue device can be read from the parent card's sysfs directory. For example, - to see the hardware type of the queue 05.0004: - - cat /sys/bus/ap/devices/card05/hwtype - - The hwtype must be 10 or higher (CEX4 or newer) in order to be bound to the - vfio_ap device driver. - -3. Create the mediated devices needed to configure the AP matrixes for the - three guests and to provide an interface to the vfio_ap driver for - use by the guests:: - - /sys/devices/vfio_ap/matrix/ - --- [mdev_supported_types] - ------ [vfio_ap-passthrough] (passthrough vfio_ap mediated device type) - --------- create - --------- [devices] - - To create the mediated devices for the three guests:: - - uuidgen > create - uuidgen > create - uuidgen > create - - or - - echo $uuid1 > create - echo $uuid2 > create - echo $uuid3 > create - - This will create three mediated devices in the [devices] subdirectory named - after the UUID written to the create attribute file. We call them $uuid1, - $uuid2 and $uuid3 and this is the sysfs directory structure after creation:: - - /sys/devices/vfio_ap/matrix/ - --- [mdev_supported_types] - ------ [vfio_ap-passthrough] - --------- [devices] - ------------ [$uuid1] - --------------- assign_adapter - --------------- assign_control_domain - --------------- assign_domain - --------------- matrix - --------------- unassign_adapter - --------------- unassign_control_domain - --------------- unassign_domain - - ------------ [$uuid2] - --------------- assign_adapter - --------------- assign_control_domain - --------------- assign_domain - --------------- matrix - --------------- unassign_adapter - ----------------unassign_control_domain - ----------------unassign_domain - - ------------ [$uuid3] - --------------- assign_adapter - --------------- assign_control_domain - --------------- assign_domain - --------------- matrix - --------------- unassign_adapter - ----------------unassign_control_domain - ----------------unassign_domain - - Note *****: The vfio_ap mdevs do not persist across reboots unless the - mdevctl tool is used to create and persist them. - -4. The administrator now needs to configure the matrixes for the mediated - devices $uuid1 (for Guest1), $uuid2 (for Guest2) and $uuid3 (for Guest3). - - This is how the matrix is configured for Guest1:: - - echo 5 > assign_adapter - echo 6 > assign_adapter - echo 4 > assign_domain - echo 0xab > assign_domain - - Control domains can similarly be assigned using the assign_control_domain - sysfs file. - - If a mistake is made configuring an adapter, domain or control domain, - you can use the unassign_xxx files to unassign the adapter, domain or - control domain. - - To display the matrix configuration for Guest1:: - - cat matrix - - To display the matrix that is or will be assigned to Guest1:: - - cat guest_matrix - - This is how the matrix is configured for Guest2:: - - echo 5 > assign_adapter - echo 0x47 > assign_domain - echo 0xff > assign_domain - - This is how the matrix is configured for Guest3:: - - echo 6 > assign_adapter - echo 0x47 > assign_domain - echo 0xff > assign_domain - - In order to successfully assign an adapter: - - * The adapter number specified must represent a value from 0 up to the - maximum adapter number configured for the system. If an adapter number - higher than the maximum is specified, the operation will terminate with - an error (ENODEV). - - Note: The maximum adapter number can be obtained via the sysfs - /sys/bus/ap/ap_max_adapter_id attribute file. - - * Each APQN derived from the Cartesian product of the APID of the adapter - being assigned and the APQIs of the domains previously assigned: - - - Must only be available to the vfio_ap device driver as specified in the - sysfs /sys/bus/ap/apmask and /sys/bus/ap/aqmask attribute files. If even - one APQN is reserved for use by the host device driver, the operation - will terminate with an error (EADDRNOTAVAIL). - - - Must NOT be assigned to another vfio_ap mediated device. If even one APQN - is assigned to another vfio_ap mediated device, the operation will - terminate with an error (EBUSY). - - - Must NOT be assigned while the sysfs /sys/bus/ap/apmask and - sys/bus/ap/aqmask attribute files are being edited or the operation may - terminate with an error (EBUSY). - - In order to successfully assign a domain: - - * The domain number specified must represent a value from 0 up to the - maximum domain number configured for the system. If a domain number - higher than the maximum is specified, the operation will terminate with - an error (ENODEV). - - Note: The maximum domain number can be obtained via the sysfs - /sys/bus/ap/ap_max_domain_id attribute file. - - * Each APQN derived from the Cartesian product of the APQI of the domain - being assigned and the APIDs of the adapters previously assigned: - - - Must only be available to the vfio_ap device driver as specified in the - sysfs /sys/bus/ap/apmask and /sys/bus/ap/aqmask attribute files. If even - one APQN is reserved for use by the host device driver, the operation - will terminate with an error (EADDRNOTAVAIL). - - - Must NOT be assigned to another vfio_ap mediated device. If even one APQN - is assigned to another vfio_ap mediated device, the operation will - terminate with an error (EBUSY). - - - Must NOT be assigned while the sysfs /sys/bus/ap/apmask and - sys/bus/ap/aqmask attribute files are being edited or the operation may - terminate with an error (EBUSY). - - In order to successfully assign a control domain: - - * The domain number specified must represent a value from 0 up to the maximum - domain number configured for the system. If a control domain number higher - than the maximum is specified, the operation will terminate with an - error (ENODEV). - -5. Start Guest1:: - - /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \ - -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid1 ... - -7. Start Guest2:: - - /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \ - -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid2 ... - -7. Start Guest3:: - - /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on,apqi=on \ - -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid3 ... - -When the guest is shut down, the vfio_ap mediated devices may be removed. - -Using our example again, to remove the vfio_ap mediated device $uuid1:: - - /sys/devices/vfio_ap/matrix/ - --- [mdev_supported_types] - ------ [vfio_ap-passthrough] - --------- [devices] - ------------ [$uuid1] - --------------- remove - -:: - - echo 1 > remove - -This will remove all of the matrix mdev device's sysfs structures including -the mdev device itself. To recreate and reconfigure the matrix mdev device, -all of the steps starting with step 3 will have to be performed again. Note -that the remove will fail if a guest using the vfio_ap mdev is still running. - -It is not necessary to remove a vfio_ap mdev, but one may want to -remove it if no guest will use it during the remaining lifetime of the linux -host. If the vfio_ap mdev is removed, one may want to also reconfigure -the pool of adapters and queues reserved for use by the default drivers. - -Hot plug/unplug support: -======================== -An adapter, domain or control domain may be hot plugged into a running KVM -guest by assigning it to the vfio_ap mediated device being used by the guest if -the following conditions are met: - -* The adapter, domain or control domain must also be assigned to the host's - AP configuration. - -* Each APQN derived from the Cartesian product comprised of the APID of the - adapter being assigned and the APQIs of the domains assigned must reference a - queue device bound to the vfio_ap device driver. - -* To hot plug a domain, each APQN derived from the Cartesian product - comprised of the APQI of the domain being assigned and the APIDs of the - adapters assigned must reference a queue device bound to the vfio_ap device - driver. - -An adapter, domain or control domain may be hot unplugged from a running KVM -guest by unassigning it from the vfio_ap mediated device being used by the -guest. - -Over-provisioning of AP queues for a KVM guest: -=============================================== -Over-provisioning is defined herein as the assignment of adapters or domains to -a vfio_ap mediated device that do not reference AP devices in the host's AP -configuration. The idea here is that when the adapter or domain becomes -available, it will be automatically hot-plugged into the KVM guest using -the vfio_ap mediated device to which it is assigned as long as each new APQN -resulting from plugging it in references a queue device bound to the vfio_ap -device driver. - -Limitations -=========== -Live guest migration is not supported for guests using AP devices without -intervention by a system administrator. Before a KVM guest can be migrated, -the vfio_ap mediated device must be removed. Unfortunately, it can not be -removed manually (i.e., echo 1 > /sys/devices/vfio_ap/matrix/$UUID/remove) while -the mdev is in use by a KVM guest. If the guest is being emulated by QEMU, -its mdev can be hot unplugged from the guest in one of two ways: - -1. If the KVM guest was started with libvirt, you can hot unplug the mdev via - the following commands: - - virsh detach-device <guestname> <path-to-device-xml> - - For example, to hot unplug mdev 62177883-f1bb-47f0-914d-32a22e3a8804 from - the guest named 'my-guest': - - virsh detach-device my-guest ~/config/my-guest-hostdev.xml - - The contents of my-guest-hostdev.xml: - -.. code-block:: xml - - <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'> - <source> - <address uuid='62177883-f1bb-47f0-914d-32a22e3a8804'/> - </source> - </hostdev> - - - virsh qemu-monitor-command <guest-name> --hmp "device-del <device-id>" - - For example, to hot unplug the vfio_ap mediated device identified on the - qemu command line with 'id=hostdev0' from the guest named 'my-guest': - -.. code-block:: sh - - virsh qemu-monitor-command my-guest --hmp "device_del hostdev0" - -2. A vfio_ap mediated device can be hot unplugged by attaching the qemu monitor - to the guest and using the following qemu monitor command: - - (QEMU) device-del id=<device-id> - - For example, to hot unplug the vfio_ap mediated device that was specified - on the qemu command line with 'id=hostdev0' when the guest was started: - - (QEMU) device-del id=hostdev0 - -After live migration of the KVM guest completes, an AP configuration can be -restored to the KVM guest by hot plugging a vfio_ap mediated device on the target -system into the guest in one of two ways: - -1. If the KVM guest was started with libvirt, you can hot plug a matrix mediated - device into the guest via the following virsh commands: - - virsh attach-device <guestname> <path-to-device-xml> - - For example, to hot plug mdev 62177883-f1bb-47f0-914d-32a22e3a8804 into - the guest named 'my-guest': - - virsh attach-device my-guest ~/config/my-guest-hostdev.xml - - The contents of my-guest-hostdev.xml: - -.. code-block:: xml - - <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'> - <source> - <address uuid='62177883-f1bb-47f0-914d-32a22e3a8804'/> - </source> - </hostdev> - - - virsh qemu-monitor-command <guest-name> --hmp \ - "device_add vfio-ap,sysfsdev=<path-to-mdev>,id=<device-id>" - - For example, to hot plug the vfio_ap mediated device - 62177883-f1bb-47f0-914d-32a22e3a8804 into the guest named 'my-guest' with - device-id hostdev0: - - virsh qemu-monitor-command my-guest --hmp \ - "device_add vfio-ap,\ - sysfsdev=/sys/devices/vfio_ap/matrix/62177883-f1bb-47f0-914d-32a22e3a8804,\ - id=hostdev0" - -2. A vfio_ap mediated device can be hot plugged by attaching the qemu monitor - to the guest and using the following qemu monitor command: - - (qemu) device_add "vfio-ap,sysfsdev=<path-to-mdev>,id=<device-id>" - - For example, to plug the vfio_ap mediated device - 62177883-f1bb-47f0-914d-32a22e3a8804 into the guest with the device-id - hostdev0: - - (QEMU) device-add "vfio-ap,\ - sysfsdev=/sys/devices/vfio_ap/matrix/62177883-f1bb-47f0-914d-32a22e3a8804,\ - id=hostdev0" |