diff options
Diffstat (limited to 'Documentation/virt/kvm/api.rst')
-rw-r--r-- | Documentation/virt/kvm/api.rst | 169 |
1 files changed, 138 insertions, 31 deletions
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 899480d4acaf..fe722c5dada9 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -891,12 +891,12 @@ like this:: The irq_type field has the following values: -- irq_type[0]: +- KVM_ARM_IRQ_TYPE_CPU: out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ -- irq_type[1]: +- KVM_ARM_IRQ_TYPE_SPI: in-kernel GIC: SPI, irq_id between 32 and 1019 (incl.) (the vcpu_index field is ignored) -- irq_type[2]: +- KVM_ARM_IRQ_TYPE_PPI: in-kernel GIC: PPI, irq_id between 16 and 31 (incl.) (The irq_id field thus corresponds nicely to the IRQ ID in the ARM GIC specs) @@ -1403,6 +1403,12 @@ Instead, an abort (data abort if the cause of the page-table update was a load or a store, instruction abort if it was an instruction fetch) is injected in the guest. +S390: +^^^^^ + +Returns -EINVAL if the VM has the KVM_VM_S390_UCONTROL flag set. +Returns -EINVAL if called on a protected VM. + 4.36 KVM_SET_TSS_ADDR --------------------- @@ -1921,7 +1927,7 @@ flags: If KVM_MSI_VALID_DEVID is set, devid contains a unique device identifier for the device that wrote the MSI message. For PCI, this is usually a -BFD identifier in the lower 16 bits. +BDF identifier in the lower 16 bits. On x86, address_hi is ignored unless the KVM_X2APIC_API_USE_32BIT_IDS feature of KVM_CAP_X2APIC_API capability is enabled. If it is enabled, @@ -2989,7 +2995,7 @@ flags: If KVM_MSI_VALID_DEVID is set, devid contains a unique device identifier for the device that wrote the MSI message. For PCI, this is usually a -BFD identifier in the lower 16 bits. +BDF identifier in the lower 16 bits. On x86, address_hi is ignored unless the KVM_X2APIC_API_USE_32BIT_IDS feature of KVM_CAP_X2APIC_API capability is enabled. If it is enabled, @@ -6276,6 +6282,12 @@ state. At VM creation time, all memory is shared, i.e. the PRIVATE attribute is '0' for all gfns. Userspace can control whether memory is shared/private by toggling KVM_MEMORY_ATTRIBUTE_PRIVATE via KVM_SET_MEMORY_ATTRIBUTES as needed. +S390: +^^^^^ + +Returns -EINVAL if the VM has the KVM_VM_S390_UCONTROL flag set. +Returns -EINVAL if called on a protected VM. + 4.141 KVM_SET_MEMORY_ATTRIBUTES ------------------------------- @@ -6355,6 +6367,61 @@ a single guest_memfd file, but the bound ranges must not overlap). See KVM_SET_USER_MEMORY_REGION2 for additional details. +4.143 KVM_PRE_FAULT_MEMORY +------------------------ + +:Capability: KVM_CAP_PRE_FAULT_MEMORY +:Architectures: none +:Type: vcpu ioctl +:Parameters: struct kvm_pre_fault_memory (in/out) +:Returns: 0 if at least one page is processed, < 0 on error + +Errors: + + ========== =============================================================== + EINVAL The specified `gpa` and `size` were invalid (e.g. not + page aligned, causes an overflow, or size is zero). + ENOENT The specified `gpa` is outside defined memslots. + EINTR An unmasked signal is pending and no page was processed. + EFAULT The parameter address was invalid. + EOPNOTSUPP Mapping memory for a GPA is unsupported by the + hypervisor, and/or for the current vCPU state/mode. + EIO unexpected error conditions (also causes a WARN) + ========== =============================================================== + +:: + + struct kvm_pre_fault_memory { + /* in/out */ + __u64 gpa; + __u64 size; + /* in */ + __u64 flags; + __u64 padding[5]; + }; + +KVM_PRE_FAULT_MEMORY populates KVM's stage-2 page tables used to map memory +for the current vCPU state. KVM maps memory as if the vCPU generated a +stage-2 read page fault, e.g. faults in memory as needed, but doesn't break +CoW. However, KVM does not mark any newly created stage-2 PTE as Accessed. + +In some cases, multiple vCPUs might share the page tables. In this +case, the ioctl can be called in parallel. + +When the ioctl returns, the input values are updated to point to the +remaining range. If `size` > 0 on return, the caller can just issue +the ioctl again with the same `struct kvm_map_memory` argument. + +Shadow page tables cannot support this ioctl because they +are indexed by virtual address or nested guest physical address. +Calling this ioctl when the guest is using shadow page tables (for +example because it is running a nested guest with nested page tables) +will fail with `EOPNOTSUPP` even if `KVM_CHECK_EXTENSION` reports +the capability to be present. + +`flags` must currently be zero. + + 5. The kvm_run structure ======================== @@ -6419,9 +6486,12 @@ More architecture-specific flags detailing state of the VCPU that may affect the device's behavior. Current defined flags:: /* x86, set if the VCPU is in system management mode */ - #define KVM_RUN_X86_SMM (1 << 0) + #define KVM_RUN_X86_SMM (1 << 0) /* x86, set if bus lock detected in VM */ - #define KVM_RUN_BUS_LOCK (1 << 1) + #define KVM_RUN_X86_BUS_LOCK (1 << 1) + /* x86, set if the VCPU is executing a nested (L2) guest */ + #define KVM_RUN_X86_GUEST_MODE (1 << 2) + /* arm64, set for KVM_EXIT_DEBUG */ #define KVM_DEBUG_ARCH_HSR_HIGH_VALID (1 << 0) @@ -7767,29 +7837,31 @@ Valid bits in args[0] are:: #define KVM_BUS_LOCK_DETECTION_OFF (1 << 0) #define KVM_BUS_LOCK_DETECTION_EXIT (1 << 1) -Enabling this capability on a VM provides userspace with a way to select -a policy to handle the bus locks detected in guest. Userspace can obtain -the supported modes from the result of KVM_CHECK_EXTENSION and define it -through the KVM_ENABLE_CAP. +Enabling this capability on a VM provides userspace with a way to select a +policy to handle the bus locks detected in guest. Userspace can obtain the +supported modes from the result of KVM_CHECK_EXTENSION and define it through +the KVM_ENABLE_CAP. The supported modes are mutually-exclusive. -KVM_BUS_LOCK_DETECTION_OFF and KVM_BUS_LOCK_DETECTION_EXIT are supported -currently and mutually exclusive with each other. More bits can be added in -the future. +This capability allows userspace to force VM exits on bus locks detected in the +guest, irrespective whether or not the host has enabled split-lock detection +(which triggers an #AC exception that KVM intercepts). This capability is +intended to mitigate attacks where a malicious/buggy guest can exploit bus +locks to degrade the performance of the whole system. -With KVM_BUS_LOCK_DETECTION_OFF set, bus locks in guest will not cause vm exits -so that no additional actions are needed. This is the default mode. +If KVM_BUS_LOCK_DETECTION_OFF is set, KVM doesn't force guest bus locks to VM +exit, although the host kernel's split-lock #AC detection still applies, if +enabled. -With KVM_BUS_LOCK_DETECTION_EXIT set, vm exits happen when bus lock detected -in VM. KVM just exits to userspace when handling them. Userspace can enforce -its own throttling or other policy based mitigations. +If KVM_BUS_LOCK_DETECTION_EXIT is set, KVM enables a CPU feature that ensures +bus locks in the guest trigger a VM exit, and KVM exits to userspace for all +such VM exits, e.g. to allow userspace to throttle the offending guest and/or +apply some other policy-based mitigation. When exiting to userspace, KVM sets +KVM_RUN_X86_BUS_LOCK in vcpu-run->flags, and conditionally sets the exit_reason +to KVM_EXIT_X86_BUS_LOCK. -This capability is aimed to address the thread that VM can exploit bus locks to -degree the performance of the whole system. Once the userspace enable this -capability and select the KVM_BUS_LOCK_DETECTION_EXIT mode, KVM will set the -KVM_RUN_BUS_LOCK flag in vcpu-run->flags field and exit to userspace. Concerning -the bus lock vm exit can be preempted by a higher priority VM exit, the exit -notifications to userspace can be KVM_EXIT_BUS_LOCK or other reasons. -KVM_RUN_BUS_LOCK flag is used to distinguish between them. +Note! Detected bus locks may be coincident with other exits to userspace, i.e. +KVM_RUN_X86_BUS_LOCK should be checked regardless of the primary exit reason if +userspace wants to take action on all detected bus locks. 7.23 KVM_CAP_PPC_DAWR1 ---------------------- @@ -7905,10 +7977,10 @@ perform a bulk copy of tags to/from the guest. 7.29 KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM ------------------------------------- -Architectures: x86 SEV enabled -Type: vm -Parameters: args[0] is the fd of the source vm -Returns: 0 on success +:Architectures: x86 SEV enabled +:Type: vm +:Parameters: args[0] is the fd of the source vm +:Returns: 0 on success This capability enables userspace to migrate the encryption context from the VM indicated by the fd to the VM this is called on. @@ -7956,7 +8028,11 @@ The valid bits in cap.args[0] are: When this quirk is disabled, the reset value is 0x10000 (APIC_LVT_MASKED). - KVM_X86_QUIRK_CD_NW_CLEARED By default, KVM clears CR0.CD and CR0.NW. + KVM_X86_QUIRK_CD_NW_CLEARED By default, KVM clears CR0.CD and CR0.NW on + AMD CPUs to workaround buggy guest firmware + that runs in perpetuity with CR0.CD, i.e. + with caches in "no fill" mode. + When this quirk is disabled, KVM does not change the value of CR0.CD and CR0.NW. @@ -8073,6 +8149,37 @@ error/annotated fault. See KVM_EXIT_MEMORY_FAULT for more information. +7.35 KVM_CAP_X86_APIC_BUS_CYCLES_NS +----------------------------------- + +:Architectures: x86 +:Target: VM +:Parameters: args[0] is the desired APIC bus clock rate, in nanoseconds +:Returns: 0 on success, -EINVAL if args[0] contains an invalid value for the + frequency or if any vCPUs have been created, -ENXIO if a virtual + local APIC has not been created using KVM_CREATE_IRQCHIP. + +This capability sets the VM's APIC bus clock frequency, used by KVM's in-kernel +virtual APIC when emulating APIC timers. KVM's default value can be retrieved +by KVM_CHECK_EXTENSION. + +Note: Userspace is responsible for correctly configuring CPUID 0x15, a.k.a. the +core crystal clock frequency, if a non-zero CPUID 0x15 is exposed to the guest. + +7.36 KVM_CAP_X86_GUEST_MODE +------------------------------ + +:Architectures: x86 +:Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP. + +The presence of this capability indicates that KVM_RUN will update the +KVM_RUN_X86_GUEST_MODE bit in kvm_run.flags to indicate whether the +vCPU was executing nested guest code when it exited. + +KVM exits with the register state of either the L1 or L2 guest +depending on which executed at the time of an exit. Userspace must +take care to differentiate between these cases. + 8. Other capabilities. ====================== |