diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/admin-guide/kernel-parameters.txt | 24 | ||||
-rw-r--r-- | Documentation/virt/coco/sev-guest.rst | 19 | ||||
-rw-r--r-- | Documentation/virt/kvm/api.rst | 169 | ||||
-rw-r--r-- | Documentation/virt/kvm/devices/arm-vgic.rst | 2 | ||||
-rw-r--r-- | Documentation/virt/kvm/halt-polling.rst | 12 | ||||
-rw-r--r-- | Documentation/virt/kvm/x86/amd-memory-encryption.rst | 110 | ||||
-rw-r--r-- | Documentation/virt/kvm/x86/errata.rst | 18 |
7 files changed, 312 insertions, 42 deletions
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 7c19a5fd75e4..530ebe9b11cd 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2722,6 +2722,24 @@ [KVM,ARM,EARLY] Allow use of GICv4 for direct injection of LPIs. + kvm-arm.wfe_trap_policy= + [KVM,ARM] Control when to set WFE instruction trap for + KVM VMs. Traps are allowed but not guaranteed by the + CPU architecture. + + trap: set WFE instruction trap + + notrap: clear WFE instruction trap + + kvm-arm.wfi_trap_policy= + [KVM,ARM] Control when to set WFI instruction trap for + KVM VMs. Traps are allowed but not guaranteed by the + CPU architecture. + + trap: set WFI instruction trap + + notrap: clear WFI instruction trap + kvm_cma_resv_ratio=n [PPC,EARLY] Reserves given percentage from system memory area for contiguous memory allocation for KVM hash pagetable @@ -4036,9 +4054,9 @@ prediction) vulnerability. System may allow data leaks with this option. - no-steal-acc [X86,PV_OPS,ARM64,PPC/PSERIES,RISCV,EARLY] Disable - paravirtualized steal time accounting. steal time is - computed, but won't influence scheduler behaviour + no-steal-acc [X86,PV_OPS,ARM64,PPC/PSERIES,RISCV,LOONGARCH,EARLY] + Disable paravirtualized steal time accounting. steal time + is computed, but won't influence scheduler behaviour nosync [HW,M68K] Disables sync negotiation for all devices. diff --git a/Documentation/virt/coco/sev-guest.rst b/Documentation/virt/coco/sev-guest.rst index 9d00967a5b2b..93debceb6eb0 100644 --- a/Documentation/virt/coco/sev-guest.rst +++ b/Documentation/virt/coco/sev-guest.rst @@ -176,6 +176,25 @@ to SNP_CONFIG command defined in the SEV-SNP spec. The current values of the firmware parameters affected by this command can be queried via SNP_PLATFORM_STATUS. +2.7 SNP_VLEK_LOAD +----------------- +:Technology: sev-snp +:Type: hypervisor ioctl cmd +:Parameters (in): struct sev_user_data_snp_vlek_load +:Returns (out): 0 on success, -negative on error + +When requesting an attestation report a guest is able to specify whether +it wants SNP firmware to sign the report using either a Versioned Chip +Endorsement Key (VCEK), which is derived from chip-unique secrets, or a +Versioned Loaded Endorsement Key (VLEK) which is obtained from an AMD +Key Derivation Service (KDS) and derived from seeds allocated to +enrolled cloud service providers. + +In the case of VLEK keys, the SNP_VLEK_LOAD SNP command is used to load +them into the system after obtaining them from the KDS, and corresponds +closely to the SNP_VLEK_LOAD firmware command specified in the SEV-SNP +spec. + 3. SEV-SNP CPUID Enforcement ============================ diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 899480d4acaf..fe722c5dada9 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -891,12 +891,12 @@ like this:: The irq_type field has the following values: -- irq_type[0]: +- KVM_ARM_IRQ_TYPE_CPU: out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ -- irq_type[1]: +- KVM_ARM_IRQ_TYPE_SPI: in-kernel GIC: SPI, irq_id between 32 and 1019 (incl.) (the vcpu_index field is ignored) -- irq_type[2]: +- KVM_ARM_IRQ_TYPE_PPI: in-kernel GIC: PPI, irq_id between 16 and 31 (incl.) (The irq_id field thus corresponds nicely to the IRQ ID in the ARM GIC specs) @@ -1403,6 +1403,12 @@ Instead, an abort (data abort if the cause of the page-table update was a load or a store, instruction abort if it was an instruction fetch) is injected in the guest. +S390: +^^^^^ + +Returns -EINVAL if the VM has the KVM_VM_S390_UCONTROL flag set. +Returns -EINVAL if called on a protected VM. + 4.36 KVM_SET_TSS_ADDR --------------------- @@ -1921,7 +1927,7 @@ flags: If KVM_MSI_VALID_DEVID is set, devid contains a unique device identifier for the device that wrote the MSI message. For PCI, this is usually a -BFD identifier in the lower 16 bits. +BDF identifier in the lower 16 bits. On x86, address_hi is ignored unless the KVM_X2APIC_API_USE_32BIT_IDS feature of KVM_CAP_X2APIC_API capability is enabled. If it is enabled, @@ -2989,7 +2995,7 @@ flags: If KVM_MSI_VALID_DEVID is set, devid contains a unique device identifier for the device that wrote the MSI message. For PCI, this is usually a -BFD identifier in the lower 16 bits. +BDF identifier in the lower 16 bits. On x86, address_hi is ignored unless the KVM_X2APIC_API_USE_32BIT_IDS feature of KVM_CAP_X2APIC_API capability is enabled. If it is enabled, @@ -6276,6 +6282,12 @@ state. At VM creation time, all memory is shared, i.e. the PRIVATE attribute is '0' for all gfns. Userspace can control whether memory is shared/private by toggling KVM_MEMORY_ATTRIBUTE_PRIVATE via KVM_SET_MEMORY_ATTRIBUTES as needed. +S390: +^^^^^ + +Returns -EINVAL if the VM has the KVM_VM_S390_UCONTROL flag set. +Returns -EINVAL if called on a protected VM. + 4.141 KVM_SET_MEMORY_ATTRIBUTES ------------------------------- @@ -6355,6 +6367,61 @@ a single guest_memfd file, but the bound ranges must not overlap). See KVM_SET_USER_MEMORY_REGION2 for additional details. +4.143 KVM_PRE_FAULT_MEMORY +------------------------ + +:Capability: KVM_CAP_PRE_FAULT_MEMORY +:Architectures: none +:Type: vcpu ioctl +:Parameters: struct kvm_pre_fault_memory (in/out) +:Returns: 0 if at least one page is processed, < 0 on error + +Errors: + + ========== =============================================================== + EINVAL The specified `gpa` and `size` were invalid (e.g. not + page aligned, causes an overflow, or size is zero). + ENOENT The specified `gpa` is outside defined memslots. + EINTR An unmasked signal is pending and no page was processed. + EFAULT The parameter address was invalid. + EOPNOTSUPP Mapping memory for a GPA is unsupported by the + hypervisor, and/or for the current vCPU state/mode. + EIO unexpected error conditions (also causes a WARN) + ========== =============================================================== + +:: + + struct kvm_pre_fault_memory { + /* in/out */ + __u64 gpa; + __u64 size; + /* in */ + __u64 flags; + __u64 padding[5]; + }; + +KVM_PRE_FAULT_MEMORY populates KVM's stage-2 page tables used to map memory +for the current vCPU state. KVM maps memory as if the vCPU generated a +stage-2 read page fault, e.g. faults in memory as needed, but doesn't break +CoW. However, KVM does not mark any newly created stage-2 PTE as Accessed. + +In some cases, multiple vCPUs might share the page tables. In this +case, the ioctl can be called in parallel. + +When the ioctl returns, the input values are updated to point to the +remaining range. If `size` > 0 on return, the caller can just issue +the ioctl again with the same `struct kvm_map_memory` argument. + +Shadow page tables cannot support this ioctl because they +are indexed by virtual address or nested guest physical address. +Calling this ioctl when the guest is using shadow page tables (for +example because it is running a nested guest with nested page tables) +will fail with `EOPNOTSUPP` even if `KVM_CHECK_EXTENSION` reports +the capability to be present. + +`flags` must currently be zero. + + 5. The kvm_run structure ======================== @@ -6419,9 +6486,12 @@ More architecture-specific flags detailing state of the VCPU that may affect the device's behavior. Current defined flags:: /* x86, set if the VCPU is in system management mode */ - #define KVM_RUN_X86_SMM (1 << 0) + #define KVM_RUN_X86_SMM (1 << 0) /* x86, set if bus lock detected in VM */ - #define KVM_RUN_BUS_LOCK (1 << 1) + #define KVM_RUN_X86_BUS_LOCK (1 << 1) + /* x86, set if the VCPU is executing a nested (L2) guest */ + #define KVM_RUN_X86_GUEST_MODE (1 << 2) + /* arm64, set for KVM_EXIT_DEBUG */ #define KVM_DEBUG_ARCH_HSR_HIGH_VALID (1 << 0) @@ -7767,29 +7837,31 @@ Valid bits in args[0] are:: #define KVM_BUS_LOCK_DETECTION_OFF (1 << 0) #define KVM_BUS_LOCK_DETECTION_EXIT (1 << 1) -Enabling this capability on a VM provides userspace with a way to select -a policy to handle the bus locks detected in guest. Userspace can obtain -the supported modes from the result of KVM_CHECK_EXTENSION and define it -through the KVM_ENABLE_CAP. +Enabling this capability on a VM provides userspace with a way to select a +policy to handle the bus locks detected in guest. Userspace can obtain the +supported modes from the result of KVM_CHECK_EXTENSION and define it through +the KVM_ENABLE_CAP. The supported modes are mutually-exclusive. -KVM_BUS_LOCK_DETECTION_OFF and KVM_BUS_LOCK_DETECTION_EXIT are supported -currently and mutually exclusive with each other. More bits can be added in -the future. +This capability allows userspace to force VM exits on bus locks detected in the +guest, irrespective whether or not the host has enabled split-lock detection +(which triggers an #AC exception that KVM intercepts). This capability is +intended to mitigate attacks where a malicious/buggy guest can exploit bus +locks to degrade the performance of the whole system. -With KVM_BUS_LOCK_DETECTION_OFF set, bus locks in guest will not cause vm exits -so that no additional actions are needed. This is the default mode. +If KVM_BUS_LOCK_DETECTION_OFF is set, KVM doesn't force guest bus locks to VM +exit, although the host kernel's split-lock #AC detection still applies, if +enabled. -With KVM_BUS_LOCK_DETECTION_EXIT set, vm exits happen when bus lock detected -in VM. KVM just exits to userspace when handling them. Userspace can enforce -its own throttling or other policy based mitigations. +If KVM_BUS_LOCK_DETECTION_EXIT is set, KVM enables a CPU feature that ensures +bus locks in the guest trigger a VM exit, and KVM exits to userspace for all +such VM exits, e.g. to allow userspace to throttle the offending guest and/or +apply some other policy-based mitigation. When exiting to userspace, KVM sets +KVM_RUN_X86_BUS_LOCK in vcpu-run->flags, and conditionally sets the exit_reason +to KVM_EXIT_X86_BUS_LOCK. -This capability is aimed to address the thread that VM can exploit bus locks to -degree the performance of the whole system. Once the userspace enable this -capability and select the KVM_BUS_LOCK_DETECTION_EXIT mode, KVM will set the -KVM_RUN_BUS_LOCK flag in vcpu-run->flags field and exit to userspace. Concerning -the bus lock vm exit can be preempted by a higher priority VM exit, the exit -notifications to userspace can be KVM_EXIT_BUS_LOCK or other reasons. -KVM_RUN_BUS_LOCK flag is used to distinguish between them. +Note! Detected bus locks may be coincident with other exits to userspace, i.e. +KVM_RUN_X86_BUS_LOCK should be checked regardless of the primary exit reason if +userspace wants to take action on all detected bus locks. 7.23 KVM_CAP_PPC_DAWR1 ---------------------- @@ -7905,10 +7977,10 @@ perform a bulk copy of tags to/from the guest. 7.29 KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM ------------------------------------- -Architectures: x86 SEV enabled -Type: vm -Parameters: args[0] is the fd of the source vm -Returns: 0 on success +:Architectures: x86 SEV enabled +:Type: vm +:Parameters: args[0] is the fd of the source vm +:Returns: 0 on success This capability enables userspace to migrate the encryption context from the VM indicated by the fd to the VM this is called on. @@ -7956,7 +8028,11 @@ The valid bits in cap.args[0] are: When this quirk is disabled, the reset value is 0x10000 (APIC_LVT_MASKED). - KVM_X86_QUIRK_CD_NW_CLEARED By default, KVM clears CR0.CD and CR0.NW. + KVM_X86_QUIRK_CD_NW_CLEARED By default, KVM clears CR0.CD and CR0.NW on + AMD CPUs to workaround buggy guest firmware + that runs in perpetuity with CR0.CD, i.e. + with caches in "no fill" mode. + When this quirk is disabled, KVM does not change the value of CR0.CD and CR0.NW. @@ -8073,6 +8149,37 @@ error/annotated fault. See KVM_EXIT_MEMORY_FAULT for more information. +7.35 KVM_CAP_X86_APIC_BUS_CYCLES_NS +----------------------------------- + +:Architectures: x86 +:Target: VM +:Parameters: args[0] is the desired APIC bus clock rate, in nanoseconds +:Returns: 0 on success, -EINVAL if args[0] contains an invalid value for the + frequency or if any vCPUs have been created, -ENXIO if a virtual + local APIC has not been created using KVM_CREATE_IRQCHIP. + +This capability sets the VM's APIC bus clock frequency, used by KVM's in-kernel +virtual APIC when emulating APIC timers. KVM's default value can be retrieved +by KVM_CHECK_EXTENSION. + +Note: Userspace is responsible for correctly configuring CPUID 0x15, a.k.a. the +core crystal clock frequency, if a non-zero CPUID 0x15 is exposed to the guest. + +7.36 KVM_CAP_X86_GUEST_MODE +------------------------------ + +:Architectures: x86 +:Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP. + +The presence of this capability indicates that KVM_RUN will update the +KVM_RUN_X86_GUEST_MODE bit in kvm_run.flags to indicate whether the +vCPU was executing nested guest code when it exited. + +KVM exits with the register state of either the L1 or L2 guest +depending on which executed at the time of an exit. Userspace must +take care to differentiate between these cases. + 8. Other capabilities. ====================== diff --git a/Documentation/virt/kvm/devices/arm-vgic.rst b/Documentation/virt/kvm/devices/arm-vgic.rst index 40bdeea1d86e..19f0c6756891 100644 --- a/Documentation/virt/kvm/devices/arm-vgic.rst +++ b/Documentation/virt/kvm/devices/arm-vgic.rst @@ -31,7 +31,7 @@ Groups: KVM_VGIC_V2_ADDR_TYPE_CPU (rw, 64-bit) Base address in the guest physical address space of the GIC virtual cpu interface register mappings. Only valid for KVM_DEV_TYPE_ARM_VGIC_V2. - This address needs to be 4K aligned and the region covers 4 KByte. + This address needs to be 4K aligned and the region covers 8 KByte. Errors: diff --git a/Documentation/virt/kvm/halt-polling.rst b/Documentation/virt/kvm/halt-polling.rst index c82a04b709b4..a6790a67e205 100644 --- a/Documentation/virt/kvm/halt-polling.rst +++ b/Documentation/virt/kvm/halt-polling.rst @@ -79,11 +79,11 @@ adjustment of the polling interval. Module Parameters ================= -The kvm module has 3 tuneable module parameters to adjust the global max -polling interval as well as the rate at which the polling interval is grown and -shrunk. These variables are defined in include/linux/kvm_host.h and as module -parameters in virt/kvm/kvm_main.c, or arch/powerpc/kvm/book3s_hv.c in the -powerpc kvm-hv case. +The kvm module has 4 tunable module parameters to adjust the global max polling +interval, the initial value (to grow from 0), and the rate at which the polling +interval is grown and shrunk. These variables are defined in +include/linux/kvm_host.h and as module parameters in virt/kvm/kvm_main.c, or +arch/powerpc/kvm/book3s_hv.c in the powerpc kvm-hv case. +-----------------------+---------------------------+-------------------------+ |Module Parameter | Description | Default Value | @@ -105,7 +105,7 @@ powerpc kvm-hv case. | | grow_halt_poll_ns() | | | | function. | | +-----------------------+---------------------------+-------------------------+ -|halt_poll_ns_shrink | The value by which the | 0 | +|halt_poll_ns_shrink | The value by which the | 2 | | | halt polling interval is | | | | divided in the | | | | shrink_halt_poll_ns() | | diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst index 9677a0714a39..1ddb6a86ce7f 100644 --- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst +++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst @@ -466,6 +466,112 @@ issued by the hypervisor to make the guest ready for execution. Returns: 0 on success, -negative on error +18. KVM_SEV_SNP_LAUNCH_START +---------------------------- + +The KVM_SNP_LAUNCH_START command is used for creating the memory encryption +context for the SEV-SNP guest. It must be called prior to issuing +KVM_SEV_SNP_LAUNCH_UPDATE or KVM_SEV_SNP_LAUNCH_FINISH; + +Parameters (in): struct kvm_sev_snp_launch_start + +Returns: 0 on success, -negative on error + +:: + + struct kvm_sev_snp_launch_start { + __u64 policy; /* Guest policy to use. */ + __u8 gosvw[16]; /* Guest OS visible workarounds. */ + __u16 flags; /* Must be zero. */ + __u8 pad0[6]; + __u64 pad1[4]; + }; + +See SNP_LAUNCH_START in the SEV-SNP specification [snp-fw-abi]_ for further +details on the input parameters in ``struct kvm_sev_snp_launch_start``. + +19. KVM_SEV_SNP_LAUNCH_UPDATE +----------------------------- + +The KVM_SEV_SNP_LAUNCH_UPDATE command is used for loading userspace-provided +data into a guest GPA range, measuring the contents into the SNP guest context +created by KVM_SEV_SNP_LAUNCH_START, and then encrypting/validating that GPA +range so that it will be immediately readable using the encryption key +associated with the guest context once it is booted, after which point it can +attest the measurement associated with its context before unlocking any +secrets. + +It is required that the GPA ranges initialized by this command have had the +KVM_MEMORY_ATTRIBUTE_PRIVATE attribute set in advance. See the documentation +for KVM_SET_MEMORY_ATTRIBUTES for more details on this aspect. + +Upon success, this command is not guaranteed to have processed the entire +range requested. Instead, the ``gfn_start``, ``uaddr``, and ``len`` fields of +``struct kvm_sev_snp_launch_update`` will be updated to correspond to the +remaining range that has yet to be processed. The caller should continue +calling this command until those fields indicate the entire range has been +processed, e.g. ``len`` is 0, ``gfn_start`` is equal to the last GFN in the +range plus 1, and ``uaddr`` is the last byte of the userspace-provided source +buffer address plus 1. In the case where ``type`` is KVM_SEV_SNP_PAGE_TYPE_ZERO, +``uaddr`` will be ignored completely. + +Parameters (in): struct kvm_sev_snp_launch_update + +Returns: 0 on success, < 0 on error, -EAGAIN if caller should retry + +:: + + struct kvm_sev_snp_launch_update { + __u64 gfn_start; /* Guest page number to load/encrypt data into. */ + __u64 uaddr; /* Userspace address of data to be loaded/encrypted. */ + __u64 len; /* 4k-aligned length in bytes to copy into guest memory.*/ + __u8 type; /* The type of the guest pages being initialized. */ + __u8 pad0; + __u16 flags; /* Must be zero. */ + __u32 pad1; + __u64 pad2[4]; + + }; + +where the allowed values for page_type are #define'd as:: + + KVM_SEV_SNP_PAGE_TYPE_NORMAL + KVM_SEV_SNP_PAGE_TYPE_ZERO + KVM_SEV_SNP_PAGE_TYPE_UNMEASURED + KVM_SEV_SNP_PAGE_TYPE_SECRETS + KVM_SEV_SNP_PAGE_TYPE_CPUID + +See the SEV-SNP spec [snp-fw-abi]_ for further details on how each page type is +used/measured. + +20. KVM_SEV_SNP_LAUNCH_FINISH +----------------------------- + +After completion of the SNP guest launch flow, the KVM_SEV_SNP_LAUNCH_FINISH +command can be issued to make the guest ready for execution. + +Parameters (in): struct kvm_sev_snp_launch_finish + +Returns: 0 on success, -negative on error + +:: + + struct kvm_sev_snp_launch_finish { + __u64 id_block_uaddr; + __u64 id_auth_uaddr; + __u8 id_block_en; + __u8 auth_key_en; + __u8 vcek_disabled; + __u8 host_data[32]; + __u8 pad0[3]; + __u16 flags; /* Must be zero */ + __u64 pad1[4]; + }; + + +See SNP_LAUNCH_FINISH in the SEV-SNP specification [snp-fw-abi]_ for further +details on the input parameters in ``struct kvm_sev_snp_launch_finish``. + Device attribute API ==================== @@ -497,9 +603,11 @@ References ========== -See [white-paper]_, [api-spec]_, [amd-apm]_ and [kvm-forum]_ for more info. +See [white-paper]_, [api-spec]_, [amd-apm]_, [kvm-forum]_, and [snp-fw-abi]_ +for more info. .. [white-paper] https://developer.amd.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf .. [api-spec] https://support.amd.com/TechDocs/55766_SEV-KM_API_Specification.pdf .. [amd-apm] https://support.amd.com/TechDocs/24593.pdf (section 15.34) .. [kvm-forum] https://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf +.. [snp-fw-abi] https://www.amd.com/system/files/TechDocs/56860.pdf diff --git a/Documentation/virt/kvm/x86/errata.rst b/Documentation/virt/kvm/x86/errata.rst index 49a05f24747b..4116045a8744 100644 --- a/Documentation/virt/kvm/x86/errata.rst +++ b/Documentation/virt/kvm/x86/errata.rst @@ -48,3 +48,21 @@ have the same physical APIC ID, KVM will deliver events targeting that APIC ID only to the vCPU with the lowest vCPU ID. If KVM_X2APIC_API_USE_32BIT_IDS is not enabled, KVM follows x86 architecture when processing interrupts (all vCPUs matching the target APIC ID receive the interrupt). + +MTRRs +----- +KVM does not virtualize guest MTRR memory types. KVM emulates accesses to MTRR +MSRs, i.e. {RD,WR}MSR in the guest will behave as expected, but KVM does not +honor guest MTRRs when determining the effective memory type, and instead +treats all of guest memory as having Writeback (WB) MTRRs. + +CR0.CD +------ +KVM does not virtualize CR0.CD on Intel CPUs. Similar to MTRR MSRs, KVM +emulates CR0.CD accesses so that loads and stores from/to CR0 behave as +expected, but setting CR0.CD=1 has no impact on the cachaeability of guest +memory. + +Note, this erratum does not affect AMD CPUs, which fully virtualize CR0.CD in +hardware, i.e. put the CPU caches into "no fill" mode when CR0.CD=1, even when +running in the guest.
\ No newline at end of file |