diff options
Diffstat (limited to 'Documentation/virt/kvm/api.rst')
-rw-r--r-- | Documentation/virt/kvm/api.rst | 362 |
1 files changed, 356 insertions, 6 deletions
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 22d077562149..c7b165ca70b6 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -688,9 +688,14 @@ MSRs that have been set successfully. Defines the vcpu responses to the cpuid instruction. Applications should use the KVM_SET_CPUID2 ioctl if available. -Note, when this IOCTL fails, KVM gives no guarantees that previous valid CPUID -configuration (if there is) is not corrupted. Userspace can get a copy of the -resulting CPUID configuration through KVM_GET_CPUID2 in case. +Caveat emptor: + - If this IOCTL fails, KVM gives no guarantees that previous valid CPUID + configuration (if there is) is not corrupted. Userspace can get a copy + of the resulting CPUID configuration through KVM_GET_CPUID2 in case. + - Using KVM_SET_CPUID{,2} after KVM_RUN, i.e. changing the guest vCPU model + after running the guest, may cause guest instability. + - Using heterogeneous CPUID configurations, modulo APIC IDs, topology, etc... + may cause guest instability. :: @@ -4803,7 +4808,7 @@ KVM_PV_VM_VERIFY 4.126 KVM_X86_SET_MSR_FILTER ---------------------------- -:Capability: KVM_X86_SET_MSR_FILTER +:Capability: KVM_CAP_X86_MSR_FILTER :Architectures: x86 :Type: vm ioctl :Parameters: struct kvm_msr_filter @@ -5034,6 +5039,260 @@ see KVM_XEN_VCPU_SET_ATTR above. The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used with the KVM_XEN_VCPU_GET_ATTR ioctl. +4.130 KVM_ARM_MTE_COPY_TAGS +--------------------------- + +:Capability: KVM_CAP_ARM_MTE +:Architectures: arm64 +:Type: vm ioctl +:Parameters: struct kvm_arm_copy_mte_tags +:Returns: number of bytes copied, < 0 on error (-EINVAL for incorrect + arguments, -EFAULT if memory cannot be accessed). + +:: + + struct kvm_arm_copy_mte_tags { + __u64 guest_ipa; + __u64 length; + void __user *addr; + __u64 flags; + __u64 reserved[2]; + }; + +Copies Memory Tagging Extension (MTE) tags to/from guest tag memory. The +``guest_ipa`` and ``length`` fields must be ``PAGE_SIZE`` aligned. The ``addr`` +field must point to a buffer which the tags will be copied to or from. + +``flags`` specifies the direction of copy, either ``KVM_ARM_TAGS_TO_GUEST`` or +``KVM_ARM_TAGS_FROM_GUEST``. + +The size of the buffer to store the tags is ``(length / 16)`` bytes +(granules in MTE are 16 bytes long). Each byte contains a single tag +value. This matches the format of ``PTRACE_PEEKMTETAGS`` and +``PTRACE_POKEMTETAGS``. + +If an error occurs before any data is copied then a negative error code is +returned. If some tags have been copied before an error occurs then the number +of bytes successfully copied is returned. If the call completes successfully +then ``length`` is returned. + +4.131 KVM_GET_SREGS2 +------------------ + +:Capability: KVM_CAP_SREGS2 +:Architectures: x86 +:Type: vcpu ioctl +:Parameters: struct kvm_sregs2 (out) +:Returns: 0 on success, -1 on error + +Reads special registers from the vcpu. +This ioctl (when supported) replaces the KVM_GET_SREGS. + +:: + +struct kvm_sregs2 { + /* out (KVM_GET_SREGS2) / in (KVM_SET_SREGS2) */ + struct kvm_segment cs, ds, es, fs, gs, ss; + struct kvm_segment tr, ldt; + struct kvm_dtable gdt, idt; + __u64 cr0, cr2, cr3, cr4, cr8; + __u64 efer; + __u64 apic_base; + __u64 flags; + __u64 pdptrs[4]; +}; + +flags values for ``kvm_sregs2``: + +``KVM_SREGS2_FLAGS_PDPTRS_VALID`` + + Indicates thats the struct contain valid PDPTR values. + + +4.132 KVM_SET_SREGS2 +------------------ + +:Capability: KVM_CAP_SREGS2 +:Architectures: x86 +:Type: vcpu ioctl +:Parameters: struct kvm_sregs2 (in) +:Returns: 0 on success, -1 on error + +Writes special registers into the vcpu. +See KVM_GET_SREGS2 for the data structures. +This ioctl (when supported) replaces the KVM_SET_SREGS. + +4.133 KVM_GET_STATS_FD +---------------------- + +:Capability: KVM_CAP_STATS_BINARY_FD +:Architectures: all +:Type: vm ioctl, vcpu ioctl +:Parameters: none +:Returns: statistics file descriptor on success, < 0 on error + +Errors: + + ====== ====================================================== + ENOMEM if the fd could not be created due to lack of memory + EMFILE if the number of opened files exceeds the limit + ====== ====================================================== + +The returned file descriptor can be used to read VM/vCPU statistics data in +binary format. The data in the file descriptor consists of four blocks +organized as follows: + ++-------------+ +| Header | ++-------------+ +| id string | ++-------------+ +| Descriptors | ++-------------+ +| Stats Data | ++-------------+ + +Apart from the header starting at offset 0, please be aware that it is +not guaranteed that the four blocks are adjacent or in the above order; +the offsets of the id, descriptors and data blocks are found in the +header. However, all four blocks are aligned to 64 bit offsets in the +file and they do not overlap. + +All blocks except the data block are immutable. Userspace can read them +only one time after retrieving the file descriptor, and then use ``pread`` or +``lseek`` to read the statistics repeatedly. + +All data is in system endianness. + +The format of the header is as follows:: + + struct kvm_stats_header { + __u32 flags; + __u32 name_size; + __u32 num_desc; + __u32 id_offset; + __u32 desc_offset; + __u32 data_offset; + }; + +The ``flags`` field is not used at the moment. It is always read as 0. + +The ``name_size`` field is the size (in byte) of the statistics name string +(including trailing '\0') which is contained in the "id string" block and +appended at the end of every descriptor. + +The ``num_desc`` field is the number of descriptors that are included in the +descriptor block. (The actual number of values in the data block may be +larger, since each descriptor may comprise more than one value). + +The ``id_offset`` field is the offset of the id string from the start of the +file indicated by the file descriptor. It is a multiple of 8. + +The ``desc_offset`` field is the offset of the Descriptors block from the start +of the file indicated by the file descriptor. It is a multiple of 8. + +The ``data_offset`` field is the offset of the Stats Data block from the start +of the file indicated by the file descriptor. It is a multiple of 8. + +The id string block contains a string which identifies the file descriptor on +which KVM_GET_STATS_FD was invoked. The size of the block, including the +trailing ``'\0'``, is indicated by the ``name_size`` field in the header. + +The descriptors block is only needed to be read once for the lifetime of the +file descriptor contains a sequence of ``struct kvm_stats_desc``, each followed +by a string of size ``name_size``. + + #define KVM_STATS_TYPE_SHIFT 0 + #define KVM_STATS_TYPE_MASK (0xF << KVM_STATS_TYPE_SHIFT) + #define KVM_STATS_TYPE_CUMULATIVE (0x0 << KVM_STATS_TYPE_SHIFT) + #define KVM_STATS_TYPE_INSTANT (0x1 << KVM_STATS_TYPE_SHIFT) + #define KVM_STATS_TYPE_PEAK (0x2 << KVM_STATS_TYPE_SHIFT) + + #define KVM_STATS_UNIT_SHIFT 4 + #define KVM_STATS_UNIT_MASK (0xF << KVM_STATS_UNIT_SHIFT) + #define KVM_STATS_UNIT_NONE (0x0 << KVM_STATS_UNIT_SHIFT) + #define KVM_STATS_UNIT_BYTES (0x1 << KVM_STATS_UNIT_SHIFT) + #define KVM_STATS_UNIT_SECONDS (0x2 << KVM_STATS_UNIT_SHIFT) + #define KVM_STATS_UNIT_CYCLES (0x3 << KVM_STATS_UNIT_SHIFT) + + #define KVM_STATS_BASE_SHIFT 8 + #define KVM_STATS_BASE_MASK (0xF << KVM_STATS_BASE_SHIFT) + #define KVM_STATS_BASE_POW10 (0x0 << KVM_STATS_BASE_SHIFT) + #define KVM_STATS_BASE_POW2 (0x1 << KVM_STATS_BASE_SHIFT) + + struct kvm_stats_desc { + __u32 flags; + __s16 exponent; + __u16 size; + __u32 offset; + __u32 unused; + char name[]; + }; + +The ``flags`` field contains the type and unit of the statistics data described +by this descriptor. Its endianness is CPU native. +The following flags are supported: + +Bits 0-3 of ``flags`` encode the type: + * ``KVM_STATS_TYPE_CUMULATIVE`` + The statistics data is cumulative. The value of data can only be increased. + Most of the counters used in KVM are of this type. + The corresponding ``size`` field for this type is always 1. + All cumulative statistics data are read/write. + * ``KVM_STATS_TYPE_INSTANT`` + The statistics data is instantaneous. Its value can be increased or + decreased. This type is usually used as a measurement of some resources, + like the number of dirty pages, the number of large pages, etc. + All instant statistics are read only. + The corresponding ``size`` field for this type is always 1. + * ``KVM_STATS_TYPE_PEAK`` + The statistics data is peak. The value of data can only be increased, and + represents a peak value for a measurement, for example the maximum number + of items in a hash table bucket, the longest time waited and so on. + The corresponding ``size`` field for this type is always 1. + +Bits 4-7 of ``flags`` encode the unit: + * ``KVM_STATS_UNIT_NONE`` + There is no unit for the value of statistics data. This usually means that + the value is a simple counter of an event. + * ``KVM_STATS_UNIT_BYTES`` + It indicates that the statistics data is used to measure memory size, in the + unit of Byte, KiByte, MiByte, GiByte, etc. The unit of the data is + determined by the ``exponent`` field in the descriptor. + * ``KVM_STATS_UNIT_SECONDS`` + It indicates that the statistics data is used to measure time or latency. + * ``KVM_STATS_UNIT_CYCLES`` + It indicates that the statistics data is used to measure CPU clock cycles. + +Bits 8-11 of ``flags``, together with ``exponent``, encode the scale of the +unit: + * ``KVM_STATS_BASE_POW10`` + The scale is based on power of 10. It is used for measurement of time and + CPU clock cycles. For example, an exponent of -9 can be used with + ``KVM_STATS_UNIT_SECONDS`` to express that the unit is nanoseconds. + * ``KVM_STATS_BASE_POW2`` + The scale is based on power of 2. It is used for measurement of memory size. + For example, an exponent of 20 can be used with ``KVM_STATS_UNIT_BYTES`` to + express that the unit is MiB. + +The ``size`` field is the number of values of this statistics data. Its +value is usually 1 for most of simple statistics. 1 means it contains an +unsigned 64bit data. + +The ``offset`` field is the offset from the start of Data Block to the start of +the corresponding statistics data. + +The ``unused`` field is reserved for future support for other types of +statistics data, like log/linear histogram. Its value is always 0 for the types +defined above. + +The ``name`` field is the name string of the statistics data. The name string +starts at the end of ``struct kvm_stats_desc``. The maximum length including +the trailing ``'\0'``, is indicated by ``name_size`` in the header. + +The Stats Data block contains an array of 64-bit values in the same order +as the descriptors in Descriptors block. + 5. The kvm_run structure ======================== @@ -6323,6 +6582,7 @@ KVM_RUN_BUS_LOCK flag is used to distinguish between them. This capability can be used to check / enable 2nd DAWR feature provided by POWER10 processor. + 7.24 KVM_CAP_VM_COPY_ENC_CONTEXT_FROM ------------------------------------- @@ -6360,7 +6620,67 @@ system fingerprint. To prevent userspace from circumventing such restrictions by running an enclave in a VM, KVM prevents access to privileged attributes by default. -See Documentation/x86/sgx/2.Kernel-internals.rst for more details. +See Documentation/x86/sgx.rst for more details. + +7.26 KVM_CAP_PPC_RPT_INVALIDATE +------------------------------- + +:Capability: KVM_CAP_PPC_RPT_INVALIDATE +:Architectures: ppc +:Type: vm + +This capability indicates that the kernel is capable of handling +H_RPT_INVALIDATE hcall. + +In order to enable the use of H_RPT_INVALIDATE in the guest, +user space might have to advertise it for the guest. For example, +IBM pSeries (sPAPR) guest starts using it if "hcall-rpt-invalidate" is +present in the "ibm,hypertas-functions" device-tree property. + +This capability is enabled for hypervisors on platforms like POWER9 +that support radix MMU. + +7.27 KVM_CAP_EXIT_ON_EMULATION_FAILURE +-------------------------------------- + +:Architectures: x86 +:Parameters: args[0] whether the feature should be enabled or not + +When this capability is enabled, an emulation failure will result in an exit +to userspace with KVM_INTERNAL_ERROR (except when the emulator was invoked +to handle a VMware backdoor instruction). Furthermore, KVM will now provide up +to 15 instruction bytes for any exit to userspace resulting from an emulation +failure. When these exits to userspace occur use the emulation_failure struct +instead of the internal struct. They both have the same layout, but the +emulation_failure struct matches the content better. It also explicitly +defines the 'flags' field which is used to describe the fields in the struct +that are valid (ie: if KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES is +set in the 'flags' field then both 'insn_size' and 'insn_bytes' have valid data +in them.) + +7.28 KVM_CAP_ARM_MTE +-------------------- + +:Architectures: arm64 +:Parameters: none + +This capability indicates that KVM (and the hardware) supports exposing the +Memory Tagging Extensions (MTE) to the guest. It must also be enabled by the +VMM before creating any VCPUs to allow the guest access. Note that MTE is only +available to a guest running in AArch64 mode and enabling this capability will +cause attempts to create AArch32 VCPUs to fail. + +When enabled the guest is able to access tags associated with any memory given +to the guest. KVM will ensure that the tags are maintained during swap or +hibernation of the host; however the VMM needs to manually save/restore the +tags as appropriate if the VM is migrated. + +When this capability is enabled all memory in memslots must be mapped as +not-shareable (no MAP_SHARED), attempts to create a memslot with a +MAP_SHARED mmap will result in an -EINVAL return. + +When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to +perform a bulk copy of tags to/from the guest. 8. Other capabilities. ====================== @@ -6715,7 +7035,7 @@ accesses that would usually trigger a #GP by KVM into the guest will instead get bounced to user space through the KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR exit notifications. -8.27 KVM_X86_SET_MSR_FILTER +8.27 KVM_CAP_X86_MSR_FILTER --------------------------- :Architectures: x86 @@ -6891,3 +7211,33 @@ This capability is always enabled. This capability indicates that the KVM virtual PTP service is supported in the host. A VMM can check whether the service is available to the guest on migration. + +8.33 KVM_CAP_HYPERV_ENFORCE_CPUID +----------------------------- + +Architectures: x86 + +When enabled, KVM will disable emulated Hyper-V features provided to the +guest according to the bits Hyper-V CPUID feature leaves. Otherwise, all +currently implmented Hyper-V features are provided unconditionally when +Hyper-V identification is set in the HYPERV_CPUID_INTERFACE (0x40000001) +leaf. + +8.34 KVM_CAP_EXIT_HYPERCALL +--------------------------- + +:Capability: KVM_CAP_EXIT_HYPERCALL +:Architectures: x86 +:Type: vm + +This capability, if enabled, will cause KVM to exit to userspace +with KVM_EXIT_HYPERCALL exit reason to process some hypercalls. + +Calling KVM_CHECK_EXTENSION for this capability will return a bitmask +of hypercalls that can be configured to exit to userspace. +Right now, the only such hypercall is KVM_HC_MAP_GPA_RANGE. + +The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset +of the result of KVM_CHECK_EXTENSION. KVM will forward to userspace +the hypercalls whose corresponding bit is in the argument, and return +ENOSYS for the others. |