summaryrefslogtreecommitdiff
path: root/arch/x86/kvm/vmx
AgeCommit message (Collapse)AuthorFilesLines
2021-05-07KVM: x86: Move RDPID emulation intercept to its own enumSean Christopherson1-1/+2
Add a dedicated intercept enum for RDPID instead of piggybacking RDTSCP. Unlike VMX's ENABLE_RDTSCP, RDPID is not bound to SVM's RDTSCP intercept. Fixes: fb6d4d340e05 ("KVM: x86: emulate RDPID") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-5-seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-07KVM: VMX: Do not advertise RDPID if ENABLE_RDTSCP control is unsupportedSean Christopherson1-2/+4
Clear KVM's RDPID capability if the ENABLE_RDTSCP secondary exec control is unsupported. Despite being enumerated in a separate CPUID flag, RDPID is bundled under the same VMCS control as RDTSCP and will #UD in VMX non-root if ENABLE_RDTSCP is not enabled. Fixes: 41cd02c6f7f6 ("kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-2-seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-07KVM: nVMX: Always make an attempt to map eVMCS after migrationVitaly Kuznetsov1-10/+19
When enlightened VMCS is in use and nested state is migrated with vmx_get_nested_state()/vmx_set_nested_state() KVM can't map evmcs page right away: evmcs gpa is not 'struct kvm_vmx_nested_state_hdr' and we can't read it from VP assist page because userspace may decide to restore HV_X64_MSR_VP_ASSIST_PAGE after restoring nested state (and QEMU, for example, does exactly that). To make sure eVMCS is mapped /vmx_set_nested_state() raises KVM_REQ_GET_NESTED_STATE_PAGES request. Commit f2c7ef3ba955 ("KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit") added KVM_REQ_GET_NESTED_STATE_PAGES clearing to nested_vmx_vmexit() to make sure MSR permission bitmap is not switched when an immediate exit from L2 to L1 happens right after migration (caused by a pending event, for example). Unfortunately, in the exact same situation we still need to have eVMCS mapped so nested_sync_vmcs12_to_shadow() reflects changes in VMCS12 to eVMCS. As a band-aid, restore nested_get_evmcs_page() when clearing KVM_REQ_GET_NESTED_STATE_PAGES in nested_vmx_vmexit(). The 'fix' is far from being ideal as we can't easily propagate possible failures and even if we could, this is most likely already too late to do so. The whole 'KVM_REQ_GET_NESTED_STATE_PAGES' idea for mapping eVMCS after migration seems to be fragile as we diverge too much from the 'native' path when vmptr loading happens on vmx_set_nested_state(). Fixes: f2c7ef3ba955 ("KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210503150854.1144255-2-vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-05KVM: x86: Consolidate guest enter/exit logic to common helpersSean Christopherson1-37/+2
Move the enter/exit logic in {svm,vmx}_vcpu_enter_exit() to common helpers. Opportunistically update the somewhat stale comment about the updates needing to occur immediately after VM-Exit. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210505002735.1684165-9-seanjc@google.com
2021-05-05KVM: x86: Defer vtime accounting 'til after IRQ handlingWanpeng Li1-3/+3
Defer the call to account guest time until after servicing any IRQ(s) that happened in the guest or immediately after VM-Exit. Tick-based accounting of vCPU time relies on PF_VCPU being set when the tick IRQ handler runs, and IRQs are blocked throughout the main sequence of vcpu_enter_guest(), including the call into vendor code to actually enter and exit the guest. This fixes a bug where reported guest time remains '0', even when running an infinite loop in the guest: https://bugzilla.kernel.org/show_bug.cgi?id=209831 Fixes: 87fa7f3e98a131 ("x86/kvm: Move context tracking where it belongs") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210505002735.1684165-4-seanjc@google.com
2021-05-05KVM/VMX: Invoke NMI non-IST entry instead of IST entryLai Jiangshan1-7/+9
In VMX, the host NMI handler needs to be invoked after NMI VM-Exit. Before commit 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via indirect call instead of INTn"), this was done by INTn ("int $2"). But INTn microcode is relatively expensive, so the commit reworked NMI VM-Exit handling to invoke the kernel handler by function call. But this missed a detail. The NMI entry point for direct invocation is fetched from the IDT table and called on the kernel stack. But on 64-bit the NMI entry installed in the IDT expects to be invoked on the IST stack. It relies on the "NMI executing" variable on the IST stack to work correctly, which is at a fixed position in the IST stack. When the entry point is unexpectedly called on the kernel stack, the RSP-addressed "NMI executing" variable is obviously also on the kernel stack and is "uninitialized" and can cause the NMI entry code to run in the wrong way. Provide a non-ist entry point for VMX which shares the C-function with the regular NMI entry and invoke the new asm entry point instead. On 32-bit this just maps to the regular NMI entry point as 32-bit has no ISTs and is not affected. [ tglx: Made it independent for backporting, massaged changelog ] Fixes: 1a5488ef0dcf6 ("KVM: VMX: Invoke NMI handler via indirect call instead of INTn") Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Lai Jiangshan <laijs@linux.alibaba.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87r1imi8i1.ffs@nanos.tec.linutronix.de
2021-05-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds9-232/+872
Pull kvm updates from Paolo Bonzini: "This is a large update by KVM standards, including AMD PSP (Platform Security Processor, aka "AMD Secure Technology") and ARM CoreSight (debug and trace) changes. ARM: - CoreSight: Add support for ETE and TRBE - Stage-2 isolation for the host kernel when running in protected mode - Guest SVE support when running in nVHE mode - Force W^X hypervisor mappings in nVHE mode - ITS save/restore for guests using direct injection with GICv4.1 - nVHE panics now produce readable backtraces - Guest support for PTP using the ptp_kvm driver - Performance improvements in the S2 fault handler x86: - AMD PSP driver changes - Optimizations and cleanup of nested SVM code - AMD: Support for virtual SPEC_CTRL - Optimizations of the new MMU code: fast invalidation, zap under read lock, enable/disably dirty page logging under read lock - /dev/kvm API for AMD SEV live migration (guest API coming soon) - support SEV virtual machines sharing the same encryption context - support SGX in virtual machines - add a few more statistics - improved directed yield heuristics - Lots and lots of cleanups Generic: - Rework of MMU notifier interface, simplifying and optimizing the architecture-specific code - a handful of "Get rid of oprofile leftovers" patches - Some selftests improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (379 commits) KVM: selftests: Speed up set_memory_region_test selftests: kvm: Fix the check of return value KVM: x86: Take advantage of kvm_arch_dy_has_pending_interrupt() KVM: SVM: Skip SEV cache flush if no ASIDs have been used KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids() KVM: SVM: Drop redundant svm_sev_enabled() helper KVM: SVM: Move SEV VMCB tracking allocation to sev.c KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup() KVM: SVM: Unconditionally invoke sev_hardware_teardown() KVM: SVM: Enable SEV/SEV-ES functionality by default (when supported) KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=y KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variables KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features KVM: SVM: Move SEV module params/variables to sev.c KVM: SVM: Disable SEV/SEV-ES if NPT is disabled KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails KVM: SVM: Zero out the VMCB array used to track SEV ASID association x86/sev: Drop redundant and potentially misleading 'sev_enabled' KVM: x86: Move reverse CPUID helpers to separate header file KVM: x86: Rename GPR accessors to make mode-aware variants the defaults ...
2021-04-26Merge tag 'x86_cleanups_for_v5.13' of ↵Linus Torvalds3-5/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 cleanups from Borislav Petkov: "Trivial cleanups and fixes all over the place" * tag 'x86_cleanups_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: MAINTAINERS: Remove me from IDE/ATAPI section x86/pat: Do not compile stubbed functions when X86_PAT is off x86/asm: Ensure asm/proto.h can be included stand-alone x86/platform/intel/quark: Fix incorrect kernel-doc comment syntax in files x86/msr: Make locally used functions static x86/cacheinfo: Remove unneeded dead-store initialization x86/process/64: Move cpu_current_top_of_stack out of TSS tools/turbostat: Unmark non-kernel-doc comment x86/syscalls: Fix -Wmissing-prototypes warnings from COND_SYSCALL() x86/fpu/math-emu: Fix function cast warning x86/msr: Fix wr/rdmsr_safe_regs_on_cpu() prototypes x86: Fix various typos in comments, take #2 x86: Remove unusual Unicode characters from comments x86/kaslr: Return boolean values from a function returning bool x86: Fix various typos in comments x86/setup: Remove unused RESERVE_BRK_ARRAY() stacktrace: Move documentation for arch_stack_walk_reliable() to header x86: Remove duplicate TSC DEADLINE MSR definitions
2021-04-26KVM: x86: Rename GPR accessors to make mode-aware variants the defaultsSean Christopherson2-16/+16
Append raw to the direct variants of kvm_register_read/write(), and drop the "l" from the mode-aware variants. I.e. make the mode-aware variants the default, and make the direct variants scary sounding so as to discourage use. Accessing the full 64-bit values irrespective of mode is rarely the desired behavior. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: nVMX: Truncate base/index GPR value on address calc in !64-bitSean Christopherson1-2/+2
Drop bits 63:32 of the base and/or index GPRs when calculating the effective address of a VMX instruction memory operand. Outside of 64-bit mode, memory encodings are strictly limited to E*X and below. Fixes: 064aea774768 ("KVM: nVMX: Decoding memory operands of VMX instructions") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: nVMX: Truncate bits 63:32 of VMCS field on nested check in !64-bitSean Christopherson1-1/+1
Drop bits 63:32 of the VMCS field encoding when checking for a nested VM-Exit on VMREAD/VMWRITE in !64-bit mode. VMREAD and VMWRITE always use 32-bit operands outside of 64-bit mode. The actual emulation of VMREAD/VMWRITE does the right thing, this bug is purely limited to incorrectly causing a nested VM-Exit if a GPR happens to have bits 63:32 set outside of 64-bit mode. Fixes: a7cde481b6e8 ("KVM: nVMX: Do not forward VMREAD/VMWRITE VMExits to L1 if required so by vmcs12 vmread/vmwrite bitmaps") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: VMX: Truncate GPR value for DR and CR reads in !64-bit modeSean Christopherson1-3/+3
Drop bits 63:32 when storing a DR/CR to a GPR when the vCPU is not in 64-bit mode. Per the SDM: The operand size for these instructions is always 32 bits in non-64-bit modes, regardless of the operand-size attribute. CR8 technically isn't affected as CR8 isn't accessible outside of 64-bit mode, but fix it up for consistency and to allow for future cleanup. Fixes: 6aa8b732ca01 ("[PATCH] kvm: userspace interface") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422022128.3464144-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: VMX: Intercept FS/GS_BASE MSR accesses for 32-bit KVMSean Christopherson2-0/+6
Disable pass-through of the FS and GS base MSRs for 32-bit KVM. Intel's SDM unequivocally states that the MSRs exist if and only if the CPU supports x86-64. FS_BASE and GS_BASE are mostly a non-issue; a clever guest could opportunistically use the MSRs without issue. KERNEL_GS_BASE is a bigger problem, as a clever guest would subtly be broken if it were migrated, as KVM disallows software access to the MSRs, and unlike the direct variants, KERNEL_GS_BASE needs to be explicitly migrated as it's not captured in the VMCS. Fixes: 25c5f225beda ("KVM: VMX: Enable MSR Bitmap feature") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210422023831.3473491-1-seanjc@google.com> [*NOT* for stable kernels. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-26KVM: VMX: Invert the inlining of MSR interception helpersSean Christopherson2-17/+15
Invert the inline declarations of the MSR interception helpers between the wrapper, vmx_set_intercept_for_msr(), and the core implementations, vmx_{dis,en}able_intercept_for_msr(). Letting the compiler _not_ inline the implementation reduces KVM's code footprint by ~3k bytes. Back when the helpers were added in commit 904e14fb7cb9 ("KVM: VMX: make MSR bitmaps per-VCPU"), both the wrapper and the implementations were __always_inline because the end code distilled down to a few conditionals and a bit operation. Today, the implementations involve a variety of checks and bit ops in order to support userspace MSR filtering. Furthermore, the vast majority of calls to manipulate MSR interception are not performance sensitive, e.g. vCPU creation and x2APIC toggling. On the other hand, the one path that is performance sensitive, dynamic LBR passthrough, uses the wrappers, i.e. is largely untouched by inverting the inlining. In short, forcing the low level MSR interception code to be inlined no longer makes sense. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210423221912.3857243-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-23KVM: VMX: use EPT_VIOLATION_GVA_TRANSLATED instead of 0x100Isaku Yamahata1-1/+1
Use symbolic value, EPT_VIOLATION_GVA_TRANSLATED, instead of 0x100 in handle_ept_violation(). Signed-off-by: Yao Yuan <yuan.yao@intel.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Message-Id: <724e8271ea301aece3eb2afe286a9e2e92a70b18.1619136576.git.isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LCSean Christopherson7-7/+149
Enable SGX virtualization now that KVM has the VM-Exit handlers needed to trap-and-execute ENCLS to ensure correctness and/or enforce the CPU model exposed to the guest. Add a KVM module param, "sgx", to allow an admin to disable SGX virtualization independent of the kernel. When supported in hardware and the kernel, advertise SGX1, SGX2 and SGX LC to userspace via CPUID and wire up the ENCLS_EXITING bitmap based on the guest's SGX capabilities, i.e. to allow ENCLS to be executed in an SGX-enabled guest. With the exception of the provision key, all SGX attribute bits may be exposed to the guest. Guest access to the provision key, which is controlled via securityfs, will be added in a future patch. Note, KVM does not yet support exposing ENCLS_C leafs or ENCLV leafs. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Kai Huang <kai.huang@intel.com> Message-Id: <a99e9c23310c79f2f4175c1af4c4cbcef913c3e5.1618196135.git.kai.huang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20KVM: VMX: Add ENCLS[EINIT] handler to support SGX Launch Control (LC)Sean Christopherson1-0/+64
Add a VM-Exit handler to trap-and-execute EINIT when SGX LC is enabled in the host. When SGX LC is enabled, the host kernel may rewrite the hardware values at will, e.g. to launch enclaves with different signers, thus KVM needs to intercept EINIT to ensure it is executed with the correct LE hash (even if the guest sees a hardwired hash). Switching the LE hash MSRs on VM-Enter/VM-Exit is not a viable option as writing the MSRs is prohibitively expensive, e.g. on SKL hardware each WRMSR is ~400 cycles. And because EINIT takes tens of thousands of cycles to execute, the ~1500 cycle overhead to trap-and-execute EINIT is unlikely to be noticed by the guest, let alone impact its overall SGX performance. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Kai Huang <kai.huang@intel.com> Message-Id: <57c92fa4d2083eb3be9e6355e3882fc90cffea87.1618196135.git.kai.huang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20KVM: VMX: Add emulation of SGX Launch Control LE hash MSRsSean Christopherson4-0/+75
Emulate the four Launch Enclave public key hash MSRs (LE hash MSRs) that exist on CPUs that support SGX Launch Control (LC). SGX LC modifies the behavior of ENCLS[EINIT] to use the LE hash MSRs when verifying the key used to sign an enclave. On CPUs without LC support, the LE hash is hardwired into the CPU to an Intel controlled key (the Intel key is also the reset value of the LE hash MSRs). Track the guest's desired hash so that a future patch can stuff the hash into the hardware MSRs when executing EINIT on behalf of the guest, when those MSRs are writable in host. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Co-developed-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Kai Huang <kai.huang@intel.com> Message-Id: <c58ef601ddf88f3a113add837969533099b1364a.1618196135.git.kai.huang@intel.com> [Add a comment regarding the MSRs being available until SGX is locked. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictionsSean Christopherson1-0/+275
Add an ECREATE handler that will be used to intercept ECREATE for the purpose of enforcing and enclave's MISCSELECT, ATTRIBUTES and XFRM, i.e. to allow userspace to restrict SGX features via CPUID. ECREATE will be intercepted when any of the aforementioned masks diverges from hardware in order to enforce the desired CPUID model, i.e. inject #GP if the guest attempts to set a bit that hasn't been enumerated as allowed-1 in CPUID. Note, access to the PROVISIONKEY is not yet supported. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Co-developed-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Kai Huang <kai.huang@intel.com> Message-Id: <c3a97684f1b71b4f4626a1fc3879472a95651725.1618196135.git.kai.huang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20KVM: VMX: Frame in ENCLS handler for SGX virtualizationSean Christopherson3-3/+71
Introduce sgx.c and sgx.h, along with the framework for handling ENCLS VM-Exits. Add a bool, enable_sgx, that will eventually be wired up to a module param to control whether or not SGX virtualization is enabled at runtime. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Kai Huang <kai.huang@intel.com> Message-Id: <1c782269608b2f5e1034be450f375a8432fb705d.1618196135.git.kai.huang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20KVM: VMX: Add basic handling of VM-Exit from SGX enclaveSean Christopherson2-2/+45
Add support for handling VM-Exits that originate from a guest SGX enclave. In SGX, an "enclave" is a new CPL3-only execution environment, wherein the CPU and memory state is protected by hardware to make the state inaccesible to code running outside of the enclave. When exiting an enclave due to an asynchronous event (from the perspective of the enclave), e.g. exceptions, interrupts, and VM-Exits, the enclave's state is automatically saved and scrubbed (the CPU loads synthetic state), and then reloaded when re-entering the enclave. E.g. after an instruction based VM-Exit from an enclave, vmcs.GUEST_RIP will not contain the RIP of the enclave instruction that trigered VM-Exit, but will instead point to a RIP in the enclave's untrusted runtime (the guest userspace code that coordinates entry/exit to/from the enclave). To help a VMM recognize and handle exits from enclaves, SGX adds bits to existing VMCS fields, VM_EXIT_REASON.VMX_EXIT_REASON_FROM_ENCLAVE and GUEST_INTERRUPTIBILITY_INFO.GUEST_INTR_STATE_ENCLAVE_INTR. Define the new architectural bits, and add a boolean to struct vcpu_vmx to cache VMX_EXIT_REASON_FROM_ENCLAVE. Clear the bit in exit_reason so that checks against exit_reason do not need to account for SGX, e.g. "if (exit_reason == EXIT_REASON_EXCEPTION_NMI)" continues to work. KVM is a largely a passive observer of the new bits, e.g. KVM needs to account for the bits when propagating information to a nested VMM, but otherwise doesn't need to act differently for the majority of VM-Exits from enclaves. The one scenario that is directly impacted is emulation, which is for all intents and purposes impossible[1] since KVM does not have access to the RIP or instruction stream that triggered the VM-Exit. The inability to emulate is a non-issue for KVM, as most instructions that might trigger VM-Exit unconditionally #UD in an enclave (before the VM-Exit check. For the few instruction that conditionally #UD, KVM either never sets the exiting control, e.g. PAUSE_EXITING[2], or sets it if and only if the feature is not exposed to the guest in order to inject a #UD, e.g. RDRAND_EXITING. But, because it is still possible for a guest to trigger emulation, e.g. MMIO, inject a #UD if KVM ever attempts emulation after a VM-Exit from an enclave. This is architecturally accurate for instruction VM-Exits, and for MMIO it's the least bad choice, e.g. it's preferable to killing the VM. In practice, only broken or particularly stupid guests should ever encounter this behavior. Add a WARN in skip_emulated_instruction to detect any attempt to modify the guest's RIP during an SGX enclave VM-Exit as all such flows should either be unreachable or must handle exits from enclaves before getting to skip_emulated_instruction. [1] Impossible for all practical purposes. Not truly impossible since KVM could implement some form of para-virtualization scheme. [2] PAUSE_LOOP_EXITING only affects CPL0 and enclaves exist only at CPL3, so we also don't need to worry about that interaction. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Kai Huang <kai.huang@intel.com> Message-Id: <315f54a8507d09c292463ef29104e1d4c62e9090.1618196135.git.kai.huang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20KVM: vmx: add mismatched size assertions in vmcs_check32()Haiwei Li1-0/+4
Add compile-time assertions in vmcs_check32() to disallow accesses to 64-bit and 64-bit high fields via vmcs_{read,write}32(). Upper level KVM code should never do partial accesses to VMCS fields. KVM handles the split accesses automatically in vmcs_{read,write}64() when running as a 32-bit kernel. Reviewed-and-tested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Haiwei Li <lihaiwei@tencent.com> Message-Id: <20210409022456.23528-1-lihaiwei.kernel@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86: pending exceptions must not be blocked by an injected eventMaxim Levitsky1-2/+8
Injected interrupts/nmi should not block a pending exception, but rather be either lost if nested hypervisor doesn't intercept the pending exception (as in stock x86), or be delivered in exitintinfo/IDT_VECTORING_INFO field, as a part of a VMexit that corresponds to the pending exception. The only reason for an exception to be blocked is when nested run is pending (and that can't really happen currently but still worth checking for). Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401143817.1030695-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86: dump_vmcs should include the autoload/autostore MSR listsDavid Edmondson1-0/+16
When dumping the current VMCS state, include the MSRs that are being automatically loaded/stored during VM entry/exit. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20210318120841.133123-6-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86: dump_vmcs should show the effective EFERDavid Edmondson2-5/+17
If EFER is not being loaded from the VMCS, show the effective value by reference to the MSR autoload list or calculation. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20210318120841.133123-5-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86: dump_vmcs should consider only the load controls of EFER/PATDavid Edmondson1-4/+2
When deciding whether to dump the GUEST_IA32_EFER and GUEST_IA32_PAT fields of the VMCS, examine only the VM entry load controls, as saving on VM exit has no effect on whether VM entry succeeds or fails. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20210318120841.133123-4-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86: dump_vmcs should not conflate EFER and PAT presence in VMCSDavid Edmondson1-9/+10
Show EFER and PAT based on their individual entry/exit controls. Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20210318120841.133123-3-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86: dump_vmcs should not assume GUEST_IA32_EFER is validDavid Edmondson1-6/+3
If the VM entry/exit controls for loading/saving MSR_EFER are either not available (an older processor or explicitly disabled) or not used (host and guest values are the same), reading GUEST_IA32_EFER from the VMCS returns an inaccurate value. Because of this, in dump_vmcs() don't use GUEST_IA32_EFER to decide whether to print the PDPTRs - always do so if the fields exist. Fixes: 4eb64dce8d0a ("KVM: x86: dump VMCS on invalid entry") Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20210318120841.133123-2-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86: Account a variety of miscellaneous allocationsSean Christopherson1-1/+1
Switch to GFP_KERNEL_ACCOUNT for a handful of allocations that are clearly associated with a single task/VM. Note, there are a several SEV allocations that aren't accounted, but those can (hopefully) be fixed by using the local stack for memory. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210331023025.2485960-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-14KVM: VMX: Don't use vcpu->run->internal.ndata as an array indexReiji Watanabe1-5/+5
__vmx_handle_exit() uses vcpu->run->internal.ndata as an index for an array access. Since vcpu->run is (can be) mapped to a user address space with a writer permission, the 'ndata' could be updated by the user process at anytime (the user process can set it to outside the bounds of the array). So, it is not safe that __vmx_handle_exit() uses the 'ndata' that way. Fixes: 1aa561b1a4c0 ("kvm: x86: Add "last CPU" to some KVM_EXIT information") Signed-off-by: Reiji Watanabe <reijiw@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20210413154739.490299-1-reijiw@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-22x86: Fix various typos in comments, take #2Ingo Molnar1-1/+1
Fix another ~42 single-word typos in arch/x86/ code comments, missed a few in the first pass, in particular in .S files. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org
2021-03-18x86: Fix various typos in commentsIngo Molnar2-4/+4
Fix ~144 single-word typos in arch/x86/ code comments. Doing this in a single commit should reduce the churn. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org
2021-03-15KVM: VMX: Track root HPA instead of EPTP for paravirt Hyper-V TLB flushSean Christopherson2-47/+42
Track the address of the top-level EPT struct, a.k.a. the root HPA, instead of the EPTP itself for Hyper-V's paravirt TLB flush. The paravirt API takes only the address, not the full EPTP, and in theory tracking the EPTP could lead to false negatives, e.g. if the HPA matched but the attributes in the EPTP do not. In practice, such a mismatch is extremely unlikely, if not flat out impossible, given how KVM generates the EPTP. Opportunsitically rename the related fields to use the 'root' nomenclature, and to prefix them with 'hv_' to connect them to Hyper-V's paravirt TLB flushing. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-12-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: VMX: Skip additional Hyper-V TLB EPTP flushes if one failsSean Christopherson1-1/+9
Skip additional EPTP flushes if one fails when processing EPTPs for Hyper-V's paravirt TLB flushing. If _any_ flush fails, KVM falls back to a full global flush, i.e. additional flushes are unnecessary (and will likely fail anyways). Continue processing the loop unless a mismatch was already detected, e.g. to handle the case where the first flush fails and there is a yet-to-be-detected mismatch. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: VMX: Define Hyper-V paravirt TLB flush fields iff Hyper-V is enabledSean Christopherson2-1/+8
Ifdef away the Hyper-V specific fields in structs kvm_vmx and vcpu_vmx as each field has only a single reference outside of the struct itself that isn't already wrapped in ifdeffery (and both are initialization). vcpu_vmx.ept_pointer in particular should be wrapped as it is valid if and only if Hyper-v is active, i.e. non-Hyper-V code cannot rely on it to actually track the current EPTP (without additional code changes). Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: VMX: Explicitly check for hv_remote_flush_tlb when loading pgdSean Christopherson1-7/+16
Explicitly check that kvm_x86_ops.tlb_remote_flush() points at Hyper-V's implementation for PV flushing instead of assuming that a non-NULL implementation means running on Hyper-V. Wrap the related logic in ifdeffery as hv_remote_flush_tlb() is defined iff CONFIG_HYPERV!=n. Short term, the explicit check makes it more obvious why a non-NULL tlb_remote_flush() triggers EPTP shenanigans. Long term, this will allow TDX to define its own implementation of tlb_remote_flush() without running afoul of Hyper-V. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: VMX: Don't invalidate hv_tlb_eptp if the new EPTP matchesSean Christopherson1-1/+2
Don't invalidate the common EPTP, and thus trigger rechecking of EPTPs across all vCPUs, if the new EPTP matches the old/common EPTP. In all likelihood this is a meaningless optimization, but there are (uncommon) scenarios where KVM can reload the same EPTP. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: VMX: Invalidate hv_tlb_eptp to denote an EPTP mismatchSean Christopherson2-19/+23
Drop the dedicated 'ept_pointers_match' field in favor of stuffing 'hv_tlb_eptp' with INVALID_PAGE to mark it as invalid, i.e. to denote that there is at least one EPTP mismatch. Use a local variable to track whether or not a mismatch is detected so that hv_tlb_eptp can be used to skip redundant flushes. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: VMX: Do Hyper-V TLB flush iff vCPU's EPTP hasn't been flushedSean Christopherson1-15/+8
Combine the for-loops for Hyper-V TLB EPTP checking and flushing, and in doing so skip flushes for vCPUs whose EPTP matches the target EPTP. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: VMX: Fold Hyper-V EPTP checking into it's only callerSean Christopherson1-24/+20
Fold check_ept_pointer_match() into hv_remote_flush_tlb_with_range() in preparation for combining the kvm_for_each_vcpu loops of the ==CHECK and !=MATCH statements. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: VMX: Stash kvm_vmx in a local variable for Hyper-V paravirt TLB flushSean Christopherson1-6/+7
Capture kvm_vmx in a local variable instead of polluting hv_remote_flush_tlb_with_range() with to_kvm_vmx(kvm). No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: VMX: Track common EPTP for Hyper-V's paravirt TLB flushSean Christopherson2-11/+10
Explicitly track the EPTP that is common to all vCPUs instead of grabbing vCPU0's EPTP when invoking Hyper-V's paravirt TLB flush. Tracking the EPTP will allow optimizing the checks when loading a new EPTP and will also allow dropping ept_pointer_match, e.g. by marking the common EPTP as invalid. This also technically fixes a bug where KVM could theoretically flush an invalid GPA if all vCPUs have an invalid root. In practice, it's likely impossible to trigger a remote TLB flush in such a scenario. In any case, the superfluous flush is completely benign. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: x86: Get active PCID only when writing a CR3 valueSean Christopherson2-9/+7
Retrieve the active PCID only when writing a guest CR3 value, i.e. don't get the PCID when using EPT or NPT. The PCID is especially problematic for EPT as the bits have different meaning, and so the PCID and must be manually stripped, which is annoying and unnecessary. And on VMX, getting the active PCID also involves reading the guest's CR3 and CR4.PCIDE, i.e. may add pointless VMREADs. Opportunistically rename the pgd/pgd_level params to root_hpa and root_level to better reflect their new roles. Keep the function names, as "load the guest PGD" is still accurate/correct. Last, and probably least, pass root_hpa as a hpa_t/u64 instead of an unsigned long. The EPTP holds a 64-bit value, even in 32-bit mode, so in theory EPT could support HIGHMEM for 32-bit KVM. Never mind that doing so would require changing the MMU page allocators and reworking the MMU to use kmap(). Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305183123.3978098-2-seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: x86/mmu: Move logic for setting SPTE masks for EPT into the MMU properSean Christopherson1-18/+2
Let the MMU deal with the SPTE masks to avoid splitting the logic and knowledge across the MMU and VMX. The SPTE masks that are used for EPT are very, very tightly coupled to the MMU implementation. The use of available bits, the existence of A/D types, the fact that shadow_x_mask even exists, and so on and so forth are all baked into the MMU implementation. Cross referencing the params to the masks is also a nightmare, as pretty much every param is a u64. A future patch will make the location of the MMU_WRITABLE and HOST_WRITABLE bits MMU specific, to free up bit 11 for a MMU_PRESENT bit. Doing that change with the current kvm_mmu_set_mask_ptes() would be an absolute mess. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210225204749.1512652-18-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: x86/mmu: Co-locate code for setting various SPTE masksSean Christopherson1-11/+6
Squish all the code for (re)setting the various SPTE masks into one location. With the split code, it's not at all clear that the masks are set once during module initialization. This will allow a future patch to clean up initialization of the masks without shuffling code all over tarnation. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210225204749.1512652-17-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: x86/mmu: Stop using software available bits to denote MMIO SPTEsSean Christopherson1-1/+2
Stop tagging MMIO SPTEs with specific available bits and instead detect MMIO SPTEs by checking for their unique SPTE value. The value is guaranteed to be unique on shadow paging and NPT as setting reserved physical address bits on any other type of SPTE would consistute a KVM bug. Ditto for EPT, as creating a WX non-MMIO would also be a bug. Note, this approach is also future-compatibile with TDX, which will need to reflect MMIO EPT violations as #VEs into the guest. To create an EPT violation instead of a misconfig, TDX EPTs will need to have RWX=0, But, MMIO SPTEs will also be the only case where KVM clears SUPPRESS_VE, so MMIO SPTEs will still be guaranteed to have a unique value within a given MMU context. The main motivation is to make it easier to reason about which types of SPTEs use which available bits. As a happy side effect, this frees up two more bits for storing the MMIO generation. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210225204749.1512652-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: x86: Move RDPMC emulation to common codeSean Christopherson1-9/+1
Move the entirety of the accelerated RDPMC emulation to x86.c, and assign the common handler directly to the exit handler array for VMX. SVM has bizarre nrips behavior that prevents it from directly invoking the common handler. The nrips goofiness will be addressed in a future patch. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: x86: Move trivial instruction-based exit handlers to common codeSean Christopherson1-46/+7
Move the trivial exit handlers, e.g. for instructions that KVM "emulates" as nops, to common x86 code. Assign the common handlers directly to the exit handler arrays and drop the vendor trampolines. Opportunistically use pr_warn_once() where appropriate. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: x86: Move XSETBV emulation to common codeSean Christopherson1-10/+1
Move the entirety of XSETBV emulation to x86.c, and assign the function directly to both VMX's and SVM's exit handlers, i.e. drop the unnecessary trampolines. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: x86: Handle triple fault in L2 without killing L1Sean Christopherson1-0/+9
Synthesize a nested VM-Exit if L2 triggers an emulated triple fault instead of exiting to userspace, which likely will kill L1. Any flow that does KVM_REQ_TRIPLE_FAULT is suspect, but the most common scenario for L2 killing L1 is if L0 (KVM) intercepts a contributory exception that is _not_intercepted by L1. E.g. if KVM is intercepting #GPs for the VMware backdoor, a #GP that occurs in L2 while vectoring an injected #DF will cause KVM to emulate triple fault. Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210302174515.2812275-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>