summaryrefslogtreecommitdiff
path: root/arch/x86/kvm/vmx.c
AgeCommit message (Collapse)AuthorFilesLines
2018-06-16KVM: x86: pass kvm_vcpu to kvm_read_guest_virt and kvm_write_guest_virt_systemPaolo Bonzini1-13/+10
commit ce14e868a54edeb2e30cb7a7b104a2fc4b9d76ca upstream. Int the next patch the emulator's .read_std and .write_std callbacks will grow another argument, which is not needed in kvm_read_guest_virt and kvm_write_guest_virt_system's callers. Since we have to make separate functions, let's give the currently existing names a nicer interface, too. Fixes: 129a72a0d3c8 ("KVM: x86: Introduce segmented_write_std", 2017-01-12) Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-16kvm: nVMX: Enforce cpl=0 for VMX instructionsFelix Wilhelm1-2/+13
commit 727ba748e110b4de50d142edca9d6a9b7e6111d8 upstream. VMX instructions executed inside a L1 VM will always trigger a VM exit even when executed with cpl 3. This means we must perform the privilege check in software. Fixes: 70f3aac964ae("kvm: nVMX: Remove superfluous VMX instruction fault checks") Cc: stable@vger.kernel.org Signed-off-by: Felix Wilhelm <fwilhelm@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-21Merge branch 'speck-v20' of ↵Linus Torvalds1-12/+19
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Merge speculative store buffer bypass fixes from Thomas Gleixner: - rework of the SPEC_CTRL MSR management to accomodate the new fancy SSBD (Speculative Store Bypass Disable) bit handling. - the CPU bug and sysfs infrastructure for the exciting new Speculative Store Bypass 'feature'. - support for disabling SSB via LS_CFG MSR on AMD CPUs including Hyperthread synchronization on ZEN. - PRCTL support for dynamic runtime control of SSB - SECCOMP integration to automatically disable SSB for sandboxed processes with a filter flag for opt-out. - KVM integration to allow guests fiddling with SSBD including the new software MSR VIRT_SPEC_CTRL to handle the LS_CFG based oddities on AMD. - BPF protection against SSB .. this is just the core and x86 side, other architecture support will come separately. * 'speck-v20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits) bpf: Prevent memory disambiguation attack x86/bugs: Rename SSBD_NO to SSB_NO KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG x86/bugs: Rework spec_ctrl base and mask logic x86/bugs: Remove x86_spec_ctrl_set() x86/bugs: Expose x86_spec_ctrl_base directly x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host} x86/speculation: Rework speculative_store_bypass_update() x86/speculation: Add virtualized speculative store bypass disable support x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL x86/speculation: Handle HT correctly on AMD x86/cpufeatures: Add FEATURE_ZEN x86/cpufeatures: Disentangle SSBD enumeration x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP KVM: SVM: Move spec control call after restore of GS x86/cpu: Make alternative_msr_write work for 32-bit code x86/bugs: Fix the parameters alignment and missing void x86/bugs: Make cpu_show_common() static ...
2018-05-17KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBDTom Lendacky1-3/+15
Expose the new virtualized architectural mechanism, VIRT_SSBD, for using speculative store bypass disable (SSBD) under SVM. This will allow guests to use SSBD on hardware that uses non-architectural mechanisms for enabling SSBD. [ tglx: Folded the migration fixup from Paolo Bonzini ] Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-17x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRLThomas Gleixner1-2/+2
AMD is proposing a VIRT_SPEC_CTRL MSR to handle the Speculative Store Bypass Disable via MSR_AMD64_LS_CFG so that guests do not have to care about the bit position of the SSBD bit and thus facilitate migration. Also, the sibling coordination on Family 17H CPUs can only be done on the host. Extend x86_spec_ctrl_set_guest() and x86_spec_ctrl_restore_host() with an extra argument for the VIRT_SPEC_CTRL MSR. Hand in 0 from VMX and in SVM add a new virt_spec_ctrl member to the CPU data structure which is going to be used in later patches for the actual implementation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-17x86/speculation: Use synthetic bits for IBRS/IBPB/STIBPBorislav Petkov1-7/+2
Intel and AMD have different CPUID bits hence for those use synthetic bits which get set on the respective vendor's in init_speculation_control(). So that debacles like what the commit message of c65732e4f721 ("x86/cpu: Restore CPUID_8000_0008_EBX reload") talks about don't happen anymore. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Jörg Otte <jrg.otte@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/20180504161815.GG9257@pd.tnic
2018-05-11KVM: vmx: update sec exec controls for UMIP iff emulating UMIPSean Christopherson1-13/+15
Update SECONDARY_EXEC_DESC for UMIP emulation if and only UMIP is actually being emulated. Skipping the VMCS update eliminates unnecessary VMREAD/VMWRITE when UMIP is supported in hardware, and on platforms that don't have SECONDARY_VM_EXEC_CONTROL. The latter case resolves a bug where KVM would fill the kernel log with warnings due to failed VMWRITEs on older platforms. Fixes: 0367f205a3b7 ("KVM: vmx: add support for emulating UMIP") Cc: stable@vger.kernel.org #4.16 Reported-by: Paolo Zeppegno <pzeppegno@gmail.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Suggested-by: Radim KrÄmář <rkrcmar@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-05-09x86/bugs: Rename _RDS to _SSBDKonrad Rzeszutek Wilk1-3/+3
Intel collateral will reference the SSB mitigation bit in IA32_SPEC_CTL[2] as SSBD (Speculative Store Bypass Disable). Hence changing it. It is unclear yet what the MSR_IA32_ARCH_CAPABILITIES (0x10a) Bit(4) name is going to be. Following the rename it would be SSBD_NO but that rolls out to Speculative Store Bypass Disable No. Also fixed the missing space in X86_FEATURE_AMD_SSBD. [ tglx: Fixup x86_amd_rds_enable() and rds_tif_to_amd_ls_cfg() as well ] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-03x86/speculation: Create spec-ctrl.h to avoid include hellThomas Gleixner1-1/+1
Having everything in nospec-branch.h creates a hell of dependencies when adding the prctl based switching mechanism. Move everything which is not required in nospec-branch.h to spec-ctrl.h and fix up the includes in the relevant files. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Ingo Molnar <mingo@kernel.org>
2018-05-03x86/KVM/VMX: Expose SPEC_CTRL Bit(2) to the guestKonrad Rzeszutek Wilk1-3/+5
Expose the CPUID.7.EDX[31] bit to the guest, and also guard against various combinations of SPEC_CTRL MSR values. The handling of the MSR (to take into account the host value of SPEC_CTRL Bit(2)) is taken care of in patch: KVM/SVM/VMX/x86/spectre_v2: Support the combination of guest and host IBRS Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org>
2018-05-03x86/bugs, KVM: Support the combination of guest and host IBRSKonrad Rzeszutek Wilk1-4/+2
A guest may modify the SPEC_CTRL MSR from the value used by the kernel. Since the kernel doesn't use IBRS, this means a value of zero is what is needed in the host. But the 336996-Speculative-Execution-Side-Channel-Mitigations.pdf refers to the other bits as reserved so the kernel should respect the boot time SPEC_CTRL value and use that. This allows to deal with future extensions to the SPEC_CTRL interface if any at all. Note: This uses wrmsrl() instead of native_wrmsl(). I does not make any difference as paravirt will over-write the callq *0xfff.. with the wrmsrl assembler code. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ingo Molnar <mingo@kernel.org>
2018-04-27kvm: apic: Flush TLB after APIC mode/address change if VPIDs are in useJunaid Shahid1-10/+4
Currently, KVM flushes the TLB after a change to the APIC access page address or the APIC mode when EPT mode is enabled. However, even in shadow paging mode, a TLB flush is needed if VPIDs are being used, as specified in the Intel SDM Section 29.4.5. So replace vmx_flush_tlb_ept_only() with vmx_flush_tlb(), which will flush if either EPT or VPIDs are in use. Signed-off-by: Junaid Shahid <junaids@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-04-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-40/+55
Pull kvm fixes from Paolo Bonzini: "Bug fixes, plus a new test case and the associated infrastructure for writing nested virtualization tests" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: selftests: add vmx_tsc_adjust_test kvm: x86: move MSR_IA32_TSC handling to x86.c X86/KVM: Properly update 'tsc_offset' to represent the running guest kvm: selftests: add -std=gnu99 cflags x86: Add check for APIC access address for vmentry of L2 guests KVM: X86: fix incorrect reference of trace_kvm_pi_irte_update X86/KVM: Do not allow DISABLE_EXITS_MWAIT when LAPIC ARAT is not available kvm: selftests: fix spelling mistake: "divisable" and "divisible" X86/VMX: Disable VMX preemption timer if MWAIT is not intercepted
2018-04-16kvm: x86: move MSR_IA32_TSC handling to x86.cPaolo Bonzini1-20/+0
This is not specific to Intel/AMD anymore. The TSC offset is available in vcpu->arch.tsc_offset. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-16X86/KVM: Properly update 'tsc_offset' to represent the running guestKarimAllah Ahmed1-19/+35
Update 'tsc_offset' on vmentry/vmexit of L2 guests to ensure that it always captures the TSC_OFFSET of the running guest whether it is the L1 or L2 guest. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> [AMD changes, fix update_ia32_tsc_adjust_msr. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-12x86: Add check for APIC access address for vmentry of L2 guestsKrish Sadhukhan1-0/+13
According to the sub-section titled 'VM-Execution Control Fields' in the section titled 'Basic VM-Entry Checks' in Intel SDM vol. 3C, the following vmentry check must be enforced: If the 'virtualize APIC-accesses' VM-execution control is 1, the APIC-access address must satisfy the following checks: - Bits 11:0 of the address must be 0. - The address should not set any bits beyond the processor's physical-address width. This patch adds the necessary check to conform to this rule. If the check fails, we cause the L2 VMENTRY to fail which is what the associated unit test (following patch) expects. Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-11KVM: X86: fix incorrect reference of trace_kvm_pi_irte_updatehu huajun1-1/+1
In arch/x86/kvm/trace.h, this function is declared as host_irq the first input, and vcpu_id the second, instead of otherwise. Signed-off-by: hu huajun <huhuajun@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-10X86/VMX: Disable VMX preemption timer if MWAIT is not interceptedKarimAllah Ahmed1-4/+10
The VMX-preemption timer is used by KVM as a way to set deadlines for the guest (i.e. timer emulation). That was safe till very recently when capability KVM_X86_DISABLE_EXITS_MWAIT to disable intercepting MWAIT was introduced. According to Intel SDM 25.5.1: """ The VMX-preemption timer operates in the C-states C0, C1, and C2; it also operates in the shutdown and wait-for-SIPI states. If the timer counts down to zero in any state other than the wait-for SIPI state, the logical processor transitions to the C0 C-state and causes a VM exit; the timer does not cause a VM exit if it counts down to zero in the wait-for-SIPI state. The timer is not decremented in C-states deeper than C2. """ Now once the guest issues the MWAIT with a c-state deeper than C2 the preemption timer will never wake it up again since it stopped ticking! Usually this is compensated by other activities in the system that would wake the core from the deep C-state (and cause a VMExit). For example, if the host itself is ticking or it received interrupts, etc! So disable the VMX-preemption timer if MWAIT is exposed to the guest! Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Fixes: 4d5422cea3b61f158d58924cbb43feada456ba5c Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-09Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-354/+737
Pull kvm updates from Paolo Bonzini: "ARM: - VHE optimizations - EL2 address space randomization - speculative execution mitigations ("variant 3a", aka execution past invalid privilege register access) - bugfixes and cleanups PPC: - improvements for the radix page fault handler for HV KVM on POWER9 s390: - more kvm stat counters - virtio gpu plumbing - documentation - facilities improvements x86: - support for VMware magic I/O port and pseudo-PMCs - AMD pause loop exiting - support for AMD core performance extensions - support for synchronous register access - expose nVMX capabilities to userspace - support for Hyper-V signaling via eventfd - use Enlightened VMCS when running on Hyper-V - allow userspace to disable MWAIT/HLT/PAUSE vmexits - usual roundup of optimizations and nested virtualization bugfixes Generic: - API selftest infrastructure (though the only tests are for x86 as of now)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (174 commits) kvm: x86: fix a prototype warning kvm: selftests: add sync_regs_test kvm: selftests: add API testing infrastructure kvm: x86: fix a compile warning KVM: X86: Add Force Emulation Prefix for "emulate the next instruction" KVM: X86: Introduce handle_ud() KVM: vmx: unify adjacent #ifdefs x86: kvm: hide the unused 'cpu' variable KVM: VMX: remove bogus WARN_ON in handle_ept_misconfig Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown" kvm: Add emulation for movups/movupd KVM: VMX: raise internal error for exception during invalid protected mode state KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with nested_run_pending KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending KVM: x86: Fix misleading comments on handling pending exceptions KVM: x86: Rename interrupt.pending to interrupt.injected KVM: VMX: No need to clear pending NMI/interrupt on inject realmode interrupt x86/kvm: use Enlightened VMCS when running on Hyper-V x86/hyper-v: detect nested features x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits ...
2018-04-06kvm: x86: fix a prototype warningPeng Hao1-1/+1
Make the function static to avoid a warning: no previous prototype for ‘vmx_enable_tdp’ Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-04KVM: X86: Introduce handle_ud()Wanpeng Li1-8/+2
Introduce handle_ud() to handle invalid opcode, this function will be used by later patches. Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim KrÄmář <rkrcmar@redhat.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-04KVM: vmx: unify adjacent #ifdefsPaolo Bonzini1-7/+3
vmx_save_host_state has multiple ifdefs for CONFIG_X86_64 that have no other code between them. Simplify by reducing them to a single conditional. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-04x86: kvm: hide the unused 'cpu' variableArnd Bergmann1-0/+2
The local variable was newly introduced but is only accessed in one place on x86_64, but not on 32-bit: arch/x86/kvm/vmx.c: In function 'vmx_save_host_state': arch/x86/kvm/vmx.c:2175:6: error: unused variable 'cpu' [-Werror=unused-variable] This puts it into another #ifdef. Fixes: 35060ed6a1ff ("x86/kvm/vmx: avoid expensive rdmsr for MSR_GS_BASE") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-04KVM: VMX: remove bogus WARN_ON in handle_ept_misconfigSean Christopherson1-12/+1
Remove the WARN_ON in handle_ept_misconfig() as it is unnecessary and causes false positives. Return the unmodified result of kvm_mmu_page_fault() instead of converting a system error code to KVM_EXIT_UNKNOWN so that userspace sees the error code of the actual failure, not a generic "we don't know what went wrong". * kvm_mmu_page_fault() will WARN if reserved bits are set in the SPTEs, i.e. it covers the case where an EPT misconfig occurred because of a KVM bug. * The WARN_ON will fire on any system error code that is hit while handling the fault, e.g. -ENOMEM from mmu_topup_memory_caches() while handling a legitmate MMIO EPT misconfig or -EFAULT from kvm_handle_bad_page() if the corresponding HVA is invalid. In either case, userspace should receive the original error code and firing a warning is incorrect behavior as KVM is operating as designed. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-04KVM: VMX: raise internal error for exception during invalid protected mode stateSean Christopherson1-6/+14
Exit to userspace with KVM_INTERNAL_ERROR_EMULATION if we encounter an exception in Protected Mode while emulating guest due to invalid guest state. Unlike Big RM, KVM doesn't support emulating exceptions in PM, i.e. PM exceptions are always injected via the VMCS. Because we will never do VMRESUME due to emulation_required, the exception is never realized and we'll keep emulating the faulting instruction over and over until we receive a signal. Exit to userspace iff there is a pending exception, i.e. don't exit simply on a requested event. The purpose of this check and exit is to aid in debugging a guest that is in all likelihood already doomed. Invalid guest state in PM is extremely limited in normal operation, e.g. it generally only occurs for a few instructions early in BIOS, and any exception at this time is all but guaranteed to be fatal. Non-vectored interrupts, e.g. INIT, SIPI and SMI, can be cleanly handled/emulated, while checking for vectored interrupts, e.g. INTR and NMI, without hitting false positives would add a fair amount of complexity for almost no benefit (getting hit by lightning seems more likely than encountering this specific scenario). Add a WARN_ON_ONCE to vmx_queue_exception() if we try to inject an exception via the VMCS and emulation_required is true. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-04-03Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups and msr updates from Ingo Molnar: "The main change is a performance/latency improvement to /dev/msr access. The rest are misc cleanups" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/msr: Make rdmsrl_safe_on_cpu() scheduling safe as well x86/cpuid: Allow cpuid_read() to schedule x86/msr: Allow rdmsr_safe_on_cpu() to schedule x86/rtc: Stop using deprecated functions x86/dumpstack: Unify show_regs() x86/fault: Do not print IP in show_fault_oops() x86/MSR: Move native_* variants to msr.h
2018-03-30Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-5/+5
Pull KVM fixes from Radim Krčmář: "PPC: - Fix a bug causing occasional machine check exceptions on POWER8 hosts (introduced in 4.16-rc1) x86: - Fix a guest crashing regression with nested VMX and restricted guest (introduced in 4.16-rc1) - Fix dependency check for pv tlb flush (the wrong dependency that effectively disabled the feature was added in 4.16-rc4, the original feature in 4.16-rc1, so it got decent testing)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Fix pv tlb flush dependencies KVM: nVMX: sync vmcs02 segment regs prior to vmx_set_cr0 KVM: PPC: Book3S HV: Fix duplication of host SLB entries
2018-03-28KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with ↵Liran Alon1-8/+0
nested_run_pending When vCPU runs L2 and there is a pending event that requires to exit from L2 to L1 and nested_run_pending=1, vcpu_enter_guest() will request an immediate-exit from L2 (See req_immediate_exit). Since now handling of req_immediate_exit also makes sure to set KVM_REQ_EVENT, there is no need to also set it on vmx_vcpu_run() when nested_run_pending=1. This optimizes cases where VMRESUME was executed by L1 to enter L2 and there is no pending events that require exit from L2 to L1. Previously, this would have set KVM_REQ_EVENT unnecessarly. Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28KVM: x86: Rename interrupt.pending to interrupt.injectedLiran Alon1-1/+1
For exceptions & NMIs events, KVM code use the following coding convention: *) "pending" represents an event that should be injected to guest at some point but it's side-effects have not yet occurred. *) "injected" represents an event that it's side-effects have already occurred. However, interrupts don't conform to this coding convention. All current code flows mark interrupt.pending when it's side-effects have already taken place (For example, bit moved from LAPIC IRR to ISR). Therefore, it makes sense to just rename interrupt.pending to interrupt.injected. This change follows logic of previous commit 664f8e26b00c ("KVM: X86: Fix loss of exception which has not yet been injected") which changed exception to follow this coding convention as well. It is important to note that in case !lapic_in_kernel(vcpu), interrupt.pending usage was and still incorrect. In this case, interrrupt.pending can only be set using one of the following ioctls: KVM_INTERRUPT, KVM_SET_VCPU_EVENTS and KVM_SET_SREGS. Looking at how QEMU uses these ioctls, one can see that QEMU uses them either to re-set an "interrupt.pending" state it has received from KVM (via KVM_GET_VCPU_EVENTS interrupt.pending or via KVM_GET_SREGS interrupt_bitmap) or by dispatching a new interrupt from QEMU's emulated LAPIC which reset bit in IRR and set bit in ISR before sending ioctl to KVM. So it seems that indeed "interrupt.pending" in this case is also suppose to represent "interrupt.injected". However, kvm_cpu_has_interrupt() & kvm_cpu_has_injectable_intr() is misusing (now named) interrupt.injected in order to return if there is a pending interrupt. This leads to nVMX/nSVM not be able to distinguish if it should exit from L2 to L1 on EXTERNAL_INTERRUPT on pending interrupt or should re-inject an injected interrupt. Therefore, add a FIXME at these functions for handling this issue. This patch introduce no semantics change. Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28x86/kvm: use Enlightened VMCS when running on Hyper-VVitaly Kuznetsov1-10/+291
Enlightened VMCS is just a structure in memory, the main benefit besides avoiding somewhat slower VMREAD/VMWRITE is using clean field mask: we tell the underlying hypervisor which fields were modified since VMEXIT so there's no need to inspect them all. Tight CPUID loop test shows significant speedup: Before: 18890 cycles After: 8304 cycles Static key is being used to avoid performance penalty for non-Hyper-V deployments. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28KVM: VMX: Bring the common code to header fileBabu Moger1-42/+9
This patch brings some of the code from vmx to x86.h header file. Now, we can share this code between vmx and svm. Modified couple functions to make it common. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28KVM: VMX: Remove ple_window_actual_maxBabu Moger1-25/+6
Get rid of ple_window_actual_max, because its benefits are really minuscule and the logic is complicated. The overflows(and underflow) are controlled in __ple_window_grow and _ple_window_shrink respectively. Suggested-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Babu Moger <babu.moger@amd.com> [Fixed potential wraparound and change the max to UINT_MAX. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28KVM: VMX: Fix the module parameters for vmxBabu Moger1-13/+14
The vmx module parameters are supposed to be unsigned variants. Also fixed the checkpatch errors like the one below. WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'. +module_param(ple_gap, uint, S_IRUGO); Signed-off-by: Babu Moger <babu.moger@amd.com> [Expanded uint to unsigned int in code. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28KVM: x86: Fix perf timer mode IP reportingAndi Kleen1-2/+2
KVM and perf have a special backdoor mechanism to report the IP for interrupts re-executed after vm exit. This works for the NMIs that perf normally uses. However when perf is in timer mode it doesn't work because the timer interrupt doesn't get this special treatment. This is common when KVM is running nested in another hypervisor which may not implement the PMU, so only timer mode is available. Call the functions to set up the backdoor IP also for non NMI interrupts. I renamed the functions to set up the backdoor IP reporting to be more appropiate for their new use. The SVM change is only compile tested. v2: Moved the functions inline. For the normal interrupt case the before/after functions are now called from x86.c, not arch specific code. For the NMI case we still need to call it in the architecture specific code, because it's already needed in the low level *_run functions. Signed-off-by: Andi Kleen <ak@linux.intel.com> [Removed unnecessary calls from arch handle_external_intr. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-23KVM: VMX: add struct kvm_vmx to hold VMX specific KVM varsSean Christopherson1-15/+31
Add struct kvm_vmx, which wraps struct kvm, and a helper to_kvm_vmx() that retrieves 'struct kvm_vmx *' from 'struct kvm *'. Move the VMX specific variables out of kvm_arch and into kvm_vmx. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-03-23KVM: x86: move setting of ept_identity_map_addr to vmx.cSean Christopherson1-0/+7
Add kvm_x86_ops->set_identity_map_addr and set ept_identity_map_addr in VMX specific code so that ept_identity_map_addr can be moved out of 'struct kvm_arch' in a future patch. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-03-23KVM: x86: define SVM/VMX specific kvm_arch_[alloc|free]_vmSean Christopherson1-0/+12
Define kvm_arch_[alloc|free]_vm in x86 as pass through functions to new kvm_x86_ops vm_alloc and vm_free, and move the current allocation logic as-is to SVM and VMX. Vendor specific alloc/free functions set the stage for SVM/VMX wrappers of 'struct kvm', which will allow us to move the growing number of SVM/VMX specific member variables out of 'struct kvm_arch'. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-03-23KVM: nVMX: sync vmcs02 segment regs prior to vmx_set_cr0Sean Christopherson1-5/+5
Segment registers must be synchronized prior to any code that may trigger a call to emulation_required()/guest_state_valid(), e.g. vmx_set_cr0(). Because preparing vmcs02 writes segmentation fields directly, i.e. doesn't use vmx_set_segment(), emulation_required will not be re-evaluated when synchronizing the segment registers, which can result in L0 incorrectly starting emulation of L2. Fixes: 8665c3f97320 ("KVM: nVMX: initialize descriptor cache fields in prepare_vmcs02_full") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> [Move all of prepare_vmcs02_full earlier, not just segment registers. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-03-21KVM: nVMX: fix vmentry failure code when L2 state would require emulationPaolo Bonzini1-1/+3
Commit 2bb8cafea80b ("KVM: vVMX: signal failure for nested VMEntry if emulation_required", 2018-03-12) introduces a new error path which does not set *entry_failure_code. Fix that to avoid a leak of L0 stack to L1. Reported-by: Radim Krčmář <rkrcmar@redhat.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-03-21kvm/x86: fix icebp instruction handlingLinus Torvalds1-1/+8
The undocumented 'icebp' instruction (aka 'int1') works pretty much like 'int3' in the absense of in-circuit probing equipment (except, obviously, that it raises #DB instead of raising #BP), and is used by some validation test-suites as such. But Andy Lutomirski noticed that his test suite acted differently in kvm than on bare hardware. The reason is that kvm used an inexact test for the icebp instruction: it just assumed that an all-zero VM exit qualification value meant that the VM exit was due to icebp. That is not unlike the guess that do_debug() does for the actual exception handling case, but it's purely a heuristic, not an absolute rule. do_debug() does it because it wants to ascribe _some_ reasons to the #DB that happened, and an empty %dr6 value means that 'icebp' is the most likely casue and we have no better information. But kvm can just do it right, because unlike the do_debug() case, kvm actually sees the real reason for the #DB in the VM-exit interruption information field. So instead of relying on an inexact heuristic, just use the actual VM exit information that says "it was 'icebp'". Right now the 'icebp' instruction isn't technically documented by Intel, but that will hopefully change. The special "privileged software exception" information _is_ actually mentioned in the Intel SDM, even though the cause of it isn't enumerated. Reported-by: Andy Lutomirski <luto@kernel.org> Tested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-03-17x86/kvm/vmx: avoid expensive rdmsr for MSR_GS_BASEVitaly Kuznetsov1-1/+2
vmx_save_host_state() is only called from kvm_arch_vcpu_ioctl_run() so the context is pretty well defined and as we're past 'swapgs' MSR_GS_BASE should contain kernel's GS base which we point to irq_stack_union. Add new kernelmode_gs_base() API, irq_stack_union needs to be exported as KVM can be build as module. Acked-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-03-17x86/kvm/vmx: read MSR_{FS,KERNEL_GS}_BASE from current->threadVitaly Kuznetsov1-3/+10
vmx_save_host_state() is only called from kvm_arch_vcpu_ioctl_run() so the context is pretty well defined. Read MSR_{FS,KERNEL_GS}_BASE from current->thread after calling save_fsgs() which takes care of X86_BUG_NULL_SEG case now and will do RD[FG,GS]BASE when FSGSBASE extensions are exposed to userspace (currently they are not). Acked-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-03-17KVM: X86: Provide a capability to disable PAUSE interceptsWanpeng Li1-4/+13
Allow to disable pause loop exit/pause filtering on a per VM basis. If some VMs have dedicated host CPUs, they won't be negatively affected due to needlessly intercepted PAUSE instructions. Thanks to Jan H. Schönherr's initial patch. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-03-17KVM: X86: Provide a capability to disable HLT interceptsWanpeng Li1-0/+24
If host CPUs are dedicated to a VM, we can avoid VM exits on HLT. This patch adds the per-VM capability to disable them. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-03-17KVM: X86: Provide a capability to disable MWAIT interceptsWanpeng Li1-4/+5
Allowing a guest to execute MWAIT without interception enables a guest to put a (physical) CPU into a power saving state, where it takes longer to return from than what may be desired by the host. Don't give a guest that power over a host by default. (Especially, since nothing prevents a guest from using MWAIT even when it is not advertised via CPUID.) Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-03-17KVM: x86: VMX: Intercept #GP to support access to VMware backdoor portsLiran Alon1-0/+24
If KVM enable_vmware_backdoor module parameter is set, the commit change VMX to now intercept #GP instead of being directly deliviered from CPU to guest. It is done to support access to VMware backdoor I/O ports even if TSS I/O permission denies it. In that case: 1. A #GP will be raised and intercepted. 2. #GP intercept handler will simulate I/O port access instruction. 3. I/O port access instruction simulation will allow access to VMware backdoor ports specifically even if TSS I/O permission bitmap denies it. Note that the above change introduce slight performance hit as now #GPs are not deliviered directly from CPU to guest but instead cause #VMExit and instruction emulation. However, this behavior is introduced only when enable_vmware_backdoor KVM module parameter is set. Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-03-17KVM: x86: add kvm_fast_pio() to consolidate fast PIO codeSean Christopherson1-11/+2
Add kvm_fast_pio() to consolidate duplicate code in VMX and SVM. Unexport kvm_fast_pio_in() and kvm_fast_pio_out(). Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-03-17KVM: VMX: use kvm_fast_pio_in for handling IN I/OSean Christopherson1-3/+6
Fast emulation of processor I/O for IN was disabled on x86 (both VMX and SVM) some years ago due to a buggy implementation. The addition of kvm_fast_pio_in(), used by SVM, re-introduced (functional!) fast emulation of IN. Piggyback SVM's work and use kvm_fast_pio_in() on VMX instead of performing full emulation of IN. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-03-17KVM: vVMX: signal failure for nested VMEntry if emulation_requiredSean Christopherson1-0/+15
Fail a nested VMEntry with EXIT_REASON_INVALID_STATE if L2 guest state is invalid, i.e. vmcs12 contained invalid guest state, and unrestricted guest is disabled in L0 (and by extension disabled in L1). WARN_ON_ONCE in handle_invalid_guest_state() if we're attempting to emulate L2, i.e. nested_run_pending is true, to aid debug in the (hopefully unlikely) scenario that we somehow skip the nested VMEntry consistency check, e.g. due to a L0 bug. Note: KVM relies on hardware to detect the scenario where unrestricted guest is enabled in L0 but disabled in L1 and vmcs12 contains invalid guest state, i.e. checking emulation_required in prepare_vmcs02 is required only to handle the case were unrestricted guest is disabled in L0 since L0 never actually attempts VMLAUNCH/VMRESUME with vmcs02. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-03-17KVM: VMX: WARN on a MOV CR3 exit w/ unrestricted guestSean Christopherson1-0/+2
CR3 load/store exiting are always off when unrestricted guest is enabled. WARN on the associated CR3 VMEXIT to detect code that would re-introduce CR3 load/store exiting for unrestricted guest. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>