summaryrefslogtreecommitdiff
path: root/arch/x86/kvm
AgeCommit message (Collapse)AuthorFilesLines
2018-08-06kvm: nVMX: Fix fault priority for VMX operationsJim Mattson1-4/+5
When checking emulated VMX instructions for faults, the #UD for "IF (not in VMX operation)" should take precedence over the #GP for "ELSIF CPL > 0." Suggested-by: Eric Northup <digitaleric@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: nVMX: Fix fault vector for VMX operation at CPL > 0Jim Mattson1-2/+2
The fault that should be raised for a privilege level violation is #GP rather than #UD. Fixes: 727ba748e110b4 ("kvm: nVMX: Enforce cpl=0 for VMX instructions") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: vmx: Add tlb_remote_flush callback supportTianyu Lan1-1/+71
Register tlb_remote_flush callback for vmx when hyperv capability of nested guest mapping flush is detected. The interface can help to reduce overhead when flush ept table among vcpus for nested VM. The tradition way is to send IPIs to all affected vcpus and executes INVEPT on each vcpus. It will trigger several vmexits for IPI and INVEPT emulation. Hyper-V provides such hypercall to do flush for all vcpus and call the hypercall when all ept table pointers of single VM are same. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM/MMU: Simplify __kvm_sync_page() functionTianyu Lan1-6/+2
Merge check of "sp->role.cr4_pae != !!is_pae(vcpu))" and "vcpu-> arch.mmu.sync_page(vcpu, sp) == 0". kvm_mmu_prepare_zap_page() is called under both these conditions. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Remove CR3_PCID_INVD flagJunaid Shahid2-3/+3
It is a duplicate of X86_CR3_PCID_NOFLUSH. So just use that instead. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Add multi-entry LRU cache for previous CR3sJunaid Shahid2-37/+86
Adds support for storing multiple previous CR3/root_hpa pairs maintained as an LRU cache, so that the lockless CR3 switch path can be used when switching back to any of them. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Flush only affected TLB entries in kvm_mmu_invlpg*Junaid Shahid3-3/+47
This needs a minor bug fix. The updated patch is as follows. Thanks, Junaid ------------------------------------------------------------------------------ kvm_mmu_invlpg() and kvm_mmu_invpcid_gva() only need to flush the TLB entries for the specific guest virtual address, instead of flushing all TLB entries associated with the VM. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Skip shadow page resync on CR3 switch when indicated by guestJunaid Shahid3-7/+34
When the guest indicates that the TLB doesn't need to be flushed in a CR3 switch, we can also skip resyncing the shadow page tables since an out-of-sync shadow page table is equivalent to an out-of-sync TLB. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Support selectively freeing either current or previous MMU rootJunaid Shahid1-14/+22
kvm_mmu_free_roots() now takes a mask specifying which roots to free, so that either one of the roots (active/previous) can be individually freed when needed. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Add a root_hpa parameter to kvm_mmu->invlpg()Junaid Shahid2-10/+32
This allows invlpg() to be called using either the active root_hpa or the prev_root_hpa. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Skip TLB flush on fast CR3 switch when indicated by guestJunaid Shahid3-16/+35
When PCIDs are enabled, the MSb of the source operand for a MOV-to-CR3 instruction indicates that the TLB doesn't need to be flushed. This change enables this optimization for MOV-to-CR3s in the guest that have been intercepted by KVM for shadow paging and are handled within the fast CR3 switch path. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: vmx: Support INVPCID in shadow paging modeJunaid Shahid2-3/+111
Implement support for INVPCID in shadow paging mode as well. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Propagate guest PCIDs to host PCIDsJunaid Shahid1-1/+16
When using shadow paging mode, propagate the guest's PCID value to the shadow CR3 in the host instead of always using PCID 0. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Add ability to skip TLB flush when switching CR3Junaid Shahid4-6/+2
Remove the implicit flush from the set_cr3 handlers, so that the callers are able to decide whether to flush the TLB or not. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Use fast CR3 switch for nested VMXJunaid Shahid3-9/+15
Use the fast CR3 switch mechanism to locklessly change the MMU root page when switching between L1 and L2. The switch from L2 to L1 should always go through the fast path, while the switch from L1 to L2 should go through the fast path if L1's CR3/EPTP for L2 hasn't changed since the last time. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Support resetting the MMU context without resetting rootsJunaid Shahid2-17/+10
This adds support for re-initializing the MMU context in a different mode while preserving the active root_hpa and the prev_root. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Add support for fast CR3 switch across different MMU modesJunaid Shahid1-6/+15
This generalizes the lockless CR3 switch path to be able to work across different MMU modes (e.g. nested vs non-nested) by checking that the expected page role of the new root page matches the page role of the previously stored root page in addition to checking that the new CR3 matches the previous CR3. Furthermore, instead of loading the hardware CR3 in fast_cr3_switch(), it is now done in vcpu_enter_guest(), as by that time the MMU context would be up-to-date with the VCPU mode. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Introduce KVM_REQ_LOAD_CR3Junaid Shahid3-2/+10
The KVM_REQ_LOAD_CR3 request loads the hardware CR3 using the current root_hpa. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Introduce kvm_mmu_calc_root_page_role()Junaid Shahid1-27/+85
These functions factor out the base role calculation from the corresponding kvm_init_*_mmu() functions. The new functions return what would be the role assigned to a root page in the current VCPU state. This can be masked with mmu_base_role_mask to derive the base role. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Add fast CR3 switch code pathJunaid Shahid2-6/+60
When using shadow paging, a CR3 switch in the guest results in a VM Exit. In the common case, that VM exit doesn't require much processing by KVM. However, it does acquire the MMU lock, which can start showing signs of contention under some workloads even on a 2 VCPU VM when the guest is using KPTI. Therefore, we add a fast path that avoids acquiring the MMU lock in the most common cases e.g. when switching back and forth between the kernel and user mode CR3s used by KPTI with no guest page table changes in between. For now, this fast path is implemented only for 64-bit guests and hosts to avoid the handling of PDPTEs, but it can be extended later to 32-bit guests and/or hosts as well. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Avoid taking MMU lock in kvm_mmu_sync_roots if no sync is neededJunaid Shahid1-8/+67
kvm_mmu_sync_roots() can locklessly check whether a sync is needed and just bail out if it isn't. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: x86: Make sync_page() flush remote TLBs once onlyJunaid Shahid2-8/+20
sync_page() calls set_spte() from a loop across a page table. It would work better if set_spte() left the TLB flushing to its callers, so that sync_page() can aggregate into a single call. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: MMU: drop vcpu param in gpte_accessPeter Xu1-5/+5
It's never used. Drop it. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: nVMX: Separate logic allocating shadow vmcs to a functionLiran Alon1-9/+28
No functionality change. This is done as a preparation for VMCS shadowing virtualization. Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: VMX: Mark vmcs header as shadow in case alloc_vmcs_cpu() allocate ↵Liran Alon1-8/+8
shadow vmcs No functionality change. Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: nVMX: Expose VMCS shadowing to L1 guestLiran Alon1-0/+9
Expose VMCS shadowing to L1 as a VMX capability of the virtual CPU, whether or not VMCS shadowing is supported by the physical CPU. (VMCS shadowing emulation) Shadowed VMREADs and VMWRITEs from L2 are handled by L0, without a VM-exit to L1. Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: nVMX: Do not forward VMREAD/VMWRITE VMExits to L1 if required so by ↵Liran Alon1-2/+31
vmcs12 vmread/vmwrite bitmaps This is done as a preparation for VMCS shadowing emulation. Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: nVMX: vmread/vmwrite: Use shadow vmcs12 if running L2Liran Alon1-12/+49
This is done as a preparation to VMCS shadowing emulation. Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: nVMX: include shadow vmcs12 in nested statePaolo Bonzini1-1/+30
The shadow vmcs12 cannot be flushed on KVM_GET_NESTED_STATE, because at that point guest memory is assumed by userspace to be immutable. Capture the cache in vmx_get_nested_state, adding another page at the end if there is an active shadow vmcs12. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: nVMX: Cache shadow vmcs12 on VMEntry and flush to memory on VMExitLiran Alon1-0/+74
This is done is done as a preparation to VMCS shadowing emulation. Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: nVMX: Verify VMCS shadowing VMCS link pointerLiran Alon1-2/+28
Intel SDM considers these checks to be part of "Checks on Guest Non-Register State". Note that it is legal for vmcs->vmcs_link_pointer to be -1ull when VMCS shadowing is enabled. In this case, any VMREAD/VMWRITE to shadowed-field sets the ALU flags for VMfailInvalid (i.e. CF=1). Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: nVMX: Verify VMCS shadowing controlsLiran Alon1-0/+16
Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: nVMX: Introduce nested_cpu_has_shadow_vmcs()Liran Alon1-1/+6
Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: nVMX: Fail VMLAUNCH and VMRESUME on shadow VMCSLiran Alon1-0/+11
Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: nVMX: Allow VMPTRLD for shadow VMCS if vCPU supports VMCS shadowingLiran Alon1-1/+8
Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: VMX: Change vmcs12_{read,write}_any() to receive vmcs12 as parameterLiran Alon1-10/+11
No functionality change. This is done as a preparation for VMCS shadowing emulation. Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: VMX: Create struct for VMCS headerLiran Alon1-9/+15
No functionality change. Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: nVMX: Introduce KVM_CAP_NESTED_STATEJim Mattson2-2/+227
For nested virtualization L0 KVM is managing a bit of state for L2 guests, this state can not be captured through the currently available IOCTLs. In fact the state captured through all of these IOCTLs is usually a mix of L1 and L2 state. It is also dependent on whether the L2 guest was running at the moment when the process was interrupted to save its state. With this capability, there are two new vcpu ioctls: KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE. These can be used for saving and restoring a VM that is in VMX operation. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Jim Mattson <jmattson@google.com> [karahmed@ - rename structs and functions and make them ready for AMD and address previous comments. - handle nested.smm state. - rebase & a bit of refactoring. - Merge 7/8 and 8/8 into one patch. ] Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: x86: do not load vmcs12 pages while still in SMMPaolo Bonzini2-18/+38
If the vCPU enters system management mode while running a nested guest, RSM starts processing the vmentry while still in SMM. In that case, however, the pages pointed to by the vmcs12 might be incorrectly loaded from SMRAM. To avoid this, delay the handling of the pages until just before the next vmentry. This is done with a new request and a new entry in kvm_x86_ops, which we will be able to reuse for nested VMX state migration. Extracted from a patch by Jim Mattson and KarimAllah Ahmed. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: x86: ensure all MSRs can always be KVM_GET/SET_MSR'dPaolo Bonzini3-14/+30
Some of the MSRs returned by GET_MSR_INDEX_LIST currently cannot be sent back to KVM_GET_MSR and/or KVM_SET_MSR; either they can never be sent back, or you they are only accepted under special conditions. This makes the API a pain to use. To avoid this pain, this patch makes it so that the result of the get-list ioctl can always be used for host-initiated get and set. Since we don't have a separate way to check for read-only MSRs, this means some Hyper-V MSRs are ignored when written. Arguably they should not even be in the result of GET_MSR_INDEX_LIST, but I am leaving there in case userspace is using the outcome of GET_MSR_INDEX_LIST to derive the support for the corresponding Hyper-V feature. Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06KVM: vmx: remove save/restore of host BNDCGFS MSRSean Christopherson1-5/+6
Linux does not support Memory Protection Extensions (MPX) in the kernel itself, thus the BNDCFGS (Bound Config Supervisor) MSR will always be zero in the KVM host, i.e. RDMSR in vmx_save_host_state() is superfluous. KVM unconditionally sets VM_EXIT_CLEAR_BNDCFGS, i.e. BNDCFGS will always be zero after VMEXIT, thus manually loading BNDCFGS is also superfluous. And in the event the MPX kernel support is added (unlikely given that MPX for userspace is in its death throes[1]), BNDCFGS will likely be common across all CPUs[2], and at the least shouldn't change on a regular basis, i.e. saving the MSR on every VMENTRY is completely unnecessary. WARN_ONCE in hardware_setup() if the host's BNDCFGS is non-zero to document that KVM does not preserve BNDCFGS and to serve as a hint as to how BNDCFGS likely should be handled if MPX is used in the kernel, e.g. BNDCFGS should be saved once during KVM setup. [1] https://lkml.org/lkml/2018/4/27/1046 [2] http://www.openwall.com/lists/kernel-hardening/2017/07/24/28 Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds3-21/+46
Pull kvm fixes from Paolo Bonzini: "Miscellaneous bugfixes, plus a small patchlet related to Spectre v2" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvmclock: fix TSC calibration for nested guests KVM: VMX: Mark VMXArea with revision_id of physical CPU even when eVMCS enabled KVM: irqfd: fix race between EPOLLHUP and irq_bypass_register_consumer KVM/Eventfd: Avoid crash when assign and deassign specific eventfd in parallel. x86/kvmclock: set pvti_cpu0_va after enabling kvmclock x86/kvm/Kconfig: Ensure CRYPTO_DEV_CCP_DD state at minimum matches KVM_AMD kvm: nVMX: Restore exit qual for VM-entry failure due to MSR loading x86/kvm/vmx: don't read current->thread.{fs,gs}base of legacy tasks KVM: VMX: support MSR_IA32_ARCH_CAPABILITIES as a feature MSR
2018-07-18KVM: VMX: Mark VMXArea with revision_id of physical CPU even when eVMCS enabledLiran Alon1-6/+21
When eVMCS is enabled, all VMCS allocated to be used by KVM are marked with revision_id of KVM_EVMCS_VERSION instead of revision_id reported by MSR_IA32_VMX_BASIC. However, even though not explictly documented by TLFS, VMXArea passed as VMXON argument should still be marked with revision_id reported by physical CPU. This issue was found by the following setup: * L0 = KVM which expose eVMCS to it's L1 guest. * L1 = KVM which consume eVMCS reported by L0. This setup caused the following to occur: 1) L1 execute hardware_enable(). 2) hardware_enable() calls kvm_cpu_vmxon() to execute VMXON. 3) L0 intercept L1 VMXON and execute handle_vmon() which notes vmxarea->revision_id != VMCS12_REVISION and therefore fails with nested_vmx_failInvalid() which sets RFLAGS.CF. 4) L1 kvm_cpu_vmxon() don't check RFLAGS.CF for failure and therefore hardware_enable() continues as usual. 5) L1 hardware_enable() then calls ept_sync_global() which executes INVEPT. 6) L0 intercept INVEPT and execute handle_invept() which notes !vmx->nested.vmxon and thus raise a #UD to L1. 7) Raised #UD caused L1 to panic. Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Cc: stable@vger.kernel.org Fixes: 773e8a0425c923bc02668a2d6534a5ef5a43cc69 Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-15x86/kvm/Kconfig: Ensure CRYPTO_DEV_CCP_DD state at minimum matches KVM_AMDJanakarajan Natarajan1-1/+1
Prevent a config where KVM_AMD=y and CRYPTO_DEV_CCP_DD=m thereby ensuring that AMD Secure Processor device driver will be built-in when KVM_AMD is also built-in. v1->v2: * Removed usage of 'imply' Kconfig option. * Change patch commit message. Fixes: 505c9e94d832 ("KVM: x86: prefer "depends on" to "select" for SEV") Cc: <stable@vger.kernel.org> # 4.16.x Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-15kvm: nVMX: Restore exit qual for VM-entry failure due to MSR loadingJim Mattson1-5/+4
This exit qualification was inadvertently dropped when the two VM-entry failure blocks were coalesced. Fixes: e79f245ddec1 ("X86/KVM: Properly update 'tsc_offset' to represent the running guest") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-15x86/kvm/vmx: don't read current->thread.{fs,gs}base of legacy tasksVitaly Kuznetsov1-8/+17
When we switched from doing rdmsr() to reading FS/GS base values from current->thread we completely forgot about legacy 32-bit userspaces which we still support in KVM (why?). task->thread.{fsbase,gsbase} are only synced for 64-bit processes, calling save_fsgs_for_kvm() and using its result from current is illegal for legacy processes. There's no ARCH_SET_FS/GS prctls for legacy applications. Base MSRs are, however, not always equal to zero. Intel's manual says (3.4.4 Segment Loading Instructions in IA-32e Mode): "In order to set up compatibility mode for an application, segment-load instructions (MOV to Sreg, POP Sreg) work normally in 64-bit mode. An entry is read from the system descriptor table (GDT or LDT) and is loaded in the hidden portion of the segment register. ... The hidden descriptor register fields for FS.base and GS.base are physically mapped to MSRs in order to load all address bits supported by a 64-bit implementation. " The issue was found by strace test suite where 32-bit ioctl_kvm_run test started segfaulting. Reported-by: Dmitry V. Levin <ldv@altlinux.org> Bisected-by: Masatake YAMATO <yamato@redhat.com> Fixes: 42b933b59721 ("x86/kvm/vmx: read MSR_{FS,KERNEL_GS}_BASE from current->thread") Cc: stable@vger.kernel.org Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-15KVM: VMX: support MSR_IA32_ARCH_CAPABILITIES as a feature MSRPaolo Bonzini1-1/+3
This lets userspace read the MSR_IA32_ARCH_CAPABILITIES and check that all requested features are available on the host. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-22kvm: vmx: Nested VM-entry prereqs for event inj.Marc Orr2-0/+76
This patch extends the checks done prior to a nested VM entry. Specifically, it extends the check_vmentry_prereqs function with checks for fields relevant to the VM-entry event injection information, as described in the Intel SDM, volume 3. This patch is motivated by a syzkaller bug, where a bad VM-entry interruption information field is generated in the VMCS02, which causes the nested VM launch to fail. Then, KVM fails to resume L1. While KVM should be improved to correctly resume L1 execution after a failed nested launch, this change is justified because the existing code to resume L1 is flaky/ad-hoc and the test coverage for resuming L1 is sparse. Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Marc Orr <marcorr@google.com> [Removed comment whose parts were describing previous revisions and the rest was obvious from function/variable naming. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-06-14KVM: x86: VMX: redo fix for link error without CONFIG_HYPERVArnd Bergmann1-4/+2
Arnd had sent this patch to the KVM mailing list, but it slipped through the cracks of maintainers hand-off, and therefore wasn't included in the pull request. The same issue had been fixed by Linus in commit dbee3d0 ("KVM: x86: VMX: fix build without hyper-v", 2018-06-12) as a self-described "quick-and-hacky build fix". However, checking the compile-time configuration symbol with IS_ENABLED is cleaner and it is enough to avoid the link error, so switch to Arnd's solution. Signed-off-by: Arnd Bergmann <arnd@arndb.de> [Rewritten commit message. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-14KVM: x86: fix typo at kvm_arch_hardware_setup commentMarcelo Tosatti1-1/+1
Fix typo in sentence about min value calculation. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>