summaryrefslogtreecommitdiff
path: root/arch/x86/kvm/vmx.c
AgeCommit message (Collapse)AuthorFilesLines
2014-04-28KVM: x86: Check for host supported fields in shadow vmcsBandan Das1-12/+41
We track shadow vmcs fields through two static lists, one for read only and another for r/w fields. However, with addition of new vmcs fields, not all fields may be supported on all hosts. If so, copy_vmcs12_to_shadow() trying to vmwrite on unsupported hosts will result in a vmwrite error. For example, commit 36be0b9deb23161 introduced GUEST_BNDCFGS, which is not supported by all processors. Filter out host unsupported fields before letting guests use shadow vmcs Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-04-23KVM: nVMX: Advertise support for interrupt acknowledgementBandan Das1-2/+3
Some Type 1 hypervisors such as XEN won't enable VMX without it present Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-23KVM: nVMX: Ack and write vector info to intr_info if L1 asks us toBandan Das1-0/+18
This feature emulates the "Acknowledge interrupt on exit" behavior. We can safely emulate it for L1 to run L2 even if L0 itself has it disabled (to run L1). Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-23KVM: nVMX: Don't advertise single context invalidation for inveptBandan Das1-10/+5
For single context invalidation, we fall through to global invalidation in handle_invept() except for one case - when the operand supplied by L1 is different from what we have in vmcs12. However, typically hypervisors will only call invept for the currently loaded eptp, so the condition will never be true. Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-23KVM: VMX: Advance rip to after an ICEBP instructionHuw Davies1-0/+3
When entering an exception after an ICEBP, the saved instruction pointer should point to after the instruction. This fixes the bug here: https://bugs.launchpad.net/qemu/+bug/1119686 Signed-off-by: Huw Davies <huw@codeweavers.com> Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-17KVM: VMX: speed up wildcard MMIO EVENTFDMichael S. Tsirkin1-0/+4
With KVM, MMIO is much slower than PIO, due to the need to do page walk and emulation. But with EPT, it does not have to be: we know the address from the VMCS so if the address is unique, we can look up the eventfd directly, bypassing emulation. Unfortunately, this only works if userspace does not need to match on access length and data. The implementation adds a separate FAST_MMIO bus internally. This serves two purposes: - minimize overhead for old userspace that does not use eventfd with lengtth = 0 - minimize disruption in other code (since we don't know the length, devices on the MMIO bus only get a valid address in write, this way we don't need to touch all devices to teach them to handle an invalid length) At the moment, this optimization only has effect for EPT on x86. It will be possible to speed up MMIO for NPT and MMU using the same idea in the future. With this patch applied, on VMX MMIO EVENTFD is essentially as fast as PIO. I was unable to detect any measureable slowdown to non-eventfd MMIO. Making MMIO faster is important for the upcoming virtio 1.0 which includes an MMIO signalling capability. The idea was suggested by Peter Anvin. Lots of thanks to Gleb for pre-review and suggestions. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-15KVM: Disable SMAP for guests in EPT realmode and EPT unpaging modeFeng Wu1-5/+6
SMAP is disabled if CPU is in non-paging mode in hardware. However KVM always uses paging mode to emulate guest non-paging mode with TDP. To emulate this behavior, SMAP needs to be manually disabled when guest switches to non-paging mode. Signed-off-by: Feng Wu <feng.wu@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-03-17KVM: x86: handle missing MPX in nested virtualizationPaolo Bonzini1-0/+5
When doing nested virtualization, we may be able to read BNDCFGS but still not be allowed to write to GUEST_BNDCFGS in the VMCS. Guard writes to the field with vmx_mpx_supported(), and similarly hide the MSR from userspace if the processor does not support the field. We could work around this with the generic MSR save/load machinery, but there is only a limited number of MSR save/load slots and it is not really worthwhile to waste one for a scenario that should not happen except in the nested virtualization case. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-17KVM: x86: Add nested virtualization support for MPXPaolo Bonzini1-0/+17
This is simple to do, the "host" BNDCFGS is either 0 or the guest value. However, both controls have to be present. We cannot provide MPX if we only have one of the "load BNDCFGS" or "clear BNDCFGS" controls. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11KVM: nVMX: Allow nested guests to run with dirty debug registersPaolo Bonzini1-0/+4
When preparing the VMCS02, the CPU-based execution controls is computed by vmx_exec_control. Turn off DR access exits there, too, if the KVM_DEBUGREG_WONT_EXIT bit is set in switch_db_regs. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11KVM: vmx: Allow the guest to run with dirty debug registersPaolo Bonzini1-1/+37
When not running in guest-debug mode (i.e. the guest controls the debug registers, having to take an exit for each DR access is a waste of time. If the guest gets into a state where each context switch causes DR to be saved and restored, this can take away as much as 40% of the execution time from the guest. If the guest is running with vcpu->arch.db == vcpu->arch.eff_db, we can let it write freely to the debug registers and reload them on the next exit. We still need to exit on the first access, so that the KVM_DEBUGREG_WONT_EXIT flag is set in switch_db_regs; after that, further accesses to the debug registers will not cause a vmexit. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11KVM: vmx: we do rely on loading DR7 on entryPaolo Bonzini1-1/+1
Currently, this works even if the bit is not in "min", because the bit is always set in MSR_IA32_VMX_ENTRY_CTLS. Mention it for the sake of documentation, and to avoid surprises if we later switch to MSR_IA32_VMX_TRUE_ENTRY_CTLS. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11KVM: x86: Remove return code from enable_irq/nmi_windowJan Kiszka1-18/+7
It's no longer possible to enter enable_irq_window in guest mode when L1 intercepts external interrupts and we are entering L2. This is now caught in vcpu_enter_guest. So we can remove the check from the VMX version of enable_irq_window, thus the need to return an error code from both enable_irq_window and enable_nmi_window. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11KVM: nVMX: Do not inject NMI vmexits when L2 has a pending interruptJan Kiszka1-1/+2
According to SDM 27.2.3, IDT vectoring information will not be valid on vmexits caused by external NMIs. So we have to avoid creating such scenarios by delaying EXIT_REASON_EXCEPTION_NMI injection as long as we have a pending interrupt because that one would be migrated to L1's IDT vectoring info on nested exit. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11KVM: nVMX: Fully emulate preemption timerJan Kiszka1-55/+96
We cannot rely on the hardware-provided preemption timer support because we are holding L2 in HLT outside non-root mode. Furthermore, emulating the preemption will resolve tick rate errata on older Intel CPUs. The emulation is based on hrtimer which is started on L2 entry, stopped on L2 exit and evaluated via the new check_nested_events hook. As we no longer rely on hardware features, we can enable both the preemption timer support and value saving unconditionally. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11KVM: nVMX: Rework interception of IRQs and NMIsJan Kiszka1-29/+38
Move the check for leaving L2 on pending and intercepted IRQs or NMIs from the *_allowed handler into a dedicated callback. Invoke this callback at the relevant points before KVM checks if IRQs/NMIs can be injected. The callback has the task to switch from L2 to L1 if needed and inject the proper vmexit events. The rework fixes L2 wakeups from HLT and provides the foundation for preemption timer emulation. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-03kvm, vmx: Really fix lazy FPU on nested guestPaolo Bonzini1-1/+1
Commit e504c9098ed6 (kvm, vmx: Fix lazy FPU on nested guest, 2013-11-13) highlighted a real problem, but the fix was subtly wrong. nested_read_cr0 is the CR0 as read by L2, but here we want to look at the CR0 value reflecting L1's setup. In other words, L2 might think that TS=0 (so nested_read_cr0 has the bit clear); but if L1 is actually running it with TS=1, we should inject the fault into L1. The effective value of CR0 in L2 is contained in vmcs12->guest_cr0, use it. Fixes: e504c9098ed6acd9e1079c5e10e4910724ad429f Reported-by: Kashyap Chamarty <kchamart@redhat.com> Reported-by: Stefan Bader <stefan.bader@canonical.com> Tested-by: Kashyap Chamarty <kchamart@redhat.com> Tested-by: Anthoine Bourgeois <bourgeois@bertin.fr> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-25KVM: x86: add MSR_IA32_BNDCFGS to msrs_to_saveLiu, Jinsong1-0/+6
From 5d5a80cd172ea6fb51786369bcc23356b1e9e956 Mon Sep 17 00:00:00 2001 From: Liu Jinsong <jinsong.liu@intel.com> Date: Mon, 24 Feb 2014 18:11:55 +0800 Subject: [PATCH v5 2/3] KVM: x86: add MSR_IA32_BNDCFGS to msrs_to_save Add MSR_IA32_BNDCFGS to msrs_to_save, and corresponding logic to kvm_get/set_msr(). Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-24KVM: x86: Intel MPX vmx and msr handleLiu, Jinsong1-2/+16
From caddc009a6d2019034af8f2346b2fd37a81608d0 Mon Sep 17 00:00:00 2001 From: Liu Jinsong <jinsong.liu@intel.com> Date: Mon, 24 Feb 2014 18:11:11 +0800 Subject: [PATCH v5 1/3] KVM: x86: Intel MPX vmx and msr handle This patch handle vmx and msr of Intel MPX feature. Signed-off-by: Xudong Hao <xudong.hao@intel.com> Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-27KVM: x86: Validate guest writes to MSR_IA32_APICBASEJan Kiszka1-4/+5
Check for invalid state transitions on guest-initiated updates of MSR_IA32_APICBASE. This address both enabling of the x2APIC when it is not supported and all invalid transitions as described in SDM section 10.12.5. It also checks that no reserved bit is set in APICBASE by the guest. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> [Use cpuid_maxphyaddr instead of guest_cpuid_get_phys_bits. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-23Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-119/+204
Pull KVM updates from Paolo Bonzini: "First round of KVM updates for 3.14; PPC parts will come next week. Nothing major here, just bugfixes all over the place. The most interesting part is the ARM guys' virtualized interrupt controller overhaul, which lets userspace get/set the state and thus enables migration of ARM VMs" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (67 commits) kvm: make KVM_MMU_AUDIT help text more readable KVM: s390: Fix memory access error detection KVM: nVMX: Update guest activity state field on L2 exits KVM: nVMX: Fix nested_run_pending on activity state HLT KVM: nVMX: Clean up handling of VMX-related MSRs KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit KVM: nVMX: Leave VMX mode on clearing of feature control MSR KVM: VMX: Fix DR6 update on #DB exception KVM: SVM: Fix reading of DR6 KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS add support for Hyper-V reference time counter KVM: remove useless write to vcpu->hv_clock.tsc_timestamp KVM: x86: fix tsc catchup issue with tsc scaling KVM: x86: limit PIT timer frequency KVM: x86: handle invalid root_hpa everywhere kvm: Provide kvm_vcpu_eligible_for_directed_yield() stub kvm: vfio: silence GCC warning KVM: ARM: Remove duplicate include arm/arm64: KVM: relax the requirements of VMA alignment for THP ...
2014-01-17KVM: nVMX: Update guest activity state field on L2 exitsJan Kiszka1-0/+4
Set guest activity state in L1's VMCS according to the VCPUs mp_state. This ensures we report the correct state in case we L2 executed HLT or if we put L2 into HLT state and it was now woken up by an event. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-17KVM: nVMX: Fix nested_run_pending on activity state HLTJan Kiszka1-2/+2
When we suspend the guest in HLT state, the nested run is no longer pending - we emulated it completely. So only set nested_run_pending after checking the activity state. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-17KVM: nVMX: Clean up handling of VMX-related MSRsJan Kiszka1-56/+23
This simplifies the code and also stops issuing warning about writing to unhandled MSRs when VMX is disabled or the Feature Control MSR is locked - we do handle them all according to the spec. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-17KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_injectJan Kiszka1-0/+14
Already used by nested SVM for tracing nested vmexit: kvm_nested_vmexit marks exits from L2 to L0 while kvm_nested_vmexit_inject marks vmexits that are reflected to L1. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-17KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexitJan Kiszka1-29/+34
Instead of fixing up the vmcs12 after the nested vmexit, pass key parameters already when calling nested_vmx_vmexit. This will help tracing those vmexits. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-17KVM: nVMX: Leave VMX mode on clearing of feature control MSRJan Kiszka1-0/+14
When userspace sets MSR_IA32_FEATURE_CONTROL to 0, make sure we leave root and non-root mode, fully disabling VMX. The register state of the VCPU is undefined after this step, so userspace has to set it to a proper state afterward. This enables to reboot a VM while it is running some hypervisor code. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-17KVM: VMX: Fix DR6 update on #DB exceptionJan Kiszka1-1/+2
According to the SDM, only bits 0-3 of DR6 "may" be cleared by "certain" debug exception. So do update them on #DB exception in KVM, but leave the rest alone, only setting BD and BS in addition to already set bits in DR6. This also aligns us with kvm_vcpu_check_singlestep. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-17KVM: SVM: Fix reading of DR6Jan Kiszka1-0/+11
In contrast to VMX, SVM dose not automatically transfer DR6 into the VCPU's arch.dr6. So if we face a DR6 read, we must consult a new vendor hook to obtain the current value. And as SVM now picks the DR6 state from its VMCB, we also need a set callback in order to write updates of DR6 back. Fixes a regression of 020df0794f. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-09KVM: VMX: fix use after free of vmx->loaded_vmcsMarcelo Tosatti1-1/+1
After free_loaded_vmcs executes, the "loaded_vmcs" structure is kfreed, and now vmx->loaded_vmcs points to a kfreed area. Subsequent free_loaded_vmcs then attempts to manipulate vmx->loaded_vmcs. Switch the order to avoid the problem. https://bugzilla.redhat.com/show_bug.cgi?id=1047892 Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-01-09KVM: VMX: check use I/O bitmap first before unconditional I/O exitZhihui Zhang1-4/+1
According to Table C-1 of Intel SDM 3C, a VM exit happens on an I/O instruction when "use I/O bitmaps" VM-execution control was 0 _and_ the "unconditional I/O exiting" VM-execution control was 1. So we can't just check "unconditional I/O exiting" alone. This patch was improved by suggestion from Jan Kiszka. Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Zhihui Zhang <zzhsuny@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-01-02KVM: nVMX: Unconditionally uninit the MMU on nested vmexitJan Kiszka1-2/+1
Three reasons for doing this: 1. arch.walk_mmu points to arch.mmu anyway in case nested EPT wasn't in use. 2. this aligns VMX with SVM. But 3. is most important: nested_cpu_has_ept(vmcs12) queries the VMCS page, and if one guest VCPU manipulates the page of another VCPU in L2, we may be fooled to skip over the nested_ept_uninit_mmu_context, leaving mmu in nested state. That can crash the host later on if nested_ept_get_cr3 is invoked while L1 already left vmxon and nested.current_vmcs12 became NULL therefore. Cc: stable@kernel.org Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-12-20KVM: VMX: Do not skip the instruction if handle_dr injects a faultJan Kiszka1-3/+7
If kvm_get_dr or kvm_set_dr reports that it raised a fault, we must not advance the instruction pointer. Otherwise the exception will hit the wrong instruction. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-12-18KVM: nVMX: Support direct APIC access from L2Jan Kiszka1-0/+5
It's a pathological case, but still a valid one: If L1 disables APIC virtualization and also allows L2 to directly write to the APIC page, we have to forcibly enable APIC virtualization while in L2 if the in-kernel APIC is in use. This allows to run the direct interrupt test case in the vmx unit test without x2APIC. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-12-12KVM: nVMX: Add support for activity state HLTJan Kiszka1-1/+6
We can easily emulate the HLT activity state for L1: If it decides that L2 shall be halted on entry, just invoke the normal emulation of halt after switching to L2. We do not depend on specific host features to provide this, so we can expose the capability unconditionally. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-12-12KVM: VMX: shadow VM_(ENTRY|EXIT)_CONTROLS vmcs fieldGleb Natapov1-27/+85
VM_(ENTRY|EXIT)_CONTROLS vmcs fields are read/written on each guest entry but most times it can be avoided since values do not changes. Keep fields copy in memory to avoid unnecessary reads from vmcs. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-11-13kvm, vmx: Fix lazy FPU on nested guestAnthoine Bourgeois1-0/+3
If a nested guest does a NM fault but its CR0 doesn't contain the TS flag (because it was already cleared by the guest with L1 aid) then we have to activate FPU ourselves in L0 and then continue to L2. If TS flag is set then we fallback on the previous behavior, forward the fault to L1 if it asked for. Signed-off-by: Anthoine Bourgeois <bourgeois@bertin.fr> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-31kvm/vmx: error message typo fixMichael S. Tsirkin1-1/+1
mst can't be blamed for lack of switch entries: the issue is with msrs actually. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-30kvm: Create non-coherent DMA registerationAlex Williamson1-2/+1
We currently use some ad-hoc arch variables tied to legacy KVM device assignment to manage emulation of instructions that depend on whether non-coherent DMA is present. Create an interface for this, adapting legacy KVM device assignment and adding VFIO via the KVM-VFIO device. For now we assume that non-coherent DMA is possible any time we have a VFIO group. Eventually an interface can be developed as part of the VFIO external user interface to query the coherency of a group. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-30kvm/x86: Convert iommu_flags to iommu_noncoherentAlex Williamson1-1/+1
Default to operating in coherent mode. This simplifies the logic when we switch to a model of registering and unregistering noncoherent I/O with KVM. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-28nVMX: Report CPU_BASED_VIRTUAL_NMI_PENDING as supportedJan Kiszka1-1/+2
If the host supports it, we can and should expose it to the guest as well, just like we already do with PIN_BASED_VIRTUAL_NMIS. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-28nVMX: Fix pick-up of uninjected NMIsJan Kiszka1-1/+1
__vmx_complete_interrupts stored uninjected NMIs in arch.nmi_injected, not arch.nmi_pending. So we actually need to check the former field in vmcs12_save_pending_event. This fixes the eventinj unit test when run in nested KVM. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-28KVM: nVMX: Report 2MB EPT pages as supportedJan Kiszka1-1/+2
As long as the hardware provides us 2MB EPT pages, we can also expose them to the guest because our shadow EPT code already supports this feature. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-17Powerpc KVM work is based on a commit after rc4.Gleb Natapov1-13/+15
Merging master into next to satisfy the dependencies. Conflicts: arch/arm/kvm/reset.c
2013-10-10KVM: nVMX: Fully support nested VMX preemption timerArthur Chunqi Li1-2/+40
This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2->L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of "Save VMX-preemption timer value" VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li <yzt356@gmail.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-10KVM: nVMX: fix shadow on EPTGleb Natapov1-12/+12
72f857950f6f19 broke shadow on EPT. This patch reverts it and fixes PAE on nEPT (which reverted commit fixed) in other way. Shadow on EPT is now broken because while L1 builds shadow page table for L2 (which is PAE while L2 is in real mode) it never loads L2's GUEST_PDPTR[0-3]. They do not need to be loaded because without nested virtualization HW does this during guest entry if EPT is disabled, but in our case L0 emulates L2's vmentry while EPT is enables, so we cannot rely on vmcs12->guest_pdptr[0-3] to contain up-to-date values and need to re-read PDPTEs from L2 memory. This is what kvm_set_cr3() is doing, but by clearing cache bits during L2 vmentry we drop values that kvm_set_cr3() read from memory. So why the same code does not work for PAE on nEPT? kvm_set_cr3() reads pdptes into vcpu->arch.walk_mmu->pdptrs[]. walk_mmu points to vcpu->arch.nested_mmu while nested guest is running, but ept_load_pdptrs() uses vcpu->arch.mmu which contain incorrect values. Fix that by using walk_mmu in ept_(load|save)_pdptrs. Signed-off-by: Gleb Natapov <gleb@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-03KVM: mmu: change useless int return types to voidPaolo Bonzini1-4/+2
kvm_mmu initialization is mostly filling in function pointers, there is no way for it to fail. Clean up unused return values. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-09-30KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2Gleb Natapov1-0/+20
If #PF happens during delivery of an exception into L2 and L1 also do not have the page mapped in its shadow page table then L0 needs to generate vmexit to L2 with original event in IDT_VECTORING_INFO, but current code combines both exception and generates #DF instead. Fix that by providing nVMX specific function to handle page faults during page table walk that handles this case correctly. Signed-off-by: Gleb Natapov <gleb@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-30KVM: nVMX: Check all exceptions for intercept during delivery to L2Gleb Natapov1-8/+4
All exceptions should be checked for intercept during delivery to L2, but we check only #PF currently. Drop nested_run_pending while we are at it since exception cannot be injected during vmentry anyway. Signed-off-by: Gleb Natapov <gleb@redhat.com> [Renamed the nested_vmx_check_exception function. - Paolo] Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-30KVM: nVMX: Do not put exception that caused vmexit to IDT_VECTORING_INFOGleb Natapov1-4/+4
If an exception causes vmexit directly it should not be reported in IDT_VECTORING_INFO during the exit. For that we need to be able to distinguish between exception that is injected into nested VM and one that is reinjected because its delivery failed. Fortunately we already have mechanism to do so for nested SVM, so here we just use correct function to requeue exceptions and make sure that reinjected exception is not moved to IDT_VECTORING_INFO during vmexit emulation and not re-checked for interception during delivery. Signed-off-by: Gleb Natapov <gleb@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>