summaryrefslogtreecommitdiff
path: root/arch/x86/kvm/x86.c
AgeCommit message (Collapse)AuthorFilesLines
2021-03-15KVM: x86/mmu: Move initial kvm_mmu_set_mask_ptes() call into MMU properSean Christopherson1-3/+0
Move kvm_mmu_set_mask_ptes() into mmu.c as prep for future cleanup of the mask initialization code. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210225204749.1512652-16-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: x86: determine if an exception has an error code only when injecting it.Maxim Levitsky1-4/+10
A page fault can be queued while vCPU is in real paged mode on AMD, and AMD manual asks the user to always intercept it (otherwise result is undefined). The resulting VM exit, does have an error code. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210225154135.405125-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: x86: Move RDPMC emulation to common codeSean Christopherson1-7/+8
Move the entirety of the accelerated RDPMC emulation to x86.c, and assign the common handler directly to the exit handler array for VMX. SVM has bizarre nrips behavior that prevents it from directly invoking the common handler. The nrips goofiness will be addressed in a future patch. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: x86: Move trivial instruction-based exit handlers to common codeSean Christopherson1-0/+34
Move the trivial exit handlers, e.g. for instructions that KVM "emulates" as nops, to common x86 code. Assign the common handlers directly to the exit handler arrays and drop the vendor trampolines. Opportunistically use pr_warn_once() where appropriate. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: x86: Move XSETBV emulation to common codeSean Christopherson1-5/+8
Move the entirety of XSETBV emulation to x86.c, and assign the function directly to both VMX's and SVM's exit handlers, i.e. drop the unnecessary trampolines. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: x86: Handle triple fault in L2 without killing L1Sean Christopherson1-6/+23
Synthesize a nested VM-Exit if L2 triggers an emulated triple fault instead of exiting to userspace, which likely will kill L1. Any flow that does KVM_REQ_TRIPLE_FAULT is suspect, but the most common scenario for L2 killing L1 is if L0 (KVM) intercepts a contributory exception that is _not_intercepted by L1. E.g. if KVM is intercepting #GPs for the VMware backdoor, a #GP that occurs in L2 while vectoring an injected #DF will cause KVM to emulate triple fault. Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210302174515.2812275-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: x86: Defer the MMU unload to the normal path on an global INVPCIDSean Christopherson1-1/+1
Defer unloading the MMU after a INVPCID until the instruction emulation has completed, i.e. until after RIP has been updated. On VMX, this is a benign bug as VMX doesn't touch the MMU when skipping an emulated instruction. However, on SVM, if nrip is disabled, the emulator is used to skip an instruction, which would lead to fireworks if the emulator were invoked without a valid MMU. Fixes: eb4b248e152d ("kvm: vmx: Support INVPCID in shadow paging mode") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210305011101.3597423-15-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-15KVM: x86: to track if L1 is running L2 VMDongli Zhang1-0/+1
The new per-cpu stat 'nested_run' is introduced in order to track if L1 VM is running or used to run L2 VM. An example of the usage of 'nested_run' is to help the host administrator to easily track if any L1 VM is used to run L2 VM. Suppose there is issue that may happen with nested virtualization, the administrator will be able to easily narrow down and confirm if the issue is due to nested virtualization via 'nested_run'. For example, whether the fix like commit 88dddc11a8d6 ("KVM: nVMX: do not use dangling shadow VMCS after guest reset") is required. Cc: Joe Jin <joe.jin@oracle.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Message-Id: <20210305225747.7682-1-dongli.zhang@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-14Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-1/+1
Pull KVM fixes from Paolo Bonzini: "More fixes for ARM and x86" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: LAPIC: Advancing the timer expiration on guest initiated write KVM: x86/mmu: Skip !MMU-present SPTEs when removing SP in exclusive mode KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged kvm: x86: annotate RCU pointers KVM: arm64: Fix exclusive limit for IPA size KVM: arm64: Reject VM creation when the default IPA size is unsupported KVM: arm64: Ensure I-cache isolation between vcpus of a same VM KVM: arm64: Don't use cbz/adr with external symbols KVM: arm64: Fix range alignment when walking page tables KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to __vgic_v3_get_gic_config() KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is available KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key KVM: arm64: Fix nVHE hyp panic host context restore KVM: arm64: Avoid corrupting vCPU context register in guest exit KVM: arm64: nvhe: Save the SPE context early kvm: x86: use NULL instead of using plain integer as pointer KVM: SVM: Connect 'npt' module param to KVM's internal 'npt_enabled' KVM: x86: Ensure deadline timer has truly expired before posting its IRQ
2021-03-08x86: Remove duplicate TSC DEADLINE MSR definitionsDave Hansen1-4/+4
There are two definitions for the TSC deadline MSR in msr-index.h, one with an underscore and one without. Axe one of them and move all the references over to the other one. [ bp: Fixup the MSR define in handle_fastpath_set_msr_irqoff() too. ] Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200305174706.0D6B8EE4@viggo.jf.intel.com
2021-03-06kvm: x86: use NULL instead of using plain integer as pointerMuhammad Usama Anjum1-1/+1
Sparse warnings removed: warning: Using plain integer as NULL pointer Signed-off-by: Muhammad Usama Anjum <musamaanjum@gmail.com> Message-Id: <20210305180816.GA488770@LEGION> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-1/+21
Pull KVM fixes from Paolo Bonzini: - Doc fixes - selftests fixes - Add runstate information to the new Xen support - Allow compiling out the Xen interface - 32-bit PAE without EPT bugfix - NULL pointer dereference bugfix * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: Clear the CR4 register on reset KVM: x86/xen: Add support for vCPU runstate information KVM: x86/xen: Fix return code when clearing vcpu_info and vcpu_time_info selftests: kvm: Mmap the entire vcpu mmap area KVM: Documentation: Fix index for KVM_CAP_PPC_DAWR1 KVM: x86: allow compiling out the Xen hypercall interface KVM: xen: flush deferred static key before checking it KVM: x86/mmu: Set SPTE_AD_WRPROT_ONLY_MASK if and only if PML is enabled KVM: x86: hyper-v: Fix Hyper-V context null-ptr-deref KVM: x86: remove misplaced comment on active_mmu_pages KVM: Documentation: rectify rst markup in kvm_run->flags Documentation: kvm: fix messy conversion from .txt to .rst
2021-03-02KVM: x86/xen: Add support for vCPU runstate informationDavid Woodhouse1-1/+12
This is how Xen guests do steal time accounting. The hypervisor records the amount of time spent in each of running/runnable/blocked/offline states. In the Xen accounting, a vCPU is still in state RUNSTATE_running while in Xen for a hypercall or I/O trap, etc. Only if Xen explicitly schedules does the state become RUNSTATE_blocked. In KVM this means that even when the vCPU exits the kvm_run loop, the state remains RUNSTATE_running. The VMM can explicitly set the vCPU to RUNSTATE_blocked by using the KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT attribute, and can also use KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST to retrospectively add a given amount of time to the blocked state and subtract it from the running state. The state_entry_time corresponds to get_kvmclock_ns() at the time the vCPU entered the current state, and the total times of all four states should always add up to state_entry_time. Co-developed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20210301125309.874953-2-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-02KVM: x86: allow compiling out the Xen hypercall interfacePaolo Bonzini1-0/+8
The Xen hypercall interface adds to the attack surface of the hypervisor and will be used quite rarely. Allow compiling it out. Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-26Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-56/+87
Pull more KVM updates from Paolo Bonzini: "x86: - take into account HVA before retrying on MMU notifier race - fixes for nested AMD guests without NPT - allow INVPCID in guest without PCID - disable PML in hardware when not in use - MMU code cleanups: * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits) KVM: SVM: Fix nested VM-Exit on #GP interception handling KVM: vmx/pmu: Fix dummy check if lbr_desc->event is created KVM: x86/mmu: Consider the hva in mmu_notifier retry KVM: x86/mmu: Skip mmu_notifier check when handling MMIO page fault KVM: Documentation: rectify rst markup in KVM_GET_SUPPORTED_HV_CPUID KVM: nSVM: prepare guest save area while is_guest_mode is true KVM: x86/mmu: Remove a variety of unnecessary exports KVM: x86: Fold "write-protect large" use case into generic write-protect KVM: x86/mmu: Don't set dirty bits when disabling dirty logging w/ PML KVM: VMX: Dynamically enable/disable PML based on memslot dirty logging KVM: x86: Further clarify the logic and comments for toggling log dirty KVM: x86: Move MMU's PML logic to common code KVM: x86/mmu: Make dirty log size hook (PML) a value, not a function KVM: x86/mmu: Expand on the comment in kvm_vcpu_ad_need_write_protect() KVM: nVMX: Disable PML in hardware when running L2 KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs KVM: x86/mmu: Pass the memslot to the rmap callbacks KVM: x86/mmu: Split out max mapping level calculation to helper KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE and HugeTLB pages KVM: nVMX: no need to undo inject_page_fault change on nested vmexit ...
2021-02-26KVM: xen: flush deferred static key before checking itPaolo Bonzini1-0/+1
A missing flush would cause the static branch to trigger incorrectly. Cc: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-22Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-326/+392
Pull KVM updates from Paolo Bonzini: "x86: - Support for userspace to emulate Xen hypercalls - Raise the maximum number of user memslots - Scalability improvements for the new MMU. Instead of the complex "fast page fault" logic that is used in mmu.c, tdp_mmu.c uses an rwlock so that page faults are concurrent, but the code that can run against page faults is limited. Right now only page faults take the lock for reading; in the future this will be extended to some cases of page table destruction. I hope to switch the default MMU around 5.12-rc3 (some testing was delayed due to Chinese New Year). - Cleanups for MAXPHYADDR checks - Use static calls for vendor-specific callbacks - On AMD, use VMLOAD/VMSAVE to save and restore host state - Stop using deprecated jump label APIs - Workaround for AMD erratum that made nested virtualization unreliable - Support for LBR emulation in the guest - Support for communicating bus lock vmexits to userspace - Add support for SEV attestation command - Miscellaneous cleanups PPC: - Support for second data watchpoint on POWER10 - Remove some complex workarounds for buggy early versions of POWER9 - Guest entry/exit fixes ARM64: - Make the nVHE EL2 object relocatable - Cleanups for concurrent translation faults hitting the same page - Support for the standard TRNG hypervisor call - A bunch of small PMU/Debug fixes - Simplification of the early init hypercall handling Non-KVM changes (with acks): - Detection of contended rwlocks (implemented only for qrwlocks, because KVM only needs it for x86) - Allow __DISABLE_EXPORTS from assembly code - Provide a saner follow_pfn replacements for modules" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (192 commits) KVM: x86/xen: Explicitly pad struct compat_vcpu_info to 64 bytes KVM: selftests: Don't bother mapping GVA for Xen shinfo test KVM: selftests: Fix hex vs. decimal snafu in Xen test KVM: selftests: Fix size of memslots created by Xen tests KVM: selftests: Ignore recently added Xen tests' build output KVM: selftests: Add missing header file needed by xAPIC IPI tests KVM: selftests: Add operand to vmsave/vmload/vmrun in svm.c KVM: SVM: Make symbol 'svm_gp_erratum_intercept' static locking/arch: Move qrwlock.h include after qspinlock.h KVM: PPC: Book3S HV: Fix host radix SLB optimisation with hash guests KVM: PPC: Book3S HV: Ensure radix guest has no SLB entries KVM: PPC: Don't always report hash MMU capability for P9 < DD2.2 KVM: PPC: Book3S HV: Save and restore FSCR in the P9 path KVM: PPC: remove unneeded semicolon KVM: PPC: Book3S HV: Use POWER9 SLBIA IH=6 variant to clear SLB KVM: PPC: Book3S HV: No need to clear radix host SLB before loading HPT guest KVM: PPC: Book3S HV: Fix radix guest SLB side channel KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host without mixed mode support KVM: PPC: Book3S HV: Introduce new capability for 2nd DAWR KVM: PPC: Book3S HV: Add infrastructure to support 2nd DAWR ...
2021-02-19KVM: x86: Fold "write-protect large" use case into generic write-protectSean Christopherson1-15/+17
Drop kvm_mmu_slot_largepage_remove_write_access() and refactor its sole caller to use kvm_mmu_slot_remove_write_access(). Remove the now-unused slot_handle_large_level() and slot_handle_all_level() helpers. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-19KVM: x86/mmu: Don't set dirty bits when disabling dirty logging w/ PMLSean Christopherson1-51/+36
Stop setting dirty bits for MMU pages when dirty logging is disabled for a memslot, as PML is now completely disabled when there are no memslots with dirty logging enabled. This means that spurious PML entries will be created for memslots with dirty logging disabled if at least one other memslot has dirty logging enabled. However, spurious PML entries are already possible since dirty bits are set only when a dirty logging is turned off, i.e. memslots that are never dirty logged will have dirty bits cleared. In the end, it's faster overall to eat a few spurious PML entries in the window where dirty logging is being disabled across all memslots. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-19KVM: VMX: Dynamically enable/disable PML based on memslot dirty loggingMakarand Sonare1-4/+31
Currently, if enable_pml=1 PML remains enabled for the entire lifetime of the VM irrespective of whether dirty logging is enable or disabled. When dirty logging is disabled, all the pages of the VM are manually marked dirty, so that PML is effectively non-operational. Setting the dirty bits is an expensive operation which can cause severe MMU lock contention in a performance sensitive path when dirty logging is disabled after a failed or canceled live migration. Manually setting dirty bits also fails to prevent PML activity if some code path clears dirty bits, which can incur unnecessary VM-Exits. In order to avoid this extra overhead, dynamically enable/disable PML when dirty logging gets turned on/off for the first/last memslot. Signed-off-by: Makarand Sonare <makarandsonare@google.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-12-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-19KVM: x86: Further clarify the logic and comments for toggling log dirtySean Christopherson1-4/+11
Add a sanity check in kvm_mmu_slot_apply_flags to assert that the LOG_DIRTY_PAGES flag is indeed being toggled, and explicitly rely on that holding true when zapping collapsible SPTEs. Manipulating the CPU dirty log (PML) and write-protection also relies on this assertion, but that's not obvious in the current code. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-19KVM: x86: Move MMU's PML logic to common codeSean Christopherson1-6/+16
Drop the facade of KVM's PML logic being vendor specific and move the bits that aren't truly VMX specific into common x86 code. The MMU logic for dealing with PML is tightly coupled to the feature and to VMX's implementation, bouncing through kvm_x86_ops obfuscates the code without providing any meaningful separation of concerns or encapsulation. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210213005015.1651772-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-17entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling pointFrederic Weisbecker1-0/+1
Following the idle loop model, cleanly check for pending rcuog wakeup before the last rescheduling point upon resuming to guest mode. This way we can avoid to do it from rcu_user_enter() with the last resort self-IPI hack that enforces rescheduling. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20210131230548.32970-6-frederic@kernel.org
2021-02-09KVM: x86: hyper-v: Allocate Hyper-V context lazilyVitaly Kuznetsov1-8/+1
Hyper-V context is only needed for guests which use Hyper-V emulation in KVM (e.g. Windows/Hyper-V guests) so we don't actually need to allocate it in kvm_arch_vcpu_create(), we can postpone the action until Hyper-V specific MSRs are accessed or SynIC is enabled. Once allocated, let's keep the context alive for the lifetime of the vCPU as an attempt to free it would require additional synchronization with other vCPUs and normally it is not supposed to happen. Note, Hyper-V style hypercall enablement is done by writing to HV_X64_MSR_GUEST_OS_ID so we don't need to worry about allocating Hyper-V context from kvm_hv_hypercall(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210126134816.1880136-15-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-09KVM: x86: hyper-v: Make Hyper-V emulation enablement conditionalVitaly Kuznetsov1-1/+1
Hyper-V emulation is enabled in KVM unconditionally. This is bad at least from security standpoint as it is an extra attack surface. Ideally, there should be a per-VM capability explicitly enabled by VMM but currently it is not the case and we can't mandate one without breaking backwards compatibility. We can, however, check guest visible CPUIDs and only enable Hyper-V emulation when "Hv#1" interface was exposed in HYPERV_CPUID_INTERFACE. Note, VMMs are free to act in any sequence they like, e.g. they can try to set MSRs first and CPUIDs later so we still need to allow the host to read/write Hyper-V specific MSRs unconditionally. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210126134816.1880136-14-vkuznets@redhat.com> [Add selftest vcpu_set_hv_cpuid API to avoid breaking xen_vmcall_test. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-09KVM: x86: hyper-v: Allocate 'struct kvm_vcpu_hv' dynamicallyVitaly Kuznetsov1-2/+5
Hyper-V context is only needed for guests which use Hyper-V emulation in KVM (e.g. Windows/Hyper-V guests). 'struct kvm_vcpu_hv' is, however, quite big, it accounts for more than 1/4 of the total 'struct kvm_vcpu_arch' which is also quite big already. This all looks like a waste. Allocate 'struct kvm_vcpu_hv' dynamically. This patch does not bring any (intentional) functional change as we still allocate the context unconditionally but it paves the way to doing that only when needed. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210126134816.1880136-13-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-09KVM: x86: hyper-v: Prepare to meet unallocated Hyper-V contextVitaly Kuznetsov1-2/+5
Currently, Hyper-V context is part of 'struct kvm_vcpu_arch' and is always available. As a preparation to allocating it dynamically, check that it is not NULL at call sites which can normally proceed without it i.e. the behavior is identical to the situation when Hyper-V emulation is not being used by the guest. When Hyper-V context for a particular vCPU is not allocated, we may still need to get 'vp_index' from there. E.g. in a hypothetical situation when Hyper-V emulation was enabled on one CPU and wasn't on another, Hyper-V style send-IPI hypercall may still be used. Luckily, vp_index is always initialized to kvm_vcpu_get_idx() and can only be changed when Hyper-V context is present. Introduce kvm_hv_get_vpindex() helper for simplification. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210126134816.1880136-12-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-09KVM: x86: hyper-v: Always use to_hv_vcpu() accessor to get to 'struct ↵Vitaly Kuznetsov1-1/+3
kvm_vcpu_hv' As a preparation to allocating Hyper-V context dynamically, make it clear who's the user of the said context. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210126134816.1880136-11-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-09KVM: x86: hyper-v: Introduce to_kvm_hv() helperVitaly Kuznetsov1-1/+1
Spelling '&kvm->arch.hyperv' correctly is hard. Also, this makes the code more consistent with vmx/svm where to_kvm_vmx()/to_kvm_svm() are already being used. Opportunistically change kvm_hv_msr_{get,set}_crash_{data,ctl}() and kvm_hv_msr_set_crash_data() to take 'kvm' instead of 'vcpu' as these MSRs are partition wide. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210126134816.1880136-9-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-09KVM: x86: hyper-v: Rename vcpu_to_synic()/synic_to_vcpu()Vitaly Kuznetsov1-1/+1
vcpu_to_synic()'s argument is almost always 'vcpu' so there's no need to have an additional prefix. Also, as this is used outside of hyper-v emulation code, add '_hv_' part to make it clear what this s. This makes the naming more consistent with to_hv_vcpu(). Rename synic_to_vcpu() to hv_synic_to_vcpu() for consistency. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210126134816.1880136-6-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-09KVM: x86: move kvm_inject_gp up from kvm_set_dr to callersPaolo Bonzini1-14/+5
Push the injection of #GP up to the callers, so that they can just use kvm_complete_insn_gp. __kvm_set_dr is pretty much what the callers can use together with kvm_complete_insn_gp, so rename it to kvm_set_dr and drop the old kvm_set_dr wrapper. This also allows nested VMX code, which really wanted to use __kvm_set_dr, to use the right function. While at it, remove the kvm_require_dr() check from the SVM interception. The APM states: All normal exception checks take precedence over the SVM intercepts. which includes the CR4.DE=1 #UD. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-09KVM: x86: reading DR cannot failPaolo Bonzini1-5/+4
kvm_get_dr and emulator_get_dr except an in-range value for the register number so they cannot fail. Change the return type to void. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: x86: Set so called 'reserved CR3 bits in LM mask' at vCPU resetSean Christopherson1-0/+1
Set cr3_lm_rsvd_bits, which is effectively an invalid GPA mask, at vCPU reset. The reserved bits check needs to be done even if userspace never configures the guest's CPUID model. Cc: stable@vger.kernel.org Fixes: 0107973a80ad ("KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: x86: Add helper to consolidate "raw" reserved GPA mask calculationsSean Christopherson1-2/+2
Add a helper to generate the mask of reserved GPA bits _without_ any adjustments for repurposed bits, and use it to replace a variety of open coded variants in the MTRR and APIC_BASE flows. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: x86: Use reserved_gpa_bits to calculate reserved PxE bitsSean Christopherson1-2/+1
Use reserved_gpa_bits, which accounts for exceptions to the maxphyaddr rule, e.g. SEV's C-bit, for the page {table,directory,etc...} entry (PxE) reserved bits checks. For SEV, the C-bit is ignored by hardware when walking pages tables, e.g. the APM states: Note that while the guest may choose to set the C-bit explicitly on instruction pages and page table addresses, the value of this bit is a don't-care in such situations as hardware always performs these as private accesses. Such behavior is expected to hold true for other features that repurpose GPA bits, e.g. KVM could theoretically emulate SME or MKTME, which both allow non-zero repurposed bits in the page tables. Conceptually, KVM should apply reserved GPA checks universally, and any features that do not adhere to the basic rule should be explicitly handled, i.e. if a GPA bit is repurposed but not allowed in page tables for whatever reason. Refactor __reset_rsvds_bits_mask() to take the pre-generated reserved bits mask, and opportunistically clean up its code, e.g. to align lines and comments. Practically speaking, this is change is a likely a glorified nop given the current KVM code base. SEV's C-bit is the only repurposed GPA bit, and KVM doesn't support shadowing encrypted page tables (which is theoretically possible via SEV debug APIs). Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: x86: SEV: Treat C-bit as legal GPA bit regardless of vCPU modeSean Christopherson1-4/+3
Rename cr3_lm_rsvd_bits to reserved_gpa_bits, and use it for all GPA legality checks. AMD's APM states: If the C-bit is an address bit, this bit is masked from the guest physical address when it is translated through the nested page tables. Thus, any access that can conceivably be run through NPT should ignore the C-bit when checking for validity. For features that KVM emulates in software, e.g. MTRRs, there is no clear direction in the APM for how the C-bit should be handled. For such cases, follow the SME behavior inasmuch as possible, since SEV is is essentially a VM-specific variant of SME. For SME, the APM states: In this case the upper physical address bits are treated as reserved when the feature is enabled except where otherwise indicated. Collecting the various relavant SME snippets in the APM and cross- referencing the omissions with Linux kernel code, this leaves MTTRs and APIC_BASE as the only flows that KVM emulates that should _not_ ignore the C-bit. Note, this means the reserved bit checks in the page tables are technically broken. This will be remedied in a future patch. Although the page table checks are technically broken, in practice, it's all but guaranteed to be irrelevant. NPT is required for SEV, i.e. shadowing page tables isn't needed in the common case. Theoretically, the checks could be in play for nested NPT, but it's extremely unlikely that anyone is running nested VMs on SEV, as doing so would require L1 to expose sensitive data to L0, e.g. the entire VMCB. And if anyone is running nested VMs, L0 can't read the guest's encrypted memory, i.e. L1 would need to put its NPT in shared memory, in which case the C-bit will never be set. Or, L1 could use shadow paging, but again, if L0 needs to read page tables, e.g. to load PDPTRs, the memory can't be encrypted if L1 has any expectation of L0 doing the right thing. Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: x86: Set so called 'reserved CR3 bits in LM mask' at vCPU resetSean Christopherson1-0/+1
Set cr3_lm_rsvd_bits, which is effectively an invalid GPA mask, at vCPU reset. The reserved bits check needs to be done even if userspace never configures the guest's CPUID model. Cc: stable@vger.kernel.org Fixes: 0107973a80ad ("KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: x86: declare Xen HVM shared info capability and add test caseDavid Woodhouse1-1/+2
Instead of adding a plethora of new KVM_CAP_XEN_FOO capabilities, just add bits to the return value of KVM_CAP_XEN_HVM. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
2021-02-04KVM: x86/xen: Add event channel interrupt vector upcallDavid Woodhouse1-1/+2
It turns out that we can't handle event channels *entirely* in userspace by delivering them as ExtINT, because KVM is a bit picky about when it accepts ExtINT interrupts from a legacy PIC. The in-kernel local APIC has to have LVT0 configured in APIC_MODE_EXTINT and unmasked, which isn't necessarily the case for Xen guests especially on secondary CPUs. To cope with this, add kvm_xen_get_interrupt() which checks the evtchn_pending_upcall field in the Xen vcpu_info, and delivers the Xen upcall vector (configured by KVM_XEN_ATTR_TYPE_UPCALL_VECTOR) if it's set regardless of LAPIC LVT0 configuration. This gives us the minimum support we need for completely userspace-based implementation of event channels. This does mean that vcpu_enter_guest() needs to check for the evtchn_pending_upcall flag being set, because it can't rely on someone having set KVM_REQ_EVENT unless we were to add some way for userspace to do so manually. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
2021-02-04KVM: x86/xen: register vcpu time info regionJoao Martins1-0/+2
Allow the Xen emulated guest the ability to register secondary vcpu time information. On Xen guests this is used in order to be mapped to userspace and hence allow vdso gettimeofday to work. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
2021-02-04KVM: x86/xen: setup pvclock updatesJoao Martins1-13/+18
Parameterise kvm_setup_pvclock_page() a little bit so that it can be invoked for different gfn_to_hva_cache structures, and with different offsets. Then we can invoke it for the normal KVM pvclock and also for the Xen one in the vcpu_info. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
2021-02-04KVM: x86/xen: Add KVM_XEN_VCPU_SET_ATTR/KVM_XEN_VCPU_GET_ATTRDavid Woodhouse1-0/+20
This will be used for per-vCPU setup such as runstate and vcpu_info. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
2021-02-04KVM: x86/xen: update wallclock regionJoao Martins1-5/+12
Wallclock on Xen is written in the shared_info page. To that purpose, export kvm_write_wall_clock() and pass on the GPA of its location to populate the shared_info wall clock data. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
2021-02-04KVM: x86/xen: add KVM_XEN_HVM_SET_ATTR/KVM_XEN_HVM_GET_ATTRJoao Martins1-0/+20
This will be used to set up shared info pages etc. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
2021-02-04KVM: x86/xen: Add kvm_xen_enabled static keyDavid Woodhouse1-0/+2
The code paths for Xen support are all fairly lightweight but if we hide them behind this, they're even *more* lightweight for any system which isn't actually hosting Xen guests. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
2021-02-04KVM: x86/xen: Move KVM_XEN_HVM_CONFIG handling to xen.cDavid Woodhouse1-13/+1
This is already more complex than the simple memcpy it originally had. Move it to xen.c with the rest of the Xen support. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
2021-02-04KVM: x86/xen: intercept xen hypercalls if enabledJoao Martins1-31/+18
Add a new exit reason for emulator to handle Xen hypercalls. Since this means KVM owns the ABI, dispense with the facility for the VMM to provide its own copy of the hypercall pages; just fill them in directly using VMCALL/VMMCALL as we do for the Hyper-V hypercall page. This behaviour is enabled by a new INTERCEPT_HCALL flag in the KVM_XEN_HVM_CONFIG ioctl structure, and advertised by the same flag being returned from the KVM_CAP_XEN_HVM check. Rename xen_hvm_config() to kvm_xen_write_hypercall_page() and move it to the nascent xen.c while we're at it, and add a test case. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
2021-02-04KVM: x86/xen: Fix __user pointer handling for hypercall page installationDavid Woodhouse1-3/+5
The address we give to memdup_user() isn't correctly tagged as __user. This is harmless enough as it's a one-off use and we're doing exactly the right thing, but fix it anyway to shut the checker up. Otherwise it'll whine when the (now legacy) code gets moved around in a later patch. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
2021-02-04KVM: x86/xen: fix Xen hypercall page msr handlingJoao Martins1-2/+3
Xen usually places its MSR at 0x40000000 or 0x40000200 depending on whether it is running in viridian mode or not. Note that this is not ABI guaranteed, so it is possible for Xen to advertise the MSR some place else. Given the way xen_hvm_config() is handled, if the former address is selected, this will conflict with Hyper-V's MSR (HV_X64_MSR_GUEST_OS_ID) which unconditionally uses the same address. Given that the MSR location is arbitrary, move the xen_hvm_config() handling to the top of kvm_set_msr_common() before falling through. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
2021-02-04KVM: x86/mmu: Use an rwlock for the x86 MMUBen Gardon1-2/+2
Add a read / write lock to be used in place of the MMU spinlock on x86. The rwlock will enable the TDP MMU to handle page faults, and other operations in parallel in future commits. Reviewed-by: Peter Feiner <pfeiner@google.com> Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210202185734.1680553-19-bgardon@google.com> [Introduce virt/kvm/mmu_lock.h - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>