summaryrefslogtreecommitdiff
path: root/arch/x86/kvm
AgeCommit message (Collapse)AuthorFilesLines
2021-04-20KVM: vmx: add mismatched size assertions in vmcs_check32()Haiwei Li1-0/+4
Add compile-time assertions in vmcs_check32() to disallow accesses to 64-bit and 64-bit high fields via vmcs_{read,write}32(). Upper level KVM code should never do partial accesses to VMCS fields. KVM handles the split accesses automatically in vmcs_{read,write}64() when running as a 32-bit kernel. Reviewed-and-tested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Haiwei Li <lihaiwei@tencent.com> Message-Id: <20210409022456.23528-1-lihaiwei.kernel@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20KVM: SVM: Enhance and clean up the vmcb tracking comment in pre_svm_run()Sean Christopherson1-5/+4
Explicitly document why a vmcb must be marked dirty and assigned a new asid when it will be run on a different cpu. The "what" is relatively obvious, whereas the "why" requires reading the APM and/or KVM code. Opportunistically remove a spurious period and several unnecessary newlines in the comment. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210406171811.4043363-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20KVM: SVM: Add a comment to clarify what vcpu_svm.vmcb points atSean Christopherson1-0/+1
Add a comment above the declaration of vcpu_svm.vmcb to call out that it is simply a shorthand for current_vmcb->ptr. The myriad accesses to svm->vmcb are quite confusing without this crucial detail. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210406171811.4043363-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20KVM: SVM: Drop vcpu_svm.vmcb_paSean Christopherson2-4/+9
Remove vmcb_pa from vcpu_svm and simply read current_vmcb->pa directly in the one path where it is consumed. Unlike svm->vmcb, use of the current vmcb's address is very limited, as evidenced by the fact that its use can be trimmed to a single dereference. Opportunistically add a comment about using vmcb01 for VMLOAD/VMSAVE, at first glance using vmcb01 instead of vmcb_pa looks wrong. No functional change intended. Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210406171811.4043363-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20KVM: SVM: Don't set current_vmcb->cpu when switching vmcbSean Christopherson1-8/+0
Do not update the new vmcb's last-run cpu when switching to a different vmcb. If the vCPU is migrated between its last run and a vmcb switch, e.g. for nested VM-Exit, then setting the cpu without marking the vmcb dirty will lead to KVM running the vCPU on a different physical cpu with stale clean bit settings. vcpu->cpu current_vmcb->cpu hardware pre_svm_run() cpu0 cpu0 cpu0,clean kvm_arch_vcpu_load() cpu1 cpu0 cpu0,clean svm_switch_vmcb() cpu1 cpu1 cpu0,clean pre_svm_run() cpu1 cpu1 kaboom Simply delete the offending code; unlike VMX, which needs to update the cpu at switch time due to the need to do VMPTRLD, SVM only cares about which cpu last ran the vCPU. Fixes: af18fa775d07 ("KVM: nSVM: Track the physical cpu of the vmcb vmrun through the vmcb") Cc: Cathy Avery <cavery@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210406171811.4043363-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20KVM: SVM: Make sure GHCB is mapped before updatingTom Lendacky2-1/+4
Access to the GHCB is mainly in the VMGEXIT path and it is known that the GHCB will be mapped. But there are two paths where it is possible the GHCB might not be mapped. The sev_vcpu_deliver_sipi_vector() routine will update the GHCB to inform the caller of the AP Reset Hold NAE event that a SIPI has been delivered. However, if a SIPI is performed without a corresponding AP Reset Hold, then the GHCB might not be mapped (depending on the previous VMEXIT), which will result in a NULL pointer dereference. The svm_complete_emulated_msr() routine will update the GHCB to inform the caller of a RDMSR/WRMSR operation about any errors. While it is likely that the GHCB will be mapped in this situation, add a safe guard in this path to be certain a NULL pointer dereference is not encountered. Fixes: f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts under SEV-ES") Fixes: 647daca25d24 ("KVM: SVM: Add support for booting APs in an SEV-ES guest") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Message-Id: <a5d3ebb600a91170fc88599d5a575452b3e31036.1617979121.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20KVM: X86: Do not yield to selfWanpeng Li1-0/+4
If the target is self we do not need to yield, we can avoid malicious guest to play this. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1617941911-5338-3-git-send-email-wanpengli@tencent.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20KVM: X86: Count attempted/successful directed yieldWanpeng Li1-6/+18
To analyze some performance issues with lock contention and scheduling, it is nice to know when directed yield are successful or failing. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1617941911-5338-2-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20KVM: x86/mmu: Tear down roots before kvm_mmu_zap_all_fast returnsBen Gardon3-1/+89
To avoid saddling a vCPU thread with the work of tearing down an entire paging structure, take a reference on each root before they become obsolete, so that the thread initiating the fast invalidation can tear down the paging structure and (most likely) release the last reference. As a bonus, this teardown can happen under the MMU lock in read mode so as not to block the progress of vCPU threads. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-14-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-20KVM: x86/mmu: Fast invalidation for TDP MMUBen Gardon3-3/+31
Provide a real mechanism for fast invalidation by marking roots as invalid so that their reference count will quickly fall to zero and they will be torn down. One negative side affect of this approach is that a vCPU thread will likely drop the last reference to a root and be saddled with the work of tearing down an entire paging structure. This issue will be resolved in a later commit. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-13-bgardon@google.com> [Move the loop to tdp_mmu.c, otherwise compilation fails on 32-bit. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-19KVM: x86/mmu: Allow enabling/disabling dirty logging under MMU read lockBen Gardon2-17/+61
To reduce lock contention and interference with page fault handlers, allow the TDP MMU functions which enable and disable dirty logging to operate under the MMU read lock. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-12-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-19KVM: x86/mmu: Allow zapping collapsible SPTEs to use MMU read lockBen Gardon2-8/+23
To reduce the impact of disabling dirty logging, change the TDP MMU function which zaps collapsible SPTEs to run under the MMU read lock. This way, page faults on zapped SPTEs can proceed in parallel with kvm_mmu_zap_collapsible_sptes. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-11-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-19KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lockBen Gardon3-45/+101
To reduce lock contention and interference with page fault handlers, allow the TDP MMU function to zap a GFN range to operate under the MMU read lock. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-10-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-19KVM: x86/mmu: Protect the tdp_mmu_roots list with RCUBen Gardon1-30/+39
Protect the contents of the TDP MMU roots list with RCU in preparation for a future patch which will allow the iterator macro to be used under the MMU lock in read mode. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-9-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-19KVM: x86/mmu: handle cmpxchg failure in kvm_tdp_mmu_get_rootBen Gardon2-15/+9
To reduce dependence on the MMU write lock, don't rely on the assumption that the atomic operation in kvm_tdp_mmu_get_root will always succeed. By not relying on that assumption, threads do not need to hold the MMU lock in write mode in order to take a reference on a TDP MMU root. In the root iterator, this change means that some roots might have to be skipped if they are found to have a zero refcount. This will still never happen as of this patch, but a future patch will need that flexibility to make the root iterator safe under the MMU read lock. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-8-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-19KVM: x86/mmu: Make TDP MMU root refcount atomicBen Gardon3-6/+14
In order to parallelize more operations for the TDP MMU, make the refcount on TDP MMU roots atomic, so that a future patch can allow multiple threads to take a reference on the root concurrently, while holding the MMU lock in read mode. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-7-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-19KVM: x86/mmu: Refactor yield safe root iteratorBen Gardon1-19/+26
Refactor the yield safe TDP MMU root iterator to be more amenable to changes in future commits which will allow it to be used under the MMU lock in read mode. Currently the iterator requires a complicated dance between the helper functions and different parts of the for loop which makes it hard to reason about. Moving all the logic into a single function simplifies the iterator substantially. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-6-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-19KVM: x86/mmu: Merge TDP MMU put and free rootBen Gardon3-40/+28
kvm_tdp_mmu_put_root and kvm_tdp_mmu_free_root are always called together, so merge the functions to simplify TDP MMU root refcounting / freeing. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-5-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-19KVM: x86/mmu: use tdp_mmu_free_sp to free rootsBen Gardon1-8/+7
Minor cleanup to deduplicate the code used to free a struct kvm_mmu_page in the TDP MMU. No functional change intended. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-4-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-19KVM: x86/mmu: Move kvm_mmu_(get|put)_root to TDP MMUBen Gardon4-25/+25
The TDP MMU is almost the only user of kvm_mmu_get_root and kvm_mmu_put_root. There is only one use of put_root in mmu.c for the legacy / shadow MMU. Open code that one use and move the get / put functions to the TDP MMU so they can be extended in future commits. No functional change intended. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-3-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-19KVM: x86/mmu: Re-add const qualifier in kvm_tdp_mmu_zap_collapsible_sptesBen Gardon4-10/+13
kvm_tdp_mmu_zap_collapsible_sptes unnecessarily removes the const qualifier from its memlsot argument, leading to a compiler warning. Add the const annotation and pass it to subsequent functions. Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210401233736.638171-2-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-19KVM: x86/mmu: Allow yielding during MMU notifier unmap/zap, if possibleSean Christopherson1-1/+5
Let the TDP MMU yield when unmapping a range in response to a MMU notification, if yielding is allowed by said notification. There is no reason to disallow yielding in this case, and in theory the range being invalidated could be quite large. Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210402005658.3024832-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: Move x86's MMU notifier memslot walkers to generic codeSean Christopherson3-241/+135
Move the hva->gfn lookup for MMU notifiers into common code. Every arch does a similar lookup, and some arch code is all but identical across multiple architectures. In addition to consolidating code, this will allow introducing optimizations that will benefit all architectures without incurring multiple walks of the memslots, e.g. by taking mmu_lock if and only if a relevant range exists in the memslots. The use of __always_inline to avoid indirect call retpolines, as done by x86, may also benefit other architectures. Consolidating the lookups also fixes a wart in x86, where the legacy MMU and TDP MMU each do their own memslot walks. Lastly, future enhancements to the memslot implementation, e.g. to add an interval tree to track host address, will need to touch far less arch specific code. MIPS, PPC, and arm64 will be converted one at a time in future patches. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210402005658.3024832-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: constify kvm_arch_flush_remote_tlbs_memslotPaolo Bonzini1-1/+1
memslots are stored in RCU and there should be no need to change them. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: MMU: protect TDP MMU pages only down to required levelPaolo Bonzini1-1/+1
When using manual protection of dirty pages, it is not necessary to protect nested page tables down to the 4K level; instead KVM can protect only hugepages in order to split them lazily, and delay write protection at 4K-granularity until KVM_CLEAR_DIRTY_LOG. This was overlooked in the TDP MMU, so do it there as well. Fixes: a6a0b05da9f37 ("kvm: x86/mmu: Support dirty logging for the TDP MMU") Cc: Ben Gardon <bgardon@google.com> Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86: implement KVM_CAP_SET_GUEST_DEBUG2Maxim Levitsky1-0/+2
Store the supported bits into KVM_GUESTDBG_VALID_MASK macro, similar to how arm does this. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401135451.1004564-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86: pending exceptions must not be blocked by an injected eventMaxim Levitsky2-3/+15
Injected interrupts/nmi should not block a pending exception, but rather be either lost if nested hypervisor doesn't intercept the pending exception (as in stock x86), or be delivered in exitintinfo/IDT_VECTORING_INFO field, as a part of a VMexit that corresponds to the pending exception. The only reason for an exception to be blocked is when nested run is pending (and that can't really happen currently but still worth checking for). Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401143817.1030695-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: nSVM: call nested_svm_load_cr3 on nested state loadMaxim Levitsky1-18/+22
While KVM's MMU should be fully reset by loading of nested CR0/CR3/CR4 by KVM_SET_SREGS, we are not in nested mode yet when we do it and therefore only root_mmu is reset. On regular nested entries we call nested_svm_load_cr3 which both updates the guest's CR3 in the MMU when it is needed, and it also initializes the mmu again which makes it initialize the walk_mmu as well when nested paging is enabled in both host and guest. Since we don't call nested_svm_load_cr3 on nested state load, the walk_mmu can be left uninitialized, which can lead to a NULL pointer dereference while accessing it if we happen to get a nested page fault right after entering the nested guest first time after the migration and we decide to emulate it, which leads to the emulator trying to access walk_mmu->gva_to_gpa which is NULL. Therefore we should call this function on nested state load as well. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401141814.1029036-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86: dump_vmcs should include the autoload/autostore MSR listsDavid Edmondson1-0/+16
When dumping the current VMCS state, include the MSRs that are being automatically loaded/stored during VM entry/exit. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20210318120841.133123-6-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86: dump_vmcs should show the effective EFERDavid Edmondson2-5/+17
If EFER is not being loaded from the VMCS, show the effective value by reference to the MSR autoload list or calculation. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20210318120841.133123-5-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86: dump_vmcs should consider only the load controls of EFER/PATDavid Edmondson1-4/+2
When deciding whether to dump the GUEST_IA32_EFER and GUEST_IA32_PAT fields of the VMCS, examine only the VM entry load controls, as saving on VM exit has no effect on whether VM entry succeeds or fails. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20210318120841.133123-4-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86: dump_vmcs should not conflate EFER and PAT presence in VMCSDavid Edmondson1-9/+10
Show EFER and PAT based on their individual entry/exit controls. Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20210318120841.133123-3-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86: dump_vmcs should not assume GUEST_IA32_EFER is validDavid Edmondson1-6/+3
If the VM entry/exit controls for loading/saving MSR_EFER are either not available (an older processor or explicitly disabled) or not used (host and guest values are the same), reading GUEST_IA32_EFER from the VMCS returns an inaccurate value. Because of this, in dump_vmcs() don't use GUEST_IA32_EFER to decide whether to print the PDPTRs - always do so if the fields exist. Fixes: 4eb64dce8d0a ("KVM: x86: dump VMCS on invalid entry") Signed-off-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <20210318120841.133123-2-david.edmondson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: nSVM: improve SYSENTER emulation on AMDMaxim Levitsky2-37/+68
Currently to support Intel->AMD migration, if CPU vendor is GenuineIntel, we emulate the full 64 value for MSR_IA32_SYSENTER_{EIP|ESP} msrs, and we also emulate the sysenter/sysexit instruction in long mode. (Emulator does still refuse to emulate sysenter in 64 bit mode, on the ground that the code for that wasn't tested and likely has no users) However when virtual vmload/vmsave is enabled, the vmload instruction will update these 32 bit msrs without triggering their msr intercept, which will lead to having stale values in kvm's shadow copy of these msrs, which relies on the intercept to be up to date. Fix/optimize this by doing the following: 1. Enable the MSR intercepts for SYSENTER MSRs iff vendor=GenuineIntel (This is both a tiny optimization and also ensures that in case the guest cpu vendor is AMD, the msrs will be 32 bit wide as AMD defined). 2. Store only high 32 bit part of these msrs on interception and combine it with hardware msr value on intercepted read/writes iff vendor=GenuineIntel. 3. Disable vmload/vmsave virtualization if vendor=GenuineIntel. (It is somewhat insane to set vendor=GenuineIntel and still enable SVM for the guest but well whatever). Then zero the high 32 bit parts when kvm intercepts and emulates vmload. Thanks a lot to Paulo Bonzini for helping me with fixing this in the most correct way. This patch fixes nested migration of 32 bit nested guests, that was broken because incorrect cached values of SYSENTER msrs were stored in the migration stream if L1 changed these msrs with vmload prior to L2 entry. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401111928.996871-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86: add guest_cpuid_is_intelMaxim Levitsky1-0/+8
This is similar to existing 'guest_cpuid_is_amd_or_hygon' Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401111928.996871-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86: Account a variety of miscellaneous allocationsSean Christopherson3-6/+6
Switch to GFP_KERNEL_ACCOUNT for a handful of allocations that are clearly associated with a single task/VM. Note, there are a several SEV allocations that aren't accounted, but those can (hopefully) be fixed by using the local stack for memory. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210331023025.2485960-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: SVM: Do not allow SEV/SEV-ES initialization after vCPUs are createdSean Christopherson1-0/+3
Reject KVM_SEV_INIT and KVM_SEV_ES_INIT if they are attempted after one or more vCPUs have been created. KVM assumes a VM is tagged SEV/SEV-ES prior to vCPU creation, e.g. init_vmcb() needs to mark the VMCB as SEV enabled, and svm_create_vcpu() needs to allocate the VMSA. At best, creating vCPUs before SEV/SEV-ES init will lead to unexpected errors and/or behavior, and at worst it will crash the host, e.g. sev_launch_update_vmsa() will dereference a null svm->vmsa pointer. Fixes: 1654efcbc431 ("KVM: SVM: Add KVM_SEV_INIT command") Fixes: ad73109ae7ec ("KVM: SVM: Provide support to launch and run an SEV-ES guest") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210331031936.2495277-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: SVM: Do not set sev->es_active until KVM_SEV_ES_INIT completesSean Christopherson1-17/+12
Set sev->es_active only after the guts of KVM_SEV_ES_INIT succeeds. If the command fails, e.g. because SEV is already active or there are no available ASIDs, then es_active will be left set even though the VM is not fully SEV-ES capable. Refactor the code so that "es_active" is passed on the stack instead of being prematurely shoved into sev_info, both to avoid having to unwind sev_info and so that it's more obvious what actually consumes es_active in sev_guest_init() and its helpers. Fixes: ad73109ae7ec ("KVM: SVM: Provide support to launch and run an SEV-ES guest") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210331031936.2495277-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: SVM: Use online_vcpus, not created_vcpus, to iterate over vCPUsSean Christopherson1-2/+3
Use the kvm_for_each_vcpu() helper to iterate over vCPUs when encrypting VMSAs for SEV, which effectively switches to use online_vcpus instead of created_vcpus. This fixes a possible null-pointer dereference as created_vcpus does not guarantee a vCPU exists, since it is updated at the very beginning of KVM_CREATE_VCPU. created_vcpus exists to allow the bulk of vCPU creation to run in parallel, while still correctly restricting the max number of max vCPUs. Fixes: ad73109ae7ec ("KVM: SVM: Provide support to launch and run an SEV-ES guest") Cc: stable@vger.kernel.org Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210331031936.2495277-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86/mmu: Simplify code for aging SPTEs in TDP MMUSean Christopherson1-3/+2
Use a basic NOT+AND sequence to clear the Accessed bit in TDP MMU SPTEs, as opposed to the fancy ffs()+clear_bit() logic that was copied from the legacy MMU. The legacy MMU uses clear_bit() because it is operating on the SPTE itself, i.e. clearing needs to be atomic. The TDP MMU operates on a local variable that it later writes to the SPTE, and so doesn't need to be atomic or even resident in memory. Opportunistically drop unnecessary initialization of new_spte, it's guaranteed to be written before being accessed. Using NOT+AND instead of ffs()+clear_bit() reduces the sequence from: 0x0000000000058be6 <+134>: test %rax,%rax 0x0000000000058be9 <+137>: je 0x58bf4 <age_gfn_range+148> 0x0000000000058beb <+139>: test %rax,%rdi 0x0000000000058bee <+142>: je 0x58cdc <age_gfn_range+380> 0x0000000000058bf4 <+148>: mov %rdi,0x8(%rsp) 0x0000000000058bf9 <+153>: mov $0xffffffff,%edx 0x0000000000058bfe <+158>: bsf %eax,%edx 0x0000000000058c01 <+161>: movslq %edx,%rdx 0x0000000000058c04 <+164>: lock btr %rdx,0x8(%rsp) 0x0000000000058c0b <+171>: mov 0x8(%rsp),%r15 to: 0x0000000000058bdd <+125>: test %rax,%rax 0x0000000000058be0 <+128>: je 0x58beb <age_gfn_range+139> 0x0000000000058be2 <+130>: test %rax,%r8 0x0000000000058be5 <+133>: je 0x58cc0 <age_gfn_range+352> 0x0000000000058beb <+139>: not %rax 0x0000000000058bee <+142>: and %r8,%rax 0x0000000000058bf1 <+145>: mov %rax,%r15 thus eliminating several memory accesses, including a locked access. Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210331004942.2444916-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86/mmu: Remove spurious clearing of dirty bit from TDP MMU SPTESean Christopherson1-1/+0
Don't clear the dirty bit when aging a TDP MMU SPTE (in response to a MMU notifier event). Prematurely clearing the dirty bit could cause spurious PML updates if aging a page happened to coincide with dirty logging. Note, tdp_mmu_set_spte_no_acc_track() flows into __handle_changed_spte(), so the host PFN will be marked dirty, i.e. there is no potential for data corruption. Fixes: a6a0b05da9f3 ("kvm: x86/mmu: Support dirty logging for the TDP MMU") Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210331004942.2444916-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86/mmu: Drop trace_kvm_age_page() tracepointSean Christopherson2-3/+0
Remove x86's trace_kvm_age_page() tracepoint. It's mostly redundant with the common trace_kvm_age_hva() tracepoint, and if there is a need for the extra details, e.g. gfn, referenced, etc... those details should be added to the common tracepoint so that all architectures and MMUs benefit from the info. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-19-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86/mmu: Use leaf-only loop for walking TDP SPTEs when changing SPTESean Christopherson1-1/+1
Use the leaf-only TDP iterator when changing the SPTE in reaction to a MMU notifier. Practically speaking, this is a nop since the guts of the loop explicitly looks for 4k SPTEs, which are always leaf SPTEs. Switch the iterator to match age_gfn_range() and test_age_gfn() so that a future patch can consolidate the core iterating logic. No real functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86/mmu: Pass address space ID to TDP MMU root walkersSean Christopherson2-66/+44
Move the address space ID check that is performed when iterating over roots into the macro helpers to consolidate code. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86/mmu: Pass address space ID to __kvm_tdp_mmu_zap_gfn_range()Sean Christopherson3-18/+24
Pass the address space ID to TDP MMU's primary "zap gfn range" helper to allow the MMU notifier paths to iterate over memslots exactly once. Currently, both the legacy MMU and TDP MMU iterate over memslots when looking for an overlapping hva range, which can be quite costly if there are a large number of memslots. Add a "flush" parameter so that iterating over multiple address spaces in the caller will continue to do the right thing when yielding while a flush is pending from a previous address space. Note, this also has a functional change in the form of coalescing TLB flushes across multiple address spaces in kvm_zap_gfn_range(), and also optimizes the TDP MMU to utilize range-based flushing when running as L1 with Hyper-V enlightenments. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-6-seanjc@google.com> [Keep separate for loops to prepare for other incoming patches. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86/mmu: Coalesce TLB flushes across address spaces for gfn range zapSean Christopherson1-9/+8
Gather pending TLB flushes across both address spaces when zapping a given gfn range. This requires feeding "flush" back into subsequent calls, but on the plus side sets the stage for further batching between the legacy MMU and TDP MMU. It also allows refactoring the address space iteration to cover the legacy and TDP MMUs without introducing truly ugly code. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86/mmu: Coalesce TLB flushes when zapping collapsible SPTEsSean Christopherson3-9/+9
Gather pending TLB flushes across both the legacy and TDP MMUs when zapping collapsible SPTEs to avoid multiple flushes if both the legacy MMU (for nested guests) and TDP MMU have mappings for the memslot. Note, this also optimizes the TDP MMU to flush only the relevant range when running as L1 with Hyper-V enlightenments. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86/mmu: Move flushing for "slot" handlers to caller for legacy MMUSean Christopherson1-18/+19
Place the onus on the caller of slot_handle_*() to flush the TLB, rather than handling the flush in the helper, and rename parameters accordingly. This will allow future patches to coalesce flushes between address spaces and between the legacy and TDP MMUs. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86/mmu: Coalesce TDP MMU TLB flushes when zapping collapsible SPTEsSean Christopherson1-9/+13
When zapping collapsible SPTEs across multiple roots, gather pending flushes and perform a single remote TLB flush at the end, as opposed to flushing after processing every root. Note, flush may be cleared by the result of zap_collapsible_spte_range(). This is intended and correct, e.g. yielding may have serviced a prior pending flush. Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17KVM: x86/vPMU: Forbid reading from MSR_F15H_PERF MSRs when guest doesn't ↵Vitaly Kuznetsov1-0/+6
have X86_FEATURE_PERFCTR_CORE MSR_F15H_PERF_CTL0-5, MSR_F15H_PERF_CTR0-5 MSRs have a CPUID bit assigned to them (X86_FEATURE_PERFCTR_CORE) and when it wasn't exposed to the guest the correct behavior is to inject #GP an not just return zero. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210329124804.170173-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>