summaryrefslogtreecommitdiff
path: root/arch/x86/kvm
AgeCommit message (Collapse)AuthorFilesLines
2022-12-13Merge tag 'x86_sgx_for_6.2' of ↵Linus Torvalds2-4/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 sgx updates from Dave Hansen: "The biggest deal in this series is support for a new hardware feature that allows enclaves to detect and mitigate single-stepping attacks. There's also a minor performance tweak and a little piece of the kmap_atomic() -> kmap_local() transition. Summary: - Introduce a new SGX feature (Asynchrounous Exit Notification) for bare-metal enclaves and KVM guests to mitigate single-step attacks - Increase batching to speed up enclave release - Replace kmap/kunmap_atomic() calls" * tag 'x86_sgx_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sgx: Replace kmap/kunmap_atomic() calls KVM/VMX: Allow exposing EDECCSSA user leaf function to KVM guest x86/sgx: Allow enclaves to use Asynchrounous Exit Notification x86/sgx: Reduce delay and interference of enclave release
2022-11-30KVM: x86: fix uninitialized variable use on KVM_REQ_TRIPLE_FAULTPaolo Bonzini1-1/+1
If a triple fault was fixed by kvm_x86_ops.nested_ops->triple_fault (by turning it into a vmexit), there is no need to leave vcpu_enter_guest(). Any vcpu->requests will be caught later before the actual vmentry, and in fact vcpu_enter_guest() was not initializing the "r" variable. Depending on the compiler's whims, this could cause the x86_64/triple_fault_event_test test to fail. Cc: Maxim Levitsky <mlevitsk@redhat.com> Fixes: 92e7d5c83aff ("KVM: x86: allow L1 to not intercept triple fault") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-27Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds6-42/+64
Pull kvm fixes from Paolo Bonzini: "x86: - Fixes for Xen emulation. While nobody should be enabling it in the kernel (the only public users of the feature are the selftests), the bug effectively allows userspace to read arbitrary memory. - Correctness fixes for nested hypervisors that do not intercept INIT or SHUTDOWN on AMD; the subsequent CPU reset can cause a use-after-free when it disables virtualization extensions. While downgrading the panic to a WARN is quite easy, the full fix is a bit more laborious; there are also tests. This is the bulk of the pull request. - Fix race condition due to incorrect mmu_lock use around make_mmu_pages_available(). Generic: - Obey changes to the kvm.halt_poll_ns module parameter in VMs not using KVM_CAP_HALT_POLL, restoring behavior from before the introduction of the capability" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: Update gfn_to_pfn_cache khva when it moves within the same page KVM: x86/xen: Only do in-kernel acceleration of hypercalls for guest CPL0 KVM: x86/xen: Validate port number in SCHEDOP_poll KVM: x86/mmu: Fix race condition in direct_page_fault KVM: x86: remove exit_int_info warning in svm_handle_exit KVM: selftests: add svm part to triple_fault_test KVM: x86: allow L1 to not intercept triple fault kvm: selftests: add svm nested shutdown test KVM: selftests: move idt_entry to header KVM: x86: forcibly leave nested mode on vCPU reset KVM: x86: add kvm_leave_nested KVM: x86: nSVM: harden svm_free_nested against freeing vmcb02 while still in use KVM: x86: nSVM: leave nested mode on vCPU free KVM: Obey kvm.halt_poll_ns in VMs not using KVM_CAP_HALT_POLL KVM: Avoid re-reading kvm->max_halt_poll_ns during halt-polling KVM: Cap vcpu->halt_poll_ns before halting rather than after
2022-11-24Merge branch 'kvm-dwmw2-fixes' into HEADPaolo Bonzini1-9/+23
This brings in a few important fixes for Xen emulation. While nobody should be enabling it, the bug effectively allows userspace to read arbitrary memory. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-24KVM: x86/xen: Only do in-kernel acceleration of hypercalls for guest CPL0David Woodhouse1-1/+11
There are almost no hypercalls which are valid from CPL > 0, and definitely none which are handled by the kernel. Fixes: 2fd6df2f2b47 ("KVM: x86/xen: intercept EVTCHNOP_send from guests") Reported-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Sean Christopherson <seanjc@google.com> Cc: stable@kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-24KVM: x86/xen: Validate port number in SCHEDOP_pollDavid Woodhouse1-8/+12
We shouldn't allow guests to poll on arbitrary port numbers off the end of the event channel table. Fixes: 1a65105a5aba ("KVM: x86/xen: handle PV spinlocks slowpath") [dwmw2: my bug though; the original version did check the validity as a side-effect of an idr_find() which I ripped out in refactoring.] Reported-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Sean Christopherson <seanjc@google.com> Cc: stable@kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-24KVM: x86/mmu: Fix race condition in direct_page_faultKazuki Takiguchi1-6/+7
make_mmu_pages_available() must be called with mmu_lock held for write. However, if the TDP MMU is used, it will be called with mmu_lock held for read. This function does nothing unless shadow pages are used, so there is no race unless nested TDP is used. Since nested TDP uses shadow pages, old shadow pages may be zapped by this function even when the TDP MMU is enabled. Since shadow pages are never allocated by kvm_tdp_mmu_map(), a race condition can be avoided by not calling make_mmu_pages_available() if the TDP MMU is currently in use. I encountered this when repeatedly starting and stopping nested VM. It can be artificially caused by allocating a large number of nested TDP SPTEs. For example, the following BUG and general protection fault are caused in the host kernel. pte_list_remove: 00000000cd54fc10 many->many ------------[ cut here ]------------ kernel BUG at arch/x86/kvm/mmu/mmu.c:963! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:pte_list_remove.cold+0x16/0x48 [kvm] Call Trace: <TASK> drop_spte+0xe0/0x180 [kvm] mmu_page_zap_pte+0x4f/0x140 [kvm] __kvm_mmu_prepare_zap_page+0x62/0x3e0 [kvm] kvm_mmu_zap_oldest_mmu_pages+0x7d/0xf0 [kvm] direct_page_fault+0x3cb/0x9b0 [kvm] kvm_tdp_page_fault+0x2c/0xa0 [kvm] kvm_mmu_page_fault+0x207/0x930 [kvm] npf_interception+0x47/0xb0 [kvm_amd] svm_invoke_exit_handler+0x13c/0x1a0 [kvm_amd] svm_handle_exit+0xfc/0x2c0 [kvm_amd] kvm_arch_vcpu_ioctl_run+0xa79/0x1780 [kvm] kvm_vcpu_ioctl+0x29b/0x6f0 [kvm] __x64_sys_ioctl+0x95/0xd0 do_syscall_64+0x5c/0x90 general protection fault, probably for non-canonical address 0xdead000000000122: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:kvm_mmu_commit_zap_page.part.0+0x4b/0xe0 [kvm] Call Trace: <TASK> kvm_mmu_zap_oldest_mmu_pages+0xae/0xf0 [kvm] direct_page_fault+0x3cb/0x9b0 [kvm] kvm_tdp_page_fault+0x2c/0xa0 [kvm] kvm_mmu_page_fault+0x207/0x930 [kvm] npf_interception+0x47/0xb0 [kvm_amd] CVE: CVE-2022-45869 Fixes: a2855afc7ee8 ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU") Signed-off-by: Kazuki Takiguchi <takiguchi.kazuki171@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-17Merge branch 'kvm-svm-harden' into HEADPaolo Bonzini4-27/+34
This fixes three issues in nested SVM: 1) in the shutdown_interception() vmexit handler we call kvm_vcpu_reset(). However, if running nested and L1 doesn't intercept shutdown, the function resets vcpu->arch.hflags without properly leaving the nested state. This leaves the vCPU in inconsistent state and later triggers a kernel panic in SVM code. The same bug can likely be triggered by sending INIT via local apic to a vCPU which runs a nested guest. On VMX we are lucky that the issue can't happen because VMX always intercepts triple faults, thus triple fault in L2 will always be redirected to L1. Plus, handle_triple_fault() doesn't reset the vCPU. INIT IPI can't happen on VMX either because INIT events are masked while in VMX mode. Secondarily, KVM doesn't honour SHUTDOWN intercept bit of L1 on SVM. A normal hypervisor should always intercept SHUTDOWN, a unit test on the other hand might want to not do so. Finally, the guest can trigger a kernel non rate limited printk on SVM from the guest, which is fixed as well. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-17KVM: x86: remove exit_int_info warning in svm_handle_exitMaxim Levitsky1-15/+0
It is valid to receive external interrupt and have broken IDT entry, which will lead to #GP with exit_int_into that will contain the index of the IDT entry (e.g any value). Other exceptions can happen as well, like #NP or #SS (if stack switch fails). Thus this warning can be user triggred and has very little value. Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221103141351.50662-10-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-17KVM: x86: allow L1 to not intercept triple faultMaxim Levitsky3-5/+13
This is SVM correctness fix - although a sane L1 would intercept SHUTDOWN event, it doesn't have to, so we have to honour this. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221103141351.50662-8-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-17KVM: x86: forcibly leave nested mode on vCPU resetMaxim Levitsky1-0/+10
While not obivous, kvm_vcpu_reset() leaves the nested mode by clearing 'vcpu->arch.hflags' but it does so without all the required housekeeping. On SVM, it is possible to have a vCPU reset while in guest mode because unlike VMX, on SVM, INIT's are not latched in SVM non root mode and in addition to that L1 doesn't have to intercept triple fault, which should also trigger L1's reset if happens in L2 while L1 didn't intercept it. If one of the above conditions happen, KVM will continue to use vmcb02 while not having in the guest mode. Later the IA32_EFER will be cleared which will lead to freeing of the nested guest state which will (correctly) free the vmcb02, but since KVM still uses it (incorrectly) this will lead to a use after free and kernel crash. This issue is assigned CVE-2022-3344 Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221103141351.50662-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-17KVM: x86: add kvm_leave_nestedMaxim Levitsky3-7/+7
add kvm_leave_nested which wraps a call to nested_ops->leave_nested into a function. Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221103141351.50662-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-17KVM: x86: nSVM: harden svm_free_nested against freeing vmcb02 while still in useMaxim Levitsky1-0/+3
Make sure that KVM uses vmcb01 before freeing nested state, and warn if that is not the case. This is a minimal fix for CVE-2022-3344 making the kernel print a warning instead of a kernel panic. Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221103141351.50662-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-17KVM: x86: nSVM: leave nested mode on vCPU freeMaxim Levitsky1-0/+1
If the VM was terminated while nested, we free the nested state while the vCPU still is in nested mode. Soon a warning will be added for this condition. Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221103141351.50662-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-15x86/cpu: Restore AMD's DE_CFG MSR after resumeBorislav Petkov2-6/+6
DE_CFG contains the LFENCE serializing bit, restore it on resume too. This is relevant to older families due to the way how they do S3. Unify and correct naming while at it. Fixes: e4d0e84e4907 ("x86/cpu/AMD: Make LFENCE a serializing instruction") Reported-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Reported-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-11-11KVM: x86/mmu: Block all page faults during kvm_zap_gfn_range()Sean Christopherson1-2/+2
When zapping a GFN range, pass 0 => ALL_ONES for the to-be-invalidated range to effectively block all page faults while the zap is in-progress. The invalidation helpers take a host virtual address, whereas zapping a GFN obviously provides a guest physical address and with the wrong unit of measurement (frame vs. byte). Alternatively, KVM could walk all memslots to get the associated HVAs, but thanks to SMM, that would require multiple lookups. And practically speaking, kvm_zap_gfn_range() usage is quite rare and not a hot path, e.g. MTRR and CR0.CD are almost guaranteed to be done only on vCPU0 during boot, and APICv inhibits are similarly infrequent operations. Fixes: edb298c663fc ("KVM: x86/mmu: bump mmu notifier count in kvm_zap_gfn_range") Reported-by: Chao Peng <chao.p.peng@linux.intel.com> Cc: stable@vger.kernel.org Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221111001841.2412598-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09KVM: x86/pmu: Limit the maximum number of supported AMD GP countersLike Xu2-3/+7
The AMD PerfMonV2 specification allows for a maximum of 16 GP counters, but currently only 6 pairs of MSRs are accepted by KVM. While AMD64_NUM_COUNTERS_CORE is already equal to 6, increasing without adjusting msrs_to_save_all[] could result in out-of-bounds accesses. Therefore introduce a macro (named KVM_AMD_PMC_MAX_GENERIC) to refer to the number of counters supported by KVM. Signed-off-by: Like Xu <likexu@tencent.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20220919091008.60695-3-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09KVM: x86/pmu: Limit the maximum number of supported Intel GP countersLike Xu3-8/+10
The Intel Architectural IA32_PMCx MSRs addresses range allows for a maximum of 8 GP counters, and KVM cannot address any more. Introduce a local macro (named KVM_INTEL_PMC_MAX_GENERIC) and use it consistently to refer to the number of counters supported by KVM, thus avoiding possible out-of-bound accesses. Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Like Xu <likexu@tencent.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20220919091008.60695-2-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09KVM: x86/pmu: Do not speculatively query Intel GP PMCs that don't exist yetLike Xu1-12/+2
The SDM lists an architectural MSR IA32_CORE_CAPABILITIES (0xCF) that limits the theoretical maximum value of the Intel GP PMC MSRs allocated at 0xC1 to 14; likewise the Intel April 2022 SDM adds IA32_OVERCLOCKING_STATUS at 0x195 which limits the number of event selection MSRs to 15 (0x186-0x194). Limiting the maximum number of counters to 14 or 18 based on the currently allocated MSRs is clearly fragile, and it seems likely that Intel will even place PMCs 8-15 at a completely different range of MSR indices. So stop at the maximum number of GP PMCs supported today on Intel processors. There are some machines, like Intel P4 with non Architectural PMU, that may indeed have 18 counters, but those counters are in a completely different MSR address range and are not supported by KVM. Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: stable@vger.kernel.org Fixes: cf05a67b68b8 ("KVM: x86: omit "impossible" pmu MSRs from MSR list") Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Like Xu <likexu@tencent.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20220919091008.60695-1-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09KVM: SVM: Only dump VMSA to klog at KERN_DEBUG levelPeter Gonda1-1/+1
Explicitly print the VMSA dump at KERN_DEBUG log level, KERN_CONT uses KERNEL_DEFAULT if the previous log line has a newline, i.e. if there's nothing to continuing, and as a result the VMSA gets dumped when it shouldn't. The KERN_CONT documentation says it defaults back to KERNL_DEFAULT if the previous log line has a newline. So switch from KERN_CONT to print_hex_dump_debug(). Jarkko pointed this out in reference to the original patch. See: https://lore.kernel.org/all/YuPMeWX4uuR1Tz3M@kernel.org/ print_hex_dump(KERN_DEBUG, ...) was pointed out there, but print_hex_dump_debug() should similar. Fixes: 6fac42f127b8 ("KVM: SVM: Dump Virtual Machine Save Area (VMSA) to klog") Signed-off-by: Peter Gonda <pgonda@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Harald Hoyer <harald@profian.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Message-Id: <20221104142220.469452-1-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09x86, KVM: remove unnecessary argument to x86_virt_spec_ctrl and callersPaolo Bonzini1-2/+2
x86_virt_spec_ctrl only deals with the paravirtualized MSR_IA32_VIRT_SPEC_CTRL now and does not handle MSR_IA32_SPEC_CTRL anymore; remove the corresponding, unused argument. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09KVM: SVM: move MSR_IA32_SPEC_CTRL save/restore to assemblyPaolo Bonzini4-28/+133
Restoration of the host IA32_SPEC_CTRL value is probably too late with respect to the return thunk training sequence. With respect to the user/kernel boundary, AMD says, "If software chooses to toggle STIBP (e.g., set STIBP on kernel entry, and clear it on kernel exit), software should set STIBP to 1 before executing the return thunk training sequence." I assume the same requirements apply to the guest/host boundary. The return thunk training sequence is in vmenter.S, quite close to the VM-exit. On hosts without V_SPEC_CTRL, however, the host's IA32_SPEC_CTRL value is not restored until much later. To avoid this, move the restoration of host SPEC_CTRL to assembly and, for consistency, move the restoration of the guest SPEC_CTRL as well. This is not particularly difficult, apart from some care to cover both 32- and 64-bit, and to share code between SEV-ES and normal vmentry. Cc: stable@vger.kernel.org Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk") Suggested-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09KVM: SVM: restore host save area from assemblyPaolo Bonzini5-13/+26
Allow access to the percpu area via the GS segment base, which is needed in order to access the saved host spec_ctrl value. In linux-next FILL_RETURN_BUFFER also needs to access percpu data. For simplicity, the physical address of the save area is added to struct svm_cpu_data. Cc: stable@vger.kernel.org Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk") Reported-by: Nathan Chancellor <nathan@kernel.org> Analyzed-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09KVM: SVM: move guest vmsave/vmload back to assemblyPaolo Bonzini3-20/+39
It is error-prone that code after vmexit cannot access percpu data because GSBASE has not been restored yet. It forces MSR_IA32_SPEC_CTRL save/restore to happen very late, after the predictor untraining sequence, and it gets in the way of return stack depth tracking (a retbleed mitigation that is in linux-next as of 2022-11-09). As a first step towards fixing that, move the VMCB VMSAVE/VMLOAD to assembly, essentially undoing commit fb0c4a4fee5a ("KVM: SVM: move VMLOAD/VMSAVE to C code", 2021-03-15). The reason for that commit was that it made it simpler to use a different VMCB for VMLOAD/VMSAVE versus VMRUN; but that is not a big hassle anymore thanks to the kvm-asm-offsets machinery and other related cleanups. The idea on how to number the exception tables is stolen from a prototype patch by Peter Zijlstra. Cc: stable@vger.kernel.org Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk") Link: <https://lore.kernel.org/all/f571e404-e625-bae1-10e9-449b2eb4cbd8@citrix.com/> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09KVM: SVM: do not allocate struct svm_cpu_data dynamicallyPaolo Bonzini3-29/+18
The svm_data percpu variable is a pointer, but it is allocated via svm_hardware_setup() when KVM is loaded. Unlike hardware_enable() this means that it is never NULL for the whole lifetime of KVM, and static allocation does not waste any memory compared to the status quo. It is also more efficient and more easily handled from assembly code, so do it and don't look back. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09KVM: SVM: remove dead field from struct svm_cpu_dataPaolo Bonzini2-3/+0
The "cpu" field of struct svm_cpu_data has been write-only since commit 4b656b120249 ("KVM: SVM: force new asid on vcpu migration", 2009-08-05). Remove it. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09KVM: SVM: remove unused field from struct vcpu_svmPaolo Bonzini1-1/+0
The pointer to svm_cpu_data in struct vcpu_svm looks interesting from the point of view of accessing it after vmexit, when the GSBASE is still containing the guest value. However, despite existing since the very first commit of drivers/kvm/svm.c (commit 6aa8b732ca01, "[PATCH] kvm: userspace interface", 2006-12-10), it was never set to anything. Ignore the opportunity to fix a 16 year old "bug" and delete it; doing things the "harder" way makes it possible to remove more old cruft. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09KVM: SVM: retrieve VMCB from assemblyPaolo Bonzini4-15/+16
Continue moving accesses to struct vcpu_svm to vmenter.S. Reducing the number of arguments limits the chance of mistakes due to different registers used for argument passing in 32- and 64-bit ABIs; pushing the VMCB argument and almost immediately popping it into a different register looks pretty weird. 32-bit ABI is not a concern for __svm_sev_es_vcpu_run() which is 64-bit only; however, it will soon need @svm to save/restore SPEC_CTRL so stay consistent with __svm_vcpu_run() and let them share the same prototype. No functional change intended. Cc: stable@vger.kernel.org Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk") Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09KVM: SVM: adjust register allocation for __svm_vcpu_run()Paolo Bonzini1-19/+19
32-bit ABI uses RAX/RCX/RDX as its argument registers, so they are in the way of instructions that hardcode their operands such as RDMSR/WRMSR or VMLOAD/VMRUN/VMSAVE. In preparation for moving vmload/vmsave to __svm_vcpu_run(), keep the pointer to the struct vcpu_svm in %rdi. In particular, it is now possible to load svm->vmcb01.pa in %rax without clobbering the struct vcpu_svm pointer. No functional change intended. Cc: stable@vger.kernel.org Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk") Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09KVM: SVM: replace regs argument of __svm_vcpu_run() with vcpu_svmPaolo Bonzini5-20/+30
Since registers are reachable through vcpu_svm, and we will need to access more fields of that struct, pass it instead of the regs[] array. No functional change intended. Cc: stable@vger.kernel.org Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk") Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09KVM: x86: use a separate asm-offsets.c filePaolo Bonzini4-1/+30
This already removes an ugly #include "" from asm-offsets.c, but especially it avoids a future error when trying to define asm-offsets for KVM's svm/svm.h header. This would not work for kernel/asm-offsets.c, because svm/svm.h includes kvm_cache_regs.h which is not in the include path when compiling asm-offsets.c. The problem is not there if the .c file is in arch/x86/kvm. Suggested-by: Sean Christopherson <seanjc@google.com> Cc: stable@vger.kernel.org Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk") Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-05KVM/VMX: Allow exposing EDECCSSA user leaf function to KVM guestKai Huang2-1/+4
The new Asynchronous Exit (AEX) notification mechanism (AEX-notify) allows one enclave to receive a notification in the ERESUME after the enclave exit due to an AEX. EDECCSSA is a new SGX user leaf function (ENCLU[EDECCSSA]) to facilitate the AEX notification handling. The new EDECCSSA is enumerated via CPUID(EAX=0x12,ECX=0x0):EAX[11]. Besides Allowing reporting the new AEX-notify attribute to KVM guests, also allow reporting the new EDECCSSA user leaf function to KVM guests so the guest can fully utilize the AEX-notify mechanism. Similar to existing X86_FEATURE_SGX1 and X86_FEATURE_SGX2, introduce a new scattered X86_FEATURE_SGX_EDECCSSA bit for the new EDECCSSA, and report it in KVM's supported CPUIDs. Note, no additional KVM enabling is required to allow the guest to use EDECCSSA. It's impossible to trap ENCLU (without completely preventing the guest from using SGX). Advertise EDECCSSA as supported purely so that userspace doesn't need to special case EDECCSSA, i.e. doesn't need to manually check host CPUID. The inability to trap ENCLU also means that KVM can't prevent the guest from using EDECCSSA, but that virtualization hole is benign as far as KVM is concerned. EDECCSSA is simply a fancy way to modify internal enclave state. More background about how do AEX-notify and EDECCSSA work: SGX maintains a Current State Save Area Frame (CSSA) for each enclave thread. When AEX happens, the enclave thread context is saved to the CSSA and the CSSA is increased by 1. For a normal ERESUME which doesn't deliver AEX notification, it restores the saved thread context from the previously saved SSA and decreases the CSSA. If AEX-notify is enabled for one enclave, the ERESUME acts differently. Instead of restoring the saved thread context and decreasing the CSSA, it acts like EENTER which doesn't decrease the CSSA but establishes a clean slate thread context using the CSSA for the enclave to handle the notification. After some handling, the enclave must discard the "new-established" SSA and switch back to the previously saved SSA (upon AEX). Otherwise, the enclave will run out of SSA space upon further AEXs and eventually fail to run. To solve this problem, the new EDECCSSA essentially decreases the CSSA. It can be used by the enclave notification handler to switch back to the previous saved SSA when needed, i.e. after it handles the notification. Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Sean Christopherson <seanjc@google.com> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Link: https://lore.kernel.org/all/20221101022422.858944-1-kai.huang%40intel.com
2022-11-05x86/sgx: Allow enclaves to use Asynchrounous Exit NotificationDave Hansen1-3/+1
Short Version: Allow enclaves to use the new Asynchronous EXit (AEX) notification mechanism. This mechanism lets enclaves run a handler after an AEX event. These handlers can run mitigations for things like SGX-Step[1]. AEX Notify will be made available both on upcoming processors and on some older processors through microcode updates. Long Version: == SGX Attribute Background == The SGX architecture includes a list of SGX "attributes". These attributes ensure consistency and transparency around specific enclave features. As a simple example, the "DEBUG" attribute allows an enclave to be debugged, but also destroys virtually all of SGX security. Using attributes, enclaves can know that they are being debugged. Attributes also affect enclave attestation so an enclave can, for instance, be denied access to secrets while it is being debugged. The kernel keeps a list of known attributes and will only initialize enclaves that use a known set of attributes. This kernel policy eliminates the chance that a new SGX attribute could cause undesired effects. For example, imagine a new attribute was added called "PROVISIONKEY2" that provided similar functionality to "PROVISIIONKEY". A kernel policy that allowed indiscriminate use of unknown attributes and thus PROVISIONKEY2 would undermine the existing kernel policy which limits use of PROVISIONKEY enclaves. == AEX Notify Background == "Intel Architecture Instruction Set Extensions and Future Features - Version 45" is out[2]. There is a new chapter: Asynchronous Enclave Exit Notify and the EDECCSSA User Leaf Function. Enclaves exit can be either synchronous and consensual (EEXIT for instance) or asynchronous (on an interrupt or fault). The asynchronous ones can evidently be exploited to single step enclaves[1], on top of which other naughty things can be built. AEX Notify will be made available both on upcoming processors and on some older processors through microcode updates. == The Problem == These attacks are currently entirely opaque to the enclave since the hardware does the save/restore under the covers. The Asynchronous Enclave Exit Notify (AEX Notify) mechanism provides enclaves an ability to detect and mitigate potential exposure to these kinds of attacks. == The Solution == Define the new attribute value for AEX Notification. Ensure the attribute is cleared from the list reserved attributes. Instead of adding to the open-coded lists of individual attributes, add named lists of privileged (disallowed by default) and unprivileged (allowed by default) attributes. Add the AEX notify attribute as an unprivileged attribute, which will keep the kernel from rejecting enclaves with it set. 1. https://github.com/jovanbulck/sgx-step 2. https://cdrdv2.intel.com/v1/dl/getContent/671368?explicitVersion=true Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Haitao Huang <haitao.huang@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/all/20220720191347.1343986-1-dave.hansen%40linux.intel.com
2022-11-03KVM: x86: Fix a typo about the usage of kvcalloc()Liao Chang1-1/+1
Swap the 1st and 2nd arguments to be consistent with the usage of kvcalloc(). Fixes: c9b8fecddb5b ("KVM: use kvcalloc for array allocations") Signed-off-by: Liao Chang <liaochang1@huawei.com> Message-Id: <20221103011749.139262-1-liaochang1@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-03KVM: x86: Use SRCU to protect zap in __kvm_set_or_clear_apicv_inhibit()Ben Gardon1-0/+3
kvm_zap_gfn_range() must be called in an SRCU read-critical section, but there is no SRCU annotation in __kvm_set_or_clear_apicv_inhibit(). This can lead to the following warning via kvm_arch_vcpu_ioctl_set_guest_debug() if a Shadow MMU is in use (TDP MMU disabled or nesting): [ 1416.659809] ============================= [ 1416.659810] WARNING: suspicious RCU usage [ 1416.659839] 6.1.0-dbg-DEV #1 Tainted: G S I [ 1416.659853] ----------------------------- [ 1416.659854] include/linux/kvm_host.h:954 suspicious rcu_dereference_check() usage! [ 1416.659856] ... [ 1416.659904] dump_stack_lvl+0x84/0xaa [ 1416.659910] dump_stack+0x10/0x15 [ 1416.659913] lockdep_rcu_suspicious+0x11e/0x130 [ 1416.659919] kvm_zap_gfn_range+0x226/0x5e0 [ 1416.659926] ? kvm_make_all_cpus_request_except+0x18b/0x1e0 [ 1416.659935] __kvm_set_or_clear_apicv_inhibit+0xcc/0x100 [ 1416.659940] kvm_arch_vcpu_ioctl_set_guest_debug+0x350/0x390 [ 1416.659946] kvm_vcpu_ioctl+0x2fc/0x620 [ 1416.659955] __se_sys_ioctl+0x77/0xc0 [ 1416.659962] __x64_sys_ioctl+0x1d/0x20 [ 1416.659965] do_syscall_64+0x3d/0x80 [ 1416.659969] entry_SYSCALL_64_after_hwframe+0x63/0xcd Always take the KVM SRCU read lock in __kvm_set_or_clear_apicv_inhibit() to protect the GFN to memslot translation. The SRCU read lock is not technically required when no Shadow MMUs are in use, since the TDP MMU walks the paging structures from the roots and does not need to look up GFN translations in the memslots, but make the SRCU locking unconditional for simplicty. In most cases, the SRCU locking is taken care of in the vCPU run loop, but when called through other ioctls (such as KVM_SET_GUEST_DEBUG) there is no srcu_read_lock. Tested: ran tools/testing/selftests/kvm/x86_64/debug_regs on a DBG build. This patch causes the suspicious RCU warning to disappear. Note that the warning is hit in __kvm_zap_rmaps(), so kvm_memslots_have_rmaps() must return true in order for this to repro (i.e. the TDP MMU must be off or nesting in use.) Reported-by: Greg Thelen <gthelen@google.com> Fixes: 36222b117e36 ("KVM: x86: don't disable APICv memslot when inhibited") Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20221102205359.1260980-1-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-02KVM: VMX: Ignore guest CPUID for host userspace writes to DEBUGCTLSean Christopherson1-4/+6
Ignore guest CPUID for host userspace writes to the DEBUGCTL MSR, KVM's ABI is that setting CPUID vs. state can be done in any order, i.e. KVM allows userspace to stuff MSRs prior to setting the guest's CPUID that makes the new MSR "legal". Keep the vmx_get_perf_capabilities() check for guest writes, even though it's technically unnecessary since the vCPU's PERF_CAPABILITIES is consulted when refreshing LBR support. A future patch will clean up vmx_get_perf_capabilities() to avoid the RDMSR on every call, at which point the paranoia will incur no meaningful overhead. Note, prior to vmx_get_perf_capabilities() checking that the host fully supports LBRs via x86_perf_get_lbr(), KVM effectively relied on intel_pmu_lbr_is_enabled() to guard against host userspace enabling LBRs on platforms without full support. Fixes: c646236344e9 ("KVM: vmx/pmu: Add PMU_CAP_LBR_FMT check when guest LBR is enabled") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221006000314.73240-5-seanjc@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-02KVM: VMX: Fold vmx_supported_debugctl() into vcpu_supported_debugctl()Sean Christopherson2-20/+7
Fold vmx_supported_debugctl() into vcpu_supported_debugctl(), its only caller. Setting bits only to clear them a few instructions later is rather silly, and splitting the logic makes things seem more complicated than they actually are. Opportunistically drop DEBUGCTLMSR_LBR_MASK now that there's a single reference to the pair of bits. The extra layer of indirection provides no meaningful value and makes it unnecessarily tedious to understand what KVM is doing. No functional change. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221006000314.73240-4-seanjc@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-02KVM: VMX: Advertise PMU LBRs if and only if perf supports LBRsSean Christopherson1-1/+3
Advertise LBR support to userspace via MSR_IA32_PERF_CAPABILITIES if and only if perf fully supports LBRs. Perf may disable LBRs (by zeroing the number of LBRs) even on platforms the allegedly support LBRs, e.g. if probing any LBR MSRs during setup fails. Fixes: be635e34c284 ("KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIES") Reported-by: Like Xu <like.xu.linux@gmail.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221006000314.73240-3-seanjc@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-28KVM: x86/xen: Fix eventfd error handling in kvm_xen_eventfd_assign()Eiichi Tsukata1-3/+4
Should not call eventfd_ctx_put() in case of error. Fixes: 2fd6df2f2b47 ("KVM: x86/xen: intercept EVTCHNOP_send from guests") Reported-by: syzbot+6f0c896c5a9449a10ded@syzkaller.appspotmail.com Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com> Message-Id: <20221028092631.117438-1-eiichi.tsukata@nutanix.com> [Introduce new goto target instead. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-28KVM: x86: smm: number of GPRs in the SMRAM image depends on the image formatMaxim Levitsky1-2/+2
On 64 bit host, if the guest doesn't have X86_FEATURE_LM, KVM will access 16 gprs to 32-bit smram image, causing out-ouf-bound ram access. On 32 bit host, the rsm_load_state_64/enter_smm_save_state_64 is compiled out, thus access overflow can't happen. Fixes: b443183a25ab61 ("KVM: x86: Reduce the number of emulator GPRs to '8' for 32-bit KVM") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221025124741.228045-15-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-28KVM: x86: emulator: update the emulation mode after CR0 writeMaxim Levitsky1-1/+15
Update the emulation mode when handling writes to CR0, because toggling CR0.PE switches between Real and Protected Mode, and toggling CR0.PG when EFER.LME=1 switches between Long and Protected Mode. This is likely a benign bug because there is no writeback of state, other than the RIP increment, and when toggling CR0.PE, the CPU has to execute code from a very low memory address. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221025124741.228045-14-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-28KVM: x86: emulator: update the emulation mode after rsmMaxim Levitsky1-1/+1
Update the emulation mode after RSM so that RIP will be correctly written back, because the RSM instruction can switch the CPU mode from 32 bit (or less) to 64 bit. This fixes a guest crash in case the #SMI is received while the guest runs a code from an address > 32 bit. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221025124741.228045-13-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-28KVM: x86: emulator: introduce emulator_recalc_and_set_modeMaxim Levitsky1-28/+57
Some instructions update the cpu execution mode, which needs to update the emulation mode. Extract this code, and make assign_eip_far use it. assign_eip_far now reads CS, instead of getting it via a parameter, which is ok, because callers always assign CS to the same value before calling this function. No functional change is intended. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221025124741.228045-12-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-28KVM: x86: emulator: em_sysexit should update ctxt->modeMaxim Levitsky1-0/+1
SYSEXIT is one of the instructions that can change the processor mode, thus ctxt->mode should be updated after it. Note that this is likely a benign bug, because the only problematic mode change is from 32 bit to 64 bit which can lead to truncation of RIP, and it is not possible to do with sysexit, since sysexit running in 32 bit mode will be limited to 32 bit version. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221025124741.228045-11-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-27KVM: Initialize gfn_to_pfn_cache locks in dedicated helperMichal Luczaj2-32/+37
Move the gfn_to_pfn_cache lock initialization to another helper and call the new helper during VM/vCPU creation. There are race conditions possible due to kvm_gfn_to_pfn_cache_init()'s ability to re-initialize the cache's locks. For example: a race between ioctl(KVM_XEN_HVM_EVTCHN_SEND) and kvm_gfn_to_pfn_cache_init() leads to a corrupted shinfo gpc lock. (thread 1) | (thread 2) | kvm_xen_set_evtchn_fast | read_lock_irqsave(&gpc->lock, ...) | | kvm_gfn_to_pfn_cache_init | rwlock_init(&gpc->lock) read_unlock_irqrestore(&gpc->lock, ...) | Rename "cache_init" and "cache_destroy" to activate+deactivate to avoid implying that the cache really is destroyed/freed. Note, there more races in the newly named kvm_gpc_activate() that will be addressed separately. Fixes: 982ed0de4753 ("KVM: Reinstate gfn_to_pfn_cache with invalidation support") Cc: stable@vger.kernel.org Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> [sean: call out that this is a bug fix] Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221013211234.1318131-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-27KVM: VMX: fully disable SGX if SECONDARY_EXEC_ENCLS_EXITING unavailableEmanuele Giuseppe Esposito1-0/+5
Clear enable_sgx if ENCLS-exiting is not supported, i.e. if SGX cannot be virtualized. When KVM is loaded, adjust_vmx_controls checks that the bit is available before enabling the feature; however, other parts of the code check enable_sgx and not clearing the variable caused two different bugs, mostly affecting nested virtualization scenarios. First, because enable_sgx remained true, SECONDARY_EXEC_ENCLS_EXITING would be marked available in the capability MSR that are accessed by a nested hypervisor. KVM would then propagate the control from vmcs12 to vmcs02 even if it isn't supported by the processor, thus causing an unexpected VM-Fail (exit code 0x7) in L1. Second, vmx_set_cpu_caps() would not clear the SGX bits when hardware support is unavailable. This is a much less problematic bug as it only happens if SGX is soft-disabled (available in the processor but hidden in CPUID) or if SGX is supported for bare metal but not in the VMCS (will never happen when running on bare metal, but can theoertically happen when running in a VM). Last but not least, this ensures that module params in sysfs reflect KVM's actual configuration. RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=2127128 Fixes: 72add915fbd5 ("KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC") Cc: stable@vger.kernel.org Suggested-by: Sean Christopherson <seanjc@google.com> Suggested-by: Bandan Das <bsd@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20221025123749.2201649-1-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-27KVM: x86: Exempt pending triple fault from event injection sanity checkSean Christopherson1-1/+14
Exempt pending triple faults, a.k.a. KVM_REQ_TRIPLE_FAULT, when asserting that KVM didn't attempt to queue a new exception during event injection. KVM needs to emulate the injection itself when emulating Real Mode due to lack of unrestricted guest support (VMX) and will queue a triple fault if that emulation fails. Ideally the assertion would more precisely filter out the emulated Real Mode triple fault case, but rmode.vm86_active is buried in vcpu_vmx and can't be queried without a new kvm_x86_ops. And unlike "regular" exceptions, triple fault cannot put the vCPU into an infinite loop; the triple fault will force either an exit to userspace or a nested VM-Exit, and triple fault after nested VM-Exit will force an exit to userspace. I.e. there is no functional issue, so just suppress the warning for triple faults. Opportunistically convert the warning to a one-time thing, when it fires, it fires _a lot_, and is usually user triggerable, i.e. can be used to spam the kernel log. Fixes: 7055fb113116 ("KVM: x86: Treat pending TRIPLE_FAULT requests as pending exceptions") Reported-by: kernel test robot <yujie.liu@intel.com> Link: https://lore.kernel.org/r/202209301338.aca913c3-yujie.liu@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220930230008.1636044-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-27KVM: x86: Reduce refcount if single_open() fails in kvm_mmu_rmaps_stat_open()Hou Wenlong1-1/+6
Refcount is increased before calling single_open() in kvm_mmu_rmaps_stat_open(), If single_open() fails, refcount should be restored, otherwise the vm couldn't be destroyed. Fixes: 3bcd0662d66fd ("KVM: X86: Introduce mmu_rmaps_stat per-vm debugfs file") Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Message-Id: <a75900413bb8b1e556be690e9588a0f92e946a30.1665733883.git.houwenlong.hwl@antgroup.com> [Preserved return value of single_open. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-27KVM: x86: Mask off reserved bits in CPUID.8000001FHJim Mattson1-1/+2
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM actually supports. CPUID.8000001FH:EBX[31:16] are reserved bits and should be masked off. Fixes: 8765d75329a3 ("KVM: X86: Extend CPUID range to include new leaf") Signed-off-by: Jim Mattson <jmattson@google.com> Message-Id: <20220929225203.2234702-6-jmattson@google.com> Cc: stable@vger.kernel.org [Clear NumVMPL too. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-22KVM: x86: Mask off reserved bits in CPUID.8000001AHJim Mattson1-0/+3
KVM_GET_SUPPORTED_CPUID should only enumerate features that KVM actually supports. In the case of CPUID.8000001AH, only three bits are currently defined. The 125 reserved bits should be masked off. Fixes: 24c82e576b78 ("KVM: Sanitize cpuid") Signed-off-by: Jim Mattson <jmattson@google.com> Message-Id: <20220929225203.2234702-4-jmattson@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>