summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-02-15Merge tag 'kvm-x86-selftests-6.3' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini12-62/+63
KVM selftests changes for 6.3: - Cache the CPU vendor (AMD vs. Intel) and use the info to emit the correct hypercall instruction instead of relying on KVM to patch in VMMCALL - A variety of one-off cleanups and fixes
2023-02-15Merge tag 'kvm-x86-pmu-6.3' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini16-198/+896
KVM x86 PMU changes for 6.3: - Add support for created masked events for the PMU filter to allow userspace to heavily restrict what events the guest can use without needing to create an absurd number of events - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU support is disabled - Add PEBS support for Intel SPR
2023-02-15Merge tag 'kvm-x86-mmu-6.3' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini7-48/+63
KVM x86 MMU changes for 6.3: - Fix and cleanup the range-based TLB flushing code, used when KVM is running on Hyper-V - A few one-off cleanups
2023-02-15Merge tag 'kvm-x86-misc-6.3' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini23-109/+327
KVM x86 changes for 6.3: - Advertise support for Intel's fancy new fast REP string features - Fix a double-shootdown issue in the emergency reboot code - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give SVM similar treatment to VMX - Update Xen's TSC info CPUID sub-leaves as appropriate - Add support for Hyper-V's extended hypercalls, where "support" at this point is just forwarding the hypercalls to userspace - Clean up the kvm->lock vs. kvm->srcu sequences when updating the PMU and MSR filters - One-off fixes and cleanups
2023-02-15Merge tag 'kvm-x86-generic-6.3' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini2-6/+8
Common KVM changes for 6.3: - Account allocations in generic kvm_arch_alloc_vm() - Fix a typo and a stale comment - Fix a memory leak if coalesced MMIO unregistration fails
2023-02-08KVM: selftests: Remove duplicate macro definitionShaoqin Huang1-3/+0
The KVM_GUEST_PAGE_TABLE_MIN_PADDR macro has been defined in include/kvm_util_base.h. So remove the duplicate definition in lib/kvm_util.c. Fixes: cce0c23dd944 ("KVM: selftests: Add wrapper to allocate page table page") Signed-off-by: Shaoqin Huang <shahuang@redhat.com> Link: https://lore.kernel.org/r/20230208071801.68620-1-shahuang@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-08KVM: selftests: Clean up misnomers in xen_shinfo_testMichal Luczaj1-8/+5
As discussed[*], relabel the poorly named structs to align with the current KVM nomenclature. Old names are a leftover from before commit 52491a38b2c2 ("KVM: Initialize gfn_to_pfn_cache locks in dedicated helper"), which i.a. introduced kvm_gpc_init() and renamed kvm_gfn_to_pfn_cache_init()/ _destroy() to kvm_gpc_activate()/_deactivate(). Partly in an effort to avoid implying that the cache really is destroyed/freed. While at it, get rid of #define GPA_INVALID, which being used as a GFN, is not only misnamed, but also unnecessarily reinvents a UAPI constant. No functional change intended. [*] https://lore.kernel.org/r/Y5yZ6CFkEMBqyJ6v@google.com Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://lore.kernel.org/r/20230206202430.1898057-1-mhal@rbox.co Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-08selftests: KVM: Replace optarg with arg in guest_modes_cmdlineShaoqin Huang1-1/+1
The parameter arg in guest_modes_cmdline not being used now, and the optarg should be replaced with arg in guest_modes_cmdline. And this is the chance to change strtoul() to atoi_non_negative(), since guest mode ID will never be negative. Signed-off-by: Shaoqin Huang <shahuang@redhat.com> Fixes: e42ac777d661 ("KVM: selftests: Factor out guest mode code") Reviewed-by: Andrew Jones <andrew.jones@linux.dev> Reviewed-by: Vipin Sharma <vipinsh@google.com> Link: https://lore.kernel.org/r/20230202025716.216323-1-shahuang@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-08KVM: update code comment in struct kvm_vcpuWang Yong1-1/+1
Commit c5b077549136 ("KVM: Convert the kvm->vcpus array to a xarray") changed kvm->vcpus array to a xarray, so update the code comment of kvm_vcpu->vcpu_idx accordingly. Signed-off-by: Wang Yong <yongw.kernel@gmail.com> Link: https://lore.kernel.org/r/20230202081342.856687-1-yongw.kernel@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-08KVM: selftests: Assign guest page size in sync area early in memslot_perf_testGavin Shan1-2/+1
The guest page size in the synchronization area is needed by all test cases. So it's reasonable to set it in the unified preparation function (prepare_vm()). Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/r/20230118092133.320003-3-gshan@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-08KVM: selftests: Remove duplicate VM creation in memslot_perf_testGavin Shan1-2/+0
Remove a spurious call to __vm_create_with_one_vcpu() that was introduced by a merge gone sideways. Fixes: eb5618911af0 ("Merge tag 'kvmarm-6.2' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD") Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/r/20230118092133.320003-2-gshan@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-04KVM: x86: Simplify msr_io()Michal Luczaj1-9/+3
As of commit bccf2150fe62 ("KVM: Per-vcpu inodes"), __msr_io() doesn't return a negative value. Remove unnecessary checks. Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://lore.kernel.org/r/20230107001256.2365304-7-mhal@rbox.co [sean: call out commit which left behind the unnecessary check] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-04KVM: x86: Remove unnecessary initialization in kvm_vm_ioctl_set_msr_filter()Michal Luczaj1-1/+1
Do not initialize the value of `r`, as it will be overwritten. Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://lore.kernel.org/r/20230107001256.2365304-6-mhal@rbox.co Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-04KVM: x86: Explicitly state lockdep condition of msr_filter updateMichal Luczaj1-2/+2
Replace `1` with the actual mutex_is_locked() check. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://lore.kernel.org/r/20230107001256.2365304-5-mhal@rbox.co [sean: delete the comment that explained the hardocded '1'] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-04KVM: x86: Simplify msr_filter updateMichal Luczaj1-4/+1
Replace srcu_dereference()+rcu_assign_pointer() sequence with a single rcu_replace_pointer(). Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://lore.kernel.org/r/20230107001256.2365304-4-mhal@rbox.co Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-04KVM: x86: Optimize kvm->lock and SRCU interaction (KVM_X86_SET_MSR_FILTER)Michal Luczaj1-1/+1
Reduce time spent holding kvm->lock: unlock mutex before calling synchronize_srcu(). There is no need to hold kvm->lock until all vCPUs have been kicked, KVM only needs to guarantee that all vCPUs will switch to the new filter before exiting to userspace. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://lore.kernel.org/r/20230107001256.2365304-3-mhal@rbox.co [sean: expand changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-04KVM: x86: Optimize kvm->lock and SRCU interaction (KVM_SET_PMU_EVENT_FILTER)Michal Luczaj1-2/+1
Reduce time spent holding kvm->lock: unlock mutex before calling synchronize_srcu_expedited(). There is no need to hold kvm->lock until all vCPUs have been kicked, KVM only needs to guarantee that all vCPUs will switch to the new filter before exiting to userspace. Protecting the write to __reprogram_pmi is also unnecessary as a vCPU may process a set bit before receiving the final KVM_REQ_PMU, but the per-vCPU writes are guaranteed to occur after all vCPUs have switched to the new filter. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://lore.kernel.org/r/20230107001256.2365304-2-mhal@rbox.co [sean: expand changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-04KVM: x86/emulator: Fix comment in __load_segment_descriptor()Michal Luczaj1-1/+1
The comment refers to the same condition twice. Make it reflect what the code actually does. No functional change intended. Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20230126013405.2967156-3-mhal@rbox.co Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-04KVM: x86/emulator: Fix segment load privilege level validationMichal Luczaj1-2/+2
Intel SDM describes what steps are taken by the CPU to verify if a memory segment can actually be used at a given privilege level. Loading DS/ES/FS/GS involves checking segment's type as well as making sure that neither selector's RPL nor caller's CPL are greater than segment's DPL. Emulator implements Intel's pseudocode in __load_segment_descriptor(), even quoting the pseudocode in the comments. Although the pseudocode is correctly translated, the implementation is incorrect. This is most likely due to SDM, at the time, being wrong. Patch fixes emulator's logic and updates the pseudocode in the comment. Below are historical notes. Emulator code for handling segment descriptors appears to have been introduced in March 2010 in commit 38ba30ba51a0 ("KVM: x86 emulator: Emulate task switch in emulator.c"). Intel SDM Vol 2A: Instruction Set Reference, A-M (Order Number: 253666-034US, _March 2010_) lists the steps for loading segment registers in section related to MOV instruction: IF DS, ES, FS, or GS is loaded with non-NULL selector THEN IF segment selector index is outside descriptor table limits or segment is not a data or readable code segment or ((segment is a data or nonconforming code segment) and (both RPL and CPL > DPL)) <--- THEN #GP(selector); FI; This is precisely what __load_segment_descriptor() quotes and implements. But there's a twist; a few SDM revisions later (253667-044US), in August 2012, the snippet above becomes: IF DS, ES, FS, or GS is loaded with non-NULL selector THEN IF segment selector index is outside descriptor table limits or segment is not a data or readable code segment or ((segment is a data or nonconforming code segment) [note: missing or superfluous parenthesis?] or ((RPL > DPL) and (CPL > DPL)) <--- THEN #GP(selector); FI; Many SDMs later (253667-065US), in December 2017, pseudocode reaches what seems to be its final form: IF DS, ES, FS, or GS is loaded with non-NULL selector THEN IF segment selector index is outside descriptor table limits OR segment is not a data or readable code segment OR ((segment is a data or nonconforming code segment) AND ((RPL > DPL) or (CPL > DPL))) <--- THEN #GP(selector); FI; which also matches the behavior described in AMD's APM, which states that a #GP occurs if: The DS, ES, FS, or GS register was loaded and the segment pointed to was a data or non-conforming code segment, but the RPL or CPL was greater than the DPL. Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20230126013405.2967156-2-mhal@rbox.co [sean: add blurb to changelog calling out AMD agrees] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-02perf/x86/intel: Expose EPT-friendly PEBS for SPR and future modelsLike Xu2-1/+4
According to Intel SDM, the EPT-friendly PEBS is supported by all the platforms after ICX, ADL and the future platforms with PEBS format 5. Currently the only in-kernel user of this capability is KVM, which has very limited support for hybrid core pmu, so ADL and its successors do not currently expose this capability. When both hybrid core and PEBS format 5 are present, KVM will decide on its own merits. Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-perf-users@vger.kernel.org Suggested-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Like Xu <likexu@tencent.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221109082802.27543-4-likexu@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-02KVM: x86/pmu: Add PRIR++ and PDist support for SPR and later modelsLike Xu1-12/+33
The pebs capability on the SPR is basically the same as Ice Lake Server with the exception of two special facilities that have been enhanced and require special handling. Upon triggering a PEBS assist, there will be a finite delay between the time the counter overflows and when the microcode starts to carry out its data collection obligations. Even if the delay is constant in core clock space, it invariably manifest as variable "skids" in instruction address space. On the Ice Lake Server, the Precise Distribution of Instructions Retire (PDIR) facility mitigates the "skid" problem by providing an early indication of when the counter is about to overflow. On SPR, the PDIR counter available (Fixed 0) is unchanged, but the capability is enhanced to Instruction-Accurate PDIR (PDIR++), where PEBS is taken on the next instruction after the one that caused the overflow. SPR also introduces a new Precise Distribution (PDist) facility only on general programmable counter 0. Per Intel SDM, PDist eliminates any skid or shadowing effects from PEBS. With PDist, the PEBS record will be generated precisely upon completion of the instruction or operation that causes the counter to overflow (there is no "wait for next occurrence" by default). In terms of KVM handling, when guest accesses those special counters, the KVM needs to request the same index counters via the perf_event kernel subsystem to ensure that the guest uses the correct pebs hardware counter (PRIR++ or PDist). This is mainly achieved by adjusting the event precise level to the maximum, where the semantics of this magic number is mainly defined by the internal software context of perf_event and it's also backwards compatible as part of the user space interface. Opportunistically, refine confusing comments on TNT+, as the only ones that currently support pebs_ept are Ice Lake server and SPR (GLC+). Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20221109082802.27543-3-likexu@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-02KVM: selftests: Test Hyper-V extended hypercall exit to userspaceVipin Sharma3-0/+99
Hyper-V extended hypercalls by default exit to userspace. Verify userspace gets the call, update the result and then verify in guest correct result is received. Add KVM_EXIT_HYPERV to list of "known" hypercalls so errors generate pretty strings. Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20221212183720.4062037-14-vipinsh@google.com [sean: add KVM_EXIT_HYPERV to exit_reasons_known] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-02KVM: selftests: Replace hardcoded Linux OS id with HYPERV_LINUX_OS_IDVipin Sharma1-1/+1
Use HYPERV_LINUX_OS_ID macro instead of hardcoded 0x8100 << 48 Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20221212183720.4062037-12-vipinsh@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-02KVM: selftests: Test Hyper-V extended hypercall enablementVipin Sharma2-0/+14
Test Hyper-V extended hypercall, HV_EXT_CALL_QUERY_CAPABILITIES (0x8001), access denied and invalid parameter cases. Access is denied if CPUID.0x40000003.EBX BIT(20) is not set. Invalid parameter if call has fast bit set. Signed-off-by: Vipin Sharma <vipinsh@google.com> Link: https://lore.kernel.org/r/20221212183720.4062037-11-vipinsh@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-02KVM: x86: hyper-v: Add extended hypercall support in Hyper-vVipin Sharma1-0/+28
Add support for extended hypercall in Hyper-v. Hyper-v TLFS 6.0b describes hypercalls above call code 0x8000 as extended hypercalls. A Hyper-v hypervisor's guest VM finds availability of extended hypercalls via CPUID.0x40000003.EBX BIT(20). If the bit is set then the guest can call extended hypercalls. All extended hypercalls will exit to userspace by default. This allows for easy support of future hypercalls without being dependent on KVM releases. If there will be need to process the hypercall in KVM instead of userspace then KVM can create a capability which userspace can query to know which hypercalls can be handled by the KVM and enable handling of those hypercalls. Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20221212183720.4062037-10-vipinsh@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-02KVM: x86: hyper-v: Use common code for hypercall userspace exitVipin Sharma1-16/+11
Remove duplicate code to exit to userspace for hyper-v hypercalls and use a common place to exit. No functional change intended. Signed-off-by: Vipin Sharma <vipinsh@google.com> Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20221212183720.4062037-9-vipinsh@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-02-01KVM: Destroy target device if coalesced MMIO unregistration failsSean Christopherson1-3/+5
Destroy and free the target coalesced MMIO device if unregistering said device fails. As clearly noted in the code, kvm_io_bus_unregister_dev() does not destroy the target device. BUG: memory leak unreferenced object 0xffff888112a54880 (size 64): comm "syz-executor.2", pid 5258, jiffies 4297861402 (age 14.129s) hex dump (first 32 bytes): 38 c7 67 15 00 c9 ff ff 38 c7 67 15 00 c9 ff ff 8.g.....8.g..... e0 c7 e1 83 ff ff ff ff 00 30 67 15 00 c9 ff ff .........0g..... backtrace: [<0000000006995a8a>] kmalloc include/linux/slab.h:556 [inline] [<0000000006995a8a>] kzalloc include/linux/slab.h:690 [inline] [<0000000006995a8a>] kvm_vm_ioctl_register_coalesced_mmio+0x8e/0x3d0 arch/x86/kvm/../../../virt/kvm/coalesced_mmio.c:150 [<00000000022550c2>] kvm_vm_ioctl+0x47d/0x1600 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3323 [<000000008a75102f>] vfs_ioctl fs/ioctl.c:46 [inline] [<000000008a75102f>] file_ioctl fs/ioctl.c:509 [inline] [<000000008a75102f>] do_vfs_ioctl+0xbab/0x1160 fs/ioctl.c:696 [<0000000080e3f669>] ksys_ioctl+0x76/0xa0 fs/ioctl.c:713 [<0000000059ef4888>] __do_sys_ioctl fs/ioctl.c:720 [inline] [<0000000059ef4888>] __se_sys_ioctl fs/ioctl.c:718 [inline] [<0000000059ef4888>] __x64_sys_ioctl+0x6f/0xb0 fs/ioctl.c:718 [<000000006444fa05>] do_syscall_64+0x9f/0x4e0 arch/x86/entry/common.c:290 [<000000009a4ed50b>] entry_SYSCALL_64_after_hwframe+0x49/0xbe BUG: leak checking failed Fixes: 5d3c4c79384a ("KVM: Stop looking for coalesced MMIO zones if the bus is destroyed") Cc: stable@vger.kernel.org Reported-by: 柳菁峰 <liujingfeng@qianxin.com> Reported-by: Michal Luczaj <mhal@rbox.co> Link: https://lore.kernel.org/r/20221219171924.67989-1-seanjc@google.com Link: https://lore.kernel.org/all/20230118220003.1239032-1-mhal@rbox.co Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-27KVM: x86/pmu: Provide "error" semantics for unsupported-but-known PMU MSRsSean Christopherson1-22/+29
Provide "error" semantics (read zeros, drop writes) for userspace accesses to MSRs that are ultimately unsupported for whatever reason, but for which KVM told userspace to save and restore the MSR, i.e. for MSRs that KVM included in KVM_GET_MSR_INDEX_LIST. Previously, KVM special cased a few PMU MSRs that were problematic at one point or another. Extend the treatment to all PMU MSRs, e.g. to avoid spurious unsupported accesses. Note, the logic can also be used for non-PMU MSRs, but as of today only PMU MSRs can end up being unsupported after KVM told userspace to save and restore them. Link: https://lore.kernel.org/r/20230124234905.3774678-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-27KVM: x86/pmu: Don't tell userspace to save MSRs for non-existent fixed PMCsLike Xu2-0/+6
Limit the set of MSRs for fixed PMU counters based on the number of fixed counters actually supported by the host so that userspace doesn't waste time saving and restoring dummy values. Signed-off-by: Like Xu <likexu@tencent.com> [sean: split for !enable_pmu logic, drop min(), write changelog] Link: https://lore.kernel.org/r/20230124234905.3774678-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-27KVM: x86/pmu: Don't tell userspace to save PMU MSRs if PMU is disabledSean Christopherson1-2/+4
Omit all PMU MSRs from the "MSRs to save" list if the PMU is disabled so that userspace doesn't waste time saving and restoring dummy values. KVM provides "error" semantics (read zeros, drop writes) for such known-but- unsupported MSRs, i.e. has fudged around this issue for quite some time. Keep the "error" semantics as-is for now, the logic will be cleaned up in a separate patch. Cc: Aaron Lewis <aaronlewis@google.com> Cc: Weijiang Yang <weijiang.yang@intel.com> Link: https://lore.kernel.org/r/20230124234905.3774678-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-27KVM: x86/pmu: Use separate array for defining "PMU MSRs to save"Sean Christopherson1-71/+82
Move all potential to-be-saved PMU MSRs into a separate array so that a future patch can easily omit all PMU MSRs from the list when the PMU is disabled. No functional change intended. Link: https://lore.kernel.org/r/20230124234905.3774678-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-27KVM: x86/pmu: Gate all "unimplemented MSR" prints on report_ignored_msrsSean Christopherson5-25/+24
Add helpers to print unimplemented MSR accesses and condition all such prints on report_ignored_msrs, i.e. honor userspace's request to not print unimplemented MSRs. Even though vcpu_unimpl() is ratelimited, printing can still be problematic, e.g. if a print gets stalled when host userspace is writing MSRs during live migration, an effective stall can result in very noticeable disruption in the guest. E.g. the profile below was taken while calling KVM_SET_MSRS on the PMU counters while the PMU was disabled in KVM. - 99.75% 0.00% [.] __ioctl - __ioctl - 99.74% entry_SYSCALL_64_after_hwframe do_syscall_64 sys_ioctl - do_vfs_ioctl - 92.48% kvm_vcpu_ioctl - kvm_arch_vcpu_ioctl - 85.12% kvm_set_msr_ignored_check svm_set_msr kvm_set_msr_common printk vprintk_func vprintk_default vprintk_emit console_unlock call_console_drivers univ8250_console_write serial8250_console_write uart_console_write Reported-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20230124234905.3774678-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-27KVM: x86/pmu: Cap kvm_pmu_cap.num_counters_gp at KVM's internal maxSean Christopherson4-4/+9
Limit kvm_pmu_cap.num_counters_gp during kvm_init_pmu_capability() based on the vendor PMU capabilities so that consuming num_counters_gp naturally does the right thing. This fixes a mostly theoretical bug where KVM could over-report its PMU support in KVM_GET_SUPPORTED_CPUID for leaf 0xA, e.g. if the number of counters reported by perf is greater than KVM's hardcoded internal limit. Incorporating input from the AMD PMU also avoids over-reporting MSRs to save when running on AMD. Link: https://lore.kernel.org/r/20230124234905.3774678-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-27KVM: x86/pmu: Drop event_type and rename "struct kvm_event_hw_type_mapping"Like Xu2-15/+12
After commit ("02791a5c362b KVM: x86/pmu: Use PERF_TYPE_RAW to merge reprogram_{gp,fixed}counter()"), vPMU starts to directly use the hardware event eventsel and unit_mask to reprogram perf_event, and the event_type field in the "struct kvm_event_hw_type_mapping" is simply no longer being used. Convert the struct into an anonymous struct as the current name is obsolete as the structure no longer has any mapping semantics, and placing the struct definition directly above its sole user makes its easier to understand what the array is filling in. Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20221205122048.16023-1-likexu@tencent.com [sean: drop new comment, use anonymous struct] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86: Replace IS_ERR() with IS_ERR_VALUE()ye xingchen1-1/+1
Avoid type casts that are needed for IS_ERR() and use IS_ERR_VALUE() instead. Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Link: https://lore.kernel.org/r/202211161718436948912@zte.com.cn Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: selftests: Stop assuming stats are contiguous in kvm_binary_stats_testJing Zhang1-9/+1
Remove the assumption from kvm_binary_stats_test that all stats are laid out contiguously in memory. The current stats in KVM are contiguously laid out in memory, but that may change in the future and the ABI specifically allows holes in the stats data (since each stat exposes its own offset). While here drop the check that each stats' offset is less than size_data, as that is now always true by construction. Link: https://lore.kernel.org/kvm/20221208193857.4090582-9-dmatlack@google.com/ Fixes: 0b45d58738cd ("KVM: selftests: Add selftest for KVM statistics data binary interface") Signed-off-by: Jing Zhang <jingzhangos@google.com> [dmatlack: Re-worded the commit message.] Signed-off-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20230117222707.3949974-1-dmatlack@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/xen: Remove unneeded semicolonzhang songyi1-1/+1
The semicolon after the "}" is unneeded. Signed-off-by: zhang songyi <zhang.songyi@zte.com.cn> Link: https://lore.kernel.org/r/202212191432274558936@zte.com.cn Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: selftests: x86: Use host's native hypercall instruction in kvm_hypercall()Vishal Annapurve1-2/+8
Use the host CPU's native hypercall instruction, i.e. VMCALL vs. VMMCALL, in kvm_hypercall(), as relying on KVM to patch in the native hypercall on a #UD for the "wrong" hypercall requires KVM_X86_QUIRK_FIX_HYPERCALL_INSN to be enabled and flat out doesn't work if guest memory is encrypted with a private key, e.g. for SEV VMs. Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Vishal Annapurve <vannapurve@google.com> Link: https://lore.kernel.org/r/20230111004445.416840-4-vannapurve@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: selftests: x86: Cache host CPU vendor (AMD vs. Intel)Vishal Annapurve6-8/+21
Cache the host CPU vendor for userspace and share it with guest code. All the current callers of this_cpu* actually care about host cpu so they are updated to check host_cpu_is*. Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Vishal Annapurve <vannapurve@google.com> Link: https://lore.kernel.org/r/20230111004445.416840-3-vannapurve@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: selftests: x86: Use "this_cpu" prefix for cpu vendor queriesVishal Annapurve6-33/+30
Replace is_intel/amd_cpu helpers with this_cpu_* helpers to better convey the intent of querying vendor of the current cpu. Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Vishal Annapurve <vannapurve@google.com> Link: https://lore.kernel.org/r/20230111004445.416840-2-vannapurve@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: selftests: Fix a typo in the vcpu_msrs_set assertAaron Lewis1-1/+1
The assert incorrectly identifies the ioctl being called. Switch it from KVM_GET_MSRS to KVM_SET_MSRS. Fixes: 6ebfef83f03f ("KVM: selftest: Add proper helpers for x86-specific save/restore ioctls") Signed-off-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221209201326.2781950-1-aaronlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: selftests: kvm_vm_elf_load() and elfhdr_get() should close fdReiji Watanabe1-0/+2
kvm_vm_elf_load() and elfhdr_get() open one file each, but they never close the opened file descriptor. If a test repeatedly creates and destroys a VM with __vm_create(), which (directly or indirectly) calls those two functions, the test might end up getting a open failure with EMFILE. Fix those two functions to close the file descriptor. Signed-off-by: Reiji Watanabe <reijiw@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Andrew Jones <andrew.jones@linux.dev> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221220170921.2499209-2-reijiw@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: selftests: Test masked events in PMU filterAaron Lewis1-2/+336
Add testing to show that a pmu event can be filtered with a generalized match on it's unit mask. These tests set up test cases to demonstrate various ways of filtering a pmu event that has multiple unit mask values. It does this by setting up the filter in KVM with the masked events provided, then enabling three pmu counters in the guest. The test then verifies that the pmu counters agree with which counters should be counting and which counters should be filtered for both a sparse filter list and a dense filter list. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20221220161236.555143-8-aaronlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: selftests: Add testing for KVM_SET_PMU_EVENT_FILTERAaron Lewis1-0/+36
Test that masked events are not using invalid bits, and if they are, ensure the pmu event filter is not accepted by KVM_SET_PMU_EVENT_FILTER. The only valid bits that can be used for masked events are set when using KVM_PMU_ENCODE_MASKED_ENTRY() with one exception: If any of the high bits (35:32) of the event select are set when using Intel, the pmu event filter will fail. Also, because validation was not being done prior to the introduction of masked events, only expect validation to fail when masked events are used. E.g. in the first test a filter event with all its bits set is accepted by KVM_SET_PMU_EVENT_FILTER when flags = 0. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20221220161236.555143-7-aaronlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: selftests: Add flags when creating a pmu event filterAaron Lewis1-4/+5
Now that the flags field can be non-zero, pass it in when creating a pmu event filter. This is needed in preparation for testing masked events. No functional change intended. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20221220161236.555143-6-aaronlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/pmu: Introduce masked events to the pmu event filterAaron Lewis6-38/+282
When building a list of filter events, it can sometimes be a challenge to fit all the events needed to adequately restrict the guest into the limited space available in the pmu event filter. This stems from the fact that the pmu event filter requires each event (i.e. event select + unit mask) be listed, when the intention might be to restrict the event select all together, regardless of it's unit mask. Instead of increasing the number of filter events in the pmu event filter, add a new encoding that is able to do a more generalized match on the unit mask. Introduce masked events as another encoding the pmu event filter understands. Masked events has the fields: mask, match, and exclude. When filtering based on these events, the mask is applied to the guest's unit mask to see if it matches the match value (i.e. umask & mask == match). The exclude bit can then be used to exclude events from that match. E.g. for a given event select, if it's easier to say which unit mask values shouldn't be filtered, a masked event can be set up to match all possible unit mask values, then another masked event can be set up to match the unit mask values that shouldn't be filtered. Userspace can query to see if this feature exists by looking for the capability, KVM_CAP_PMU_EVENT_MASKED_EVENTS. This feature is enabled by setting the flags field in the pmu event filter to KVM_PMU_EVENT_FLAG_MASKED_EVENTS. Events can be encoded by using KVM_PMU_ENCODE_MASKED_ENTRY(). It is an error to have a bit set outside the valid bits for a masked event, and calls to KVM_SET_PMU_EVENT_FILTER will return -EINVAL in such cases, including the high bits of the event select (35:32) if called on Intel. With these updates the filter matching code has been updated to match on a common event. Masked events were flexible enough to handle both event types, so they were used as the common event. This changes how guest events get filtered because regardless of the type of event used in the uAPI, they will be converted to masked events. Because of this there could be a slight performance hit because instead of matching the filter event with a lookup on event select + unit mask, it does a lookup on event select then walks the unit masks to find the match. This shouldn't be a big problem because I would expect the set of common event selects to be small, and if they aren't the set can likely be reduced by using masked events to generalize the unit mask. Using one type of event when filtering guest events allows for a common code path to be used. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20221220161236.555143-5-aaronlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/pmu: prepare the pmu event filter for masked eventsAaron Lewis1-23/+33
Refactor check_pmu_event_filter() in preparation for masked events. No functional changes intended Signed-off-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20221220161236.555143-4-aaronlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/pmu: Remove impossible events from the pmu event filterAaron Lewis1-1/+18
If it's not possible for an event in the pmu event filter to match a pmu event being programmed by the guest, it's pointless to have it in the list. Opt for a shorter list by removing those events. Because this is established uAPI the pmu event filter can't outright rejected these events as garbage and return an error. Instead, play nice and remove them from the list. Also, opportunistically rewrite the comment when the filter is set to clarify that it guards against *all* TOCTOU attacks on the verified data. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20221220161236.555143-3-aaronlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/pmu: Correct the mask used in a pmu event filter lookupAaron Lewis4-1/+6
When checking if a pmu event the guest is attempting to program should be filtered, only consider the event select + unit mask in that decision. Use an architecture specific mask to mask out all other bits, including bits 35:32 on Intel. Those bits are not part of the event select and should not be considered in that decision. Fixes: 66bb8a065f5a ("KVM: x86: PMU Event Filter") Signed-off-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20221220161236.555143-2-aaronlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/mmu: Use kstrtobool() instead of strtobool()Christophe JAILLET1-1/+2
strtobool() is the same as kstrtobool(). However, the latter is more used within the kernel. In order to remove strtobool() and slightly simplify kstrtox.h, switch to the other function name. While at it, include the corresponding header file (<linux/kstrtox.h>) Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/670882aa04dbdd171b46d3b20ffab87158454616.1673689135.git.christophe.jaillet@wanadoo.fr Signed-off-by: Sean Christopherson <seanjc@google.com>