summaryrefslogtreecommitdiff
path: root/arch/x86/kvm/vmx/pmu_intel.c
AgeCommit message (Collapse)AuthorFilesLines
2024-04-11KVM: VMX: Snapshot LBR capabilities during module initializationSean Christopherson1-1/+1
Snapshot VMX's LBR capabilities once during module initialization instead of calling into perf every time a vCPU reconfigures its vPMU. This will allow massaging the LBR capabilities, e.g. if the CPU doesn't support callstacks, without having to remember to update multiple locations. Opportunistically tag vmx_get_perf_capabilities() with __init, as it's only called from vmx_set_cpu_caps(). Reviewed-by: Mingwei Zhang <mizhang@google.com> Link: https://lore.kernel.org/r/20240307011344.835640-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-03-11Merge tag 'kvm-x86-pmu-6.9' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini1-135/+85
KVM x86 PMU changes for 6.9: - Fix several bugs where KVM speciously prevents the guest from utilizing fixed counters and architectural event encodings based on whether or not guest CPUID reports support for the _architectural_ encoding. - Fix a variety of bugs in KVM's emulation of RDPMC, e.g. for "fast" reads, priority of VMX interception vs #GP, PMC types in architectural PMUs, etc. - Add a selftest to verify KVM correctly emulates RDMPC, counter availability, and a variety of other PMC-related behaviors that depend on guest CPUID, i.e. are difficult to validate via KVM-Unit-Tests. - Zero out PMU metadata on AMD if the virtual PMU is disabled to avoid wasting cycles, e.g. when checking if a PMC event needs to be synthesized when skipping an instruction. - Optimize triggering of emulated events, e.g. for "count instructions" events when skipping an instruction, which yields a ~10% performance improvement in VM-Exit microbenchmarks when a vPMU is exposed to the guest. - Tighten the check for "PMI in guest" to reduce false positives if an NMI arrives in the host while KVM is handling an IRQ VM-Exit.
2024-02-03KVM: x86/pmu: Fix type length error when reading pmu->fixed_ctr_ctrlMingwei Zhang1-1/+1
Use a u64 instead of a u8 when taking a snapshot of pmu->fixed_ctr_ctrl when reprogramming fixed counters, as truncating the value results in KVM thinking fixed counter 2 is already disabled (the bug also affects fixed counters 3+, but KVM doesn't yet support those). As a result, if the guest disables fixed counter 2, KVM will get a false negative and fail to reprogram/disable emulation of the counter, which can leads to incorrect counts and spurious PMIs in the guest. Fixes: 76d287b2342e ("KVM: x86/pmu: Drop "u8 ctrl, int idx" for reprogram_fixed_counter()") Cc: stable@vger.kernel.org Signed-off-by: Mingwei Zhang <mizhang@google.com> Link: https://lore.kernel.org/r/20240123221220.3911317-1-mizhang@google.com [sean: rewrite changelog to call out the effects of the bug] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-01KVM: x86/pmu: Add macros to iterate over all PMCs given a bitmapSean Christopherson1-5/+2
Add and use kvm_for_each_pmc() to dedup a variety of open coded for-loops that iterate over valid PMCs given a bitmap (and because seeing checkpatch whine about bad macro style is always amusing). No functional change intended. Link: https://lore.kernel.org/r/20231110022857.1273836-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-01KVM: x86/pmu: Move pmc_idx => pmc translation helper to common codeSean Christopherson1-14/+1
Add a common helper for *internal* PMC lookups, and delete the ops hook and Intel's implementation. Keep AMD's implementation, but rename it to amd_pmu_get_pmc() to make it somewhat more obvious that it's suited for both KVM-internal and guest-initiated lookups. Because KVM tracks all counters in a single bitmap, getting a counter when iterating over a bitmap, e.g. of all valid PMCs, requires a small amount of math, that while simple, isn't super obvious and doesn't use the same semantics as PMC lookups from RDPMC! Although AMD doesn't support fixed counters, the common PMU code still behaves as if there a split, the high half of which just happens to always be empty. Opportunstically add a comment to explain both what is going on, and why KVM uses a single bitmap, e.g. the boilerplate for iterating over separate bitmaps could be done via macros, so it's not (just) about deduplicating code. Link: https://lore.kernel.org/r/20231110022857.1273836-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-01KVM: x86/pmu: Add common define to capture fixed counters offsetSean Christopherson1-6/+6
Add a common define to "officially" solidify KVM's split of counters, i.e. to commit to using bits 31:0 to track general purpose counters and bits 63:32 to track fixed counters (which only Intel supports). KVM already bleeds this behavior all over common PMU code, and adding a KVM- defined macro allows clarifying that the value is a _base_, as oppposed to the _flag_ that is used to access fixed PMCs via RDPMC (which perf confusingly calls INTEL_PMC_FIXED_RDPMC_BASE). No functional change intended. Link: https://lore.kernel.org/r/20231110022857.1273836-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-01KVM: x86/pmu: Zero out PMU metadata on AMD if PMU is disabledSean Christopherson1-14/+2
Move the purging of common PMU metadata from intel_pmu_refresh() to kvm_pmu_refresh(), and invoke the vendor refresh() hook if and only if the VM is supposed to have a vPMU. KVM already denies access to the PMU based on kvm->arch.enable_pmu, as get_gp_pmc_amd() returns NULL for all PMCs in that case, i.e. KVM already violates AMD's architecture by not virtualizing a PMU (kernels have long since learned to not panic when the PMU is unavailable). But configuring the PMU as if it were enabled causes unwanted side effects, e.g. calls to kvm_pmu_trigger_event() waste an absurd number of cycles due to the all_valid_pmc_idx bitmap being non-zero. Fixes: b1d66dad65dc ("KVM: x86/svm: Add module param to control PMU virtualization") Reported-by: Konstantin Khorenko <khorenko@virtuozzo.com> Closes: https://lore.kernel.org/all/20231109180646.2963718-2-khorenko@virtuozzo.com Link: https://lore.kernel.org/r/20231110022857.1273836-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-01-31KVM: x86/pmu: Explicitly check for RDPMC of unsupported Intel PMC typesSean Christopherson1-6/+15
Explicitly check for attempts to read unsupported PMC types instead of letting the bounds check fail. Functionally, letting the check fail is ok, but it's unnecessarily subtle and does a poor job of documenting the architectural behavior that KVM is emulating. Reviewed-by: Dapeng Mi  <dapeng1.mi@linux.intel.com> Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20240109230250.424295-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-01-31KVM: x86/pmu: Treat "fixed" PMU type in RDPMC as index as a value, not flagSean Christopherson1-3/+11
Refactor KVM's handling of ECX for RDPMC to treat the FIXED modifier as an explicit value, not a flag (minus one wart). While non-architectural PMUs do use bit 31 as a flag (for "fast" reads), architectural PMUs use the upper half of ECX to encode the type. From the SDM: ECX[31:16] specifies type of PMC while ECX[15:0] specifies the index of the PMC to be read within that type Note, that the known supported types are 4000H and 2000H, i.e. look a lot like flags, doesn't contradict the above statement that ECX[31:16] holds the type, at least not by any sane reading of the SDM. Keep the explicitly clearing of the FIXED "flag", as KVM subtly relies on that behavior to disallow unsupported types while allowing the correct indices for fixed counters. This wart will be cleaned up in short order. Opportunistically grab the per-type bitmask in the if-else blocks to eliminate the one-off usage of the local "fixed" bool. Reported-by: Jim Mattson <jmattson@google.com> Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20240109230250.424295-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-01-31KVM: x86/pmu: Disallow "fast" RDPMC for architectural Intel PMUsSean Christopherson1-4/+18
Inject #GP on RDPMC if the "fast" flag is set for architectural Intel PMUs, i.e. if the PMU version is non-zero. Per Intel's SDM, and confirmed on bare metal, the "fast" flag is supported only for non-architectural PMUs, and is reserved for architectural PMUs. If the processor does not support architectural performance monitoring (CPUID.0AH:EAX[7:0]=0), ECX[30:0] specifies the index of the PMC to be read. Setting ECX[31] selects “fast” read mode if supported. In this mode, RDPMC returns bits 31:0 of the PMC in EAX while clearing EDX to zero. If the processor does support architectural performance monitoring (CPUID.0AH:EAX[7:0] ≠ 0), ECX[31:16] specifies type of PMC while ECX[15:0] specifies the index of the PMC to be read within that type. The following PMC types are currently defined: — General-purpose counters use type 0. The index x (to read IA32_PMCx) must be less than the value enumerated by CPUID.0AH.EAX[15:8] (thus ECX[15:8] must be zero). — Fixed-function counters use type 4000H. The index x (to read IA32_FIXED_CTRx) can be used if either CPUID.0AH.EDX[4:0] > x or CPUID.0AH.ECX[x] = 1 (thus ECX[15:5] must be 0). — Performance metrics use type 2000H. This type can be used only if IA32_PERF_CAPABILITIES.PERF_METRICS_AVAILABLE[bit 15]=1. For this type, the index in ECX[15:0] is implementation specific. Opportunistically WARN if KVM ever actually tries to complete RDPMC for a non-architectural PMU, and drop the non-existent "support" for fast RDPMC, as KVM doesn't support such PMUs, i.e. kvm_pmu_rdpmc() should reject the RDPMC before getting to the Intel code. Fixes: f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests") Fixes: 67f4d4288c35 ("KVM: x86: rdpmc emulation checks the counter incorrectly") Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20240109230250.424295-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-01-31KVM: x86/pmu: Apply "fast" RDPMC only to Intel PMUsSean Christopherson1-2/+14
Move the handling of "fast" RDPMC instructions, which drop bits 63:32 of the count, to Intel. The "fast" flag, and all modifiers for that matter, are Intel-only and aren't supported by AMD. Opportunistically replace open coded bit crud with proper #defines, and add comments to try and disentangle the flags vs. values mess for non-architectural vs. architectural PMUs. Fixes: ca724305a2b0 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM") Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20240109230250.424295-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-01-31KVM: x86/pmu: Prioritize VMX interception over #GP on RDPMC due to bad indexSean Christopherson1-12/+0
Apply the pre-intercepts RDPMC validity check only to AMD, and rename all relevant functions to make it as clear as possible that the check is not a standard PMC index check. On Intel, the basic rule is that only invalid opcodes and privilege/permission/mode checks have priority over VM-Exit, i.e. RDPMC with an invalid index should VM-Exit, not #GP. While the SDM doesn't explicitly call out RDPMC, it _does_ explicitly use RDMSR of a non-existent MSR as an example where VM-Exit has priority over #GP, and RDPMC is effectively just a variation of RDMSR. Manually testing on various Intel CPUs confirms this behavior, and the inverted priority was introduced for SVM compatibility, i.e. was not an intentional change for Intel PMUs. On AMD, *all* exceptions on RDPMC have priority over VM-Exit. Check for a NULL kvm_pmu_ops.check_rdpmc_early instead of using a RET0 static call so as to provide a convenient location to document the difference between Intel and AMD, and to again try to make it as obvious as possible that the early check is a one-off thing, not a generic "is this PMC valid?" helper. Fixes: 8061252ee0d2 ("KVM: SVM: Add intercept checks for remaining twobyte instructions") Cc: Jim Mattson <jmattson@google.com> Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20240109230250.424295-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-01-31KVM: x86/pmu: Get eventsel for fixed counters from perfSean Christopherson1-13/+17
Get the event selectors used to effectively request fixed counters for perf events from perf itself instead of hardcoding them in KVM and hoping that they match the underlying hardware. While fixed counters 0 and 1 use architectural events, as of ffbe4ab0beda ("perf/x86/intel: Extend the ref-cycles event to GP counters") fixed counter 2 (reference TSC cycles) may use a software-defined pseudo-encoding or a real hardware-defined encoding. Reported-by: Kan Liang <kan.liang@linux.intel.com> Closes: https://lkml.kernel.org/r/4281eee7-6423-4ec8-bb18-c6aeee1faf2c%40linux.intel.com Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20240109230250.424295-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-01-31KVM: x86/pmu: Setup fixed counters' eventsel during PMU initializationSean Christopherson1-11/+5
Set the eventsel for all fixed counters during PMU initialization, the eventsel is hardcoded and consumed if and only if the counter is supported, i.e. there is no reason to redo the setup every time the PMU is refreshed. Configuring all KVM-supported fixed counter also eliminates a potential pitfall if/when KVM supports discontiguous fixed counters, in which case configuring only nr_arch_fixed_counters will be insufficient (ignoring the fact that KVM will need many other changes to support discontiguous fixed counters). Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20240109230250.424295-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-01-31KVM: x86/pmu: Remove KVM's enumeration of Intel's architectural encodingsSean Christopherson1-49/+23
Drop KVM's enumeration of Intel's architectural event encodings, and instead open code the three encodings (of which only two are real) that KVM uses to emulate fixed counters. Now that KVM doesn't incorrectly enforce the availability of architectural encodings, there is no reason for KVM to ever care about the encodings themselves, at least not in the current format of an array indexed by the encoding's position in CPUID. Opportunistically add a comment to explain why KVM cares about eventsel values for fixed counters. Suggested-by: Jim Mattson <jmattson@google.com> Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20240109230250.424295-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-01-31KVM: x86/pmu: Allow programming events that match unsupported arch eventsSean Christopherson1-38/+0
Remove KVM's bogus restriction that the guest can't program an event whose encoding matches an unsupported architectural event. The enumeration of an architectural event only says that if a CPU supports an architectural event, then the event can be programmed using the architectural encoding. The enumeration does NOT say anything about the encoding when the CPU doesn't report support the architectural event. Preventing the guest from counting events whose encoding happens to match an architectural event breaks existing functionality whenever Intel adds an architectural encoding that was *ever* used for a CPU that doesn't enumerate support for the architectural event, even if the encoding is for the exact same event! E.g. the architectural encoding for Top-Down Slots is 0x01a4. Broadwell CPUs, which do not support the Top-Down Slots architectural event, 0x01a4 is a valid, model-specific event. Denying guest usage of 0x01a4 if/when KVM adds support for Top-Down slots would break any Broadwell-based guest. Reported-by: Kan Liang <kan.liang@linux.intel.com> Closes: https://lore.kernel.org/all/2004baa6-b494-462c-a11f-8104ea152c6a@linux.intel.com Fixes: a21864486f7e ("KVM: x86/pmu: Fix available_event_types check for REF_CPU_CYCLES event") Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20240109230250.424295-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-01-31KVM: x86/pmu: Always treat Fixed counters as available when supportedSean Christopherson1-1/+14
Treat fixed counters as available when they are supported, i.e. don't silently ignore an enabled fixed counter just because guest CPUID says the associated general purpose architectural event is unavailable. KVM originally treated fixed counters as always available, but that got changed as part of a fix to avoid confusing REF_CPU_CYCLES, which does NOT map to an architectural event, with the actual architectural event used associated with bit 7, TOPDOWN_SLOTS. The commit justified the change with: If the event is marked as unavailable in the Intel guest CPUID 0AH.EBX leaf, we need to avoid any perf_event creation, whether it's a gp or fixed counter. but that justification doesn't mesh with reality. The Intel SDM uses "architectural events" to refer to both general purpose events (the ones with the reverse polarity mask in CPUID.0xA.EBX) and the events for fixed counters, e.g. the SDM makes statements like: Each of the fixed-function PMC can count only one architectural performance event. but the fact that fixed counter 2 (TSC reference cycles) doesn't have an associated general purpose architectural makes trying to apply the mask from CPUID.0xA.EBX impossible. Furthermore, the lack of enumeration for an architectural event in CPUID only means the CPU doesn't officially support the architectural encoding, i.e. it doesn't mean using the architectural encoding _won't_ work, it sipmly means there are no guarantees that it will work as expected. E.g. if KVM is running in a VM that advertises a fixed counters but not the corresponding architectural event encoding, and perf decides to use a general purpose counter instead of a fixed counter, odds are very good that the underlying hardware actually does support the architectrual encoding, and that programming the encoding will count the right thing. In other words, asking perf to count the event will probably work, whereas intentionally doing nothing is obviously guaranteed to fail. Note, at the time of the change, KVM didn't enforce hardware support, i.e. didn't prevent userspace from enumerating support in guest CPUID.0xA.EBX for architectural events that aren't supported in hardware. I.e. silently dropping the fixed counter didn't somehow protection against counting the wrong event, it just enforced guest CPUID. And practically speaking, this issue is almost certainly limited to running KVM on a funky virtual CPU model. No known real hardware has an asymmetric PMU where a fixed counter is supported but the associated architectural event is not. Fixes: a21864486f7e ("KVM: x86/pmu: Fix available_event_types check for REF_CPU_CYCLES event") Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20240109230250.424295-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30KVM: x86/pmu: Update sample period in pmc_write_counter()Sean Christopherson1-2/+0
Update a PMC's sample period in pmc_write_counter() to deduplicate code across all callers of pmc_write_counter(). Opportunistically move pmc_write_counter() into pmc.c now that it's doing more work. WRMSR isn't such a hot path that an extra CALL+RET pair will be problematic, and the order of function definitions needs to be changed anyways, i.e. now is a convenient time to eat the churn. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20231103230541.352265-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30KVM: x86/pmu: Move PMU reset logic to common x86 codeSean Christopherson1-20/+0
Move the common (or at least "ignored") aspects of resetting the vPMU to common x86 code, along with the stop/release helpers that are no used only by the common pmu.c. There is no need to manually handle fixed counters as all_valid_pmc_idx tracks both fixed and general purpose counters, and resetting the vPMU is far from a hot path, i.e. the extra bit of overhead to the PMC from the index is a non-issue. Zero fixed_ctr_ctrl in common code even though it's Intel specific. Ensuring it's zero doesn't harm AMD/SVM in any way, and stopping the fixed counters via all_valid_pmc_idx, but not clearing the associated control bits, would be odd/confusing. Make the .reset() hook optional as SVM no longer needs vendor specific handling. Cc: stable@vger.kernel.org Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20231103230541.352265-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-09-26KVM: x86/pmu: Truncate counter value to allowed width on writeRoman Kagan1-2/+2
Performance counters are defined to have width less than 64 bits. The vPMU code maintains the counters in u64 variables but assumes the value to fit within the defined width. However, for Intel non-full-width counters (MSR_IA32_PERFCTRx) the value receieved from the guest is truncated to 32 bits and then sign-extended to full 64 bits. If a negative value is set, it's sign-extended to 64 bits, but then in kvm_pmu_incr_counter() it's incremented, truncated, and compared to the previous value for overflow detection. That previous value is not truncated, so it always evaluates bigger than the truncated new one, and a PMI is injected. If the PMI handler writes a negative counter value itself, the vCPU never quits the PMI loop. Turns out that Linux PMI handler actually does write the counter with the value just read with RDPMC, so when no full-width support is exposed via MSR_IA32_PERF_CAPABILITIES, and the guest initializes the counter to a negative value, it locks up. This has been observed in the field, for example, when the guest configures atop to use perfevents and runs two instances of it simultaneously. To address the problem, maintain the invariant that the counter value always fits in the defined bit width, by truncating the received value in the respective set_msr methods. For better readability, factor the out into a helper function, pmc_write_counter(), shared by vmx and svm parts. Fixes: 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions") Cc: stable@vger.kernel.org Signed-off-by: Roman Kagan <rkagan@amazon.de> Link: https://lore.kernel.org/all/20230504120042.785651-1-rkagan@amazon.de Tested-by: Like Xu <likexu@tencent.com> [sean: tweak changelog, s/set/write in the helper] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-03KVM: x86/pmu: Require nr fixed_pmc_events to match nr max fixed countersSean Christopherson1-10/+9
Assert that the number of known fixed_pmc_events matches the max number of fixed counters supported by KVM, and clean up related code. Opportunistically extend setup_fixed_pmc_eventsel()'s use of array_index_nospec() to cover fixed_counters, as nr_arch_fixed_counters is set based on userspace input (but capped using KVM-controlled values). Link: https://lore.kernel.org/r/20230607010206.1425277-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-03KVM: x86/pmu: Simplify intel_hw_event_available()Sean Christopherson1-7/+6
Walk only the "real", i.e. non-pseudo, architectural events when checking if a hardware event is available, i.e. isn't disabled by guest CPUID. Skipping pseudo-arch events in the loop body is unnecessarily convoluted, especially now that KVM has enums that delineate between real and pseudo events. Link: https://lore.kernel.org/r/20230607010206.1425277-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-08-03KVM: x86/pmu: Use enums instead of hardcoded magic for arch event indicesSean Christopherson1-12/+43
Add "enum intel_pmu_architectural_events" to replace the magic numbers for the (pseudo-)architectural events, and to give a meaningful name to each event so that new readers don't need psychic powers to understand what the code is doing. Cc: Aaron Lewis <aaronlewis@google.com> Cc: Like Xu <like.xu.linux@gmail.com> Reviewed-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230607010206.1425277-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-07-01Merge tag 'kvm-x86-vmx-6.5' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini1-2/+0
KVM VMX changes for 6.5: - Fix missing/incorrect #GP checks on ENCLS - Use standard mmu_notifier hooks for handling APIC access page - Misc cleanups
2023-06-07KVM: x86/pmu: Disable vPMU if the minimum num of counters isn't metLike Xu1-0/+1
Disable PMU support when running on AMD and perf reports fewer than four general purpose counters. All AMD PMUs must define at least four counters due to AMD's legacy architecture hardcoding the number of counters without providing a way to enumerate the number of counters to software, e.g. from AMD's APM: The legacy architecture defines four performance counters (PerfCtrn) and corresponding event-select registers (PerfEvtSeln). Virtualizing fewer than four counters can lead to guest instability as software expects four counters to be available. Rather than bleed AMD details into the common code, just define a const unsigned int and provide a convenient location to document why Intel and AMD have different mins (in particular, AMD's lack of any way to enumerate less than four counters to the guest). Keep the minimum number of counters at Intel at one, even though old P6 and Core Solo/Duo processor effectively require a minimum of two counters. KVM can, and more importantly has up until this point, supported a vPMU so long as the CPU has at least one counter. Perf's support for P6/Core CPUs does require two counters, but perf will happily chug along with a single counter when running on a modern CPU. Cc: Jim Mattson <jmattson@google.com> Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Like Xu <likexu@tencent.com> [sean: set Intel min to '1', not '2'] Link: https://lore.kernel.org/r/20230603011058.1038821-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-07KVM: x86/pmu: Provide Intel PMU's pmc_is_enabled() as generic x86 codeLike Xu1-13/+1
Move the Intel PMU implementation of pmc_is_enabled() to common x86 code as pmc_is_globally_enabled(), and drop AMD's implementation. AMD PMU currently supports only v1, and thus not PERF_GLOBAL_CONTROL, thus the semantics for AMD are unchanged. And when support for AMD PMU v2 comes along, the common behavior will also Just Work. Signed-off-by: Like Xu <likexu@tencent.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20230603011058.1038821-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-07KVM: x86/pmu: Move handling PERF_GLOBAL_CTRL and friends to common x86Like Xu1-45/+2
Move the handling of GLOBAL_CTRL, GLOBAL_STATUS, and GLOBAL_OVF_CTRL, a.k.a. GLOBAL_STATUS_RESET, from Intel PMU code to generic x86 PMU code. AMD PerfMonV2 defines three registers that have the same semantics as Intel's variants, just with different names and indices. Conveniently, since KVM virtualizes GLOBAL_CTRL on Intel only for PMU v2 and above, and AMD's version shows up in v2, KVM can use common code for the existence check as well. Signed-off-by: Like Xu <likexu@tencent.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20230603011058.1038821-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-07KVM: x86/pmu: Reject userspace attempts to set reserved GLOBAL_STATUS bitsLike Xu1-0/+3
Reject userspace writes to MSR_CORE_PERF_GLOBAL_STATUS that attempt to set reserved bits. Allowing userspace to stuff reserved bits doesn't harm KVM itself, but it's architecturally wrong and the guest can't clear the unsupported bits, e.g. makes the guest's PMI handler very confused. Signed-off-by: Like Xu <likexu@tencent.com> [sean: rewrite changelog to avoid use of #GP, rebase on name change] Link: https://lore.kernel.org/r/20230603011058.1038821-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-07KVM: x86/pmu: Move reprogram_counters() to pmu.hLike Xu1-12/+0
Move reprogram_counters() out of Intel specific PMU code and into pmu.h so that it can be used to implement AMD PMU v2 support. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Like Xu <likexu@tencent.com> [sean: rewrite changelog] Link: https://lore.kernel.org/r/20230603011058.1038821-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-07KVM: x86/pmu: Rename global_ovf_ctrl_mask to global_status_maskSean Christopherson1-4/+14
Rename global_ovf_ctrl_mask to global_status_mask to avoid confusion now that Intel has renamed GLOBAL_OVF_CTRL to GLOBAL_STATUS_RESET in PMU v4. GLOBAL_OVF_CTRL and GLOBAL_STATUS_RESET are the same MSR index, i.e. are just different names for the same thing, but the SDM provides different entries in the IA-32 Architectural MSRs table, which gets really confusing when looking at PMU v4 definitions since it *looks* like GLOBAL_STATUS has bits that don't exist in GLOBAL_OVF_CTRL, but in reality the bits are simply defined in the GLOBAL_STATUS_RESET entry. No functional change intended. Cc: Like Xu <like.xu.linux@gmail.com> Link: https://lore.kernel.org/r/20230603011058.1038821-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-02KVM: x86/pmu: Remove redundant check for MSR_IA32_DS_AREA set handlerJinrong Liang1-2/+0
After commit 2de154f541fc ("KVM: x86/pmu: Provide "error" semantics for unsupported-but-known PMU MSRs"), the guest_cpuid_has(DS) check is not necessary any more since if the guest supports X86_FEATURE_DS, it never returns 1. And if the guest does not support this feature, the set_msr handler will get false from kvm_pmu_is_valid_msr() before reaching this point. Therefore, the check will not be true in all cases and can be safely removed, which also simplifies the code and improves its readability. Signed-off-by: Jinrong Liang <cloudliang@tencent.com> Link: https://lore.kernel.org/r/20230411130338.8592-1-cloudliang@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-07KVM: x86/pmu: Fix a typo in kvm_pmu_request_counter_reprogam()Like Xu1-2/+2
Fix a "reprogam" => "reprogram" typo in kvm_pmu_request_counter_reprogam(). Fixes: 68fb4757e867 ("KVM: x86/pmu: Defer reprogram_counter() to kvm_pmu_handle_event()") Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230310113349.31799-1-likexu@tencent.com [sean: trim the changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-07KVM: x86/pmu: Rewrite reprogram_counters() to improve performanceLike Xu1-6/+6
A valid pmc is always tested before using pmu->reprogram_pmi. Eliminate this part of the redundancy by setting the counter's bitmask directly, and in addition, trigger KVM_REQ_PMU only once to save more cpu cycles. Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230214050757.9623-4-likexu@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-07KVM: VMX: Refactor intel_pmu_{g,}set_msr() to align with other helpersSean Christopherson1-52/+57
Invert the flows in intel_pmu_{g,s}et_msr()'s case statements so that they follow the kernel's preferred style of: if (<not valid>) return <error> <commit change> return <success> which is also the style used by every other {g,s}et_msr() helper (except AMD's PMU variant, which doesn't use a switch statement). Modify the "set" paths with costly side effects, i.e. that reprogram counters, to skip only the side effects, i.e. to perform reserved bits checks even if the value is unchanged. None of the reserved bits checks are expensive, so there's no strong justification for skipping them, and guarding only the side effect makes it slightly more obvious what is being skipped and why. No functional change intended (assuming no reserved bit bugs). Link: https://lkml.kernel.org/r/Y%2B6cfen%2FCpO3%2FdLO%40google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-07KVM: x86/pmu: Zero out LBR capabilities during PMU refreshSean Christopherson1-0/+10
Zero out the LBR capabilities during PMU refresh to avoid exposing LBRs to the guest against userspace's wishes. If userspace modifies the guest's CPUID model or invokes KVM_CAP_PMU_CAPABILITY to disable vPMU after an initial KVM_SET_CPUID2, but before the first KVM_RUN, KVM will retain the previous LBR info due to bailing before refreshing the LBR descriptor. Note, this is a very theoretical bug, there is no known use case where a VMM would deliberately enable the vPMU via KVM_SET_CPUID2, and then later disable the vPMU. Link: https://lore.kernel.org/r/20230311004618.920745-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-27KVM: x86/pmu: Cap kvm_pmu_cap.num_counters_gp at KVM's internal maxSean Christopherson1-0/+1
Limit kvm_pmu_cap.num_counters_gp during kvm_init_pmu_capability() based on the vendor PMU capabilities so that consuming num_counters_gp naturally does the right thing. This fixes a mostly theoretical bug where KVM could over-report its PMU support in KVM_GET_SUPPORTED_CPUID for leaf 0xA, e.g. if the number of counters reported by perf is greater than KVM's hardcoded internal limit. Incorporating input from the AMD PMU also avoids over-reporting MSRs to save when running on AMD. Link: https://lore.kernel.org/r/20230124234905.3774678-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-27KVM: x86/pmu: Drop event_type and rename "struct kvm_event_hw_type_mapping"Like Xu1-9/+12
After commit ("02791a5c362b KVM: x86/pmu: Use PERF_TYPE_RAW to merge reprogram_{gp,fixed}counter()"), vPMU starts to directly use the hardware event eventsel and unit_mask to reprogram perf_event, and the event_type field in the "struct kvm_event_hw_type_mapping" is simply no longer being used. Convert the struct into an anonymous struct as the current name is obsolete as the structure no longer has any mapping semantics, and placing the struct definition directly above its sole user makes its easier to understand what the array is filling in. Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20221205122048.16023-1-likexu@tencent.com [sean: drop new comment, use anonymous struct] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24KVM: x86/pmu: Correct the mask used in a pmu event filter lookupAaron Lewis1-0/+1
When checking if a pmu event the guest is attempting to program should be filtered, only consider the event select + unit mask in that decision. Use an architecture specific mask to mask out all other bits, including bits 35:32 on Intel. Those bits are not part of the event select and should not be considered in that decision. Fixes: 66bb8a065f5a ("KVM: x86: PMU Event Filter") Signed-off-by: Aaron Lewis <aaronlewis@google.com> Link: https://lore.kernel.org/r/20221220161236.555143-2-aaronlewis@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-12-29KVM: x86: Unify pr_fmt to use module name for all KVM modulesSean Christopherson1-2/+3
Define pr_fmt using KBUILD_MODNAME for all KVM x86 code so that printks use consistent formatting across common x86, Intel, and AMD code. In addition to providing consistent print formatting, using KBUILD_MODNAME, e.g. kvm_amd and kvm_intel, allows referencing SVM and VMX (and SEV and SGX and ...) as technologies without generating weird messages, and without causing naming conflicts with other kernel code, e.g. "SEV: ", "tdx: ", "sgx: " etc.. are all used by the kernel for non-KVM subsystems. Opportunistically move away from printk() for prints that need to be modified anyways, e.g. to drop a manual "kvm: " prefix. Opportunistically convert a few SGX WARNs that are similarly modified to WARN_ONCE; in the very unlikely event that the WARNs fire, odds are good that they would fire repeatedly and spam the kernel log without providing unique information in each print. Note, defining pr_fmt yields undesirable results for code that uses KVM's printk wrappers, e.g. vcpu_unimpl(). But, that's a pre-existing problem as SVM/kvm_amd already defines a pr_fmt, and thankfully use of KVM's wrappers is relatively limited in KVM x86 code. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Paul Durrant <paul@xen.org> Message-Id: <20221130230934.1014142-35-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09KVM: x86/pmu: Defer counter emulated overflow via pmc->prev_counterLike Xu1-2/+2
Defer reprogramming counters and handling overflow via KVM_REQ_PMU when incrementing counters. KVM skips emulated WRMSR in the VM-Exit fastpath, the fastpath runs with IRQs disabled, skipping instructions can increment and reprogram counters, reprogramming counters can sleep, and sleeping is disallowed while IRQs are disabled. [*] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580 [*] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 2981888, name: CPU 15/KVM [*] preempt_count: 1, expected: 0 [*] RCU nest depth: 0, expected: 0 [*] INFO: lockdep is turned off. [*] irq event stamp: 0 [*] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [*] hardirqs last disabled at (0): [<ffffffff8121222a>] copy_process+0x146a/0x62d0 [*] softirqs last enabled at (0): [<ffffffff81212269>] copy_process+0x14a9/0x62d0 [*] softirqs last disabled at (0): [<0000000000000000>] 0x0 [*] Preemption disabled at: [*] [<ffffffffc2063fc1>] vcpu_enter_guest+0x1001/0x3dc0 [kvm] [*] CPU: 17 PID: 2981888 Comm: CPU 15/KVM Kdump: 5.19.0-rc1-g239111db364c-dirty #2 [*] Call Trace: [*] <TASK> [*] dump_stack_lvl+0x6c/0x9b [*] __might_resched.cold+0x22e/0x297 [*] __mutex_lock+0xc0/0x23b0 [*] perf_event_ctx_lock_nested+0x18f/0x340 [*] perf_event_pause+0x1a/0x110 [*] reprogram_counter+0x2af/0x1490 [kvm] [*] kvm_pmu_trigger_event+0x429/0x950 [kvm] [*] kvm_skip_emulated_instruction+0x48/0x90 [kvm] [*] handle_fastpath_set_msr_irqoff+0x349/0x3b0 [kvm] [*] vmx_vcpu_run+0x268e/0x3b80 [kvm_intel] [*] vcpu_enter_guest+0x1d22/0x3dc0 [kvm] Add a field to kvm_pmc to track the previous counter value in order to defer overflow detection to kvm_pmu_handle_event() (the counter must be paused before handling overflow, and that may increment the counter). Opportunistically shrink sizeof(struct kvm_pmc) a bit. Suggested-by: Wanpeng Li <wanpengli@tencent.com> Fixes: 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions") Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20220831085328.45489-6-likexu@tencent.com [sean: avoid re-triggering KVM_REQ_PMU on overflow, tweak changelog] Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220923001355.3741194-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09KVM: x86/pmu: Defer reprogram_counter() to kvm_pmu_handle_event()Like Xu1-3/+3
Batch reprogramming PMU counters by setting KVM_REQ_PMU and thus deferring reprogramming kvm_pmu_handle_event() to avoid reprogramming a counter multiple times during a single VM-Exit. Deferring programming will also allow KVM to fix a bug where immediately reprogramming a counter can result in sleeping (taking a mutex) while interrupts are disabled in the VM-Exit fastpath. Introduce kvm_pmu_request_counter_reprogam() to make it obvious that KVM is _requesting_ a reprogram and not actually doing the reprogram. Opportunistically refine related comments to avoid misunderstandings. Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20220831085328.45489-5-likexu@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220923001355.3741194-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09KVM: x86: Init vcpu->arch.perf_capabilities in common x86 codeSean Christopherson1-1/+0
Initialize vcpu->arch.perf_capabilities in x86's kvm_arch_vcpu_create() instead of deferring initialization to vendor code. For better or worse, common x86 handles reads and writes to the MSR, and so common x86 should also handle initializing the MSR. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221006000314.73240-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09KVM: x86: Track supported PERF_CAPABILITIES in kvm_capsSean Christopherson1-1/+1
Track KVM's supported PERF_CAPABILITIES in kvm_caps instead of computing the supported capabilities on the fly every time. Using kvm_caps will also allow for future cleanups as the kvm_caps values can be used directly in common x86 code. Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Like Xu <likexu@tencent.com> Message-Id: <20221006000314.73240-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09KVM: x86/pmu: Limit the maximum number of supported Intel GP countersLike Xu1-2/+2
The Intel Architectural IA32_PMCx MSRs addresses range allows for a maximum of 8 GP counters, and KVM cannot address any more. Introduce a local macro (named KVM_INTEL_PMC_MAX_GENERIC) and use it consistently to refer to the number of counters supported by KVM, thus avoiding possible out-of-bound accesses. Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Like Xu <likexu@tencent.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20220919091008.60695-2-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-28KVM: x86/pmu: Avoid using PEBS perf_events for normal countersLike Xu1-0/+2
The check logic in the pmc_resume_counter() to determine whether a perf_event is reusable is partial and flawed, especially when it comes to a pseudocode sequence (contrived, but valid) like: - enabling a counter and its PEBS bit - enable global_ctrl - run workload - disable only the PEBS bit, leaving the global_ctrl bit enabled In this corner case, a perf_event created for PEBS can be reused by a normal counter before it has been released and recreated, and when this normal counter overflows, it triggers a PEBS interrupt (precise_ip != 0). To address this issue, reprogram all affected counters when PEBS_ENABLE change and reuse a counter if and only if PEBS exactly matches precise. Fixes: 79f3e3b58386 ("KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter") Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20220831085328.45489-4-likexu@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-09-28KVM: x86/pmu: Refactor PERF_GLOBAL_CTRL update helper for reuse by PEBSLike Xu1-7/+5
Extract the "global ctrl" specific bits out of global_ctrl_changed() so that the helper only deals with reprogramming general purpose counters, and rename the helper accordingly. PEBS needs the same logic, i.e needs to reprogram counters associated when PEBS_ENABLE bits are toggled, and will use the helper in a future fix. No functional change intended. Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20220831085328.45489-4-likexu@tencent.com [sean: split to separate patch, write changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-09-28KVM: x86/pmu: Avoid setting BIT_ULL(-1) to pmu->host_cross_mapped_maskLike Xu1-6/+9
In the extreme case of host counters multiplexing and contention, the perf_event requested by the guest's pebs counter is not allocated to any actual physical counter, in which case hw.idx is bookkept as -1, resulting in an out-of-bounds access to host_cross_mapped_mask. Fixes: 854250329c02 ("KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations") Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20220831085328.45489-2-likexu@tencent.com [sean: expand comment to explain how a negative idx can be encountered] Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-08-10KVM: VMX: Adjust number of LBR records for PERF_CAPABILITIES at refreshSean Christopherson1-9/+3
Now that the PMU is refreshed when MSR_IA32_PERF_CAPABILITIES is written by host userspace, zero out the number of LBR records for a vCPU during PMU refresh if PMU_CAP_LBR_FMT is not set in PERF_CAPABILITIES instead of handling the check at run-time. guest_cpuid_has() is expensive due to the linear search of guest CPUID entries, intel_pmu_lbr_is_enabled() is checked on every VM-Enter, _and_ simply enumerating the same "Model" as the host causes KVM to set the number of LBR records to a non-zero value. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220727233424.2968356-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-28Revert "KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL VM-{Entry,Exit} control"Paolo Bonzini1-3/+0
This reverts commit 03a8871add95213827e2bea84c12133ae5df952e. Since commit 03a8871add95 ("KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL VM-{Entry,Exit} control"), KVM has taken ownership of the "load IA32_PERF_GLOBAL_CTRL" VMX entry/exit control bits, trying to set these bits in the IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS MSRs if the guest's CPUID supports the architectural PMU (CPUID[EAX=0Ah].EAX[7:0]=1), and clear otherwise. This was a misguided attempt at mimicking what commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled", 2018-10-01) did for MPX. However, that commit was a workaround for another KVM bug and not something that should be imitated. Mucking with the VMX MSRs creates a subtle, difficult to maintain ABI as KVM must ensure that any internal changes, e.g. to how KVM handles _any_ guest CPUID changes, yield the same functional result. Therefore, KVM's policy is to let userspace have full control of the guest vCPU model so long as the host kernel is not at risk. Now that KVM really truly ensures kvm_set_msr() will succeed by loading PERF_GLOBAL_CTRL if and only if it exists, revert KVM's misguided and roundabout behavior. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [sean: make it a pure revert] Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220722224409.1336532-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-28KVM: VMX: Add helper to check if the guest PMU has PERF_GLOBAL_CTRLSean Christopherson1-2/+2
Add a helper to check of the guest PMU has PERF_GLOBAL_CTRL, which is unintuitive _and_ diverges from Intel's architecturally defined behavior. Even worse, KVM currently implements the check using two different (but equivalent) checks, _and_ there has been at least one attempt to add a _third_ flavor. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220722224409.1336532-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>