summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-10-03Merge tag 'kvmarm-6.1' of ↵Paolo Bonzini51-604/+1294
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for v6.1 - Fixes for single-stepping in the presence of an async exception as well as the preservation of PSTATE.SS - Better handling of AArch32 ID registers on AArch64-only systems - Fixes for the dirty-ring API, allowing it to work on architectures with relaxed memory ordering - Advertise the new kvmarm mailing list - Various minor cleanups and spelling fixes
2022-10-01Merge branch kvm-arm64/misc-6.1 into kvmarm-master/nextMarc Zyngier5-12/+20
* kvm-arm64/misc-6.1: : . : Misc KVM/arm64 fixes and improvement for v6.1 : : - Simplify the affinity check when moving a GICv3 collection : : - Tone down the shouting when kvm-arm.mode=protected is passed : to a guest : : - Fix various comments : : - Advertise the new kvmarm@lists.linux.dev and deprecate the : old Columbia list : . KVM: arm64: Advertise new kvmarm mailing list KVM: arm64: Fix comment typo in nvhe/switch.c KVM: selftests: Update top-of-file comment in psci_test KVM: arm64: Ignore kvm-arm.mode if !is_hyp_mode_available() KVM: arm64: vgic: Remove duplicate check in update_affinity_collection() Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-10-01Merge branch kvm-arm64/dirty-log-ordered into kvmarm-master/nextMarc Zyngier8-10/+51
* kvm-arm64/dirty-log-ordered: : . : Retrofit some ordering into the existing API dirty-ring by: : : - relying on acquire/release semantics which are the default on x86, : but need to be explicit on arm64 : : - adding a new capability that indicate which flavor is supported, either : with explicit ordering (arm64) or both implicit and explicit (x86), : as suggested by Paolo at KVM Forum : : - documenting the requirements for this new capability on weakly ordered : architectures : : - updating the selftests to do the right thing : . KVM: selftests: dirty-log: Use KVM_CAP_DIRTY_LOG_RING_ACQ_REL if available KVM: selftests: dirty-log: Upgrade flag accesses to acquire/release semantics KVM: Document weakly ordered architecture requirements for dirty ring KVM: x86: Select CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL KVM: Add KVM_CAP_DIRTY_LOG_RING_ACQ_REL capability and config option KVM: Use acquire/release semantics when accessing dirty ring GFN state Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-10-01KVM: arm64: Advertise new kvmarm mailing listMarc Zyngier1-1/+2
As announced on the kvmarm list, we're moving the mailing list over to kvmarm@lists.linux.dev: <quote> As you probably all know, the kvmarm mailing has been hosted on Columbia's machines for as long as the project existed (over 13 years). After all this time, the university has decided to retire the list infrastructure and asked us to find a new hosting. A new mailing list has been created on lists.linux.dev[1], and I'm kindly asking everyone interested in following the KVM/arm64 developments to start subscribing to it (and start posting your patches there). I hope that people will move over to it quickly enough that we can soon give Columbia the green light to turn their systems off. Note that the new list will only get archived automatically once we fully switch over, but I'll make sure we fill any gap and not lose any message. In the meantime, please Cc both lists. [...] [1] https://subspace.kernel.org/lists.linux.dev.html </quote> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221001091245.3900668-1-maz@kernel.org
2022-09-30kvm: vmx: keep constant definition format consistentPeng Hao1-1/+1
Keep all constants using lowercase "x". Signed-off-by: Peng Hao <flyingpeng@tencent.com> Message-Id: <CAPm50aKnctFL_7fZ-eqrz-QGnjW2+DTyDDrhxi7UZVO3HjD8UA@mail.gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-30kvm: mmu: fix typos in struct kvm_archPeng Hao1-6/+6
No 'kvmp_mmu_pages', it should be 'kvm_mmu_page'. And struct kvm_mmu_pages and struct kvm_mmu_page are different structures, here should be kvm_mmu_page. kvm_mmu_pages is defined in arch/x86/kvm/mmu/mmu.c. Suggested-by: David Matlack <dmatlack@google.com> Signed-off-by: Peng Hao <flyingpeng@tencent.com> Reviewed-by: David Matlack <dmatlack@google.com> Message-Id: <CAPm50aL=0smbohhjAcK=ciUwcQJ=uAQP1xNQi52YsE7U8NFpEw@mail.gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-30Merge tag 'kvm-x86-6.1-2' of https://github.com/sean-jc/linux into HEADPaolo Bonzini61-1199/+1928
KVM x86 updates for 6.1, batch #2: - Misc PMU fixes and cleanups. - Fixes for Hyper-V hypercall selftest
2022-09-30KVM: selftests: Fix nx_huge_pages_test on TDP-disabled hostsDavid Matlack3-2/+48
Map the test's huge page region with 2MiB virtual mappings when TDP is disabled so that KVM can shadow the region with huge pages. This fixes nx_huge_pages_test on hosts where TDP hardware support is disabled. Purposely do not skip this test on TDP-disabled hosts. While we don't care about NX Huge Pages on TDP-disabled hosts from a security perspective, KVM does support it, and so we should test it. For TDP-enabled hosts, continue mapping the region with 4KiB pages to ensure that KVM can map it with huge pages irrespective of the guest mappings. Fixes: 8448ec5993be ("KVM: selftests: Add NX huge pages test") Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220929181207.2281449-4-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-30KVM: selftests: Add helpers to read kvm_{intel,amd} boolean module parametersDavid Matlack3-12/+44
Add helper functions for reading the value of kvm_intel and kvm_amd boolean module parameters. Use the kvm_intel variant in vm_is_unrestricted_guest() to simplify the check for kvm_intel.unrestricted_guest. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220929181207.2281449-3-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-30KVM: selftests: Tell the compiler that code after TEST_FAIL() is unreachableDavid Matlack1-2/+4
Add __builtin_unreachable() to TEST_FAIL() so that the compiler knows that any code after a TEST_FAIL() is unreachable. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220929181207.2281449-2-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-30Revert "KVM: selftests: Fix nested SVM tests when built with clang"Sean Christopherson1-13/+1
Revert back to using memset() in generic_svm_setup() now that KVM selftests override memset() and friends specifically to prevent the compiler from generating fancy code and/or linking to the libc implementation. This reverts commit ed290e1c20da19fa100a3e0f421aa31b65984960. Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220928233652.783504-8-seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-30KVM: selftests: Dedup subtests of fix_hypercall_testSean Christopherson1-32/+13
Combine fix_hypercall_test's two subtests into a common routine, the only difference between the two is whether or not the quirk is disabled. Passing a boolean is a little gross, but using an enum to make it super obvious that the callers are enabling/disabling the quirk seems like overkill. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Message-Id: <20220928233652.783504-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-30KVM: selftests: Explicitly verify KVM doesn't patch hypercall if quirk==offSean Christopherson1-8/+18
Explicitly verify that KVM doesn't patch in the native hypercall if the FIX_HYPERCALL_INSN quirk is disabled. The test currently verifies that a #UD occurred, but doesn't actually verify that no patching occurred. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220928233652.783504-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-30KVM: selftests: Hardcode VMCALL/VMMCALL opcodes in "fix hypercall" testSean Christopherson1-27/+16
Hardcode the VMCALL/VMMCALL opcodes in dedicated arrays instead of extracting the opcodes from inline asm, and patch in the "other" opcode so as to preserve the original opcode, i.e. the opcode that the test executes in the guest. Preserving the original opcode (by not patching the source), will make it easier to implement a check that KVM doesn't modify the opcode (the test currently only verifies that a #UD occurred). Use INT3 (0xcc) as the placeholder so that the guest will likely die a horrible death if the test's patching goes awry. As a bonus, patching from within the test dedups a decent chunk of code. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220928233652.783504-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-30KVM: selftests: Remove unnecessary register shuffling in fix_hypercall_testSean Christopherson1-14/+8
Use input constraints to load RAX and RBX when testing that KVM correctly does/doesn't patch the "wrong" hypercall. There's no need to manually load RAX and RBX, and no reason to clobber them either (KVM is not supposed to modify anything other than RAX). Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Message-Id: <20220928233652.783504-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-30KVM: selftests: Compare insn opcodes directly in fix_hypercall_testSean Christopherson1-18/+16
Directly compare the expected versus observed hypercall instructions when verifying that KVM patched in the native hypercall (FIX_HYPERCALL_INSN quirk enabled). gcc rightly complains that doing a 4-byte memcpy() with an "unsigned char" as the source generates an out-of-bounds accesses. Alternatively, "exp" and "obs" could be declared as 3-byte arrays, but there's no known reason to copy locally instead of comparing directly. In function ‘assert_hypercall_insn’, inlined from ‘guest_main’ at x86_64/fix_hypercall_test.c:91:2: x86_64/fix_hypercall_test.c:63:9: error: array subscript ‘unsigned int[0]’ is partly outside array bounds of ‘unsigned char[1]’ [-Werror=array-bounds] 63 | memcpy(&exp, exp_insn, sizeof(exp)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ x86_64/fix_hypercall_test.c: In function ‘guest_main’: x86_64/fix_hypercall_test.c:42:22: note: object ‘vmx_hypercall_insn’ of size 1 42 | extern unsigned char vmx_hypercall_insn; | ^~~~~~~~~~~~~~~~~~ x86_64/fix_hypercall_test.c:25:22: note: object ‘svm_hypercall_insn’ of size 1 25 | extern unsigned char svm_hypercall_insn; | ^~~~~~~~~~~~~~~~~~ In function ‘assert_hypercall_insn’, inlined from ‘guest_main’ at x86_64/fix_hypercall_test.c:91:2: x86_64/fix_hypercall_test.c:64:9: error: array subscript ‘unsigned int[0]’ is partly outside array bounds of ‘unsigned char[1]’ [-Werror=array-bounds] 64 | memcpy(&obs, obs_insn, sizeof(obs)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ x86_64/fix_hypercall_test.c: In function ‘guest_main’: x86_64/fix_hypercall_test.c:25:22: note: object ‘svm_hypercall_insn’ of size 1 25 | extern unsigned char svm_hypercall_insn; | ^~~~~~~~~~~~~~~~~~ x86_64/fix_hypercall_test.c:42:22: note: object ‘vmx_hypercall_insn’ of size 1 42 | extern unsigned char vmx_hypercall_insn; | ^~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors make: *** [../lib.mk:135: tools/testing/selftests/kvm/x86_64/fix_hypercall_test] Error 1 Fixes: 6c2fa8b20d0c ("selftests: KVM: Test KVM_X86_QUIRK_FIX_HYPERCALL_INSN") Cc: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Message-Id: <20220928233652.783504-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-30KVM: selftests: Implement memcmp(), memcpy(), and memset() for guest useSean Christopherson2-1/+49
Implement memcmp(), memcpy(), and memset() to override the compiler's built-in versions in order to guarantee that the compiler won't generate out-of-line calls to external functions via the PLT. This allows the helpers to be safely used in guest code, as KVM selftests don't support dynamic loading of guest code. Steal the implementations from the kernel's generic versions, sans the optimizations in memcmp() for unaligned accesses. Put the utilities in a separate compilation unit and build with -ffreestanding to fudge around a gcc "feature" where it will optimize memset(), memcpy(), etc... by generating a recursive call. I.e. the compiler optimizes itself into infinite recursion. Alternatively, the individual functions could be tagged with optimize("no-tree-loop-distribute-patterns"), but using "optimize" for anything but debug is discouraged, and Linus NAK'd the use of the flag in the kernel proper[*]. https://lore.kernel.org/lkml/CAHk-=wik-oXnUpfZ6Hw37uLykc-_P0Apyn2XuX-odh-3Nzop8w@mail.gmail.com Cc: Andrew Jones <andrew.jones@linux.dev> Cc: Anup Patel <anup@brainfault.org> Cc: Atish Patra <atishp@atishpatra.org> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220928233652.783504-2-seanjc@google.com> Reviewed-by: Andrew Jones <andrew.jones@linux.dev> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-30KVM: x86: Hide IA32_PLATFORM_DCA_CAP[31:0] from the guestJim Mattson1-2/+0
The only thing reported by CPUID.9 is the value of IA32_PLATFORM_DCA_CAP[31:0] in EAX. This MSR doesn't even exist in the guest, since CPUID.1:ECX.DCA[bit 18] is clear in the guest. Clear CPUID.9 in KVM_GET_SUPPORTED_CPUID. Fixes: 24c82e576b78 ("KVM: Sanitize cpuid") Signed-off-by: Jim Mattson <jmattson@google.com> Message-Id: <20220922231854.249383-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-30KVM: selftests: Gracefully handle empty stack tracesDavid Matlack1-7/+13
Bail out of test_dump_stack() if the stack trace is empty rather than invoking addr2line with zero addresses. The problem with the latter is that addr2line will block waiting for addresses to be passed in via stdin, e.g. if running a selftest from an interactive terminal. Opportunistically fix up the comment that mentions skipping 3 frames since only 2 are skipped in the code. Cc: Vipin Sharma <vipinsh@google.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220922231724.3560211-1-dmatlack@google.com> [Small tweak to keep backtrace() call close to if(). - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-30KVM: selftests: replace assertion with warning in access_tracking_perf_testEmanuele Giuseppe Esposito1-9/+16
Page_idle uses {ptep/pmdp}_clear_young_notify which in turn calls the mmu notifier callback ->clear_young(), which purposefully does not flush the TLB. When running the test in a nested guest, point 1. of the test doc header is violated, because KVM TLB is unbounded by size and since no flush is forced, KVM does not update the sptes accessed/idle bits resulting in guest assertion failure. More precisely, only the first ACCESS_WRITE in run_test() actually makes visible changes, because sptes are created and the accessed bit is set to 1 (or idle bit is 0). Then the first mark_memory_idle() passes since access bit is still one, and sets all pages as idle (or not accessed). When the next write is performed, the update is not flushed therefore idle is still 1 and next mark_memory_idle() fails. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220926082923.299554-1-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-29KVM: selftests: dirty-log: Use KVM_CAP_DIRTY_LOG_RING_ACQ_REL if availableMarc Zyngier2-2/+6
Pick KVM_CAP_DIRTY_LOG_RING_ACQ_REL if exposed by the kernel. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20220926145120.27974-7-maz@kernel.org
2022-09-29KVM: selftests: dirty-log: Upgrade flag accesses to acquire/release semanticsMarc Zyngier1-2/+3
In order to preserve ordering, make sure that the flag accesses in the dirty log are done using acquire/release accessors. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20220926145120.27974-6-maz@kernel.org
2022-09-29KVM: Document weakly ordered architecture requirements for dirty ringMarc Zyngier1-2/+15
Now that the kernel can expose to userspace that its dirty ring management relies on explicit ordering, document these new requirements for VMMs to do the right thing. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20220926145120.27974-5-maz@kernel.org
2022-09-29KVM: x86: Select CONFIG_HAVE_KVM_DIRTY_RING_ACQ_RELMarc Zyngier1-0/+1
Since x86 is TSO (give or take), allow it to advertise the new ACQ_REL version of the dirty ring capability. No other change is required for it. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20220926145120.27974-4-maz@kernel.org
2022-09-29KVM: Add KVM_CAP_DIRTY_LOG_RING_ACQ_REL capability and config optionMarc Zyngier4-2/+24
In order to differenciate between architectures that require no extra synchronisation when accessing the dirty ring and those who do, add a new capability (KVM_CAP_DIRTY_LOG_RING_ACQ_REL) that identify the latter sort. TSO architectures can obviously advertise both, while relaxed architectures must only advertise the ACQ_REL version. This requires some configuration symbol rejigging, with HAVE_KVM_DIRTY_RING being only indirectly selected by two top-level config symbols: - HAVE_KVM_DIRTY_RING_TSO for strongly ordered architectures (x86) - HAVE_KVM_DIRTY_RING_ACQ_REL for weakly ordered architectures (arm64) Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20220926145120.27974-3-maz@kernel.org
2022-09-29KVM: Use acquire/release semantics when accessing dirty ring GFN stateMarc Zyngier1-2/+2
The current implementation of the dirty ring has an implicit requirement that stores to the dirty ring from userspace must be: - be ordered with one another - visible from another CPU executing a ring reset While these implicit requirements work well for x86 (and any other TSO-like architecture), they do not work for more relaxed architectures such as arm64 where stores to different addresses can be freely reordered, and loads from these addresses not observing writes from another CPU unless the required barriers (or acquire/release semantics) are used. In order to start fixing this, upgrade the ring reset accesses: - the kvm_dirty_gfn_harvested() helper now uses acquire semantics so it is ordered after all previous writes, including that from userspace - the kvm_dirty_gfn_set_invalid() helper now uses release semantics so that the next_slot and next_offset reads don't drift past the entry invalidation This is only a partial fix as the userspace side also need upgrading. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20220926145120.27974-2-maz@kernel.org
2022-09-29KVM: arm64: Fix comment typo in nvhe/switch.cWei-Lin Chang1-1/+1
Fix the comment of __hyp_vgic_restore_state() from saying VEH to VHE, also change the underscore to a dash to match the comment above __hyp_vgic_save_state(). Signed-off-by: Wei-Lin Chang <r09922117@csie.ntu.edu.tw> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220929042839.24277-1-r09922117@csie.ntu.edu.tw
2022-09-28KVM: x86/svm/pmu: Rewrite get_gp_pmc_amd() for more counters scalabilityLike Xu1-68/+20
If the number of AMD gp counters continues to grow, the code will be very clumsy and the switch-case design of inline get_gp_pmc_amd() will also bloat the kernel text size. The target code is taught to manage two groups of MSRs, each representing a different version of the AMD PMU counter MSRs. The MSR addresses of each group are contiguous, with no holes, and there is no intersection between two sets of addresses, but they are discrete in functionality by design like this: [Group A : All counter MSRs are tightly bound to all event select MSRs ] MSR_K7_EVNTSEL0 0xc0010000 MSR_K7_EVNTSELi 0xc0010000 + i ... MSR_K7_EVNTSEL3 0xc0010003 MSR_K7_PERFCTR0 0xc0010004 MSR_K7_PERFCTRi 0xc0010004 + i ... MSR_K7_PERFCTR3 0xc0010007 [Group B : The counter MSRs are interleaved with the event select MSRs ] MSR_F15H_PERF_CTL0 0xc0010200 MSR_F15H_PERF_CTR0 (0xc0010200 + 1) ... MSR_F15H_PERF_CTLi (0xc0010200 + 2 * i) MSR_F15H_PERF_CTRi (0xc0010200 + 2 * i + 1) ... MSR_F15H_PERF_CTL5 (0xc0010200 + 2 * 5) MSR_F15H_PERF_CTR5 (0xc0010200 + 2 * 5 + 1) Rewrite get_gp_pmc_amd() in this way: first determine which group of registers is accessed, then determine if it matches its requested type, applying different scaling ratios respectively, and finally get pmc_idx to pass into amd_pmc_idx_to_pmc(). Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20220831085328.45489-8-likexu@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-09-28KVM: x86/svm/pmu: Direct access pmu->gp_counter[] to implement amd_*_to_pmc()Like Xu1-36/+5
Access PMU counters on AMD by directly indexing the array of general purpose counters instead of translating the PMC index to an MSR index. AMD only supports gp counters, there's no need to translate a PMC index to an MSR index and back to a PMC index. Opportunistically apply array_index_nospec() to reduce the attack surface for speculative execution and remove the dead code. Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20220831085328.45489-7-likexu@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-09-28KVM: x86/pmu: Avoid using PEBS perf_events for normal countersLike Xu2-2/+4
The check logic in the pmc_resume_counter() to determine whether a perf_event is reusable is partial and flawed, especially when it comes to a pseudocode sequence (contrived, but valid) like: - enabling a counter and its PEBS bit - enable global_ctrl - run workload - disable only the PEBS bit, leaving the global_ctrl bit enabled In this corner case, a perf_event created for PEBS can be reused by a normal counter before it has been released and recreated, and when this normal counter overflows, it triggers a PEBS interrupt (precise_ip != 0). To address this issue, reprogram all affected counters when PEBS_ENABLE change and reuse a counter if and only if PEBS exactly matches precise. Fixes: 79f3e3b58386 ("KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter") Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20220831085328.45489-4-likexu@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-09-28KVM: x86/pmu: Refactor PERF_GLOBAL_CTRL update helper for reuse by PEBSLike Xu1-7/+5
Extract the "global ctrl" specific bits out of global_ctrl_changed() so that the helper only deals with reprogramming general purpose counters, and rename the helper accordingly. PEBS needs the same logic, i.e needs to reprogram counters associated when PEBS_ENABLE bits are toggled, and will use the helper in a future fix. No functional change intended. Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20220831085328.45489-4-likexu@tencent.com [sean: split to separate patch, write changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-09-28KVM: x86/pmu: Don't generate PEBS records for emulated instructionsLike Xu1-3/+13
KVM will accumulate an enabled counter for at least INSTRUCTIONS or BRANCH_INSTRUCTION hw event from any KVM emulated instructions, generating emulated overflow interrupt on counter overflow, which in theory should also happen when the PEBS counter overflows but it currently lacks this part of the underlying support (e.g. through software injection of records in the irq context or a lazy approach). In this case, KVM skips the injection of this BUFFER_OVF PMI (effectively dropping one PEBS record) and let the overflow counter move on. The loss of a single sample does not introduce a loss of accuracy, but is easily noticeable for certain specific instructions. This issue is expected to be addressed along with the issue of PEBS cross-mapped counters with a slow-path proposal. Fixes: 79f3e3b58386 ("KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter") Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20220831085328.45489-3-likexu@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-09-28KVM: x86/pmu: Avoid setting BIT_ULL(-1) to pmu->host_cross_mapped_maskLike Xu1-6/+9
In the extreme case of host counters multiplexing and contention, the perf_event requested by the guest's pebs counter is not allocated to any actual physical counter, in which case hw.idx is bookkept as -1, resulting in an out-of-bounds access to host_cross_mapped_mask. Fixes: 854250329c02 ("KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations") Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20220831085328.45489-2-likexu@tencent.com [sean: expand comment to explain how a negative idx can be encountered] Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-09-28KVM: selftests: Don't set reserved bits for invalid Hyper-V hypercall numberVitaly Kuznetsov1-1/+1
Bits 27 through 31 in Hyper-V hypercall 'control' are reserved (see HV_HYPERCALL_RSVD0_MASK) but '0xdeadbeef' includes them. This causes KVM to return HV_STATUS_INVALID_HYPERCALL_INPUT instead of the expected HV_STATUS_INVALID_HYPERCALL_CODE. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/all/87fsgjol20.fsf@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-09-28KVM: selftests: Load RAX with -EFAULT before Hyper-V hypercallVipin Sharma1-1/+2
Load RAX with -EFAULT prior to making a Hyper-V hypercall so that tests can't get false negatives due to the compiler coincidentally loading the "right" value into RAX, i.e. to ensure that _KVM_ and not the compiler is correctly clearing RAX on a successful hypercall. Note, initializing *hv_status (in C code) to -EFAULT is not sufficient to avoid false negatives, as the compiler can still "clobber" RAX and thus load garbage into *hv_status if the hypercall faults (or if KVM doesn't set RAX). Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20220922062451.2927010-1-vipinsh@google.com [sean: move to separate patch, massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-09-28KVM: selftests: Check result in hyperv_features for successful hypercallsVipin Sharma1-4/+4
Commit cc5851c6be86 ("KVM: selftests: Use exception fixup for #UD/#GP Hyper-V MSR/hcall tests") introduced a wrong guest assert in guest_hcall(). It is not checking the successful hypercall results and only checks the result when a fault happens. GUEST_ASSERT_2(!hcall->ud_expected || res == hcall->expect, hcall->expect, res); Correct the assertion by only checking results of the successful hypercalls. This issue was observed when this test started failing after building it in Clang. Above guest assert statement fails because "res" is not equal to "hcall->expect" when "hcall->ud_expected" is true. "res" gets some garbage value in Clang from the RAX register. In GCC, RAX is 0 because it using RAX for @output_address in the asm statement and resetting it to 0 before using it as output operand in the same asm statement. Clang is not using RAX for @output_address. Fixes: cc5851c6be86 ("KVM: selftests: Use exception fixup for #UD/#GP Hyper-V MSR/hcall tests") Signed-off-by: Vipin Sharma <vipinsh@google.com> Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20220922062451.2927010-1-vipinsh@google.com [sean: wrap changelog at ~75 chars, move -EFAULT change to separate patch] Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-09-28KVM: selftests: Update top-of-file comment in psci_testOliver Upton1-4/+6
Fix the comment to accurately describe the test and recently added SYSTEM_SUSPEND test case. What was once psci_cpu_on_test was renamed and extended to squeeze in a test case for PSCI SYSTEM_SUSPEND. Nonetheless, the author of those changes (whoever they may be...) failed to update the file comment to reflect what had changed. Reported-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220819162100.213854-1-oliver.upton@linux.dev
2022-09-27KVM: selftests: Skip tests that require EPT when it is not availableDavid Matlack2-0/+21
Skip selftests that require EPT support in the VM when it is not available. For example, if running on a machine where kvm_intel.ept=N since KVM does not offer EPT support to guests if EPT is not supported on the host. This commit causes vmx_dirty_log_test to be skipped instead of failing on hosts where kvm_intel.ept=N. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20220926171457.532542-1-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: remove KVM_REQ_UNHALTPaolo Bonzini13-45/+3
KVM_REQ_UNHALT is now unnecessary because it is replaced by the return value of kvm_vcpu_block/kvm_vcpu_halt. Remove it. No functional change intended. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Message-Id: <20220921003201.1441511-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: mips, x86: do not rely on KVM_REQ_UNHALTPaolo Bonzini2-5/+11
KVM_REQ_UNHALT is a weird request that simply reports the value of kvm_arch_vcpu_runnable() on exit from kvm_vcpu_halt(). Only MIPS and x86 are looking at it, the others just clear it. Check the state of the vCPU directly so that the request is handled as a nop on all architectures. No functional change intended, except for corner cases where an event arrive immediately after a signal become pending or after another similar host-side event. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20220921003201.1441511-12-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: x86: never write to memory from kvm_vcpu_check_block()Paolo Bonzini1-3/+11
kvm_vcpu_check_block() is called while not in TASK_RUNNING, and therefore it cannot sleep. Writing to guest memory is therefore forbidden, but it can happen on AMD processors if kvm_check_nested_events() causes a vmexit. Fortunately, all events that are caught by kvm_check_nested_events() are also recognized by kvm_vcpu_has_events() through vendor callbacks such as kvm_x86_interrupt_allowed() or kvm_x86_ops.nested_ops->has_events(), so remove the call and postpone the actual processing to vcpu_block(). Opportunistically honor the return of kvm_check_nested_events(). KVM punted on the check in kvm_vcpu_running() because the only error path is if vmx_complete_nested_posted_interrupt() fails, in which case KVM exits to userspace with "internal error" i.e. the VM is likely dead anyways so it wasn't worth overloading the return of kvm_vcpu_running(). Add the check mostly so that KVM is consistent with itself; the return of the call via kvm_apic_accept_events()=>kvm_check_nested_events() that immediately follows _is_ checked. Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [sean: check and handle return of kvm_check_nested_events()] Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220921003201.1441511-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: x86: Don't snapshot pending INIT/SIPI prior to checking nested eventsSean Christopherson1-26/+10
Don't snapshot pending INIT/SIPI events prior to checking nested events, architecturally there's nothing wrong with KVM processing (dropping) a SIPI that is received immediately after synthesizing a VM-Exit. Taking and consuming the snapshot makes the flow way more subtle than it needs to be, e.g. nVMX consumes/clears events that trigger VM-Exit (INIT/SIPI), and so at first glance it appears that KVM is double-dipping on pending INITs and SIPIs. But that's not the case because INIT is blocked unconditionally in VMX root mode the CPU cannot be in wait-for_SIPI after VM-Exit, i.e. the paths that truly consume the snapshot are unreachable if apic->pending_events is modified by kvm_check_nested_events(). nSVM is a similar story as GIF is cleared by the CPU on VM-Exit; INIT is blocked regardless of whether or not it was pending prior to VM-Exit. Drop the snapshot logic so that a future fix doesn't create weirdness when kvm_vcpu_running()'s call to kvm_check_nested_events() is moved to vcpu_block(). In that case, kvm_check_nested_events() will be called immediately before kvm_apic_accept_events(), which raises the obvious question of why that change doesn't break the snapshot logic. Note, there is a subtle functional change. Previously, KVM would clear pending SIPIs if and only SIPI was pending prior to VM-Exit, whereas now KVM clears pending SIPI unconditionally if INIT+SIPI are blocked. The latter is architecturally allowed, as SIPI is ignored if the CPU is not in wait-for-SIPI mode (arguably, KVM should be even more aggressive in dropping SIPIs). It is software's responsibility to ensure the SIPI is delivered, i.e. software shouldn't be firing INIT-SIPI at a CPU until it knows with 100% certaining that the target CPU isn't in VMX root mode. Furthermore, the existing code is extra weird as SIPIs that arrive after VM-Exit _are_ dropped if there also happened to be a pending SIPI before VM-Exit. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220921003201.1441511-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: nVMX: Make event request on VMXOFF iff INIT/SIPI is pendingSean Christopherson1-2/+2
Explicitly check for a pending INIT/SIPI event when emulating VMXOFF instead of blindly making an event request. There's obviously no need to evaluate events if none are pending. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220921003201.1441511-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: nVMX: Make an event request if INIT or SIPI is pending on VM-EnterSean Christopherson1-12/+6
Evaluate interrupts, i.e. set KVM_REQ_EVENT, if INIT or SIPI is pending when emulating nested VM-Enter. INIT is blocked while the CPU is in VMX root mode, but not in VMX non-root, i.e. becomes unblocked on VM-Enter. This bug has been masked by KVM calling ->check_nested_events() in the core run loop, but that hack will be fixed in the near future. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220921003201.1441511-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: SVM: Make an event request if INIT or SIPI is pending when GIF is setSean Christopherson1-1/+2
Set KVM_REQ_EVENT if INIT or SIPI is pending when the guest enables GIF. INIT in particular is blocked when GIF=0 and needs to be processed when GIF is toggled to '1'. This bug has been masked by (a) KVM calling ->check_nested_events() in the core run loop and (b) hypervisors toggling GIF from 0=>1 only when entering guest mode (L1 entering L2). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220921003201.1441511-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: x86: lapic does not have to process INIT if it is blockedPaolo Bonzini1-1/+2
Do not return true from kvm_vcpu_has_events() if the vCPU isn' going to immediately process a pending INIT/SIPI. INIT/SIPI shouldn't be treated as wake events if they are blocked. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [sean: rebase onto refactored INIT/SIPI helpers, massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220921003201.1441511-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: x86: Rename kvm_apic_has_events() to make it INIT/SIPI specificSean Christopherson2-4/+4
Rename kvm_apic_has_events() to kvm_apic_has_pending_init_or_sipi() so that it's more obvious that "events" really just means "INIT or SIPI". Opportunistically clean up a weirdly worded comment that referenced kvm_apic_has_events() instead of kvm_apic_accept_events(). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220921003201.1441511-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: x86: Rename and expose helper to detect if INIT/SIPI are allowedSean Christopherson4-11/+14
Rename and invert kvm_vcpu_latch_init() to kvm_apic_init_sipi_allowed() so as to match the behavior of {interrupt,nmi,smi}_allowed(), and expose the helper so that it can be used by kvm_vcpu_has_events() to determine whether or not an INIT or SIPI is pending _and_ can be taken immediately. Opportunistically replaced usage of the "latch" terminology with "blocked" and/or "allowed", again to align with KVM's terminology used for all other event types. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220921003201.1441511-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: nVMX: Make an event request when pending an MTF nested VM-ExitSean Christopherson2-2/+7
Set KVM_REQ_EVENT when MTF becomes pending to ensure that KVM will run through inject_pending_event() and thus vmx_check_nested_events() prior to re-entering the guest. MTF currently works by virtue of KVM's hack that calls kvm_check_nested_events() from kvm_vcpu_running(), but that hack will be removed in the near future. Until that call is removed, the patch introduces no real functional change. Fixes: 5ef8acbdd687 ("KVM: nVMX: Emulate MTF when performing instruction emulation") Cc: stable@vger.kernel.org Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220921003201.1441511-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: x86: make vendor code check for all nested eventsPaolo Bonzini3-6/+12
Interrupts, NMIs etc. sent while in guest mode are already handled properly by the *_interrupt_allowed callbacks, but other events can cause a vCPU to be runnable that are specific to guest mode. In the case of VMX there are two, the preemption timer and the monitor trap. The VMX preemption timer is already special cased via the hv_timer_pending callback, but the purpose of the callback can be easily extended to MTF or in fact any other event that can occur only in guest mode. Rename the callback and add an MTF check; kvm_arch_vcpu_runnable() now can return true if an MTF is pending, without relying on kvm_vcpu_running()'s call to kvm_check_nested_events(). Until that call is removed, however, the patch introduces no functional change. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220921003201.1441511-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>