Age | Commit message (Collapse) | Author | Files | Lines |
|
KVM selftests changes for 6.3:
- Cache the CPU vendor (AMD vs. Intel) and use the info to emit the correct
hypercall instruction instead of relying on KVM to patch in VMMCALL
- A variety of one-off cleanups and fixes
|
|
KVM x86 PMU changes for 6.3:
- Add support for created masked events for the PMU filter to allow
userspace to heavily restrict what events the guest can use without
needing to create an absurd number of events
- Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
support is disabled
- Add PEBS support for Intel SPR
|
|
KVM x86 MMU changes for 6.3:
- Fix and cleanup the range-based TLB flushing code, used when KVM is
running on Hyper-V
- A few one-off cleanups
|
|
KVM x86 changes for 6.3:
- Advertise support for Intel's fancy new fast REP string features
- Fix a double-shootdown issue in the emergency reboot code
- Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give SVM
similar treatment to VMX
- Update Xen's TSC info CPUID sub-leaves as appropriate
- Add support for Hyper-V's extended hypercalls, where "support" at this
point is just forwarding the hypercalls to userspace
- Clean up the kvm->lock vs. kvm->srcu sequences when updating the PMU and
MSR filters
- One-off fixes and cleanups
|
|
Common KVM changes for 6.3:
- Account allocations in generic kvm_arch_alloc_vm()
- Fix a typo and a stale comment
- Fix a memory leak if coalesced MMIO unregistration fails
|
|
The KVM_GUEST_PAGE_TABLE_MIN_PADDR macro has been defined in
include/kvm_util_base.h. So remove the duplicate definition in
lib/kvm_util.c.
Fixes: cce0c23dd944 ("KVM: selftests: Add wrapper to allocate page table page")
Signed-off-by: Shaoqin Huang <shahuang@redhat.com>
Link: https://lore.kernel.org/r/20230208071801.68620-1-shahuang@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
As discussed[*], relabel the poorly named structs to align with the
current KVM nomenclature.
Old names are a leftover from before commit 52491a38b2c2 ("KVM:
Initialize gfn_to_pfn_cache locks in dedicated helper"), which i.a.
introduced kvm_gpc_init() and renamed kvm_gfn_to_pfn_cache_init()/
_destroy() to kvm_gpc_activate()/_deactivate(). Partly in an effort
to avoid implying that the cache really is destroyed/freed.
While at it, get rid of #define GPA_INVALID, which being used as a GFN,
is not only misnamed, but also unnecessarily reinvents a UAPI constant.
No functional change intended.
[*] https://lore.kernel.org/r/Y5yZ6CFkEMBqyJ6v@google.com
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20230206202430.1898057-1-mhal@rbox.co
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
The parameter arg in guest_modes_cmdline not being used now, and the
optarg should be replaced with arg in guest_modes_cmdline.
And this is the chance to change strtoul() to atoi_non_negative(), since
guest mode ID will never be negative.
Signed-off-by: Shaoqin Huang <shahuang@redhat.com>
Fixes: e42ac777d661 ("KVM: selftests: Factor out guest mode code")
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Reviewed-by: Vipin Sharma <vipinsh@google.com>
Link: https://lore.kernel.org/r/20230202025716.216323-1-shahuang@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Commit c5b077549136 ("KVM: Convert the kvm->vcpus array to a xarray")
changed kvm->vcpus array to a xarray, so update the code comment of
kvm_vcpu->vcpu_idx accordingly.
Signed-off-by: Wang Yong <yongw.kernel@gmail.com>
Link: https://lore.kernel.org/r/20230202081342.856687-1-yongw.kernel@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
The guest page size in the synchronization area is needed by all test
cases. So it's reasonable to set it in the unified preparation function
(prepare_vm()).
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Link: https://lore.kernel.org/r/20230118092133.320003-3-gshan@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Remove a spurious call to __vm_create_with_one_vcpu() that was introduced
by a merge gone sideways.
Fixes: eb5618911af0 ("Merge tag 'kvmarm-6.2' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD")
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Link: https://lore.kernel.org/r/20230118092133.320003-2-gshan@redhat.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
As of commit bccf2150fe62 ("KVM: Per-vcpu inodes"), __msr_io() doesn't
return a negative value. Remove unnecessary checks.
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20230107001256.2365304-7-mhal@rbox.co
[sean: call out commit which left behind the unnecessary check]
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Do not initialize the value of `r`, as it will be overwritten.
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20230107001256.2365304-6-mhal@rbox.co
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Replace `1` with the actual mutex_is_locked() check.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20230107001256.2365304-5-mhal@rbox.co
[sean: delete the comment that explained the hardocded '1']
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Replace srcu_dereference()+rcu_assign_pointer() sequence with
a single rcu_replace_pointer().
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20230107001256.2365304-4-mhal@rbox.co
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Reduce time spent holding kvm->lock: unlock mutex before calling
synchronize_srcu(). There is no need to hold kvm->lock until all vCPUs
have been kicked, KVM only needs to guarantee that all vCPUs will switch
to the new filter before exiting to userspace.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20230107001256.2365304-3-mhal@rbox.co
[sean: expand changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Reduce time spent holding kvm->lock: unlock mutex before calling
synchronize_srcu_expedited(). There is no need to hold kvm->lock until
all vCPUs have been kicked, KVM only needs to guarantee that all vCPUs
will switch to the new filter before exiting to userspace. Protecting
the write to __reprogram_pmi is also unnecessary as a vCPU may process
a set bit before receiving the final KVM_REQ_PMU, but the per-vCPU writes
are guaranteed to occur after all vCPUs have switched to the new filter.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20230107001256.2365304-2-mhal@rbox.co
[sean: expand changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
The comment refers to the same condition twice. Make it reflect what the
code actually does.
No functional change intended.
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20230126013405.2967156-3-mhal@rbox.co
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Intel SDM describes what steps are taken by the CPU to verify if a
memory segment can actually be used at a given privilege level. Loading
DS/ES/FS/GS involves checking segment's type as well as making sure that
neither selector's RPL nor caller's CPL are greater than segment's DPL.
Emulator implements Intel's pseudocode in __load_segment_descriptor(),
even quoting the pseudocode in the comments. Although the pseudocode is
correctly translated, the implementation is incorrect. This is most
likely due to SDM, at the time, being wrong.
Patch fixes emulator's logic and updates the pseudocode in the comment.
Below are historical notes.
Emulator code for handling segment descriptors appears to have been
introduced in March 2010 in commit 38ba30ba51a0 ("KVM: x86 emulator:
Emulate task switch in emulator.c"). Intel SDM Vol 2A: Instruction Set
Reference, A-M (Order Number: 253666-034US, _March 2010_) lists the
steps for loading segment registers in section related to MOV
instruction:
IF DS, ES, FS, or GS is loaded with non-NULL selector
THEN
IF segment selector index is outside descriptor table limits
or segment is not a data or readable code segment
or ((segment is a data or nonconforming code segment)
and (both RPL and CPL > DPL)) <---
THEN #GP(selector); FI;
This is precisely what __load_segment_descriptor() quotes and
implements. But there's a twist; a few SDM revisions later
(253667-044US), in August 2012, the snippet above becomes:
IF DS, ES, FS, or GS is loaded with non-NULL selector
THEN
IF segment selector index is outside descriptor table limits
or segment is not a data or readable code segment
or ((segment is a data or nonconforming code segment)
[note: missing or superfluous parenthesis?]
or ((RPL > DPL) and (CPL > DPL)) <---
THEN #GP(selector); FI;
Many SDMs later (253667-065US), in December 2017, pseudocode reaches
what seems to be its final form:
IF DS, ES, FS, or GS is loaded with non-NULL selector
THEN
IF segment selector index is outside descriptor table limits
OR segment is not a data or readable code segment
OR ((segment is a data or nonconforming code segment)
AND ((RPL > DPL) or (CPL > DPL))) <---
THEN #GP(selector); FI;
which also matches the behavior described in AMD's APM, which states that
a #GP occurs if:
The DS, ES, FS, or GS register was loaded and the segment pointed to
was a data or non-conforming code segment, but the RPL or CPL was
greater than the DPL.
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20230126013405.2967156-2-mhal@rbox.co
[sean: add blurb to changelog calling out AMD agrees]
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
According to Intel SDM, the EPT-friendly PEBS is supported by all the
platforms after ICX, ADL and the future platforms with PEBS format 5.
Currently the only in-kernel user of this capability is KVM, which has
very limited support for hybrid core pmu, so ADL and its successors do
not currently expose this capability. When both hybrid core and PEBS
format 5 are present, KVM will decide on its own merits.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-perf-users@vger.kernel.org
Suggested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Like Xu <likexu@tencent.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221109082802.27543-4-likexu@tencent.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
The pebs capability on the SPR is basically the same as Ice Lake Server
with the exception of two special facilities that have been enhanced and
require special handling.
Upon triggering a PEBS assist, there will be a finite delay between the
time the counter overflows and when the microcode starts to carry out
its data collection obligations. Even if the delay is constant in core
clock space, it invariably manifest as variable "skids" in instruction
address space.
On the Ice Lake Server, the Precise Distribution of Instructions Retire
(PDIR) facility mitigates the "skid" problem by providing an early
indication of when the counter is about to overflow. On SPR, the PDIR
counter available (Fixed 0) is unchanged, but the capability is enhanced
to Instruction-Accurate PDIR (PDIR++), where PEBS is taken on the
next instruction after the one that caused the overflow.
SPR also introduces a new Precise Distribution (PDist) facility only on
general programmable counter 0. Per Intel SDM, PDist eliminates any
skid or shadowing effects from PEBS. With PDist, the PEBS record will
be generated precisely upon completion of the instruction or operation
that causes the counter to overflow (there is no "wait for next occurrence"
by default).
In terms of KVM handling, when guest accesses those special counters,
the KVM needs to request the same index counters via the perf_event
kernel subsystem to ensure that the guest uses the correct pebs hardware
counter (PRIR++ or PDist). This is mainly achieved by adjusting the
event precise level to the maximum, where the semantics of this magic
number is mainly defined by the internal software context of perf_event
and it's also backwards compatible as part of the user space interface.
Opportunistically, refine confusing comments on TNT+, as the only
ones that currently support pebs_ept are Ice Lake server and SPR (GLC+).
Signed-off-by: Like Xu <likexu@tencent.com>
Link: https://lore.kernel.org/r/20221109082802.27543-3-likexu@tencent.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Hyper-V extended hypercalls by default exit to userspace. Verify
userspace gets the call, update the result and then verify in guest
correct result is received.
Add KVM_EXIT_HYPERV to list of "known" hypercalls so errors generate
pretty strings.
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221212183720.4062037-14-vipinsh@google.com
[sean: add KVM_EXIT_HYPERV to exit_reasons_known]
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Use HYPERV_LINUX_OS_ID macro instead of hardcoded 0x8100 << 48
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20221212183720.4062037-12-vipinsh@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Test Hyper-V extended hypercall, HV_EXT_CALL_QUERY_CAPABILITIES
(0x8001), access denied and invalid parameter cases.
Access is denied if CPUID.0x40000003.EBX BIT(20) is not set.
Invalid parameter if call has fast bit set.
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Link: https://lore.kernel.org/r/20221212183720.4062037-11-vipinsh@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Add support for extended hypercall in Hyper-v. Hyper-v TLFS 6.0b
describes hypercalls above call code 0x8000 as extended hypercalls.
A Hyper-v hypervisor's guest VM finds availability of extended
hypercalls via CPUID.0x40000003.EBX BIT(20). If the bit is set then the
guest can call extended hypercalls.
All extended hypercalls will exit to userspace by default. This allows
for easy support of future hypercalls without being dependent on KVM
releases.
If there will be need to process the hypercall in KVM instead of
userspace then KVM can create a capability which userspace can query to
know which hypercalls can be handled by the KVM and enable handling
of those hypercalls.
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20221212183720.4062037-10-vipinsh@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Remove duplicate code to exit to userspace for hyper-v hypercalls and
use a common place to exit.
No functional change intended.
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20221212183720.4062037-9-vipinsh@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Destroy and free the target coalesced MMIO device if unregistering said
device fails. As clearly noted in the code, kvm_io_bus_unregister_dev()
does not destroy the target device.
BUG: memory leak
unreferenced object 0xffff888112a54880 (size 64):
comm "syz-executor.2", pid 5258, jiffies 4297861402 (age 14.129s)
hex dump (first 32 bytes):
38 c7 67 15 00 c9 ff ff 38 c7 67 15 00 c9 ff ff 8.g.....8.g.....
e0 c7 e1 83 ff ff ff ff 00 30 67 15 00 c9 ff ff .........0g.....
backtrace:
[<0000000006995a8a>] kmalloc include/linux/slab.h:556 [inline]
[<0000000006995a8a>] kzalloc include/linux/slab.h:690 [inline]
[<0000000006995a8a>] kvm_vm_ioctl_register_coalesced_mmio+0x8e/0x3d0 arch/x86/kvm/../../../virt/kvm/coalesced_mmio.c:150
[<00000000022550c2>] kvm_vm_ioctl+0x47d/0x1600 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3323
[<000000008a75102f>] vfs_ioctl fs/ioctl.c:46 [inline]
[<000000008a75102f>] file_ioctl fs/ioctl.c:509 [inline]
[<000000008a75102f>] do_vfs_ioctl+0xbab/0x1160 fs/ioctl.c:696
[<0000000080e3f669>] ksys_ioctl+0x76/0xa0 fs/ioctl.c:713
[<0000000059ef4888>] __do_sys_ioctl fs/ioctl.c:720 [inline]
[<0000000059ef4888>] __se_sys_ioctl fs/ioctl.c:718 [inline]
[<0000000059ef4888>] __x64_sys_ioctl+0x6f/0xb0 fs/ioctl.c:718
[<000000006444fa05>] do_syscall_64+0x9f/0x4e0 arch/x86/entry/common.c:290
[<000000009a4ed50b>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
BUG: leak checking failed
Fixes: 5d3c4c79384a ("KVM: Stop looking for coalesced MMIO zones if the bus is destroyed")
Cc: stable@vger.kernel.org
Reported-by: 柳菁峰 <liujingfeng@qianxin.com>
Reported-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20221219171924.67989-1-seanjc@google.com
Link: https://lore.kernel.org/all/20230118220003.1239032-1-mhal@rbox.co
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Provide "error" semantics (read zeros, drop writes) for userspace accesses
to MSRs that are ultimately unsupported for whatever reason, but for which
KVM told userspace to save and restore the MSR, i.e. for MSRs that KVM
included in KVM_GET_MSR_INDEX_LIST.
Previously, KVM special cased a few PMU MSRs that were problematic at one
point or another. Extend the treatment to all PMU MSRs, e.g. to avoid
spurious unsupported accesses.
Note, the logic can also be used for non-PMU MSRs, but as of today only
PMU MSRs can end up being unsupported after KVM told userspace to save and
restore them.
Link: https://lore.kernel.org/r/20230124234905.3774678-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Limit the set of MSRs for fixed PMU counters based on the number of fixed
counters actually supported by the host so that userspace doesn't waste
time saving and restoring dummy values.
Signed-off-by: Like Xu <likexu@tencent.com>
[sean: split for !enable_pmu logic, drop min(), write changelog]
Link: https://lore.kernel.org/r/20230124234905.3774678-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Omit all PMU MSRs from the "MSRs to save" list if the PMU is disabled so
that userspace doesn't waste time saving and restoring dummy values. KVM
provides "error" semantics (read zeros, drop writes) for such known-but-
unsupported MSRs, i.e. has fudged around this issue for quite some time.
Keep the "error" semantics as-is for now, the logic will be cleaned up in
a separate patch.
Cc: Aaron Lewis <aaronlewis@google.com>
Cc: Weijiang Yang <weijiang.yang@intel.com>
Link: https://lore.kernel.org/r/20230124234905.3774678-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Move all potential to-be-saved PMU MSRs into a separate array so that a
future patch can easily omit all PMU MSRs from the list when the PMU is
disabled.
No functional change intended.
Link: https://lore.kernel.org/r/20230124234905.3774678-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Add helpers to print unimplemented MSR accesses and condition all such
prints on report_ignored_msrs, i.e. honor userspace's request to not
print unimplemented MSRs. Even though vcpu_unimpl() is ratelimited,
printing can still be problematic, e.g. if a print gets stalled when host
userspace is writing MSRs during live migration, an effective stall can
result in very noticeable disruption in the guest.
E.g. the profile below was taken while calling KVM_SET_MSRS on the PMU
counters while the PMU was disabled in KVM.
- 99.75% 0.00% [.] __ioctl
- __ioctl
- 99.74% entry_SYSCALL_64_after_hwframe
do_syscall_64
sys_ioctl
- do_vfs_ioctl
- 92.48% kvm_vcpu_ioctl
- kvm_arch_vcpu_ioctl
- 85.12% kvm_set_msr_ignored_check
svm_set_msr
kvm_set_msr_common
printk
vprintk_func
vprintk_default
vprintk_emit
console_unlock
call_console_drivers
univ8250_console_write
serial8250_console_write
uart_console_write
Reported-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20230124234905.3774678-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Limit kvm_pmu_cap.num_counters_gp during kvm_init_pmu_capability() based
on the vendor PMU capabilities so that consuming num_counters_gp naturally
does the right thing. This fixes a mostly theoretical bug where KVM could
over-report its PMU support in KVM_GET_SUPPORTED_CPUID for leaf 0xA, e.g.
if the number of counters reported by perf is greater than KVM's
hardcoded internal limit. Incorporating input from the AMD PMU also
avoids over-reporting MSRs to save when running on AMD.
Link: https://lore.kernel.org/r/20230124234905.3774678-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
After commit ("02791a5c362b KVM: x86/pmu: Use PERF_TYPE_RAW
to merge reprogram_{gp,fixed}counter()"), vPMU starts to directly
use the hardware event eventsel and unit_mask to reprogram perf_event,
and the event_type field in the "struct kvm_event_hw_type_mapping"
is simply no longer being used.
Convert the struct into an anonymous struct as the current name is
obsolete as the structure no longer has any mapping semantics, and
placing the struct definition directly above its sole user makes its
easier to understand what the array is filling in.
Signed-off-by: Like Xu <likexu@tencent.com>
Link: https://lore.kernel.org/r/20221205122048.16023-1-likexu@tencent.com
[sean: drop new comment, use anonymous struct]
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Avoid type casts that are needed for IS_ERR() and use
IS_ERR_VALUE() instead.
Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
Link: https://lore.kernel.org/r/202211161718436948912@zte.com.cn
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Remove the assumption from kvm_binary_stats_test that all stats are
laid out contiguously in memory. The current stats in KVM are
contiguously laid out in memory, but that may change in the future and
the ABI specifically allows holes in the stats data (since each stat
exposes its own offset).
While here drop the check that each stats' offset is less than
size_data, as that is now always true by construction.
Link: https://lore.kernel.org/kvm/20221208193857.4090582-9-dmatlack@google.com/
Fixes: 0b45d58738cd ("KVM: selftests: Add selftest for KVM statistics data binary interface")
Signed-off-by: Jing Zhang <jingzhangos@google.com>
[dmatlack: Re-worded the commit message.]
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20230117222707.3949974-1-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
The semicolon after the "}" is unneeded.
Signed-off-by: zhang songyi <zhang.songyi@zte.com.cn>
Link: https://lore.kernel.org/r/202212191432274558936@zte.com.cn
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Use the host CPU's native hypercall instruction, i.e. VMCALL vs. VMMCALL,
in kvm_hypercall(), as relying on KVM to patch in the native hypercall on
a #UD for the "wrong" hypercall requires KVM_X86_QUIRK_FIX_HYPERCALL_INSN
to be enabled and flat out doesn't work if guest memory is encrypted with
a private key, e.g. for SEV VMs.
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Vishal Annapurve <vannapurve@google.com>
Link: https://lore.kernel.org/r/20230111004445.416840-4-vannapurve@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Cache the host CPU vendor for userspace and share it with guest code.
All the current callers of this_cpu* actually care about host cpu so
they are updated to check host_cpu_is*.
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Vishal Annapurve <vannapurve@google.com>
Link: https://lore.kernel.org/r/20230111004445.416840-3-vannapurve@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Replace is_intel/amd_cpu helpers with this_cpu_* helpers to better
convey the intent of querying vendor of the current cpu.
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Vishal Annapurve <vannapurve@google.com>
Link: https://lore.kernel.org/r/20230111004445.416840-2-vannapurve@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
The assert incorrectly identifies the ioctl being called. Switch it
from KVM_GET_MSRS to KVM_SET_MSRS.
Fixes: 6ebfef83f03f ("KVM: selftest: Add proper helpers for x86-specific save/restore ioctls")
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221209201326.2781950-1-aaronlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
kvm_vm_elf_load() and elfhdr_get() open one file each, but they
never close the opened file descriptor. If a test repeatedly
creates and destroys a VM with __vm_create(), which
(directly or indirectly) calls those two functions, the test
might end up getting a open failure with EMFILE.
Fix those two functions to close the file descriptor.
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221220170921.2499209-2-reijiw@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Add testing to show that a pmu event can be filtered with a generalized
match on it's unit mask.
These tests set up test cases to demonstrate various ways of filtering
a pmu event that has multiple unit mask values. It does this by
setting up the filter in KVM with the masked events provided, then
enabling three pmu counters in the guest. The test then verifies that
the pmu counters agree with which counters should be counting and which
counters should be filtered for both a sparse filter list and a dense
filter list.
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20221220161236.555143-8-aaronlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Test that masked events are not using invalid bits, and if they are,
ensure the pmu event filter is not accepted by KVM_SET_PMU_EVENT_FILTER.
The only valid bits that can be used for masked events are set when
using KVM_PMU_ENCODE_MASKED_ENTRY() with one exception: If any of the
high bits (35:32) of the event select are set when using Intel, the pmu
event filter will fail.
Also, because validation was not being done prior to the introduction
of masked events, only expect validation to fail when masked events
are used. E.g. in the first test a filter event with all its bits set
is accepted by KVM_SET_PMU_EVENT_FILTER when flags = 0.
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20221220161236.555143-7-aaronlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Now that the flags field can be non-zero, pass it in when creating a
pmu event filter.
This is needed in preparation for testing masked events.
No functional change intended.
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20221220161236.555143-6-aaronlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
When building a list of filter events, it can sometimes be a challenge
to fit all the events needed to adequately restrict the guest into the
limited space available in the pmu event filter. This stems from the
fact that the pmu event filter requires each event (i.e. event select +
unit mask) be listed, when the intention might be to restrict the
event select all together, regardless of it's unit mask. Instead of
increasing the number of filter events in the pmu event filter, add a
new encoding that is able to do a more generalized match on the unit mask.
Introduce masked events as another encoding the pmu event filter
understands. Masked events has the fields: mask, match, and exclude.
When filtering based on these events, the mask is applied to the guest's
unit mask to see if it matches the match value (i.e. umask & mask ==
match). The exclude bit can then be used to exclude events from that
match. E.g. for a given event select, if it's easier to say which unit
mask values shouldn't be filtered, a masked event can be set up to match
all possible unit mask values, then another masked event can be set up to
match the unit mask values that shouldn't be filtered.
Userspace can query to see if this feature exists by looking for the
capability, KVM_CAP_PMU_EVENT_MASKED_EVENTS.
This feature is enabled by setting the flags field in the pmu event
filter to KVM_PMU_EVENT_FLAG_MASKED_EVENTS.
Events can be encoded by using KVM_PMU_ENCODE_MASKED_ENTRY().
It is an error to have a bit set outside the valid bits for a masked
event, and calls to KVM_SET_PMU_EVENT_FILTER will return -EINVAL in
such cases, including the high bits of the event select (35:32) if
called on Intel.
With these updates the filter matching code has been updated to match on
a common event. Masked events were flexible enough to handle both event
types, so they were used as the common event. This changes how guest
events get filtered because regardless of the type of event used in the
uAPI, they will be converted to masked events. Because of this there
could be a slight performance hit because instead of matching the filter
event with a lookup on event select + unit mask, it does a lookup on event
select then walks the unit masks to find the match. This shouldn't be a
big problem because I would expect the set of common event selects to be
small, and if they aren't the set can likely be reduced by using masked
events to generalize the unit mask. Using one type of event when
filtering guest events allows for a common code path to be used.
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20221220161236.555143-5-aaronlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Refactor check_pmu_event_filter() in preparation for masked events.
No functional changes intended
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20221220161236.555143-4-aaronlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
If it's not possible for an event in the pmu event filter to match a
pmu event being programmed by the guest, it's pointless to have it in
the list. Opt for a shorter list by removing those events.
Because this is established uAPI the pmu event filter can't outright
rejected these events as garbage and return an error. Instead, play
nice and remove them from the list.
Also, opportunistically rewrite the comment when the filter is set to
clarify that it guards against *all* TOCTOU attacks on the verified
data.
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20221220161236.555143-3-aaronlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
When checking if a pmu event the guest is attempting to program should
be filtered, only consider the event select + unit mask in that
decision. Use an architecture specific mask to mask out all other bits,
including bits 35:32 on Intel. Those bits are not part of the event
select and should not be considered in that decision.
Fixes: 66bb8a065f5a ("KVM: x86: PMU Event Filter")
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20221220161236.555143-2-aaronlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
strtobool() is the same as kstrtobool().
However, the latter is more used within the kernel.
In order to remove strtobool() and slightly simplify kstrtox.h, switch to
the other function name.
While at it, include the corresponding header file (<linux/kstrtox.h>)
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/670882aa04dbdd171b46d3b20ffab87158454616.1673689135.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Sean Christopherson <seanjc@google.com>
|