Age | Commit message (Collapse) | Author | Files | Lines |
|
The parameter passed to HFENCE.GVMA instruction in rs1 register
is guest physical address right shifted by 2 (i.e. divided by 4).
Unfortunately, we overlooked the semantics of rs1 registers for
HFENCE.GVMA instruction and never right shifted guest physical
address by 2. This issue did not manifest for hypervisors till
now because:
1) Currently, only __kvm_riscv_hfence_gvma_all() and SBI
HFENCE calls are used to invalidate TLB.
2) All H-extension implementations (such as QEMU, Spike,
Rocket Core FPGA, etc) that we tried till now were
conservatively flushing everything upon any HFENCE.GVMA
instruction.
This patch fixes GPA passed to __kvm_riscv_hfence_gvma_vmid_gpa()
and __kvm_riscv_hfence_gvma_gpa() functions.
Fixes: fd7bb4a251df ("RISC-V: KVM: Implement VMID allocator")
Reported-by: Ian Huang <ihuang@ventanamicro.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Message-Id: <20211026170136.2147619-4-anup.patel@wdc.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The timer and SBI virtualization is already in separate sources.
In future, we will have vector and AIA virtualization also added
as separate sources.
To align with above described modularity, we factor-out FP
virtualization into separate sources.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Message-Id: <20211026170136.2147619-3-anup.patel@wdc.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for Linux 5.16
- More progress on the protected VM front, now with the full
fixed feature set as well as the limitation of some hypercalls
after initialisation.
- Cleanup of the RAZ/WI sysreg handling, which was pointlessly
complicated
- Fixes for the vgic placement in the IPA space, together with a
bunch of selftests
- More memcg accounting of the memory allocated on behalf of a guest
- Timer and vgic selftests
- Workarounds for the Apple M1 broken vgic implementation
- KConfig cleanups
- New kvmarm.mode=none option, for those who really dislike us
|
|
When passing the failing address and size out to user space, SGX must
ensure not to trample on the earlier fields of the emulation_failure
sub-union of struct kvm_run.
Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210920103737.2696756-5-david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Should instruction emulation fail, include the VM exit reason, etc. in
the emulation_failure data passed to userspace, in order that the VMM
can report it as a debugging aid when describing the failure.
Suggested-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210920103737.2696756-4-david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Extend the get_exit_info static call to provide the reason for the VM
exit. Modify relevant trace points to use this rather than extracting
the reason in the caller.
Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210920103737.2696756-3-david.edmondson@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This variable was renamed to kvm_has_noapic_vcpu in commit
6e4e3b4df4e3 ("KVM: Stop using deprecated jump label APIs").
Signed-off-by: Jim Mattson <jmattson@google.com>
Message-Id: <20211021185449.3471763-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Unregister KVM's posted interrupt wakeup handler during unsetup so that a
spurious interrupt that arrives after kvm_intel.ko is unloaded doesn't
call into freed memory.
Fixes: bf9f6ac8d749 ("KVM: Update Posted-Interrupts Descriptor when vCPU is blocked")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211009001107.3936588-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add a synchronize_rcu() after clearing the posted interrupt wakeup handler
to ensure all readers, i.e. in-flight IRQ handlers, see the new handler
before returning to the caller. If the caller is an exiting module and
is unregistering its handler, failure to wait could result in the IRQ
handler jumping into an unloaded module.
The registration path doesn't require synchronization, as it's the
caller's responsibility to not generate interrupts it cares about until
after its handler is registered.
Fixes: f6b3c72c2366 ("x86/irq: Define a global vector for VT-d Posted-Interrupts")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211009001107.3936588-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Use a rw_semaphore instead of a mutex to coordinate APICv updates so that
vCPUs responding to requests can take the lock for read and run in
parallel. Using a mutex forces serialization of vCPUs even though
kvm_vcpu_update_apicv() only touches data local to that vCPU or is
protected by a different lock, e.g. SVM's ir_list_lock.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211022004927.1448382-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Move SVM's assertion that vCPU's APICv state is consistent with its VM's
state out of svm_vcpu_run() and into x86's common inner run loop. The
assertion and underlying logic is not unique to SVM, it's just that SVM
has more inhibiting conditions and thus is more likely to run headfirst
into any KVM bugs.
Add relevant comments to document exactly why the update path has unusual
ordering between the update the kick, why said ordering is safe, and also
the basic rules behind the assertion in the run loop.
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211022004927.1448382-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Commit 99cdc6c18c2d ("RISC-V: Add initial skeletal KVM support") selects
the config ANON_INODES in config KVM, but the config ANON_INODES is removed
since commit 5dd50aaeb185 ("Make anon_inodes unconditional") in 2018.
Hence, ./scripts/checkkconfigsymbols.py warns on non-existing symbols:
ANON_INODES
Referencing files: arch/riscv/kvm/Kconfig
Remove selecting the non-existing config ANON_INODES.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Message-Id: <20211022061514.25946-1-lukas.bulwahn@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Extract the zapping of rmaps, a.k.a. legacy MMU, for a gfn range to a
separate helper to clean up the unholy mess that kvm_zap_gfn_range() has
become. In addition to deep nesting, the rmaps zapping spreads out the
declaration of several variables and is generally a mess. Clean up the
mess now so that future work to improve the memslots implementation
doesn't need to deal with it.
Cc: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211022010005.1454978-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Remove an unnecessary remote TLB flush in kvm_zap_gfn_range() now that
said function holds mmu_lock for write for its entire duration. The
flush was added by the now-reverted commit to allow TDP MMU to flush while
holding mmu_lock for read, as the transition from write=>read required
dropping the lock and thus a pending flush needed to be serviced.
Fixes: 5a324c24b638 ("Revert "KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock"")
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211022010005.1454978-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
A recent commit to fix the calls to kvm_flush_remote_tlbs_with_address()
in kvm_zap_gfn_range() inadvertantly added yet another flush instead of
fixing the existing flush. Drop the redundant flush, and fix the params
for the existing flush.
Cc: stable@vger.kernel.org
Fixes: 2822da446640 ("KVM: x86/mmu: fix parameters to kvm_flush_remote_tlbs_with_address")
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211022010005.1454978-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
kvm_mmu_unload() destroys all the PGD caches. Use the lighter
kvm_mmu_sync_roots() and kvm_mmu_sync_prev_roots() instead.
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211019110154.4091-5-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The commit 578e1c4db2213 ("kvm: x86: Avoid taking MMU lock
in kvm_mmu_sync_roots if no sync is needed") added smp_wmb() in
mmu_try_to_unsync_pages(), but the corresponding smp_load_acquire() isn't
used on the load of SPTE.W. smp_load_acquire() orders _subsequent_
loads after sp->is_unsync; it does not order _earlier_ loads before
the load of sp->is_unsync.
This has no functional change; smp_rmb() is a NOP on x86, and no
compiler barrier is required because there is a VMEXIT between the
load of SPTE.W and kvm_mmu_snc_roots.
Cc: Junaid Shahid <junaids@google.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211019110154.4091-4-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The commit 21823fbda5522 ("KVM: x86: Invalidate all PGDs for the
current PCID on MOV CR3 w/ flush") invalidates all PGDs for the specific
PCID and in the case of PCID is disabled, it includes all PGDs in the
prev_roots and the commit made prev_roots totally unused in this case.
Not using prev_roots fixes a problem when CR4.PCIDE is changed 0 -> 1
before the said commit:
(CR4.PCIDE=0, CR4.PGE=1; CR3=cr3_a; the page for the guest
RIP is global; cr3_b is cached in prev_roots)
modify page tables under cr3_b
the shadow root of cr3_b is unsync in kvm
INVPCID single context
the guest expects the TLB is clean for PCID=0
change CR4.PCIDE 0 -> 1
switch to cr3_b with PCID=0,NOFLUSH=1
No sync in kvm, cr3_b is still unsync in kvm
jump to the page that was modified in step 1
shadow page tables point to the wrong page
It is a very unlikely case, but it shows that stale prev_roots can be
a problem after CR4.PCIDE changes from 0 to 1. However, to fix this
case, the commit disabled caching CR3 in prev_roots altogether when PCID
is disabled. Not all CPUs have PCID; especially the PCID support
for AMD CPUs is kind of recent. To restore the prev_roots optimization
for CR4.PCIDE=0, flush the whole MMU (including all prev_roots) when
CR4.PCIDE changes.
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211019110154.4091-3-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The KVM doesn't know whether any TLB for a specific pcid is cached in
the CPU when tdp is enabled. So it is better to flush all the guest
TLB when invalidating any single PCID context.
The case is very rare or even impossible since KVM generally doesn't
intercept CR3 write or INVPCID instructions when tdp is enabled, so the
fix is mostly for the sake of overall robustness.
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211019110154.4091-2-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
X86_CR4_PGE doesn't participate in kvm_mmu_role, so the mmu context
doesn't need to be reset. It is only required to flush all the guest
tlb.
It is also inconsistent that X86_CR4_PGE is in KVM_MMU_CR4_ROLE_BITS
while kvm_mmu_role doesn't use X86_CR4_PGE. So X86_CR4_PGE is also
removed from KVM_MMU_CR4_ROLE_BITS.
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210919024246.89230-3-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
X86_CR4_PCIDE doesn't participate in kvm_mmu_role, so the mmu context
doesn't need to be reset. It is only required to flush all the guest
tlb.
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210919024246.89230-2-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
SDM mentioned that, RDPMC:
IF (((CR4.PCE = 1) or (CPL = 0) or (CR0.PE = 0)) and (ECX indicates a supported counter))
THEN
EAX := counter[31:0];
EDX := ZeroExtend(counter[MSCB:32]);
ELSE (* ECX is not valid or CR4.PCE is 0 and CPL is 1, 2, or 3 and CR0.PE is 1 *)
#GP(0);
FI;
Let's add a comment why CR0.PE isn't tested since it's impossible for CPL to be >0 if
CR0.PE=0.
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1634724836-73721-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Paul pointed out the error messages when KVM fails to load are unhelpful
in understanding exactly what went wrong if userspace probes the "wrong"
module.
Add a mandatory kvm_x86_ops field to track vendor module names, kvm_intel
and kvm_amd, and use the name for relevant error message when KVM fails
to load so that the user knows which module failed to load.
Opportunistically tweak the "disabled by bios" error message to clarify
that _support_ was disabled, not that the module itself was magically
disabled by BIOS.
Suggested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211018183929.897461-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Currently, the NX huge page recovery thread wakes up every minute and
zaps 1/nx_huge_pages_recovery_ratio of the total number of split NX
huge pages at a time. This is intended to ensure that only a
relatively small number of pages get zapped at a time. But for very
large VMs (or more specifically, VMs with a large number of
executable pages), a period of 1 minute could still result in this
number being too high (unless the ratio is changed significantly,
but that can result in split pages lingering on for too long).
This change makes the period configurable instead of fixing it at
1 minute. Users of large VMs can then adjust the period and/or the
ratio to reduce the number of pages zapped at one time while still
maintaining the same overall duration for cycling through the
entire list. By default, KVM derives a period from the ratio such
that a page will remain on the list for 1 hour on average.
Signed-off-by: Junaid Shahid <junaids@google.com>
Message-Id: <20211020010627.305925-1-junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
SDM section 18.2.3 mentioned that:
"IA32_PERF_GLOBAL_OVF_CTL MSR allows software to clear overflow indicator(s) of
any general-purpose or fixed-function counters via a single WRMSR."
It is R/W mentioned by SDM, we read this msr on bare-metal during perf testing,
the value is always 0 for ICX/SKX boxes on hands. Let's fill get_msr
MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 as hardware behavior and drop
global_ovf_ctrl variable.
Tested-by: Like Xu <likexu@tencent.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1634631160-67276-2-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
slot_handle_leaf is a misnomer because it only operates on 4K SPTEs
whereas "leaf" is used to describe any valid terminal SPTE (4K or
large page). Rename slot_handle_leaf to slot_handle_level_4k to
avoid confusion.
Making this change makes it more obvious there is a benign discrepency
between the legacy MMU and the TDP MMU when it comes to dirty logging.
The legacy MMU only iterates through 4K SPTEs when zapping for
collapsing and when clearing D-bits. The TDP MMU, on the other hand,
iterates through SPTEs on all levels.
The TDP MMU behavior of zapping SPTEs at all levels is technically
overkill for its current dirty logging implementation, which always
demotes to 4k SPTES, but both the TDP MMU and legacy MMU zap if and only
if the SPTE can be replaced by a larger page, i.e. will not spuriously
zap 2m (or larger) SPTEs. Opportunistically add comments to explain this
discrepency in the code.
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20211019162223.3935109-1-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Per Intel SDM, RTIT_CTL_BRANCH_EN bit has no dependency on any CPUID
leaf 0x14.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20210827070249.924633-5-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
To better self explain the meaning of this field and match the
PT_CAP_num_address_ranges constatn.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20210827070249.924633-4-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The number of valid PT ADDR MSRs for the guest is precomputed in
vmx->pt_desc.addr_range. Use it instead of calculating again.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20210827070249.924633-3-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
A minor optimization to WRMSR MSR_IA32_RTIT_CTL when necessary.
Opportunistically refine the comment to call out that KVM requires
VM_EXIT_CLEAR_IA32_RTIT_CTL to expose PT to the guest.
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20210827070249.924633-2-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
"prefetch", "prefault" and "speculative" are used throughout KVM to mean
the same thing. Use a single name, standardizing on "prefetch" which
is already used by various functions such as direct_pte_prefetch,
FNAME(prefetch_gpte), FNAME(pte_prefetch), etc.
Suggested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Unify the flags for rmaps and page tracking data, using a
single flag in struct kvm_arch and a single loop to go
over all the address spaces and memslots. This avoids
code duplication between alloc_all_memslots_rmaps and
kvm_page_track_enable_mmu_write_tracking.
Signed-off-by: David Stevens <stevensd@chromium.org>
[This patch is the delta between David's v2 and v3, with conflicts
fixed and my own commit message. - Paolo]
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
To date, VMM-directed TSC synchronization and migration has been a bit
messy. KVM has some baked-in heuristics around TSC writes to infer if
the VMM is attempting to synchronize. This is problematic, as it depends
on host userspace writing to the guest's TSC within 1 second of the last
write.
A much cleaner approach to configuring the guest's views of the TSC is to
simply migrate the TSC offset for every vCPU. Offsets are idempotent,
and thus not subject to change depending on when the VMM actually
reads/writes values from/to KVM. The VMM can then read the TSC once with
KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when
the guest is paused.
Cc: David Matlack <dmatlack@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210916181538.968978-8-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Refactor kvm_synchronize_tsc to make a new function that allows callers
to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly
for the sake of participating in TSC synchronization.
Signed-off-by: Oliver Upton <oupton@google.com>
Message-Id: <20210916181538.968978-7-oupton@google.com>
[Make sure kvm->arch.cur_tsc_generation and vcpu->arch.this_tsc_generation are
equal at the end of __kvm_synchronize_tsc, if matched is false. Reported by
Maxim Levitsky. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Protect the reference point for kvmclock with a seqcount, so that
kvmclock updates for all vCPUs can proceed in parallel. Xen runstate
updates will also run in parallel and not bounce the kvmclock cacheline.
Of the variables that were protected by pvclock_gtod_sync_lock,
nr_vcpus_matched_tsc is different because it is updated outside
pvclock_update_vm_gtod_copy and read inside it. Therefore, we
need to keep it protected by a spinlock. In fact it must now
be a raw spinlock, because pvclock_update_vm_gtod_copy, being the
write-side of a seqcount, is non-preemptible. Since we already
have tsc_write_lock which is a raw spinlock, we can just use
tsc_write_lock as the lock that protects the write-side of the
seqcount.
Co-developed-by: Oliver Upton <oupton@google.com>
Message-Id: <20210916181538.968978-6-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Handling the migration of TSCs correctly is difficult, in part because
Linux does not provide userspace with the ability to retrieve a (TSC,
realtime) clock pair for a single instant in time. In lieu of a more
convenient facility, KVM can report similar information in the kvm_clock
structure.
Provide userspace with a host TSC & realtime pair iff the realtime clock
is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid
realtime value, advance the KVM clock by the amount of elapsed time. Do
not step the KVM clock backwards, though, as it is a monotonic
oscillator.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210916181538.968978-5-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This is a new warning in clang top-of-tree (will be clang 14):
In file included from arch/x86/kvm/mmu/mmu.c:27:
arch/x86/kvm/mmu/spte.h:318:9: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]
return __is_bad_mt_xwr(rsvd_check, spte) |
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
||
arch/x86/kvm/mmu/spte.h:318:9: note: cast one or both operands to int to silence this warning
The code is fine, but change it anyway to shut up this clever clogs
of a compiler.
Reported-by: torvic9@mailbox.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
|
|
If allocation of rmaps fails, but some of the pointers have already been written,
those pointers can be cleaned up when the memslot is freed, or even reused later
for another attempt at allocating the rmaps. Therefore there is no need to
WARN, as done for example in memslot_rmap_alloc, but the allocation *must* be
skipped lest KVM will overwrite the previous pointer and will indeed leak memory.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
* kvm-arm64/pkvm/fixed-features: (22 commits)
: .
: Add the pKVM fixed feature that allows a bunch of exceptions
: to either be forbidden or be easily handled at EL2.
: .
KVM: arm64: pkvm: Give priority to standard traps over pvm handling
KVM: arm64: pkvm: Pass vpcu instead of kvm to kvm_get_exit_handler_array()
KVM: arm64: pkvm: Move kvm_handle_pvm_restricted around
KVM: arm64: pkvm: Consolidate include files
KVM: arm64: pkvm: Preserve pending SError on exit from AArch32
KVM: arm64: pkvm: Handle GICv3 traps as required
KVM: arm64: pkvm: Drop sysregs that should never be routed to the host
KVM: arm64: pkvm: Drop AArch32-specific registers
KVM: arm64: pkvm: Make the ERR/ERX*_EL1 registers RAZ/WI
KVM: arm64: pkvm: Use a single function to expose all id-regs
KVM: arm64: Fix early exit ptrauth handling
KVM: arm64: Handle protected guests at 32 bits
KVM: arm64: Trap access to pVM restricted features
KVM: arm64: Move sanitized copies of CPU features
KVM: arm64: Initialize trap registers for protected VMs
KVM: arm64: Add handlers for protected VM System Registers
KVM: arm64: Simplify masking out MTE in feature id reg
KVM: arm64: Add missing field descriptor for MDCR_EL2
KVM: arm64: Pass struct kvm to per-EC handlers
KVM: arm64: Move early handlers to per-EC handlers
...
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Checking for pvm handling first means that we cannot handle ptrauth
traps or apply any of the workarounds (GICv3 or TX2 #219).
Flip the order around.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211013120346.2926621-12-maz@kernel.org
|
|
Passing a VM pointer around is odd, and results in extra work on
VHE. Follow the rest of the design that uses the vcpu instead, and
let the nVHE code look into the struct kvm as required.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211013120346.2926621-11-maz@kernel.org
|
|
Place kvm_handle_pvm_restricted() next to its little friends such
as kvm_handle_pvm_sysreg().
This allows to make inject_undef64() static.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211013120346.2926621-10-maz@kernel.org
|
|
kvm_fixed_config.h is pkvm specific, and would be better placed
near its users. At the same time, include/nvhe/sys_regs.h is now
almost empty.
Merge the two into arch/arm64/kvm/hyp/include/nvhe/fixed_config.h.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211013120346.2926621-9-maz@kernel.org
|
|
Don't drop a potential SError when a guest gets caught red-handed
running AArch32 code.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211013120346.2926621-8-maz@kernel.org
|
|
Forward accesses to the ICV_*SGI*_EL1 registers to EL1, and
emulate ICV_SRE_EL1 by returning a fixed value.
This should be enough to support GICv3 in a protected guest.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211013120346.2926621-7-maz@kernel.org
|
|
A bunch of system registers (most of them MM related) should never
trap to the host under any circumstance. Keep them close to our chest.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211013120346.2926621-6-maz@kernel.org
|
|
All the SYS_*32_EL2 registers are AArch32-specific. Since we forbid
AArch32, there is no need to handle those in any way.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211013120346.2926621-5-maz@kernel.org
|
|
The ERR*/ERX* registers should be handled as RAZ/WI, and there
should be no need to involve EL1 for that.
Add a helper that handles such registers, and repaint the sysreg
table to declare these registers as RAZ/WI.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211013120346.2926621-4-maz@kernel.org
|
|
Rather than exposing a whole set of helper functions to retrieve
individual ID registers, use the existing decoding tree and expose
a single helper instead.
This allow a number of functions to be made static, and we now
have a single entry point to maintain.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20211013120346.2926621-3-maz@kernel.org
|