diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2026-04-17 17:18:03 +0300 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2026-04-17 17:18:03 +0300 |
| commit | 01f492e1817e858d1712f2489d0afbaa552f417b (patch) | |
| tree | 9ba6df223570acd45ccb2ba647407f75f4393eab /include | |
| parent | e55d98e7756135f32150b9b8f75d580d0d4b2dd3 (diff) | |
| parent | 6b802031877a995456c528095c41d1948546bf45 (diff) | |
| download | linux-01f492e1817e858d1712f2489d0afbaa552f417b.tar.xz | |
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"Arm:
- Add support for tracing in the standalone EL2 hypervisor code,
which should help both debugging and performance analysis. This
uses the new infrastructure for 'remote' trace buffers that can be
exposed by non-kernel entities such as firmware, and which came
through the tracing tree
- Add support for GICv5 Per Processor Interrupts (PPIs), as the
starting point for supporting the new GIC architecture in KVM
- Finally add support for pKVM protected guests, where pages are
unmapped from the host as they are faulted into the guest and can
be shared back from the guest using pKVM hypercalls. Protected
guests are created using a new machine type identifier. As the
elusive guestmem has not yet delivered on its promises, anonymous
memory is also supported
This is only a first step towards full isolation from the host; for
example, the CPU register state and DMA accesses are not yet
isolated. Because this does not really yet bring fully what it
promises, it is hidden behind CONFIG_ARM_PKVM_GUEST +
'kvm-arm.mode=protected', and also triggers TAINT_USER when a VM is
created. Caveat emptor
- Rework the dreaded user_mem_abort() function to make it more
maintainable, reducing the amount of state being exposed to the
various helpers and rendering a substantial amount of state
immutable
- Expand the Stage-2 page table dumper to support NV shadow page
tables on a per-VM basis
- Tidy up the pKVM PSCI proxy code to be slightly less hard to
follow
- Fix both SPE and TRBE in non-VHE configurations so that they do not
generate spurious, out of context table walks that ultimately lead
to very bad HW lockups
- A small set of patches fixing the Stage-2 MMU freeing in error
cases
- Tighten-up accepted SMC immediate value to be only #0 for host
SMCCC calls
- The usual cleanups and other selftest churn
LoongArch:
- Use CSR_CRMD_PLV for kvm_arch_vcpu_in_kernel()
- Add DMSINTC irqchip in kernel support
RISC-V:
- Fix steal time shared memory alignment checks
- Fix vector context allocation leak
- Fix array out-of-bounds in pmu_ctr_read() and pmu_fw_ctr_read_hi()
- Fix double-free of sdata in kvm_pmu_clear_snapshot_area()
- Fix integer overflow in kvm_pmu_validate_counter_mask()
- Fix shift-out-of-bounds in make_xfence_request()
- Fix lost write protection on huge pages during dirty logging
- Split huge pages during fault handling for dirty logging
- Skip CSR restore if VCPU is reloaded on the same core
- Implement kvm_arch_has_default_irqchip() for KVM selftests
- Factored-out ISA checks into separate sources
- Added hideleg to struct kvm_vcpu_config
- Factored-out VCPU config into separate sources
- Support configuration of per-VM HGATP mode from KVM user space
s390:
- Support for ESA (31-bit) guests inside nested hypervisors
- Remove restriction on memslot alignment, which is not needed
anymore with the new gmap code
- Fix LPSW/E to update the bear (which of course is the breaking
event address register)
x86:
- Shut up various UBSAN warnings on reading module parameter before
they were initialized
- Don't zero-allocate page tables that are used for splitting
hugepages in the TDP MMU, as KVM is guaranteed to set all SPTEs in
the page table and thus write all bytes
- As an optimization, bail early when trying to unsync 4KiB mappings
if the target gfn can just be mapped with a 2MiB hugepage
x86 generic:
- Copy single-chunk MMIO write values into struct kvm_vcpu (more
precisely struct kvm_mmio_fragment) to fix use-after-free stack
bugs where KVM would dereference stack pointer after an exit to
userspace
- Clean up and comment the emulated MMIO code to try to make it
easier to maintain (not necessarily "easy", but "easier")
- Move VMXON+VMXOFF and EFER.SVME toggling out of KVM (not *all* of
VMX and SVM enabling) as it is needed for trusted I/O
- Advertise support for AVX512 Bit Matrix Multiply (BMM) instructions
- Immediately fail the build if a required #define is missing in one
of KVM's headers that is included multiple times
- Reject SET_GUEST_DEBUG with -EBUSY if there's an already injected
exception, mostly to prevent syzkaller from abusing the uAPI to
trigger WARNs, but also because it can help prevent userspace from
unintentionally crashing the VM
- Exempt SMM from CPUID faulting on Intel, as per the spec
- Misc hardening and cleanup changes
x86 (AMD):
- Fix and optimize IRQ window inhibit handling for AVIC; make it
per-vCPU so that KVM doesn't prematurely re-enable AVIC if multiple
vCPUs have to-be-injected IRQs
- Clean up and optimize the OSVW handling, avoiding a bug in which
KVM would overwrite state when enabling virtualization on multiple
CPUs in parallel. This should not be a problem because OSVW should
usually be the same for all CPUs
- Drop a WARN in KVM_MEMORY_ENCRYPT_REG_REGION where KVM complains
about a "too large" size based purely on user input
- Clean up and harden the pinning code for KVM_MEMORY_ENCRYPT_REG_REGION
- Disallow synchronizing a VMSA of an already-launched/encrypted
vCPU, as doing so for an SNP guest will crash the host due to an
RMP violation page fault
- Overhaul KVM's APIs for detecting SEV+ guests so that VM-scoped
queries are required to hold kvm->lock, and enforce it by lockdep.
Fix various bugs where sev_guest() was not ensured to be stable for
the whole duration of a function or ioctl
- Convert a pile of kvm->lock SEV code to guard()
- Play nicer with userspace that does not enable
KVM_CAP_EXCEPTION_PAYLOAD, for which KVM needs to set CR2 and DR6
as a response to ioctls such as KVM_GET_VCPU_EVENTS (even if the
payload would end up in EXITINFO2 rather than CR2, for example).
Only set CR2 and DR6 when consumption of the payload is imminent,
but on the other hand force delivery of the payload in all paths
where userspace retrieves CR2 or DR6
- Use vcpu->arch.cr2 when updating vmcb12's CR2 on nested #VMEXIT
instead of vmcb02->save.cr2. The value is out of sync after a
save/restore or after a #PF is injected into L2
- Fix a class of nSVM bugs where some fields written by the CPU are
not synchronized from vmcb02 to cached vmcb12 after VMRUN, and so
are not up-to-date when saved by KVM_GET_NESTED_STATE
- Fix a class of bugs where the ordering between KVM_SET_NESTED_STATE
and KVM_SET_{S}REGS could cause vmcb02 to be incorrectly
initialized after save+restore
- Add a variety of missing nSVM consistency checks
- Fix several bugs where KVM failed to correctly update VMCB fields
on nested #VMEXIT
- Fix several bugs where KVM failed to correctly synthesize #UD or
#GP for SVM-related instructions
- Add support for save+restore of virtualized LBRs (on SVM)
- Refactor various helpers and macros to improve clarity and
(hopefully) make the code easier to maintain
- Aggressively sanitize fields when copying from vmcb12, to guard
against unintentionally allowing L1 to utilize yet-to-be-defined
features
- Fix several bugs where KVM botched rAX legality checks when
emulating SVM instructions. There are remaining issues in that KVM
doesn't handle size prefix overrides for 64-bit guests
- Fail emulation of VMRUN/VMLOAD/VMSAVE if mapping vmcb12 fails
instead of somewhat arbitrarily synthesizing #GP (i.e. don't double
down on AMD's architectural but sketchy behavior of generating #GP
for "unsupported" addresses)
- Cache all used vmcb12 fields to further harden against TOCTOU bugs
x86 (Intel):
- Drop obsolete branch hint prefixes from the VMX instruction macros
- Use ASM_INPUT_RM() in __vmcs_writel() to coerce clang into using a
register input when appropriate
- Code cleanups
guest_memfd:
- Don't mark guest_memfd folios as accessed, as guest_memfd doesn't
support reclaim, the memory is unevictable, and there is no storage
to write back to
LoongArch selftests:
- Add KVM PMU test cases
s390 selftests:
- Enable more memory selftests
x86 selftests:
- Add support for Hygon CPUs in KVM selftests
- Fix a bug in the MSR test where it would get false failures on
AMD/Hygon CPUs with exactly one of RDPID or RDTSCP
- Add an MADV_COLLAPSE testcase for guest_memfd as a regression test
for a bug where the kernel would attempt to collapse guest_memfd
folios against KVM's will"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (373 commits)
KVM: x86: use inlines instead of macros for is_sev_*guest
x86/virt: Treat SVM as unsupported when running as an SEV+ guest
KVM: SEV: Goto an existing error label if charging misc_cg for an ASID fails
KVM: SVM: Move lock-protected allocation of SEV ASID into a separate helper
KVM: SEV: use mutex guard in snp_handle_guest_req()
KVM: SEV: use mutex guard in sev_mem_enc_unregister_region()
KVM: SEV: use mutex guard in sev_mem_enc_ioctl()
KVM: SEV: use mutex guard in snp_launch_update()
KVM: SEV: Assert that kvm->lock is held when querying SEV+ support
KVM: SEV: Document that checking for SEV+ guests when reclaiming memory is "safe"
KVM: SEV: Hide "struct kvm_sev_info" behind CONFIG_KVM_AMD_SEV=y
KVM: SEV: WARN on unhandled VM type when initializing VM
KVM: LoongArch: selftests: Add PMU overflow interrupt test
KVM: LoongArch: selftests: Add basic PMU event counting test
KVM: LoongArch: selftests: Add cpucfg read/write helpers
LoongArch: KVM: Add DMSINTC inject msi to vCPU
LoongArch: KVM: Add DMSINTC device support
LoongArch: KVM: Make vcpu_is_preempted() as a macro rather than function
LoongArch: KVM: Move host CSR_GSTAT save and restore in context switch
LoongArch: KVM: Move host CSR_EENTRY save and restore in context switch
...
Diffstat (limited to 'include')
| -rw-r--r-- | include/kvm/arm_arch_timer.h | 8 | ||||
| -rw-r--r-- | include/kvm/arm_pmu.h | 5 | ||||
| -rw-r--r-- | include/kvm/arm_vgic.h | 191 | ||||
| -rw-r--r-- | include/linux/irqchip/arm-gic-v5.h | 27 | ||||
| -rw-r--r-- | include/linux/kvm_host.h | 27 | ||||
| -rw-r--r-- | include/uapi/linux/kvm.h | 10 |
6 files changed, 250 insertions, 18 deletions
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h index 7310841f4512..bf8cc9589bd0 100644 --- a/include/kvm/arm_arch_timer.h +++ b/include/kvm/arm_arch_timer.h @@ -10,6 +10,8 @@ #include <linux/clocksource.h> #include <linux/hrtimer.h> +#include <linux/irqchip/arm-gic-v5.h> + enum kvm_arch_timers { TIMER_PTIMER, TIMER_VTIMER, @@ -47,7 +49,7 @@ struct arch_timer_vm_data { u64 poffset; /* The PPI for each timer, global to the VM */ - u8 ppi[NR_KVM_TIMERS]; + u32 ppi[NR_KVM_TIMERS]; }; struct arch_timer_context { @@ -130,6 +132,10 @@ void kvm_timer_init_vhe(void); #define timer_vm_data(ctx) (&(timer_context_to_vcpu(ctx)->kvm->arch.timer_data)) #define timer_irq(ctx) (timer_vm_data(ctx)->ppi[arch_timer_ctx_index(ctx)]) +#define get_vgic_ppi(k, i) (((k)->arch.vgic.vgic_model != KVM_DEV_TYPE_ARM_VGIC_V5) ? \ + (i) : (FIELD_PREP(GICV5_HWIRQ_ID, i) | \ + FIELD_PREP(GICV5_HWIRQ_TYPE, GICV5_HWIRQ_TYPE_PPI))) + u64 kvm_arm_timer_read_sysreg(struct kvm_vcpu *vcpu, enum kvm_arch_timers tmr, enum kvm_arch_timer_regs treg); diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h index 96754b51b411..0a36a3d5c894 100644 --- a/include/kvm/arm_pmu.h +++ b/include/kvm/arm_pmu.h @@ -12,6 +12,9 @@ #define KVM_ARMV8_PMU_MAX_COUNTERS 32 +/* PPI #23 - architecturally specified for GICv5 */ +#define KVM_ARMV8_PMU_GICV5_IRQ 0x20000017 + #if IS_ENABLED(CONFIG_HW_PERF_EVENTS) && IS_ENABLED(CONFIG_KVM) struct kvm_pmc { u8 idx; /* index into the pmu->pmc array */ @@ -38,7 +41,7 @@ struct arm_pmu_entry { }; bool kvm_supports_guest_pmuv3(void); -#define kvm_arm_pmu_irq_initialized(v) ((v)->arch.pmu.irq_num >= VGIC_NR_SGIS) +#define kvm_arm_pmu_irq_initialized(v) ((v)->arch.pmu.irq_num != 0) u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx); void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val); void kvm_pmu_set_counter_value_user(struct kvm_vcpu *vcpu, u64 select_idx, u64 val); diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index f2eafc65bbf4..1388dc6028a9 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -19,7 +19,9 @@ #include <linux/jump_label.h> #include <linux/irqchip/arm-gic-v4.h> +#include <linux/irqchip/arm-gic-v5.h> +#define VGIC_V5_MAX_CPUS 512 #define VGIC_V3_MAX_CPUS 512 #define VGIC_V2_MAX_CPUS 8 #define VGIC_NR_IRQS_LEGACY 256 @@ -31,9 +33,96 @@ #define VGIC_MIN_LPI 8192 #define KVM_IRQCHIP_NUM_PINS (1020 - 32) -#define irq_is_ppi(irq) ((irq) >= VGIC_NR_SGIS && (irq) < VGIC_NR_PRIVATE_IRQS) -#define irq_is_spi(irq) ((irq) >= VGIC_NR_PRIVATE_IRQS && \ - (irq) <= VGIC_MAX_SPI) +/* + * GICv5 supports 128 PPIs, but only the first 64 are architected. We only + * support the timers and PMU in KVM, both of which are architected. Rather than + * handling twice the state, we instead opt to only support the architected set + * in KVM for now. At a future stage, this can be bumped up to 128, if required. + */ +#define VGIC_V5_NR_PRIVATE_IRQS 64 + +#define is_v5_type(t, i) (FIELD_GET(GICV5_HWIRQ_TYPE, (i)) == (t)) + +#define __irq_is_sgi(t, i) \ + ({ \ + bool __ret; \ + \ + switch (t) { \ + case KVM_DEV_TYPE_ARM_VGIC_V5: \ + __ret = false; \ + break; \ + default: \ + __ret = (i) < VGIC_NR_SGIS; \ + } \ + \ + __ret; \ + }) + +#define __irq_is_ppi(t, i) \ + ({ \ + bool __ret; \ + \ + switch (t) { \ + case KVM_DEV_TYPE_ARM_VGIC_V5: \ + __ret = is_v5_type(GICV5_HWIRQ_TYPE_PPI, (i)); \ + break; \ + default: \ + __ret = (i) >= VGIC_NR_SGIS; \ + __ret &= (i) < VGIC_NR_PRIVATE_IRQS; \ + } \ + \ + __ret; \ + }) + +#define __irq_is_spi(t, i) \ + ({ \ + bool __ret; \ + \ + switch (t) { \ + case KVM_DEV_TYPE_ARM_VGIC_V5: \ + __ret = is_v5_type(GICV5_HWIRQ_TYPE_SPI, (i)); \ + break; \ + default: \ + __ret = (i) <= VGIC_MAX_SPI; \ + __ret &= (i) >= VGIC_NR_PRIVATE_IRQS; \ + } \ + \ + __ret; \ + }) + +#define __irq_is_lpi(t, i) \ + ({ \ + bool __ret; \ + \ + switch (t) { \ + case KVM_DEV_TYPE_ARM_VGIC_V5: \ + __ret = is_v5_type(GICV5_HWIRQ_TYPE_LPI, (i)); \ + break; \ + default: \ + __ret = (i) >= 8192; \ + } \ + \ + __ret; \ + }) + +#define irq_is_sgi(k, i) __irq_is_sgi((k)->arch.vgic.vgic_model, i) +#define irq_is_ppi(k, i) __irq_is_ppi((k)->arch.vgic.vgic_model, i) +#define irq_is_spi(k, i) __irq_is_spi((k)->arch.vgic.vgic_model, i) +#define irq_is_lpi(k, i) __irq_is_lpi((k)->arch.vgic.vgic_model, i) + +#define irq_is_private(k, i) (irq_is_ppi(k, i) || irq_is_sgi(k, i)) + +#define vgic_v5_get_hwirq_id(x) FIELD_GET(GICV5_HWIRQ_ID, (x)) +#define vgic_v5_set_hwirq_id(x) FIELD_PREP(GICV5_HWIRQ_ID, (x)) + +#define __vgic_v5_set_type(t) (FIELD_PREP(GICV5_HWIRQ_TYPE, GICV5_HWIRQ_TYPE_##t)) +#define vgic_v5_make_ppi(x) (__vgic_v5_set_type(PPI) | vgic_v5_set_hwirq_id(x)) +#define vgic_v5_make_spi(x) (__vgic_v5_set_type(SPI) | vgic_v5_set_hwirq_id(x)) +#define vgic_v5_make_lpi(x) (__vgic_v5_set_type(LPI) | vgic_v5_set_hwirq_id(x)) + +#define __vgic_is_v(k, v) ((k)->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V##v) +#define vgic_is_v3(k) (__vgic_is_v(k, 3)) +#define vgic_is_v5(k) (__vgic_is_v(k, 5)) enum vgic_type { VGIC_V2, /* Good ol' GICv2 */ @@ -101,6 +190,8 @@ enum vgic_irq_config { VGIC_CONFIG_LEVEL }; +struct vgic_irq; + /* * Per-irq ops overriding some common behavious. * @@ -119,6 +210,19 @@ struct irq_ops { * peaking into the physical GIC. */ bool (*get_input_level)(int vintid); + + /* + * Function pointer to override the queuing of an IRQ. + */ + bool (*queue_irq_unlock)(struct kvm *kvm, struct vgic_irq *irq, + unsigned long flags) __releases(&irq->irq_lock); + + /* + * Callback function pointer to either enable or disable direct + * injection for a mapped interrupt. + */ + void (*set_direct_injection)(struct kvm_vcpu *vcpu, + struct vgic_irq *irq, bool direct); }; struct vgic_irq { @@ -238,6 +342,26 @@ struct vgic_redist_region { struct list_head list; }; +struct vgic_v5_vm { + /* + * We only expose a subset of PPIs to the guest. This subset is a + * combination of the PPIs that are actually implemented and what we + * actually choose to expose. + */ + DECLARE_BITMAP(vgic_ppi_mask, VGIC_V5_NR_PRIVATE_IRQS); + + /* A mask of the PPIs that are exposed for userspace to drive. */ + DECLARE_BITMAP(userspace_ppis, VGIC_V5_NR_PRIVATE_IRQS); + + /* + * The HMR itself is handled by the hardware, but we still need to have + * a mask that we can use when merging in pending state (only the state + * of Edge PPIs is merged back in from the guest an the HMR provides a + * convenient way to do that). + */ + DECLARE_BITMAP(vgic_ppi_hmr, VGIC_V5_NR_PRIVATE_IRQS); +}; + struct vgic_dist { bool in_kernel; bool ready; @@ -310,6 +434,11 @@ struct vgic_dist { * else. */ struct its_vm its_vm; + + /* + * GICv5 per-VM data. + */ + struct vgic_v5_vm gicv5_vm; }; struct vgic_v2_cpu_if { @@ -340,11 +469,40 @@ struct vgic_v3_cpu_if { unsigned int used_lrs; }; +struct vgic_v5_cpu_if { + u64 vgic_apr; + u64 vgic_vmcr; + + /* PPI register state */ + DECLARE_BITMAP(vgic_ppi_dvir, VGIC_V5_NR_PRIVATE_IRQS); + DECLARE_BITMAP(vgic_ppi_activer, VGIC_V5_NR_PRIVATE_IRQS); + DECLARE_BITMAP(vgic_ppi_enabler, VGIC_V5_NR_PRIVATE_IRQS); + /* We have one byte (of which 5 bits are used) per PPI for priority */ + u64 vgic_ppi_priorityr[VGIC_V5_NR_PRIVATE_IRQS / 8]; + + /* + * The ICSR is re-used across host and guest, and hence it needs to be + * saved/restored. Only one copy is required as the host should block + * preemption between executing GIC CDRCFG and acccessing the + * ICC_ICSR_EL1. A guest, of course, can never guarantee this, and hence + * it is the hyp's responsibility to keep the state constistent. + */ + u64 vgic_icsr; + + struct gicv5_vpe gicv5_vpe; +}; + +/* What PPI capabilities does a GICv5 host have */ +struct vgic_v5_ppi_caps { + DECLARE_BITMAP(impl_ppi_mask, VGIC_V5_NR_PRIVATE_IRQS); +}; + struct vgic_cpu { /* CPU vif control registers for world switch */ union { struct vgic_v2_cpu_if vgic_v2; struct vgic_v3_cpu_if vgic_v3; + struct vgic_v5_cpu_if vgic_v5; }; struct vgic_irq *private_irqs; @@ -392,13 +550,17 @@ int kvm_vgic_create(struct kvm *kvm, u32 type); void kvm_vgic_destroy(struct kvm *kvm); void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu); int kvm_vgic_map_resources(struct kvm *kvm); +void kvm_vgic_finalize_idregs(struct kvm *kvm); int kvm_vgic_hyp_init(void); void kvm_vgic_init_cpu_hardware(void); int kvm_vgic_inject_irq(struct kvm *kvm, struct kvm_vcpu *vcpu, unsigned int intid, bool level, void *owner); +void kvm_vgic_set_irq_ops(struct kvm_vcpu *vcpu, u32 vintid, + struct irq_ops *ops); +void kvm_vgic_clear_irq_ops(struct kvm_vcpu *vcpu, u32 vintid); int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq, - u32 vintid, struct irq_ops *ops); + u32 vintid); int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid); int kvm_vgic_get_map(struct kvm_vcpu *vcpu, unsigned int vintid); bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid); @@ -414,8 +576,20 @@ u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu); #define irqchip_in_kernel(k) (!!((k)->arch.vgic.in_kernel)) #define vgic_initialized(k) ((k)->arch.vgic.initialized) -#define vgic_valid_spi(k, i) (((i) >= VGIC_NR_PRIVATE_IRQS) && \ - ((i) < (k)->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS)) +#define vgic_valid_spi(k, i) \ + ({ \ + bool __ret = irq_is_spi(k, i); \ + \ + switch ((k)->arch.vgic.vgic_model) { \ + case KVM_DEV_TYPE_ARM_VGIC_V5: \ + __ret &= FIELD_GET(GICV5_HWIRQ_ID, i) < (k)->arch.vgic.nr_spis; \ + break; \ + default: \ + __ret &= (i) < ((k)->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS); \ + } \ + \ + __ret; \ + }) bool kvm_vcpu_has_pending_irqs(struct kvm_vcpu *vcpu); void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu); @@ -455,6 +629,11 @@ int vgic_v4_load(struct kvm_vcpu *vcpu); void vgic_v4_commit(struct kvm_vcpu *vcpu); int vgic_v4_put(struct kvm_vcpu *vcpu); +int vgic_v5_finalize_ppi_state(struct kvm *kvm); +bool vgic_v5_ppi_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq, + unsigned long flags); +void vgic_v5_set_ppi_dvi(struct kvm_vcpu *vcpu, struct vgic_irq *irq, bool dvi); + bool vgic_state_is_nested(struct kvm_vcpu *vcpu); /* CPU HP callbacks */ diff --git a/include/linux/irqchip/arm-gic-v5.h b/include/linux/irqchip/arm-gic-v5.h index b78488df6c98..40d2fce68294 100644 --- a/include/linux/irqchip/arm-gic-v5.h +++ b/include/linux/irqchip/arm-gic-v5.h @@ -25,6 +25,28 @@ #define GICV5_HWIRQ_TYPE_SPI UL(0x3) /* + * Architected PPIs + */ +#define GICV5_ARCH_PPI_S_DB_PPI 0x0 +#define GICV5_ARCH_PPI_RL_DB_PPI 0x1 +#define GICV5_ARCH_PPI_NS_DB_PPI 0x2 +#define GICV5_ARCH_PPI_SW_PPI 0x3 +#define GICV5_ARCH_PPI_HACDBSIRQ 0xf +#define GICV5_ARCH_PPI_CNTHVS 0x13 +#define GICV5_ARCH_PPI_CNTHPS 0x14 +#define GICV5_ARCH_PPI_PMBIRQ 0x15 +#define GICV5_ARCH_PPI_COMMIRQ 0x16 +#define GICV5_ARCH_PPI_PMUIRQ 0x17 +#define GICV5_ARCH_PPI_CTIIRQ 0x18 +#define GICV5_ARCH_PPI_GICMNT 0x19 +#define GICV5_ARCH_PPI_CNTHP 0x1a +#define GICV5_ARCH_PPI_CNTV 0x1b +#define GICV5_ARCH_PPI_CNTHV 0x1c +#define GICV5_ARCH_PPI_CNTPS 0x1d +#define GICV5_ARCH_PPI_CNTP 0x1e +#define GICV5_ARCH_PPI_TRBIRQ 0x1f + +/* * Tables attributes */ #define GICV5_NO_READ_ALLOC 0b0 @@ -365,6 +387,11 @@ int gicv5_spi_irq_set_type(struct irq_data *d, unsigned int type); int gicv5_irs_iste_alloc(u32 lpi); void gicv5_irs_syncr(void); +/* Embedded in kvm.arch */ +struct gicv5_vpe { + bool resident; +}; + struct gicv5_its_devtab_cfg { union { struct { diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 6b76e7a6f4c2..4c14aee1fb06 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -318,7 +318,8 @@ static inline bool kvm_vcpu_can_poll(ktime_t cur, ktime_t stop) struct kvm_mmio_fragment { gpa_t gpa; void *data; - unsigned len; + u64 val; + unsigned int len; }; struct kvm_vcpu { @@ -1029,6 +1030,13 @@ static inline struct kvm_vcpu *kvm_get_vcpu_by_id(struct kvm *kvm, int id) return NULL; } +static inline bool kvm_is_vcpu_creation_in_progress(struct kvm *kvm) +{ + lockdep_assert_held(&kvm->lock); + + return kvm->created_vcpus != atomic_read(&kvm->online_vcpus); +} + void kvm_destroy_vcpus(struct kvm *kvm); int kvm_trylock_all_vcpus(struct kvm *kvm); @@ -1628,6 +1636,13 @@ static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {} #ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING /* + * kvm_arch_shutdown() is invoked immediately prior to forcefully disabling + * hardware virtualization on all CPUs via IPI function calls (in preparation + * for shutdown or reboot), e.g. to allow arch code to prepare for disabling + * virtualization while KVM may be actively running vCPUs. + */ +void kvm_arch_shutdown(void); +/* * kvm_arch_{enable,disable}_virtualization() are called on one CPU, under * kvm_usage_lock, immediately after/before 0=>1 and 1=>0 transitions of * kvm_usage_count, i.e. at the beginning of the generic hardware enabling @@ -2300,7 +2315,6 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu) #ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING extern bool enable_virt_at_load; -extern bool kvm_rebooting; #endif extern unsigned int halt_poll_ns; @@ -2366,6 +2380,7 @@ void kvm_unregister_device_ops(u32 type); extern struct kvm_device_ops kvm_mpic_ops; extern struct kvm_device_ops kvm_arm_vgic_v2_ops; extern struct kvm_device_ops kvm_arm_vgic_v3_ops; +extern struct kvm_device_ops kvm_arm_vgic_v5_ops; #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT @@ -2594,12 +2609,4 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu, struct kvm_pre_fault_memory *range); #endif -#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING -int kvm_enable_virtualization(void); -void kvm_disable_virtualization(void); -#else -static inline int kvm_enable_virtualization(void) { return 0; } -static inline void kvm_disable_virtualization(void) { } -#endif - #endif diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 3f0d8d3c3daf..6c8afa2047bf 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -704,6 +704,11 @@ struct kvm_enable_cap { #define KVM_VM_TYPE_ARM_IPA_SIZE_MASK 0xffULL #define KVM_VM_TYPE_ARM_IPA_SIZE(x) \ ((x) & KVM_VM_TYPE_ARM_IPA_SIZE_MASK) + +#define KVM_VM_TYPE_ARM_PROTECTED (1UL << 31) +#define KVM_VM_TYPE_ARM_MASK (KVM_VM_TYPE_ARM_IPA_SIZE_MASK | \ + KVM_VM_TYPE_ARM_PROTECTED) + /* * ioctls for /dev/kvm fds: */ @@ -990,6 +995,7 @@ struct kvm_enable_cap { #define KVM_CAP_ARM_SEA_TO_USER 245 #define KVM_CAP_S390_USER_OPEREXEC 246 #define KVM_CAP_S390_KEYOP 247 +#define KVM_CAP_S390_VSIE_ESAMODE 248 struct kvm_irq_routing_irqchip { __u32 irqchip; @@ -1225,6 +1231,10 @@ enum kvm_device_type { #define KVM_DEV_TYPE_LOONGARCH_EIOINTC KVM_DEV_TYPE_LOONGARCH_EIOINTC KVM_DEV_TYPE_LOONGARCH_PCHPIC, #define KVM_DEV_TYPE_LOONGARCH_PCHPIC KVM_DEV_TYPE_LOONGARCH_PCHPIC + KVM_DEV_TYPE_LOONGARCH_DMSINTC, +#define KVM_DEV_TYPE_LOONGARCH_DMSINTC KVM_DEV_TYPE_LOONGARCH_DMSINTC + KVM_DEV_TYPE_ARM_VGIC_V5, +#define KVM_DEV_TYPE_ARM_VGIC_V5 KVM_DEV_TYPE_ARM_VGIC_V5 KVM_DEV_TYPE_MAX, |
