Age | Commit message (Collapse) | Author | Files | Lines |
|
arch_timer_edge_cases hits the following assertion in < 10% of the test runs:
==== Test Assertion Failure ====
arm64/arch_timer_edge_cases.c:490: timer_get_cntct(timer) >= DEF_CNT + (timer_get_cntfrq() * (uint64_t)(delta_2_ms) / 1000)
pid=17110 tid=17110 errno=4 - Interrupted system call
1 0x0000000000404ec7: test_run at arch_timer_edge_cases.c:945
2 0x0000000000401fa3: main at arch_timer_edge_cases.c:1074
3 0x0000ffffa774b587: ?? ??:0
4 0x0000ffffa774b65f: ?? ??:0
5 0x000000000040206f: _start at ??:?
timer_get_cntct(timer) >= DEF_CNT + msec_to_cycles(delta_2_ms)
Enabling the timer without proper xval initialization in set_tval_irq()
resulted in an early interrupt during timer reprogramming. Make sure
to set the xval before setting the enable bit.
Signed-off-by: Sebastian Ott <sebott@redhat.com>
Link: https://lore.kernel.org/r/20250605103613.14544-4-sebott@redhat.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
arch_timer_edge_cases tries to migrate itself across host cpus. Before
the first test, it migrates to cpu 0 by setting up an affinity mask with
only bit 0 set. After that it looks for the next possible cpu in the
current affinity mask which still has only bit 0 set. So there is no
migration at all.
Fix this by reading the default mask at start and use this to find
the next cpu in each iteration.
Signed-off-by: Sebastian Ott <sebott@redhat.com>
Link: https://lore.kernel.org/r/20250605103613.14544-3-sebott@redhat.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Fix the help text for arch_timer_edge_cases to show the correct
option for setting the wait time.
Signed-off-by: Sebastian Ott <sebott@redhat.com>
Link: https://lore.kernel.org/r/20250605103613.14544-2-sebott@redhat.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Now that we don't have any use of __vcpu_sys_reg() as a lvalue,
remove the in-place update, and directly return the sanitised
value.
Reviewed-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250603070824.1192795-5-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
We are about to prevent the use of __vcpu_sys_reg() as a lvalue,
and getting the address of a rvalue is not a thing.
Update the couple of places where we do this to use the __ctxt_sys_reg()
accessor, which return the address of a register.
Reviewed-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250603070824.1192795-4-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In a number of cases, we perform a Read-Modify-Write operation on
a system register, meaning that we would apply the RESx masks twice.
Instead, provide a new accessor that performs this RMW operation,
allowing the masks to be applied exactly once per operation.
Reviewed-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250603070824.1192795-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Assigning a value to a system register doesn't do what it is
supposed to be doing if that register is one that has RESx bits.
The main problem is that we use __vcpu_sys_reg(), which can be used
both as a lvalue and rvalue. When used as a lvalue, the bit masking
occurs *before* the new value is assigned, meaning that we (1) do
pointless work on the old cvalue, and (2) potentially assign an
invalid value as we fail to apply the masks to it.
Fix this by providing a new __vcpu_assign_sys_reg() that does
what it says on the tin, and sanitises the *new* value instead of
the old one. This comes with a significant amount of churn.
Reviewed-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250603070824.1192795-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Dan reports that iterating over a device ITEs can legitimately lead
to a NULL pointer, and that the NULL check is placed *after* the
pointer has already been dereferenced.
Hoist the pointer check as early as possible and be done with it.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes: 30deb51a677b ("KVM: arm64: vgic-its: Add debugfs interface to expose ITS tables")
Link: https://lore.kernel.org/r/aDBylI1YnjPatAbr@stanley.mountain
Cc: Jing Zhang <jingzhangos@google.com>
Link: https://lore.kernel.org/r/20250530091647.1152489-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
syzkaller has found another ugly race in the VGIC, this time dealing
with VGIC creation. Since kvm_vgic_create() doesn't sufficiently protect
against in-flight vCPU creations, it is possible to get a vCPU into the
kernel w/ an in-kernel VGIC but no allocation of private IRQs:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000d20
Mem abort info:
ESR = 0x0000000096000046
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x06: level 2 translation fault
Data abort info:
ISV = 0, ISS = 0x00000046, ISS2 = 0x00000000
CM = 0, WnR = 1, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=0000000103e4f000
[0000000000000d20] pgd=0800000102e1c403, p4d=0800000102e1c403, pud=0800000101146403, pmd=0000000000000000
Internal error: Oops: 0000000096000046 [#1] PREEMPT SMP
CPU: 9 UID: 0 PID: 246 Comm: test Not tainted 6.14.0-rc6-00097-g0c90821f5db8 #16
Hardware name: linux,dummy-virt (DT)
pstate: 814020c5 (Nzcv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
pc : _raw_spin_lock_irqsave+0x34/0x8c
lr : kvm_vgic_set_owner+0x54/0xa4
sp : ffff80008086ba20
x29: ffff80008086ba20 x28: ffff0000c19b5640 x27: 0000000000000000
x26: 0000000000000000 x25: ffff0000c4879bd0 x24: 000000000000001e
x23: 0000000000000000 x22: 0000000000000000 x21: ffff0000c487af80
x20: ffff0000c487af18 x19: 0000000000000000 x18: 0000001afadd5a8b
x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000001
x14: ffff0000c19b56c0 x13: 0030c9adf9d9889e x12: ffffc263710e1908
x11: 0000001afb0d74f2 x10: e0966b840b373664 x9 : ec806bf7d6a57cd5
x8 : ffff80008086b980 x7 : 0000000000000001 x6 : 0000000000000001
x5 : 0000000080800054 x4 : 4ec4ec4ec4ec4ec5 x3 : 0000000000000000
x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000d20
Call trace:
_raw_spin_lock_irqsave+0x34/0x8c (P)
kvm_vgic_set_owner+0x54/0xa4
kvm_timer_enable+0xf4/0x274
kvm_arch_vcpu_run_pid_change+0xe0/0x380
kvm_vcpu_ioctl+0x93c/0x9e0
__arm64_sys_ioctl+0xb4/0xec
invoke_syscall+0x48/0x110
el0_svc_common.constprop.0+0x40/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x30/0xd0
el0t_64_sync_handler+0x10c/0x138
el0t_64_sync+0x198/0x19c
Code: b9000841 d503201f 52800001 52800022 (88e17c02)
---[ end trace 0000000000000000 ]---
Plug the race by explicitly checking for an in-progress vCPU creation
and failing kvm_vgic_create() when that's the case. Add some comments to
document all the things kvm_vgic_create() is trying to guard against
too.
Reported-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Tested-by: Alexander Potapenko <glider@google.com>
Link: https://lore.kernel.org/r/20250523194722.4066715-6-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
KVM's interrupt infrastructure is dodgy at best, allowing for some ugly
'off label' usage of the various UAPIs. In one example, userspace can
change the routing entry of a particular "GSI" after configuring
irqbypass with KVM_IRQFD. KVM/arm64 is oblivious to this, and winds up
preserving the stale translation in cases where vLPIs are configured.
Honor userspace's intentions and tear down the vLPI mapping if affected
by a "GSI" routing change. Make no attempt to reconstruct vLPIs if the
new target is an MSI and just fall back to software injection.
Tested-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250523194722.4066715-5-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The virtual mapping and "GSI" routing of a particular vLPI is subject to
change in response to the guest / userspace. This can be pretty annoying
to deal with when KVM needs to track the physical state that's managed
for vLPI direct injection.
Make vgic_v4_unset_forwarding() resilient by using the host IRQ to
resolve the vgic IRQ. Since this uses the LPI xarray directly, finding
the ITS by doorbell address + grabbing it's its_lock is no longer
necessary. Note that matching the right ITS / ITE is already handled in
vgic_v4_set_forwarding(), and unless there's a bug in KVM's VGIC ITS
emulation the virtual mapping that should remain stable for the lifetime
of the vLPI mapping.
Tested-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250523194722.4066715-4-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Though undocumented, KVM generally protects the translation of a vLPI
with the its_lock. While this makes perfectly good sense, as the ITS
itself contains the guest translation, an upcoming change will require
twiddling the vLPI mapping in an atomic context.
Switch to using the vIRQ's irq_lock to protect the translation. Use of
the its_lock in vgic_v4_unset_forwarding() is preserved for now as it
still needs to walk the ITS.
Tested-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250523194722.4066715-3-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The locking dance is about to get more interesting, switch the its_lock
over to a lock guard to make it a bit easier to handle.
Tested-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250523194722.4066715-2-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
When handling a TLBI VA* instruction that potentially targets a
VNCR page mapping, we fail to mask out the top bits that contain
the ASID and TTL fields, hence potentially failing the VA check
in the TLB code.
An additional wrinkle is that we fail to sign extend the VA,
again leading to failed VA checks.
Fix both in one go by sign-extending the VA from bit 48, making
it comparable to the way we interpret VNCR_EL2.BADDR.
Fixes: 4ffa72ad8f37e ("KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2")
Link: https://lore.kernel.org/r/20250525175759.780891-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Broonie reports that fed55f49fad18 ("arm64: errata: Work around
AmpereOne's erratum AC04_CPU_23") breaks one of the vdso selftests
(vdso_test_chacha) as it indirectly drags asm/sysreg.h.
It is rather unfortunate (and worrying) that userspace gets built
with non-UAPI headers. In any case, paper over the issue by dragging
linux/kconfig.h in asm/sysreg.h. It is the right thing to do, at
least from the kernel perspective.
Reported-by: Mark Brown <broonie@kernel.org>
Fixes: fed55f49fad18 ("arm64: errata: Work around AmpereOne's erratum AC04_CPU_23")
Link: https://lore.kernel.org/r/aDCDGZ-G-nCP3hJI@finisterre.sirena.org.uk
Cc: D Scott Phillips <scott@os.amperecomputing.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250523170208.530818-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/misc-6.16:
: .
: Misc changes and improvements for 6.16:
:
: - Add a new selftest for the SVE host state being corrupted by a guest
:
: - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2,
: ensuring that the window for interrupts is slightly bigger, and avoiding
: a pretty bad erratum on the AmpereOne HW
:
: - Replace a couple of open-coded on/off strings with str_on_off()
:
: - Get rid of the pKVM memblock sorting, which now appears to be superflous
:
: - Drop superflous clearing of ICH_LR_EOI in the LR when nesting
:
: - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from
: a pretty bad case of TLB corruption unless accesses to HCR_EL2 are
: heavily synchronised
:
: - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables
: in a human-friendly fashion
: .
KVM: arm64: Fix documentation for vgic_its_iter_next()
KVM: arm64: vgic-its: Add debugfs interface to expose ITS tables
arm64: errata: Work around AmpereOne's erratum AC04_CPU_23
KVM: arm64: nv: Remove clearing of ICH_LR<n>.EOI if ICH_LR<n>.HW == 1
KVM: arm64: Drop sort_memblock_regions()
KVM: arm64: selftests: Add test for SVE host corruption
KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE mode
KVM: arm64: Replace ternary flags with str_on_off() helper
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/nv-nv:
: .
: Flick the switch on the NV support by adding the missing piece
: in the form of the VNCR page management. From the cover letter:
:
: "This is probably the most interesting bit of the whole NV adventure.
: So far, everything else has been a walk in the park, but this one is
: where the real fun takes place.
:
: With FEAT_NV2, most of the NV support revolves around tricking a guest
: into accessing memory while it tries to access system registers. The
: hypervisor's job is to handle the context switch of the actual
: registers with the state in memory as needed."
: .
KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section
KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held
KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating
KVM: arm64: Document NV caps and vcpu flags
KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2*
KVM: arm64: nv: Remove dead code from ERET handling
KVM: arm64: nv: Plumb TLBI S1E2 into system instruction dispatch
KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2
KVM: arm64: nv: Program host's VNCR_EL2 to the fixmap address
KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers
KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2
KVM: arm64: nv: Handle VNCR_EL2-triggered faults
KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2
KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2
KVM: arm64: nv: Don't adjust PSTATE.M when L2 is nesting
KVM: arm64: nv: Move TLBI range decoding to a helper
KVM: arm64: nv: Snapshot S1 ASID tagging information during walk
KVM: arm64: nv: Extract translation helper from the AT code
KVM: arm64: nv: Allocate VNCR page when required
arm64: sysreg: Add layout for VNCR_EL2
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/at-fixes-6.16:
: .
: Set of fixes for Address Translation (AT) instruction emulation,
: which affect the (not yet upstream) NV support.
:
: From the cover letter:
:
: "Here's a small series of fixes for KVM's implementation of address
: translation (aka the AT S1* instructions), addressing a number of
: issues in increasing levels of severity:
:
: - We misreport PAR_EL1.PTW in a number of occasions, including state
: that is not possible as per the architecture definition
:
: - We don't handle access faults at all, and that doesn't play very
: well with the rest of the VNCR stuff
:
: - AT S1E{0,1} from EL2 with HCR_EL2.{E2H,TGE}={1,1} will absolutely
: take the host down, no questions asked"
: .
KVM: arm64: Don't feed uninitialised data to HCR_EL2
KVM: arm64: Teach address translation about access faults
KVM: arm64: Fix PAR_EL1.{PTW,S} reporting on AT S1E*
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/fgt-masks: (43 commits)
: .
: Large rework of the way KVM deals with trap bits in conjunction with
: the CPU feature registers. It now draws a direct link between which
: the feature set, the system registers that need to UNDEF to match
: the configuration and bits that need to behave as RES0 or RES1 in
: the trap registers that are visible to the guest.
:
: Best of all, these definitions are mostly automatically generated
: from the JSON description published by ARM under a permissive
: license.
: .
KVM: arm64: Handle TSB CSYNC traps
KVM: arm64: Add FGT descriptors for FEAT_FGT2
KVM: arm64: Allow sysreg ranges for FGT descriptors
KVM: arm64: Add context-switch for FEAT_FGT2 registers
KVM: arm64: Add trap routing for FEAT_FGT2 registers
KVM: arm64: Add sanitisation for FEAT_FGT2 registers
KVM: arm64: Add FEAT_FGT2 registers to the VNCR page
KVM: arm64: Use HCR_EL2 feature map to drive fixed-value bits
KVM: arm64: Use HCRX_EL2 feature map to drive fixed-value bits
KVM: arm64: Allow kvm_has_feat() to take variable arguments
KVM: arm64: Use FGT feature maps to drive RES0 bits
KVM: arm64: Validate FGT register descriptions against RES0 masks
KVM: arm64: Switch to table-driven FGU configuration
KVM: arm64: Handle PSB CSYNC traps
KVM: arm64: Use KVM-specific HCRX_EL2 RES0 mask
KVM: arm64: Remove hand-crafted masks for FGT registers
KVM: arm64: Use computed FGT masks to setup FGT registers
KVM: arm64: Propagate FGT masks to the nVHE hypervisor
KVM: arm64: Unconditionally configure fine-grain traps
KVM: arm64: Use computed masks as sanitisers for FGT registers
...
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/mte-frac:
: .
: Prevent FEAT_MTE_ASYNC from being accidently exposed to a guest,
: courtesy of Ben Horgan. From the cover letter:
:
: "The ID_AA64PFR1_EL1.MTE_frac field is currently hidden from KVM.
: However, when ID_AA64PFR1_EL1.MTE==2, ID_AA64PFR1_EL1.MTE_frac==0
: indicates that MTE_ASYNC is supported. On a host with
: ID_AA64PFR1_EL1.MTE==2 but without MTE_ASYNC support a guest with the
: MTE capability enabled will incorrectly see MTE_ASYNC advertised as
: supported. This series fixes that."
: .
KVM: selftests: Confirm exposing MTE_frac does not break migration
KVM: arm64: Make MTE_frac masking conditional on MTE capability
arm64/sysreg: Expose MTE_frac so that it is visible to KVM
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/ubsan-el2:
: .
: Add UBSAN support to the EL2 portion of KVM, reusing most of the
: existing logic provided by CONFIG_IBSAN_TRAP.
:
: Patches courtesy of Mostafa Saleh.
: .
KVM: arm64: Handle UBSAN faults
KVM: arm64: Introduce CONFIG_UBSAN_KVM_EL2
ubsan: Remove regs from report_ubsan_failure()
arm64: Introduce esr_is_ubsan_brk()
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/pkvm-np-thp-6.16: (21 commits)
: .
: Large mapping support for non-protected pKVM guests, courtesy of
: Vincent Donnefort. From the cover letter:
:
: "This series adds support for stage-2 huge mappings (PMD_SIZE) to pKVM
: np-guests, that is installing PMD-level mappings in the stage-2,
: whenever the stage-1 is backed by either Hugetlbfs or THPs."
: .
KVM: arm64: np-guest CMOs with PMD_SIZE fixmap
KVM: arm64: Stage-2 huge mappings for np-guests
KVM: arm64: Add a range to pkvm_mappings
KVM: arm64: Convert pkvm_mappings to interval tree
KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest()
KVM: arm64: Add a range to __pkvm_host_wrprotect_guest()
KVM: arm64: Add a range to __pkvm_host_unshare_guest()
KVM: arm64: Add a range to __pkvm_host_share_guest()
KVM: arm64: Introduce for_each_hyp_page
KVM: arm64: Handle huge mappings for np-guest CMOs
KVM: arm64: Extend pKVM selftest for np-guests
KVM: arm64: Selftest for pKVM transitions
KVM: arm64: Don't WARN from __pkvm_host_share_guest()
KVM: arm64: Add .hyp.data section
KVM: arm64: Unconditionally cross check hyp state
KVM: arm64: Defer EL2 stage-1 mapping on share
KVM: arm64: Move hyp state to hyp_vmemmap
KVM: arm64: Introduce {get,set}_host_state() helpers
KVM: arm64: Use 0b11 for encoding PKVM_NOPAGE
KVM: arm64: Fix pKVM page-tracking comments
...
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
As reported by the build robot, the documentation for vgic_its_iter_next()
contains a typo. Fix it.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202505221421.KAuWlmSr-lkp@intel.com/
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
With the introduction of stage-2 huge mappings in the pKVM hypervisor,
guest pages CMO is needed for PMD_SIZE size. Fixmap only supports
PAGE_SIZE and iterating over the huge-page is time consuming (mostly due
to TLBI on hyp_fixmap_unmap) which is a problem for EL2 latency.
Introduce a shared PMD_SIZE fixmap (hyp_fixblock_map/hyp_fixblock_unmap)
to improve guest page CMOs when stage-2 huge mappings are installed.
On a Pixel6, the iterative solution resulted in a latency of ~700us,
while the PMD_SIZE fixmap reduces it to ~100us.
Because of the horrendous private range allocation that would be
necessary, this is disabled for 64KiB pages systems.
Suggested-by: Quentin Perret <qperret@google.com>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-11-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Now np-guests hypercalls with range are supported, we can let the
hypervisor to install block mappings whenever the Stage-1 allows it,
that is when backed by either Hugetlbfs or THPs. The size of those block
mappings is limited to PMD_SIZE.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-10-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In preparation for supporting stage-2 huge mappings for np-guest, add a
nr_pages member for pkvm_mappings to allow EL1 to track the size of the
stage-2 mapping.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-9-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In preparation for supporting stage-2 huge mappings for np-guest, let's
convert pgt.pkvm_mappings to an interval tree.
No functional change intended.
Suggested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-8-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_test_clear_young_guest hypercall.
This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is
512 on a 4K-pages system).
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-7-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_wrprotect_guest hypercall. This
range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512
on a 4K-pages system).
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-6-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_unshare_guest hypercall. This range
supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a
4K-pages system).
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-5-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_share_guest hypercall. This range
supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a
4K-pages system).
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-4-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Add a helper to iterate over the hypervisor vmemmap. This will be
particularly handy with the introduction of huge mapping support
for the np-guest stage-2.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-3-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
clean_dcache_guest_page() and invalidate_icache_guest_page() accept a
size as an argument. But they also rely on fixmap, which can only map a
single PAGE_SIZE page.
With the upcoming stage-2 huge mappings for pKVM np-guests, those
callbacks will get size > PAGE_SIZE. Loop the CMOs on a PAGE_SIZE basis
until the whole range is done.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-2-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/pkvm-selftest-6.16:
: .
: pKVM selftests covering the memory ownership transitions by
: Quentin Perret. From the initial cover letter:
:
: "We have recently found a bug [1] in the pKVM memory ownership
: transitions by code inspection, but it could have been caught with a
: test.
:
: Introduce a boot-time selftest exercising all the known pKVM memory
: transitions and importantly checks the rejection of illegal transitions.
:
: The new test is hidden behind a new Kconfig option separate from
: CONFIG_EL2_NVHE_DEBUG on purpose as that has side effects on the
: transition checks ([1] doesn't reproduce with EL2 debug enabled).
:
: [1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/"
: .
KVM: arm64: Extend pKVM selftest for np-guests
KVM: arm64: Selftest for pKVM transitions
KVM: arm64: Don't WARN from __pkvm_host_share_guest()
KVM: arm64: Add .hyp.data section
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/pkvm-6.16:
: .
: pKVM memory management cleanups, courtesy of Quentin Perret.
: From the cover letter:
:
: "This series moves the hypervisor's ownership state to the hyp_vmemmap,
: as discussed in [1]. The two main benefits are:
:
: 1. much cheaper hyp state lookups, since we can avoid the hyp stage-1
: page-table walk;
:
: 2. de-correlates the hyp state from the presence of a mapping in the
: linear map range of the hypervisor; which enables a bunch of
: clean-ups in the existing code and will simplify the introduction of
: other features in the future (hyp tracing, ...)"
: .
KVM: arm64: Unconditionally cross check hyp state
KVM: arm64: Defer EL2 stage-1 mapping on share
KVM: arm64: Move hyp state to hyp_vmemmap
KVM: arm64: Introduce {get,set}_host_state() helpers
KVM: arm64: Use 0b11 for encoding PKVM_NOPAGE
KVM: arm64: Fix pKVM page-tracking comments
KVM: arm64: Track SVE state in the hypervisor vcpu structure
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The conversion to kvm_release_faultin_page() missed the requirement
for this to be called within a critical section with mmu_lock held
for write. Move this call up to satisfy this requirement.
Fixes: 069a05e535496 ("KVM: arm64: nv: Handle VNCR_EL2-triggered faults")
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Calling invalidate_vncr_va() without the mmu_lock held for write
is a bad idea, and lockdep tells you about that.
Fixes: 4ffa72ad8f37e ("KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2")
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
When translating a VNCR translation fault, we start by marking the
current SW-managed TLB as invalid, so that we can populate it
in place. This is, however, done without the mmu_lock held.
A consequence of this is that another CPU dealing with TLBI
emulation can observe a translation still flagged as valid, but
with invalid walk results (such as pgshift being 0). Bad things
can result from this, such as a BUG() in pgshift_level_to_ttl().
Fix it by taking the mmu_lock for write to perform this local
invalidation, and use invalidate_vncr() instead of open-coding
the write to the 'valid' flag.
Fixes: 069a05e535496 ("KVM: arm64: nv: Handle VNCR_EL2-triggered faults")
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250520144116.3667978-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
This commit introduces a debugfs interface to display the contents of the
VGIC Interrupt Translation Service (ITS) tables.
The ITS tables map Device/Event IDs to Interrupt IDs and target processors.
Exposing this information through debugfs allows for easier inspection and
debugging of the interrupt routing configuration.
The debugfs interface presents the ITS table data in a tabular format:
Device ID: 0x0, Event ID Range: [0 - 31]
EVENT_ID INTID HWINTID TARGET COL_ID HW
-----------------------------------------------
0 8192 0 0 0 0
1 8193 0 0 0 0
2 8194 0 2 2 0
Device ID: 0x18, Event ID Range: [0 - 3]
EVENT_ID INTID HWINTID TARGET COL_ID HW
-----------------------------------------------
0 8225 0 0 0 0
1 8226 0 1 1 0
2 8227 0 3 3 0
Device ID: 0x10, Event ID Range: [0 - 7]
EVENT_ID INTID HWINTID TARGET COL_ID HW
-----------------------------------------------
0 8229 0 3 3 1
1 8230 0 0 0 1
2 8231 0 1 1 1
3 8232 0 2 2 1
4 8233 0 3 3 1
The output is generated using the seq_file interface, allowing for efficient
handling of potentially large ITS tables.
This interface is read-only and does not allow modification of the ITS
tables. It is intended for debugging and informational purposes only.
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Link: https://lore.kernel.org/r/20250220224247.2017205-1-jingzhangos@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
On AmpereOne AC04, updates to HCR_EL2 can rarely corrupt simultaneous
translations for data addresses initiated by load/store instructions.
Only instruction initiated translations are vulnerable, not translations
from prefetches for example. A DSB before the store to HCR_EL2 is
sufficient to prevent older instructions from hitting the window for
corruption, and an ISB after is sufficient to prevent younger
instructions from hitting the window for corruption.
Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250513184514.2678288-1-scott@os.amperecomputing.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The architecture introduces a trap for TSB CSYNC that fits in
the same EC as LS64 and PSB CSYNC. Let's deal with it in a similar
way.
It's not that we expect this to be useful any time soon anyway.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Bulk addition of all the FGT2 traps reported with EC == 0x18,
as described in the 2025-03 JSON drop.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Just like we allow sysreg ranges for Coarse Grained Trap descriptors,
allow them for Fine Grain Traps as well.
This comes with a warning that not all ranges are suitable for this
particular definition of ranges.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Just like the rest of the FGT registers, perform a switch of the
FGT2 equivalent. This avoids the host configuration leaking into
the guest...
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Similarly to the FEAT_FGT registers, pick the correct FEAT_FGT2
register when a sysreg trap indicates they could be responsible
for the exception.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Just like the FEAT_FGT registers, treat the FGT2 variant the same
way. THis is a large update, but a fairly mechanical one.
The config dependencies are extracted from the 2025-03 JSON drop.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The FEAT_FGT2 registers are part of the VNCR page. Describe the
corresponding offsets and add them to the vcpu sysreg enumeration.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Similarly to other registers, describe which HCR_EL2 bit depends
on which feature, and use this to compute the RES0 status of these
bits.
An additional complexity stems from the status of some bits such
as E2H and RW, which do not had a RESx status, but still take
a fixed value due to implementation choices in KVM.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Similarly to other registers, describe which HCR_EL2 bit depends
on which feature, and use this to compute the RES0 status of these
bits.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
In order to be able to write more compact (and easier to read) code,
let kvm_has_feat() and co take variable arguments. This enables
constructs such as:
#define FEAT_SME ID_AA64PFR1_EL1, SME, IMP
if (kvm_has_feat(kvm, FEAT_SME))
[...]
which is admitedly more readable.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|