kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2025-06-05	KVM: arm64: selftests: Fix xVAL init in arch_timer_edge_cases	Sebastian Ott	1	-1/+1
	arch_timer_edge_cases hits the following assertion in < 10% of the test runs: ==== Test Assertion Failure ==== arm64/arch_timer_edge_cases.c:490: timer_get_cntct(timer) >= DEF_CNT + (timer_get_cntfrq() * (uint64_t)(delta_2_ms) / 1000) pid=17110 tid=17110 errno=4 - Interrupted system call 1 0x0000000000404ec7: test_run at arch_timer_edge_cases.c:945 2 0x0000000000401fa3: main at arch_timer_edge_cases.c:1074 3 0x0000ffffa774b587: ?? ??:0 4 0x0000ffffa774b65f: ?? ??:0 5 0x000000000040206f: _start at ??:? timer_get_cntct(timer) >= DEF_CNT + msec_to_cycles(delta_2_ms) Enabling the timer without proper xval initialization in set_tval_irq() resulted in an early interrupt during timer reprogramming. Make sure to set the xval before setting the enable bit. Signed-off-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20250605103613.14544-4-sebott@redhat.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-05	KVM: arm64: selftests: Fix thread migration in arch_timer_edge_cases	Sebastian Ott	1	-3/+5
	arch_timer_edge_cases tries to migrate itself across host cpus. Before the first test, it migrates to cpu 0 by setting up an affinity mask with only bit 0 set. After that it looks for the next possible cpu in the current affinity mask which still has only bit 0 set. So there is no migration at all. Fix this by reading the default mask at start and use this to find the next cpu in each iteration. Signed-off-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20250605103613.14544-3-sebott@redhat.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-05	KVM: arm64: selftests: Fix help text for arch_timer_edge_cases	Sebastian Ott	1	-1/+1
	Fix the help text for arch_timer_edge_cases to show the correct option for setting the wait time. Signed-off-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20250605103613.14544-2-sebott@redhat.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-05	KVM: arm64: Make __vcpu_sys_reg() a pure rvalue operand	Marc Zyngier	1	-5/+5
	Now that we don't have any use of __vcpu_sys_reg() as a lvalue, remove the in-place update, and directly return the sanitised value. Reviewed-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250603070824.1192795-5-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-05	KVM: arm64: Don't use __vcpu_sys_reg() to get the address of a sysreg	Marc Zyngier	2	-3/+3
	We are about to prevent the use of __vcpu_sys_reg() as a lvalue, and getting the address of a rvalue is not a thing. Update the couple of places where we do this to use the __ctxt_sys_reg() accessor, which return the address of a register. Reviewed-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250603070824.1192795-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-05	KVM: arm64: Add RMW specific sysreg accessor	Marc Zyngier	6	-21/+32
	In a number of cases, we perform a Read-Modify-Write operation on a system register, meaning that we would apply the RESx masks twice. Instead, provide a new accessor that performs this RMW operation, allowing the masks to be applied exactly once per operation. Reviewed-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250603070824.1192795-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-05	KVM: arm64: Add assignment-specific sysreg accessor	Marc Zyngier	12	-73/+86
	Assigning a value to a system register doesn't do what it is supposed to be doing if that register is one that has RESx bits. The main problem is that we use __vcpu_sys_reg(), which can be used both as a lvalue and rvalue. When used as a lvalue, the bit masking occurs before the new value is assigned, meaning that we (1) do pointless work on the old cvalue, and (2) potentially assign an invalid value as we fail to apply the masks to it. Fix this by providing a new __vcpu_assign_sys_reg() that does what it says on the tin, and sanitises the new value instead of the old one. This comes with a significant amount of churn. Reviewed-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250603070824.1192795-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-30	KVM: arm64: vgic-debug: Avoid dereferencing NULL ITE pointer	Marc Zyngier	1	-1/+4
	Dan reports that iterating over a device ITEs can legitimately lead to a NULL pointer, and that the NULL check is placed after the pointer has already been dereferenced. Hoist the pointer check as early as possible and be done with it. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: 30deb51a677b ("KVM: arm64: vgic-its: Add debugfs interface to expose ITS tables") Link: https://lore.kernel.org/r/aDBylI1YnjPatAbr@stanley.mountain Cc: Jing Zhang <jingzhangos@google.com> Link: https://lore.kernel.org/r/20250530091647.1152489-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-30	KVM: arm64: vgic-init: Plug vCPU vs. VGIC creation race	Oliver Upton	1	-1/+26
	syzkaller has found another ugly race in the VGIC, this time dealing with VGIC creation. Since kvm_vgic_create() doesn't sufficiently protect against in-flight vCPU creations, it is possible to get a vCPU into the kernel w/ an in-kernel VGIC but no allocation of private IRQs: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000d20 Mem abort info: ESR = 0x0000000096000046 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x06: level 2 translation fault Data abort info: ISV = 0, ISS = 0x00000046, ISS2 = 0x00000000 CM = 0, WnR = 1, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000103e4f000 [0000000000000d20] pgd=0800000102e1c403, p4d=0800000102e1c403, pud=0800000101146403, pmd=0000000000000000 Internal error: Oops: 0000000096000046 [#1] PREEMPT SMP CPU: 9 UID: 0 PID: 246 Comm: test Not tainted 6.14.0-rc6-00097-g0c90821f5db8 #16 Hardware name: linux,dummy-virt (DT) pstate: 814020c5 (Nzcv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--) pc : _raw_spin_lock_irqsave+0x34/0x8c lr : kvm_vgic_set_owner+0x54/0xa4 sp : ffff80008086ba20 x29: ffff80008086ba20 x28: ffff0000c19b5640 x27: 0000000000000000 x26: 0000000000000000 x25: ffff0000c4879bd0 x24: 000000000000001e x23: 0000000000000000 x22: 0000000000000000 x21: ffff0000c487af80 x20: ffff0000c487af18 x19: 0000000000000000 x18: 0000001afadd5a8b x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000001 x14: ffff0000c19b56c0 x13: 0030c9adf9d9889e x12: ffffc263710e1908 x11: 0000001afb0d74f2 x10: e0966b840b373664 x9 : ec806bf7d6a57cd5 x8 : ffff80008086b980 x7 : 0000000000000001 x6 : 0000000000000001 x5 : 0000000080800054 x4 : 4ec4ec4ec4ec4ec5 x3 : 0000000000000000 x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000d20 Call trace: _raw_spin_lock_irqsave+0x34/0x8c (P) kvm_vgic_set_owner+0x54/0xa4 kvm_timer_enable+0xf4/0x274 kvm_arch_vcpu_run_pid_change+0xe0/0x380 kvm_vcpu_ioctl+0x93c/0x9e0 __arm64_sys_ioctl+0xb4/0xec invoke_syscall+0x48/0x110 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x30/0xd0 el0t_64_sync_handler+0x10c/0x138 el0t_64_sync+0x198/0x19c Code: b9000841 d503201f 52800001 52800022 (88e17c02) ---[ end trace 0000000000000000 ]--- Plug the race by explicitly checking for an in-progress vCPU creation and failing kvm_vgic_create() when that's the case. Add some comments to document all the things kvm_vgic_create() is trying to guard against too. Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Tested-by: Alexander Potapenko <glider@google.com> Link: https://lore.kernel.org/r/20250523194722.4066715-6-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-30	KVM: arm64: Unmap vLPIs affected by changes to GSI routing information	Oliver Upton	1	-0/+23
	KVM's interrupt infrastructure is dodgy at best, allowing for some ugly 'off label' usage of the various UAPIs. In one example, userspace can change the routing entry of a particular "GSI" after configuring irqbypass with KVM_IRQFD. KVM/arm64 is oblivious to this, and winds up preserving the stale translation in cases where vLPIs are configured. Honor userspace's intentions and tear down the vLPI mapping if affected by a "GSI" routing change. Make no attempt to reconstruct vLPIs if the new target is an MSI and just fall back to software injection. Tested-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250523194722.4066715-5-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-30	KVM: arm64: Resolve vLPI by host IRQ in vgic_v4_unset_forwarding()	Oliver Upton	3	-24/+27
	The virtual mapping and "GSI" routing of a particular vLPI is subject to change in response to the guest / userspace. This can be pretty annoying to deal with when KVM needs to track the physical state that's managed for vLPI direct injection. Make vgic_v4_unset_forwarding() resilient by using the host IRQ to resolve the vgic IRQ. Since this uses the LPI xarray directly, finding the ITS by doorbell address + grabbing it's its_lock is no longer necessary. Note that matching the right ITS / ITE is already handled in vgic_v4_set_forwarding(), and unless there's a bug in KVM's VGIC ITS emulation the virtual mapping that should remain stable for the lifetime of the vLPI mapping. Tested-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250523194722.4066715-4-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-30	KVM: arm64: Protect vLPI translation with vgic_irq::irq_lock	Oliver Upton	2	-42/+47
	Though undocumented, KVM generally protects the translation of a vLPI with the its_lock. While this makes perfectly good sense, as the ITS itself contains the guest translation, an upcoming change will require twiddling the vLPI mapping in an atomic context. Switch to using the vIRQ's irq_lock to protect the translation. Use of the its_lock in vgic_v4_unset_forwarding() is preserved for now as it still needs to walk the ITS. Tested-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250523194722.4066715-3-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-30	KVM: arm64: Use lock guard in vgic_v4_set_forwarding()	Oliver Upton	1	-6/+4
	The locking dance is about to get more interesting, switch the its_lock over to a lock guard to make it a bit easier to handle. Tested-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250523194722.4066715-2-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-30	KVM: arm64: Mask out non-VA bits from TLBI VA* on VNCR invalidation	Marc Zyngier	1	-2/+4
	When handling a TLBI VA* instruction that potentially targets a VNCR page mapping, we fail to mask out the top bits that contain the ASID and TTL fields, hence potentially failing the VA check in the TLB code. An additional wrinkle is that we fail to sign extend the VA, again leading to failed VA checks. Fix both in one go by sign-extending the VA from bit 48, making it comparable to the way we interpret VNCR_EL2.BADDR. Fixes: 4ffa72ad8f37e ("KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2") Link: https://lore.kernel.org/r/20250525175759.780891-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-30	arm64: sysreg: Drag linux/kconfig.h to work around vdso build issue	Marc Zyngier	1	-0/+1
	Broonie reports that fed55f49fad18 ("arm64: errata: Work around AmpereOne's erratum AC04_CPU_23") breaks one of the vdso selftests (vdso_test_chacha) as it indirectly drags asm/sysreg.h. It is rather unfortunate (and worrying) that userspace gets built with non-UAPI headers. In any case, paper over the issue by dragging linux/kconfig.h in asm/sysreg.h. It is the right thing to do, at least from the kernel perspective. Reported-by: Mark Brown <broonie@kernel.org> Fixes: fed55f49fad18 ("arm64: errata: Work around AmpereOne's erratum AC04_CPU_23") Link: https://lore.kernel.org/r/aDCDGZ-G-nCP3hJI@finisterre.sirena.org.uk Cc: D Scott Phillips <scott@os.amperecomputing.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250523170208.530818-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23	Merge branch kvm-arm64/misc-6.16 into kvmarm-master/next	Marc Zyngier	26	-89/+496
	* kvm-arm64/misc-6.16: : . : Misc changes and improvements for 6.16: : : - Add a new selftest for the SVE host state being corrupted by a guest : : - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, : ensuring that the window for interrupts is slightly bigger, and avoiding : a pretty bad erratum on the AmpereOne HW : : - Replace a couple of open-coded on/off strings with str_on_off() : : - Get rid of the pKVM memblock sorting, which now appears to be superflous : : - Drop superflous clearing of ICH_LR_EOI in the LR when nesting : : - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from : a pretty bad case of TLB corruption unless accesses to HCR_EL2 are : heavily synchronised : : - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables : in a human-friendly fashion : . KVM: arm64: Fix documentation for vgic_its_iter_next() KVM: arm64: vgic-its: Add debugfs interface to expose ITS tables arm64: errata: Work around AmpereOne's erratum AC04_CPU_23 KVM: arm64: nv: Remove clearing of ICH_LR<n>.EOI if ICH_LR<n>.HW == 1 KVM: arm64: Drop sort_memblock_regions() KVM: arm64: selftests: Add test for SVE host corruption KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE mode KVM: arm64: Replace ternary flags with str_on_off() helper Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23	Merge branch kvm-arm64/nv-nv into kvmarm-master/next	Marc Zyngier	16	-138/+956
	* kvm-arm64/nv-nv: : . : Flick the switch on the NV support by adding the missing piece : in the form of the VNCR page management. From the cover letter: : : "This is probably the most interesting bit of the whole NV adventure. : So far, everything else has been a walk in the park, but this one is : where the real fun takes place. : : With FEAT_NV2, most of the NV support revolves around tricking a guest : into accessing memory while it tries to access system registers. The : hypervisor's job is to handle the context switch of the actual : registers with the state in memory as needed." : . KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating KVM: arm64: Document NV caps and vcpu flags KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2* KVM: arm64: nv: Remove dead code from ERET handling KVM: arm64: nv: Plumb TLBI S1E2 into system instruction dispatch KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2 KVM: arm64: nv: Program host's VNCR_EL2 to the fixmap address KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2 KVM: arm64: nv: Handle VNCR_EL2-triggered faults KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2 KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2 KVM: arm64: nv: Don't adjust PSTATE.M when L2 is nesting KVM: arm64: nv: Move TLBI range decoding to a helper KVM: arm64: nv: Snapshot S1 ASID tagging information during walk KVM: arm64: nv: Extract translation helper from the AT code KVM: arm64: nv: Allocate VNCR page when required arm64: sysreg: Add layout for VNCR_EL2 Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23	Merge branch kvm-arm64/at-fixes-6.16 into kvmarm-master/next	Marc Zyngier	1	-23/+36
	* kvm-arm64/at-fixes-6.16: : . : Set of fixes for Address Translation (AT) instruction emulation, : which affect the (not yet upstream) NV support. : : From the cover letter: : : "Here's a small series of fixes for KVM's implementation of address : translation (aka the AT S1* instructions), addressing a number of : issues in increasing levels of severity: : : - We misreport PAR_EL1.PTW in a number of occasions, including state : that is not possible as per the architecture definition : : - We don't handle access faults at all, and that doesn't play very : well with the rest of the VNCR stuff : : - AT S1E{0,1} from EL2 with HCR_EL2.{E2H,TGE}={1,1} will absolutely : take the host down, no questions asked" : . KVM: arm64: Don't feed uninitialised data to HCR_EL2 KVM: arm64: Teach address translation about access faults KVM: arm64: Fix PAR_EL1.{PTW,S} reporting on AT S1E* Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23	Merge branch kvm-arm64/fgt-masks into kvmarm-master/next	Marc Zyngier	20	-711/+2895
	* kvm-arm64/fgt-masks: (43 commits) : . : Large rework of the way KVM deals with trap bits in conjunction with : the CPU feature registers. It now draws a direct link between which : the feature set, the system registers that need to UNDEF to match : the configuration and bits that need to behave as RES0 or RES1 in : the trap registers that are visible to the guest. : : Best of all, these definitions are mostly automatically generated : from the JSON description published by ARM under a permissive : license. : . KVM: arm64: Handle TSB CSYNC traps KVM: arm64: Add FGT descriptors for FEAT_FGT2 KVM: arm64: Allow sysreg ranges for FGT descriptors KVM: arm64: Add context-switch for FEAT_FGT2 registers KVM: arm64: Add trap routing for FEAT_FGT2 registers KVM: arm64: Add sanitisation for FEAT_FGT2 registers KVM: arm64: Add FEAT_FGT2 registers to the VNCR page KVM: arm64: Use HCR_EL2 feature map to drive fixed-value bits KVM: arm64: Use HCRX_EL2 feature map to drive fixed-value bits KVM: arm64: Allow kvm_has_feat() to take variable arguments KVM: arm64: Use FGT feature maps to drive RES0 bits KVM: arm64: Validate FGT register descriptions against RES0 masks KVM: arm64: Switch to table-driven FGU configuration KVM: arm64: Handle PSB CSYNC traps KVM: arm64: Use KVM-specific HCRX_EL2 RES0 mask KVM: arm64: Remove hand-crafted masks for FGT registers KVM: arm64: Use computed FGT masks to setup FGT registers KVM: arm64: Propagate FGT masks to the nVHE hypervisor KVM: arm64: Unconditionally configure fine-grain traps KVM: arm64: Use computed masks as sanitisers for FGT registers ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23	Merge branch kvm-arm64/mte-frac into kvmarm-master/next	Marc Zyngier	3	-3/+103
	* kvm-arm64/mte-frac: : . : Prevent FEAT_MTE_ASYNC from being accidently exposed to a guest, : courtesy of Ben Horgan. From the cover letter: : : "The ID_AA64PFR1_EL1.MTE_frac field is currently hidden from KVM. : However, when ID_AA64PFR1_EL1.MTE==2, ID_AA64PFR1_EL1.MTE_frac==0 : indicates that MTE_ASYNC is supported. On a host with : ID_AA64PFR1_EL1.MTE==2 but without MTE_ASYNC support a guest with the : MTE capability enabled will incorrectly see MTE_ASYNC advertised as : supported. This series fixes that." : . KVM: selftests: Confirm exposing MTE_frac does not break migration KVM: arm64: Make MTE_frac masking conditional on MTE capability arm64/sysreg: Expose MTE_frac so that it is visible to KVM Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23	Merge branch kvm-arm64/ubsan-el2 into kvmarm-master/next	Marc Zyngier	9	-10/+41
	* kvm-arm64/ubsan-el2: : . : Add UBSAN support to the EL2 portion of KVM, reusing most of the : existing logic provided by CONFIG_IBSAN_TRAP. : : Patches courtesy of Mostafa Saleh. : . KVM: arm64: Handle UBSAN faults KVM: arm64: Introduce CONFIG_UBSAN_KVM_EL2 ubsan: Remove regs from report_ubsan_failure() arm64: Introduce esr_is_ubsan_brk() Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-23	Merge branch kvm-arm64/pkvm-np-thp-6.16 into kvmarm-master/next	Marc Zyngier	19	-245/+730
	* kvm-arm64/pkvm-np-thp-6.16: (21 commits) : . : Large mapping support for non-protected pKVM guests, courtesy of : Vincent Donnefort. From the cover letter: : : "This series adds support for stage-2 huge mappings (PMD_SIZE) to pKVM : np-guests, that is installing PMD-level mappings in the stage-2, : whenever the stage-1 is backed by either Hugetlbfs or THPs." : . KVM: arm64: np-guest CMOs with PMD_SIZE fixmap KVM: arm64: Stage-2 huge mappings for np-guests KVM: arm64: Add a range to pkvm_mappings KVM: arm64: Convert pkvm_mappings to interval tree KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest() KVM: arm64: Add a range to __pkvm_host_wrprotect_guest() KVM: arm64: Add a range to __pkvm_host_unshare_guest() KVM: arm64: Add a range to __pkvm_host_share_guest() KVM: arm64: Introduce for_each_hyp_page KVM: arm64: Handle huge mappings for np-guest CMOs KVM: arm64: Extend pKVM selftest for np-guests KVM: arm64: Selftest for pKVM transitions KVM: arm64: Don't WARN from __pkvm_host_share_guest() KVM: arm64: Add .hyp.data section KVM: arm64: Unconditionally cross check hyp state KVM: arm64: Defer EL2 stage-1 mapping on share KVM: arm64: Move hyp state to hyp_vmemmap KVM: arm64: Introduce {get,set}_host_state() helpers KVM: arm64: Use 0b11 for encoding PKVM_NOPAGE KVM: arm64: Fix pKVM page-tracking comments ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-22	KVM: arm64: Fix documentation for vgic_its_iter_next()	Marc Zyngier	1	-1/+1
	As reported by the build robot, the documentation for vgic_its_iter_next() contains a typo. Fix it. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202505221421.KAuWlmSr-lkp@intel.com/ Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21	KVM: arm64: np-guest CMOs with PMD_SIZE fixmap	Vincent Donnefort	6	-33/+123
	With the introduction of stage-2 huge mappings in the pKVM hypervisor, guest pages CMO is needed for PMD_SIZE size. Fixmap only supports PAGE_SIZE and iterating over the huge-page is time consuming (mostly due to TLBI on hyp_fixmap_unmap) which is a problem for EL2 latency. Introduce a shared PMD_SIZE fixmap (hyp_fixblock_map/hyp_fixblock_unmap) to improve guest page CMOs when stage-2 huge mappings are installed. On a Pixel6, the iterative solution resulted in a latency of ~700us, while the PMD_SIZE fixmap reduces it to ~100us. Because of the horrendous private range allocation that would be necessary, this is disabled for 64KiB pages systems. Suggested-by: Quentin Perret <qperret@google.com> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-11-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21	KVM: arm64: Stage-2 huge mappings for np-guests	Vincent Donnefort	3	-11/+20
	Now np-guests hypercalls with range are supported, we can let the hypervisor to install block mappings whenever the Stage-1 allows it, that is when backed by either Hugetlbfs or THPs. The size of those block mappings is limited to PMD_SIZE. Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-10-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21	KVM: arm64: Add a range to pkvm_mappings	Quentin Perret	2	-9/+29
	In preparation for supporting stage-2 huge mappings for np-guest, add a nr_pages member for pkvm_mappings to allow EL1 to track the size of the stage-2 mapping. Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-9-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21	KVM: arm64: Convert pkvm_mappings to interval tree	Quentin Perret	3	-60/+37
	In preparation for supporting stage-2 huge mappings for np-guest, let's convert pgt.pkvm_mappings to an interval tree. No functional change intended. Suggested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-8-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21	KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest()	Vincent Donnefort	4	-8/+13
	In preparation for supporting stage-2 huge mappings for np-guest. Add a nr_pages argument to the __pkvm_host_test_clear_young_guest hypercall. This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a 4K-pages system). Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-7-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21	KVM: arm64: Add a range to __pkvm_host_wrprotect_guest()	Vincent Donnefort	4	-12/+17
	In preparation for supporting stage-2 huge mappings for np-guest. Add a nr_pages argument to the __pkvm_host_wrprotect_guest hypercall. This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a 4K-pages system). Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-6-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21	KVM: arm64: Add a range to __pkvm_host_unshare_guest()	Vincent Donnefort	4	-29/+35
	In preparation for supporting stage-2 huge mappings for np-guest. Add a nr_pages argument to the __pkvm_host_unshare_guest hypercall. This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a 4K-pages system). Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-5-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21	KVM: arm64: Add a range to __pkvm_host_share_guest()	Vincent Donnefort	4	-34/+61
	In preparation for supporting stage-2 huge mappings for np-guest. Add a nr_pages argument to the __pkvm_host_share_guest hypercall. This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a 4K-pages system). Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-4-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21	KVM: arm64: Introduce for_each_hyp_page	Vincent Donnefort	3	-34/+37
	Add a helper to iterate over the hypervisor vmemmap. This will be particularly handy with the introduction of huge mapping support for the np-guest stage-2. Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-3-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21	KVM: arm64: Handle huge mappings for np-guest CMOs	Vincent Donnefort	1	-4/+22
	clean_dcache_guest_page() and invalidate_icache_guest_page() accept a size as an argument. But they also rely on fixmap, which can only map a single PAGE_SIZE page. With the upcoming stage-2 huge mappings for pKVM np-guests, those callbacks will get size > PAGE_SIZE. Loop the CMOs on a PAGE_SIZE basis until the whole range is done. Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250521124834.1070650-2-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21	Merge branch kvm-arm64/pkvm-selftest-6.16 into kvm-arm64/pkvm-np-thp-6.16	Marc Zyngier	10	-4/+250
	* kvm-arm64/pkvm-selftest-6.16: : . : pKVM selftests covering the memory ownership transitions by : Quentin Perret. From the initial cover letter: : : "We have recently found a bug [1] in the pKVM memory ownership : transitions by code inspection, but it could have been caught with a : test. : : Introduce a boot-time selftest exercising all the known pKVM memory : transitions and importantly checks the rejection of illegal transitions. : : The new test is hidden behind a new Kconfig option separate from : CONFIG_EL2_NVHE_DEBUG on purpose as that has side effects on the : transition checks ([1] doesn't reproduce with EL2 debug enabled). : : [1] https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/" : . KVM: arm64: Extend pKVM selftest for np-guests KVM: arm64: Selftest for pKVM transitions KVM: arm64: Don't WARN from __pkvm_host_share_guest() KVM: arm64: Add .hyp.data section Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21	Merge branch kvm-arm64/pkvm-6.16 into kvm-arm64/pkvm-np-thp-6.16	Marc Zyngier	6	-79/+158
	* kvm-arm64/pkvm-6.16: : . : pKVM memory management cleanups, courtesy of Quentin Perret. : From the cover letter: : : "This series moves the hypervisor's ownership state to the hyp_vmemmap, : as discussed in [1]. The two main benefits are: : : 1. much cheaper hyp state lookups, since we can avoid the hyp stage-1 : page-table walk; : : 2. de-correlates the hyp state from the presence of a mapping in the : linear map range of the hypervisor; which enables a bunch of : clean-ups in the existing code and will simplify the introduction of : other features in the future (hyp tracing, ...)" : . KVM: arm64: Unconditionally cross check hyp state KVM: arm64: Defer EL2 stage-1 mapping on share KVM: arm64: Move hyp state to hyp_vmemmap KVM: arm64: Introduce {get,set}_host_state() helpers KVM: arm64: Use 0b11 for encoding PKVM_NOPAGE KVM: arm64: Fix pKVM page-tracking comments KVM: arm64: Track SVE state in the hypervisor vcpu structure Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21	KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section	Marc Zyngier	1	-1/+1
	The conversion to kvm_release_faultin_page() missed the requirement for this to be called within a critical section with mmu_lock held for write. Move this call up to satisfy this requirement. Fixes: 069a05e535496 ("KVM: arm64: nv: Handle VNCR_EL2-triggered faults") Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21	KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held	Marc Zyngier	1	-0/+2
	Calling invalidate_vncr_va() without the mmu_lock held for write is a bad idea, and lockdep tells you about that. Fixes: 4ffa72ad8f37e ("KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2") Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-21	KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating	Marc Zyngier	1	-7/+18
	When translating a VNCR translation fault, we start by marking the current SW-managed TLB as invalid, so that we can populate it in place. This is, however, done without the mmu_lock held. A consequence of this is that another CPU dealing with TLBI emulation can observe a translation still flagged as valid, but with invalid walk results (such as pgshift being 0). Bad things can result from this, such as a BUG() in pgshift_level_to_ttl(). Fix it by taking the mmu_lock for write to perform this local invalidation, and use invalidate_vncr() instead of open-coding the write to the 'valid' flag. Fixes: 069a05e535496 ("KVM: arm64: nv: Handle VNCR_EL2-triggered faults") Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250520144116.3667978-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19	KVM: arm64: vgic-its: Add debugfs interface to expose ITS tables	Jing Zhang	3	-31/+265
	This commit introduces a debugfs interface to display the contents of the VGIC Interrupt Translation Service (ITS) tables. The ITS tables map Device/Event IDs to Interrupt IDs and target processors. Exposing this information through debugfs allows for easier inspection and debugging of the interrupt routing configuration. The debugfs interface presents the ITS table data in a tabular format: Device ID: 0x0, Event ID Range: [0 - 31] EVENT_ID INTID HWINTID TARGET COL_ID HW ----------------------------------------------- 0 8192 0 0 0 0 1 8193 0 0 0 0 2 8194 0 2 2 0 Device ID: 0x18, Event ID Range: [0 - 3] EVENT_ID INTID HWINTID TARGET COL_ID HW ----------------------------------------------- 0 8225 0 0 0 0 1 8226 0 1 1 0 2 8227 0 3 3 0 Device ID: 0x10, Event ID Range: [0 - 7] EVENT_ID INTID HWINTID TARGET COL_ID HW ----------------------------------------------- 0 8229 0 3 3 1 1 8230 0 0 0 1 2 8231 0 1 1 1 3 8232 0 2 2 1 4 8233 0 3 3 1 The output is generated using the seq_file interface, allowing for efficient handling of potentially large ITS tables. This interface is read-only and does not allow modification of the ITS tables. It is intended for debugging and informational purposes only. Signed-off-by: Jing Zhang <jingzhangos@google.com> Link: https://lore.kernel.org/r/20250220224247.2017205-1-jingzhangos@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19	arm64: errata: Work around AmpereOne's erratum AC04_CPU_23	D Scott Phillips	17	-19/+80
	On AmpereOne AC04, updates to HCR_EL2 can rarely corrupt simultaneous translations for data addresses initiated by load/store instructions. Only instruction initiated translations are vulnerable, not translations from prefetches for example. A DSB before the store to HCR_EL2 is sufficient to prevent older instructions from hitting the window for corruption, and an ISB after is sufficient to prevent younger instructions from hitting the window for corruption. Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250513184514.2678288-1-scott@os.amperecomputing.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19	KVM: arm64: Handle TSB CSYNC traps	Marc Zyngier	3	-1/+8
	The architecture introduces a trap for TSB CSYNC that fits in the same EC as LS64 and PSB CSYNC. Let's deal with it in a similar way. It's not that we expect this to be useful any time soon anyway. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19	KVM: arm64: Add FGT descriptors for FEAT_FGT2	Marc Zyngier	1	-0/+83
	Bulk addition of all the FGT2 traps reported with EC == 0x18, as described in the 2025-03 JSON drop. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19	KVM: arm64: Allow sysreg ranges for FGT descriptors	Marc Zyngier	1	-81/+42
	Just like we allow sysreg ranges for Coarse Grained Trap descriptors, allow them for Fine Grain Traps as well. This comes with a warning that not all ranges are suitable for this particular definition of ranges. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19	KVM: arm64: Add context-switch for FEAT_FGT2 registers	Marc Zyngier	1	-0/+44
	Just like the rest of the FGT registers, perform a switch of the FGT2 equivalent. This avoids the host configuration leaking into the guest... Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19	KVM: arm64: Add trap routing for FEAT_FGT2 registers	Marc Zyngier	1	-0/+12
	Similarly to the FEAT_FGT registers, pick the correct FEAT_FGT2 register when a sysreg trap indicates they could be responsible for the exception. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19	KVM: arm64: Add sanitisation for FEAT_FGT2 registers	Marc Zyngier	7	-0/+260
	Just like the FEAT_FGT registers, treat the FGT2 variant the same way. THis is a large update, but a fairly mechanical one. The config dependencies are extracted from the 2025-03 JSON drop. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19	KVM: arm64: Add FEAT_FGT2 registers to the VNCR page	Marc Zyngier	2	-0/+10
	The FEAT_FGT2 registers are part of the VNCR page. Describe the corresponding offsets and add them to the vcpu sysreg enumeration. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19	KVM: arm64: Use HCR_EL2 feature map to drive fixed-value bits	Marc Zyngier	2	-37/+150
	Similarly to other registers, describe which HCR_EL2 bit depends on which feature, and use this to compute the RES0 status of these bits. An additional complexity stems from the status of some bits such as E2H and RW, which do not had a RESx status, but still take a fixed value due to implementation choices in KVM. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19	KVM: arm64: Use HCRX_EL2 feature map to drive fixed-value bits	Marc Zyngier	2	-39/+79
	Similarly to other registers, describe which HCR_EL2 bit depends on which feature, and use this to compute the RES0 status of these bits. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19	KVM: arm64: Allow kvm_has_feat() to take variable arguments	Marc Zyngier	1	-2/+6
	In order to be able to write more compact (and easier to read) code, let kvm_has_feat() and co take variable arguments. This enables constructs such as: #define FEAT_SME ID_AA64PFR1_EL1, SME, IMP if (kvm_has_feat(kvm, FEAT_SME)) [...] which is admitedly more readable. Signed-off-by: Marc Zyngier <maz@kernel.org>