kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2024-08-29	KVM: arm64: Make ICC_SGI_EL1 undef in the absence of a vGICv3	Marc Zyngier	2	-0/+13
	commit 3e6245ebe7ef341639e9a7e402b3ade8ad45a19f upstream. On a system with a GICv3, if a guest hasn't been configured with GICv3 and that the host is not capable of GICv2 emulation, a write to any of the ICC_SGI_EL1 registers is trapped to EL2. We therefore try to emulate the SGI access, only to hit a NULL pointer as no private interrupt is allocated (no GIC, remember?). The obvious fix is to give the guest what it deserves, in the shape of a UNDEF exception. Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240820100349.3544850-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-19	KVM: arm64: Don't pass a TLBI level hint when zapping table entries	Will Deacon	1	-2/+8
	commit 36e008323926036650299cfbb2dca704c7aba849 upstream. The TLBI level hints are for leaf entries only, so take care not to pass them incorrectly after clearing a table entry. Cc: Gavin Shan <gshan@redhat.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Fixes: 82bb02445de5 ("KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2") Fixes: 6d9d2115c480 ("KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table") Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240327124853.11206-3-will@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Cc: <stable@vger.kernel.org> # 6.1.y only [will@: Use '0' instead of TLBI_TTL_UNKNOWN_to indicate "no level". Force level to 0 in stage2_put_pte() if we're clearing a table entry.] Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-27	KVM: arm64: Disassociate vcpus from redistributor region on teardown	Marc Zyngier	3	-4/+15
	commit 0d92e4a7ffd5c42b9fa864692f82476c0bf8bcc8 upstream. When tearing down a redistributor region, make sure we don't have any dangling pointer to that region stored in a vcpu. Fixes: e5a35635464b ("kvm: arm64: vgic-v3: Introduce vgic_v3_free_redist_region()") Reported-by: Alexander Potapenko <glider@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240605175637.1635653-1-maz@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-16	KVM: arm64: AArch32: Fix spurious trapping of conditional instructions	Marc Zyngier	1	-2/+16
	commit c92e8b9eacebb4060634ebd9395bba1b29aadc68 upstream. We recently upgraded the view of ESR_EL2 to 64bit, in keeping with the requirements of the architecture. However, the AArch32 emulation code was left unaudited, and the (already dodgy) code that triages whether a trap is spurious or not (because the condition code failed) broke in a subtle way: If ESR_EL2.ISS2 is ever non-zero (unlikely, but hey, this is the ARM architecture we're talking about), the hack that tests the top bits of ESR_EL2.EC will break in an interesting way. Instead, use kvm_vcpu_trap_get_class() to obtain the EC, and list all the possible ECs that can fail a condition code check. While we're at it, add SMC32 to the list, as it is explicitly listed as being allowed to trap despite failing a condition code check (as described in the HCR_EL2.TSC documentation). Fixes: 0b12620fddb8 ("KVM: arm64: Treat ESR_EL2 as a 64-bit register") Cc: stable@vger.kernel.org Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240524141956.1450304-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-16	KVM: arm64: Allow AArch32 PSTATE.M to be restored as System mode	Marc Zyngier	1	-0/+1
	commit dfe6d190f38fc5df5ff2614b463a5195a399c885 upstream. It appears that we don't allow a vcpu to be restored in AArch32 System mode, as we never included it in the list of valid modes. Just add it to the list of allowed modes. Fixes: 0d854a60b1d7 ("arm64: KVM: enable initialization of a 32bit vcpu") Cc: stable@vger.kernel.org Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240524141956.1450304-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-16	KVM: arm64: Fix AArch32 register narrowing on userspace write	Marc Zyngier	1	-1/+1
	commit 947051e361d551e0590777080ffc4926190f62f2 upstream. When userspace writes to one of the core registers, we make sure to narrow the corresponding GPRs if PSTATE indicates an AArch32 context. The code tries to check whether the context is EL0 or EL1 so that it narrows the correct registers. But it does so by checking the full PSTATE instead of PSTATE.M. As a consequence, and if we are restoring an AArch32 EL0 context in a 64bit guest, and that PSTATE has any bit set outside of PSTATE.M, we narrow all registers instead of only the first 15, destroying the 64bit state. Obviously, this is not something the guest is likely to enjoy. Correctly masking PSTATE to only evaluate PSTATE.M fixes it. Fixes: 90c1f934ed71 ("KVM: arm64: Get rid of the AArch32 register mapping code") Reported-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com> Cc: stable@vger.kernel.org Reviewed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com> Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240524141956.1450304-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-17	KVM: arm64: vgic-v2: Check for non-NULL vCPU in vgic_v2_parse_attr()	Oliver Upton	1	-4/+4
	[ Upstream commit 6ddb4f372fc63210034b903d96ebbeb3c7195adb ] vgic_v2_parse_attr() is responsible for finding the vCPU that matches the user-provided CPUID, which (of course) may not be valid. If the ID is invalid, kvm_get_vcpu_by_id() returns NULL, which isn't handled gracefully. Similar to the GICv3 uaccess flow, check that kvm_get_vcpu_by_id() actually returns something and fail the ioctl if not. Cc: stable@vger.kernel.org Fixes: 7d450e282171 ("KVM: arm/arm64: vgic-new: Add userland access to VGIC dist registers") Reported-by: Alexander Potapenko <glider@google.com> Tested-by: Alexander Potapenko <glider@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240424173959.3776798-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-17	KVM: arm64: vgic-v2: Use cpuid from userspace as vcpu_id	Marc Zyngier	1	-6/+2
	[ Upstream commit 4e7728c81a54b17bd33be402ac140bc11bb0c4f4 ] When parsing a GICv2 attribute that contains a cpuid, handle this as the vcpu_id, not a vcpu_idx, as userspace cannot really know the mapping between the two. For this, use kvm_get_vcpu_by_id() instead of kvm_get_vcpu(). Take this opportunity to get rid of the pointless check against online_vcpus, which doesn't make much sense either, and switch to FIELD_GET as a way to extract the vcpu_id. Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230927090911.3355209-5-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Stable-dep-of: 6ddb4f372fc6 ("KVM: arm64: vgic-v2: Check for non-NULL vCPU in vgic_v2_parse_attr()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01	KVM: arm64: vgic-its: Test for valid IRQ in its_sync_lpi_pending_table()	Oliver Upton	1	-0/+3
	commit 8d3a7dfb801d157ac423261d7cd62c33e95375f8 upstream. vgic_get_irq() may not return a valid descriptor if there is no ITS that holds a valid translation for the specified INTID. If that is the case, it is safe to silently ignore it and continue processing the LPI pending table. Cc: stable@vger.kernel.org Fixes: 33d3bc9556a7 ("KVM: arm64: vgic-its: Read initial LPI pending table") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240221092732.4126848-2-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-01	KVM: arm64: vgic-its: Test for valid IRQ in MOVALL handler	Oliver Upton	1	-0/+2
	commit 85a71ee9a0700f6c18862ef3b0011ed9dad99aca upstream. It is possible that an LPI mapped in a different ITS gets unmapped while handling the MOVALL command. If that is the case, there is no state that can be migrated to the destination. Silently ignore it and continue migrating other LPIs. Cc: stable@vger.kernel.org Fixes: ff9c114394aa ("KVM: arm/arm64: GICv4: Handle MOVALL applied to a vPE") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240221092732.4126848-3-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-26	KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache	Oliver Upton	1	-0/+5
	commit ad362fe07fecf0aba839ff2cc59a3617bd42c33f upstream. There is a potential UAF scenario in the case of an LPI translation cache hit racing with an operation that invalidates the cache, such as a DISCARD ITS command. The root of the problem is that vgic_its_check_cache() does not elevate the refcount on the vgic_irq before dropping the lock that serializes refcount changes. Have vgic_its_check_cache() raise the refcount on the returned vgic_irq and add the corresponding decrement after queueing the interrupt. Cc: stable@vger.kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240104183233.3560639-1-oliver.upton@linux.dev Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-26	KVM: arm64: vgic-v4: Restore pending state on host userspace write	Marc Zyngier	1	-10/+17
	commit 7b95382f965133ef61ce44aaabc518c16eb46909 upstream. When the VMM writes to ISPENDR0 to set the state pending state of an SGI, we fail to convey this to the HW if this SGI is already backed by a GICv4.1 vSGI. This is a bit of a corner case, as this would only occur if the vgic state is changed on an already running VM, but this can apparently happen across a guest reset driven by the VMM. Fix this by always writing out the pending_latch value to the HW, and reseting it to false. Reported-by: Kunkun Jiang <jiangkunkun@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Cc: stable@vger.kernel.org # 5.10+ Link: https://lore.kernel.org/r/7e7f2c0c-448b-10a9-8929-4b8f4f6e2a32@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-01	KVM: arm64: vgic: Force vcpu vgic teardown on vcpu destroy	Marc Zyngier	4	-3/+7
	commit 02e3858f08faabab9503ae2911cf7c7e27702257 upstream. When failing to create a vcpu because (for example) it has a duplicate vcpu_id, we destroy the vcpu. Amusingly, this leaves the redistributor registered with the KVM_MMIO bus. This is no good, and we should properly clean the mess. Force a teardown of the vgic vcpu interface, including the RD device before returning to the caller. Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231207151201.3028710-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-01	KVM: arm64: vgic: Add a non-locking primitive for kvm_vgic_vcpu_destroy()	Marc Zyngier	1	-2/+11
	commit d26b9cb33c2d1ba68d1f26bb06c40300f16a3799 upstream. As we are going to need to call into kvm_vgic_vcpu_destroy() without prior holding of the slots_lock, introduce __kvm_vgic_vcpu_destroy() as a non-locking primitive of kvm_vgic_vcpu_destroy(). Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231207151201.3028710-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-01	KVM: arm64: vgic: Simplify kvm_vgic_destroy()	Marc Zyngier	1	-15/+14
	commit 01ad29d224ff73bc4e16e0ef9ece17f28598c4a4 upstream. When destroying a vgic, we have rather cumbersome rules about when slots_lock and config_lock are held, resulting in fun buglets. The first port of call is to simplify kvm_vgic_map_resources() so that there is only one call to kvm_vgic_destroy() instead of two, with the second only holding half of the locks. For that, we kill the non-locking primitive and move the call outside of the locking altogether. This doesn't change anything (we re-acquire the locks and teardown the whole vgic), and simplifies the code significantly. Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231207151201.3028710-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-20	clocksource/drivers/arm_arch_timer: limit XGene-1 workaround	Andre Przywara	1	-1/+1
	[ Upstream commit 851354cbd12bb9500909733c3d4054306f61df87 ] The AppliedMicro XGene-1 CPU has an erratum where the timer condition would only consider TVAL, not CVAL. We currently apply a workaround when seeing the PartNum field of MIDR_EL1 being 0x000, under the assumption that this would match only the XGene-1 CPU model. However even the Ampere eMAG (aka XGene-3) uses that same part number, and only differs in the "Variant" and "Revision" fields: XGene-1's MIDR is 0x500f0000, our eMAG reports 0x503f0002. Experiments show the latter doesn't show the faulty behaviour. Increase the specificity of the check to only consider partnum 0x000 and variant 0x00, to exclude the Ampere eMAG. Fixes: 012f18850452 ("clocksource/drivers/arm_arch_timer: Work around broken CVAL implementations") Reported-by: Ross Burton <ross.burton@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20231016153127.116101-1-andre.przywara@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-23	KVM: arm64: vgic-v4: Make the doorbell request robust w.r.t preemption	Marc Zyngier	3	-5/+10
	[ Upstream commit b321c31c9b7b309dcde5e8854b741c8e6a9a05f0 ] Xiang reports that VMs occasionally fail to boot on GICv4.1 systems when running a preemptible kernel, as it is possible that a vCPU is blocked without requesting a doorbell interrupt. The issue is that any preemption that occurs between vgic_v4_put() and schedule() on the block path will mark the vPE as nonresident and not request a doorbell irq. This occurs because when the vcpu thread is resumed on its way to block, vcpu_load() will make the vPE resident again. Once the vcpu actually blocks, we don't request a doorbell anymore, and the vcpu won't be woken up on interrupt delivery. Fix it by tracking that we're entering WFI, and key the doorbell request on that flag. This allows us not to make the vPE resident when going through a preempt/schedule cycle, meaning we don't lose any state. Cc: stable@vger.kernel.org Fixes: 8e01d9a396e6 ("KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put") Reported-by: Xiang Chen <chenxiang66@hisilicon.com> Suggested-by: Zenghui Yu <yuzenghui@huawei.com> Tested-by: Xiang Chen <chenxiang66@hisilicon.com> Co-developed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20230713070657.3873244-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-03	arm64: errata: Mitigate Ampere1 erratum AC03_CPU_38 at stage-2	Oliver Upton	1	-3/+11
	[ Upstream commit 6df696cd9bc1ceed0e92e36908f88bbd16d18255 ] AmpereOne has an erratum in its implementation of FEAT_HAFDBS that required disabling the feature on the design. This was done by reporting the feature as not implemented in the ID register, although the corresponding control bits were not actually RES0. This does not align well with the requirements of the architecture, which mandates these bits be RES0 if HAFDBS isn't implemented. The kernel's use of stage-1 is unaffected, as the HA and HD bits are only set if HAFDBS is detected in the ID register. KVM, on the other hand, relies on the RES0 behavior at stage-2 to use the same value for VTCR_EL2 on any cpu in the system. Mitigate the non-RES0 behavior by leaving VTCR_EL2.HA clear on affected systems. Cc: stable@vger.kernel.org Cc: D Scott Phillips <scott@os.amperecomputing.com> Cc: Darren Hart <darren@os.amperecomputing.com> Acked-by: D Scott Phillips <scott@os.amperecomputing.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20230609220104.1836988-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-03	KVM: arm64: Condition HW AF updates on config option	Oliver Upton	1	-0/+2
	[ Upstream commit 1dfc3e905089a0bcada268fb5691a605655e0319 ] As it currently stands, KVM makes use of FEAT_HAFDBS unconditionally. Use of the feature in the rest of the kernel is guarded by an associated Kconfig option. Align KVM with the rest of the kernel and only enable VTCR_HA when ARM64_HW_AFDBM is enabled. This can be helpful for testing changes to the stage-2 access fault path on Armv8.1+ implementations. Link: https://lore.kernel.org/r/20221202185156.696189-7-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Stable-dep-of: 6df696cd9bc1 ("arm64: errata: Mitigate Ampere1 erratum AC03_CPU_38 at stage-2") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-28	KVM: arm64: Restore GICv2-on-GICv3 functionality	Marc Zyngier	1	-4/+7
	commit 1caa71a7a600f7781ce05ef1e84701c459653663 upstream. When reworking the vgic locking, the vgic distributor registration got simplified, which was a very good cleanup. But just a tad too radical, as we now register the native vgic only, ignoring the GICv2-on-GICv3 that allows pre-historic VMs (or so I thought) to run. As it turns out, QEMU still defaults to GICv2 in some cases, and this breaks Nathan's setup! Fix it by propagating the requested vgic type rather than the host's version. Fixes: 59112e9c390b ("KVM: arm64: vgic: Fix a circular locking issue") Reported-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> link: https://lore.kernel.org/r/20230606221525.GA2269598@dev-arch.thelio-3990X Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-28	KVM: arm64: PMU: Restore the host's PMUSERENR_EL0	Reiji Watanabe	1	-2/+11
	[ Upstream commit 8681f71759010503892f9e3ddb05f65c0f21b690 ] Restore the host's PMUSERENR_EL0 value instead of clearing it, before returning back to userspace, as the host's EL0 might have a direct access to PMU registers (some bits of PMUSERENR_EL0 for might not be zero for the host EL0). Fixes: 83a7a4d643d3 ("arm64: perf: Enable PMU counter userspace access for perf event") Signed-off-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230603025035.3781797-2-reijiw@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09	KVM: arm64: Populate fault info for watchpoint	Akihiko Odaki	3	-2/+9
	commit 811154e234db72f0a11557a84ba9640f8b3bc823 upstream. When handling ESR_ELx_EC_WATCHPT_LOW, far_el2 member of struct kvm_vcpu_fault_info will be copied to far member of struct kvm_debug_exit_arch and exposed to the userspace. The userspace will see stale values from older faults if the fault info does not get populated. Fixes: 8fb2046180a0 ("KVM: arm64: Move early handlers to per-EC handlers") Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230530024651.10014-1-akihiko.odaki@daynix.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-09	KVM: arm64: vgic: Fix locking comment	Jean-Philippe Brucker	1	-1/+2
	[ Upstream commit c38b8400aef99d63be2b1ff131bb993465dcafe1 ] It is now config_lock that must be held, not kvm lock. Replace the comment with a lockdep annotation. Fixes: f00327731131 ("KVM: arm64: Use config_lock to protect vgic state") Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230518100914.2837292-4-jean-philippe@linaro.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09	KVM: arm64: vgic: Wrap vgic_its_create() with config_lock	Jean-Philippe Brucker	1	-4/+10
	[ Upstream commit 9cf2f840c439b6b23bd99f584f2917ca425ae406 ] vgic_its_create() changes the vgic state without holding the config_lock, which triggers a lockdep warning in vgic_v4_init(): [ 358.667941] WARNING: CPU: 3 PID: 178 at arch/arm64/kvm/vgic/vgic-v4.c:245 vgic_v4_init+0x15c/0x7a8 ... [ 358.707410] vgic_v4_init+0x15c/0x7a8 [ 358.708550] vgic_its_create+0x37c/0x4a4 [ 358.709640] kvm_vm_ioctl+0x1518/0x2d80 [ 358.710688] __arm64_sys_ioctl+0x7ac/0x1ba8 [ 358.711960] invoke_syscall.constprop.0+0x70/0x1e0 [ 358.713245] do_el0_svc+0xe4/0x2d4 [ 358.714289] el0_svc+0x44/0x8c [ 358.715329] el0t_64_sync_handler+0xf4/0x120 [ 358.716615] el0t_64_sync+0x190/0x194 Wrap the whole of vgic_its_create() with config_lock since, in addition to calling vgic_v4_init(), it also modifies the global kvm->arch.vgic state. Fixes: f00327731131 ("KVM: arm64: Use config_lock to protect vgic state") Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230518100914.2837292-3-jean-philippe@linaro.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09	KVM: arm64: vgic: Fix a circular locking issue	Jean-Philippe Brucker	6	-37/+51
	[ Upstream commit 59112e9c390be595224e427827475a6cd3726021 ] Lockdep reports a circular lock dependency between the srcu and the config_lock: [ 262.179917] -> #1 (&kvm->srcu){.+.+}-{0:0}: [ 262.182010] __synchronize_srcu+0xb0/0x224 [ 262.183422] synchronize_srcu_expedited+0x24/0x34 [ 262.184554] kvm_io_bus_register_dev+0x324/0x50c [ 262.185650] vgic_register_redist_iodev+0x254/0x398 [ 262.186740] vgic_v3_set_redist_base+0x3b0/0x724 [ 262.188087] kvm_vgic_addr+0x364/0x600 [ 262.189189] vgic_set_common_attr+0x90/0x544 [ 262.190278] vgic_v3_set_attr+0x74/0x9c [ 262.191432] kvm_device_ioctl+0x2a0/0x4e4 [ 262.192515] __arm64_sys_ioctl+0x7ac/0x1ba8 [ 262.193612] invoke_syscall.constprop.0+0x70/0x1e0 [ 262.195006] do_el0_svc+0xe4/0x2d4 [ 262.195929] el0_svc+0x44/0x8c [ 262.196917] el0t_64_sync_handler+0xf4/0x120 [ 262.198238] el0t_64_sync+0x190/0x194 [ 262.199224] [ 262.199224] -> #0 (&kvm->arch.config_lock){+.+.}-{3:3}: [ 262.201094] __lock_acquire+0x2b70/0x626c [ 262.202245] lock_acquire+0x454/0x778 [ 262.203132] __mutex_lock+0x190/0x8b4 [ 262.204023] mutex_lock_nested+0x24/0x30 [ 262.205100] vgic_mmio_write_v3_misc+0x5c/0x2a0 [ 262.206178] dispatch_mmio_write+0xd8/0x258 [ 262.207498] __kvm_io_bus_write+0x1e0/0x350 [ 262.208582] kvm_io_bus_write+0xe0/0x1cc [ 262.209653] io_mem_abort+0x2ac/0x6d8 [ 262.210569] kvm_handle_guest_abort+0x9b8/0x1f88 [ 262.211937] handle_exit+0xc4/0x39c [ 262.212971] kvm_arch_vcpu_ioctl_run+0x90c/0x1c04 [ 262.214154] kvm_vcpu_ioctl+0x450/0x12f8 [ 262.215233] __arm64_sys_ioctl+0x7ac/0x1ba8 [ 262.216402] invoke_syscall.constprop.0+0x70/0x1e0 [ 262.217774] do_el0_svc+0xe4/0x2d4 [ 262.218758] el0_svc+0x44/0x8c [ 262.219941] el0t_64_sync_handler+0xf4/0x120 [ 262.221110] el0t_64_sync+0x190/0x194 Note that the current report, which can be triggered by the vgic_irq kselftest, is a triple chain that includes slots_lock, but after inverting the slots_lock/config_lock dependency, the actual problem reported above remains. In several places, the vgic code calls kvm_io_bus_register_dev(), which synchronizes the srcu, while holding config_lock (#1). And the MMIO handler takes the config_lock while holding the srcu read lock (#0). Break dependency #1, by registering the distributor and redistributors without holding config_lock. The ITS also uses kvm_io_bus_register_dev() but already relies on slots_lock to serialize calls. The distributor iodev is created on the first KVM_RUN call. Multiple threads will race for vgic initialization, and only the first one will see !vgic_ready() under the lock. To serialize those threads, rely on slots_lock rather than config_lock. Redistributors are created earlier, through KVM_DEV_ARM_VGIC_GRP_ADDR ioctls and vCPU creation. Similarly, serialize the iodev creation with slots_lock, and the rest with config_lock. Fixes: f00327731131 ("KVM: arm64: Use config_lock to protect vgic state") Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230518100914.2837292-2-jean-philippe@linaro.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-11	KVM: arm64: vgic: Don't acquire its_lock before config_lock	Oliver Upton	1	-3/+12
	commit 49e5d16b6fc003407a33a9961b4bcbb970bd1c76 upstream. commit f00327731131 ("KVM: arm64: Use config_lock to protect vgic state") was meant to rectify a longstanding lock ordering issue in KVM where the kvm->lock is taken while holding vcpu->mutex. As it so happens, the aforementioned commit introduced yet another locking issue by acquiring the its_lock before acquiring the config lock. This is obviously wrong, especially considering that the lock ordering is well documented in vgic.c. Reshuffle the locks once more to take the config_lock before the its_lock. While at it, sprinkle in the lockdep hinting that has become popular as of late to keep lockdep apprised of our ordering. Cc: stable@vger.kernel.org Fixes: f00327731131 ("KVM: arm64: Use config_lock to protect vgic state") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230412062733.988229-1-oliver.upton@linux.dev Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-11	KVM: arm64: Use config_lock to protect vgic state	Oliver Upton	8	-60/+88
	commit f00327731131d1b5aa6a1aa9f50bcf8d620ace4c upstream. Almost all of the vgic state is VM-scoped but accessed from the context of a vCPU. These accesses were serialized on the kvm->lock which cannot be nested within a vcpu->mutex critical section. Move over the vgic state to using the config_lock. Tweak the lock ordering where necessary to ensure that the config_lock is acquired after the vcpu->mutex. Acquire the config_lock in kvm_vgic_create() to avoid a race between the converted flows and GIC creation. Where necessary, continue to acquire kvm->lock to avoid a race with vCPU creation (i.e. flows that use lock_all_vcpus()). Finally, promote the locking expectations in comments to lockdep assertions and update the locking documentation for the config_lock as well as vcpu->mutex. Cc: stable@vger.kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230327164747.2466958-5-oliver.upton@linux.dev Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-11	KVM: arm64: Use config_lock to protect data ordered against KVM_RUN	Oliver Upton	4	-21/+12
	commit 4bba7f7def6f278266dadf845da472cfbfed784e upstream. There are various bits of VM-scoped data that can only be configured before the first call to KVM_RUN, such as the hypercall bitmaps and the PMU. As these fields are protected by the kvm->lock and accessed while holding vcpu->mutex, this is yet another example of lock inversion. Change out the kvm->lock for kvm->arch.config_lock in all of these instances. Opportunistically simplify the locking mechanics of the PMU configuration by holding the config_lock for the entirety of kvm_arm_pmu_v3_set_attr(). Note that this also addresses a couple of bugs. There is an unguarded read of the PMU version in KVM_ARM_VCPU_PMU_V3_FILTER which could race with KVM_ARM_VCPU_PMU_V3_SET_PMU. Additionally, until now writes to the per-vCPU vPMU irq were not serialized VM-wide, meaning concurrent calls to KVM_ARM_VCPU_PMU_V3_IRQ could lead to a false positive in pmu_irq_is_valid(). Cc: stable@vger.kernel.org Tested-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230327164747.2466958-4-oliver.upton@linux.dev Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-11	KVM: arm64: Avoid lock inversion when setting the VM register width	Oliver Upton	2	-3/+21
	commit c43120afb5c66a3465c7468f5cf9806a26484cde upstream. kvm->lock must be taken outside of the vcpu->mutex. Of course, the locking documentation for KVM makes this abundantly clear. Nonetheless, the locking order in KVM/arm64 has been wrong for quite a while; we acquire the kvm->lock while holding the vcpu->mutex all over the shop. All was seemingly fine until commit 42a90008f890 ("KVM: Ensure lockdep knows about kvm->lock vs. vcpu->mutex ordering rule") caught us with our pants down, leading to lockdep barfing: ====================================================== WARNING: possible circular locking dependency detected 6.2.0-rc7+ #19 Not tainted ------------------------------------------------------ qemu-system-aar/859 is trying to acquire lock: ffff5aa69269eba0 (&host_kvm->lock){+.+.}-{3:3}, at: kvm_reset_vcpu+0x34/0x274 but task is already holding lock: ffff5aa68768c0b8 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x8c/0xba0 which lock already depends on the new lock. Add a dedicated lock to serialize writes to VM-scoped configuration from the context of a vCPU. Protect the register width flags with the new lock, thus avoiding the need to grab the kvm->lock while holding vcpu->mutex in kvm_reset_vcpu(). Cc: stable@vger.kernel.org Reported-by: Jeremy Linton <jeremy.linton@arm.com> Link: https://lore.kernel.org/kvmarm/f6452cdd-65ff-34b8-bab0-5c06416da5f6@arm.com/ Tested-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230327164747.2466958-3-oliver.upton@linux.dev Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-11	KVM: arm64: Avoid vcpu->mutex v. kvm->lock inversion in CPU_ON	Oliver Upton	3	-25/+43
	commit 0acc7239c20a8401b8968c2adace8f7c9b0295ae upstream. KVM/arm64 had the lock ordering backwards on vcpu->mutex and kvm->lock from the very beginning. One such example is the way vCPU resets are handled: the kvm->lock is acquired while handling a guest CPU_ON PSCI call. Add a dedicated lock to serialize writes to kvm_vcpu_arch::{mp_state, reset_state}. Promote all accessors of mp_state to {READ,WRITE}_ONCE() as readers do not acquire the mp_state_lock. While at it, plug yet another race by taking the mp_state_lock in the KVM_SET_MP_STATE ioctl handler. As changes to MP state are now guarded with a dedicated lock, drop the kvm->lock acquisition from the PSCI CPU_ON path. Similarly, move the reader of reset_state outside of the kvm->lock and instead protect it with the mp_state_lock. Note that writes to reset_state::reset have been demoted to regular stores as both readers and writers acquire the mp_state_lock. While the kvm->lock inversion still exists in kvm_reset_vcpu(), at least now PSCI CPU_ON no longer depends on it for serializing vCPU reset. Cc: stable@vger.kernel.org Tested-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230327164747.2466958-2-oliver.upton@linux.dev Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-01	KVM: arm64: Retry fault if vma_lookup() results become invalid	David Matlack	1	-26/+21
	commit 13ec9308a85702af7c31f3638a2720863848a7f2 upstream. Read mmu_invalidate_seq before dropping the mmap_lock so that KVM can detect if the results of vma_lookup() (e.g. vma_shift) become stale before it acquires kvm->mmu_lock. This fixes a theoretical bug where a VMA could be changed by userspace after vma_lookup() and before KVM reads the mmu_invalidate_seq, causing KVM to install page table entries based on a (possibly) no-longer-valid vma_shift. Re-order the MMU cache top-up to earlier in user_mem_abort() so that it is not done after KVM has read mmu_invalidate_seq (i.e. so as to avoid inducing spurious fault retries). This bug has existed since KVM/ARM's inception. It's unlikely that any sane userspace currently modifies VMAs in such a way as to trigger this race. And even with directed testing I was unable to reproduce it. But a sufficiently motivated host userspace might be able to exploit this race. Fixes: 94f8e6418d39 ("KVM: ARM: Handle guest faults in KVM") Cc: stable@vger.kernel.org Reported-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230313235454.2964067-1-dmatlack@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev> [will: Use FSC_PERM instead of ESR_ELx_FSC_PERM] Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-26	KVM: arm64: Fix buffer overflow in kvm_arm_set_fw_reg()	Dan Carpenter	1	-0/+2
	commit a25bc8486f9c01c1af6b6c5657234b2eee2c39d6 upstream. The KVM_REG_SIZE() comes from the ioctl and it can be a power of two between 0-32768 but if it is more than sizeof(long) this will corrupt memory. Fixes: 99adb567632b ("KVM: arm/arm64: Add save/restore support for firmware workaround state") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/4efbab8c-640f-43b2-8ac6-6d68e08280fe@kili.mountain Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20	KVM: arm64: Advertise ID_AA64PFR0_EL1.CSV2/3 to protected VMs	Fuad Tabba	3	-9/+29
	[ Upstream commit e81625218bf7986ba1351a98c43d346b15601d26 ] The existing pKVM code attempts to advertise CSV2/3 using values initialized to 0, but never set. To advertise CSV2/3 to protected guests, pass the CSV2/3 values to hyp when initializing hyp's view of guests' ID_AA64PFR0_EL1. Similar to non-protected KVM, these are system-wide, rather than per cpu, for simplicity. Fixes: 6c30bfb18d0b ("KVM: arm64: Add handlers for protected VM System Registers") Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20230404152321.413064-1-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-20	KVM: arm64: Initialise hypervisor copies of host symbols unconditionally	Will Deacon	1	-6/+9
	[ Upstream commit 6c165223e9a6384aa1e934b90f2650e71adb972a ] The nVHE object at EL2 maintains its own copies of some host variables so that, when pKVM is enabled, the host cannot directly modify the hypervisor state. When running in normal nVHE mode, however, these variables are still mirrored at EL2 but are not initialised. Initialise the hypervisor symbols from the host copies regardless of pKVM, ensuring that any reference to this data at EL2 with normal nVHE will return a sensibly initialised value. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-16-will@kernel.org Stable-dep-of: e81625218bf7 ("KVM: arm64: Advertise ID_AA64PFR0_EL1.CSV2/3 to protected VMs") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-20	KVM: arm64: PMU: Restore the guest's EL0 event counting after migration	Reiji Watanabe	2	-1/+1
	commit f9ea835e99bc8d049bf2a3ec8fa5a7cb4fcade23 upstream. Currently, with VHE, KVM enables the EL0 event counting for the guest on vcpu_load() or KVM enables it as a part of the PMU register emulation process, when needed. However, in the migration case (with VHE), the same handling is lacking, as vPMU register values that were restored by userspace haven't been propagated yet (the PMU events haven't been created) at the vcpu load-time on the first KVM_RUN (kvm_vcpu_pmu_restore_guest() called from vcpu_load() on the first KVM_RUN won't do anything as events_{guest,host} of kvm_pmu_events are still zero). So, with VHE, enable the guest's EL0 event counting on the first KVM_RUN (after the migration) when needed. More specifically, have kvm_pmu_handle_pmcr() call kvm_vcpu_pmu_restore_guest() so that kvm_pmu_handle_pmcr() on the first KVM_RUN can take care of it. Fixes: d0c94c49792c ("KVM: arm64: Restore PMU configuration on first run") Cc: stable@vger.kernel.org Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Reiji Watanabe <reijiw@google.com> Link: https://lore.kernel.org/r/20230329023944.2488484-1-reijiw@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-13	KVM: arm64: PMU: Don't save PMCR_EL0.{C,P} for the vCPU	Reiji Watanabe	1	-1/+2
	[ Upstream commit f6da81f650fa47b61b847488f3938d43f90d093d ] Presently, when a guest writes 1 to PMCR_EL0.{C,P}, which is WO/RAZ, KVM saves the register value, including these bits. When userspace reads the register using KVM_GET_ONE_REG, KVM returns the saved register value as it is (the saved value might have these bits set). This could result in userspace setting these bits on the destination during migration. Consequently, KVM may end up resetting the vPMU counter registers (PMCCNTR_EL0 and/or PMEVCNTR<n>_EL0) to zero on the first KVM_RUN after migration. Fix this by not saving those bits when a guest writes 1 to those bits. Fixes: ab9468340d2b ("arm64: KVM: Add access handler for PMCR register") Cc: stable@vger.kernel.org Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Reiji Watanabe <reijiw@google.com> Link: https://lore.kernel.org/r/20230313033234.1475987-1-reijiw@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-13	KVM: arm64: PMU: Sanitise PMCR_EL0.LP on first vcpu run	Marc Zyngier	2	-2/+6
	[ Upstream commit 64d6820d64c0a206e744bd8945374d563a76c16c ] Userspace can play some dirty tricks on us by selecting a given PMU version (such as PMUv3p5), restore a PMCR_EL0 value that has PMCR_EL0.LP set, and then switch the PMU version to PMUv3p1, for example. In this situation, we end-up with PMCR_EL0.LP being set and spreading havoc in the PMU emulation. This is specially hard as the first two step can be done on one vcpu and the third step on another, meaning that we need to sanitise all vcpus when the PMU version is changed. In orer to avoid a pretty complicated locking situation, defer the sanitisation of PMCR_EL0 to the point where the vcpu is actually run for the first tine, using the existing KVM_REQ_RELOAD_PMU request that calls into kvm_pmu_handle_pmcr(). There is still an obscure corner case where userspace could do the above trick, and then save the VM without running it. They would then observe an inconsistent state (PMUv3.1 + LP set), but that state will be fixed on the first run anyway whenever the guest gets restored on a host. Reported-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Stable-dep-of: f6da81f650fa ("KVM: arm64: PMU: Don't save PMCR_EL0.{C,P} for the vCPU") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-13	KVM: arm64: PMU: Distinguish between 64bit counter and 64bit overflow	Marc Zyngier	1	-12/+31
	[ Upstream commit c82d28cbf1d4f9fe174041b4485c635cb970afa7 ] The PMU architecture makes a subtle difference between a 64bit counter and a counter that has a 64bit overflow. This is for example the case of the cycle counter, which can generate an overflow on a 32bit boundary if PMCR_EL0.LC==0 despite the accumulation being done on 64 bits. Use this distinction in the few cases where it matters in the code, as we will reuse this with PMUv3p5 long counters. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221113163832.3154370-5-maz@kernel.org Stable-dep-of: f6da81f650fa ("KVM: arm64: PMU: Don't save PMCR_EL0.{C,P} for the vCPU") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-13	KVM: arm64: PMU: Align chained counter implementation with architecture ↵	Marc Zyngier	1	-234/+86
	pseudocode [ Upstream commit bead02204e9806807bb290137b1ccabfcb4b16fd ] Ricardo recently pointed out that the PMU chained counter emulation in KVM wasn't quite behaving like the one on actual hardware, in the sense that a chained counter would expose an overflow on both halves of a chained counter, while KVM would only expose the overflow on the top half. The difference is subtle, but significant. What does the architecture say (DDI0087 H.a): - Up to PMUv3p4, all counters but the cycle counter are 32bit - A 32bit counter that overflows generates a CHAIN event on the adjacent counter after exposing its own overflow status - The CHAIN event is accounted if the counter is correctly configured (CHAIN event selected and counter enabled) This all means that our current implementation (which uses 64bit perf events) prevents us from emulating this overflow on the lower half. How to fix this? By implementing the above, to the letter. This largely results in code deletion, removing the notions of "counter pair", "chained counters", and "canonical counter". The code is further restructured to make the CHAIN handling similar to SWINC, as the two are now extremely similar in behaviour. Reported-by: Ricardo Koller <ricarkol@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Reiji Watanabe <reijiw@google.com> Link: https://lore.kernel.org/r/20221113163832.3154370-3-maz@kernel.org Stable-dep-of: f6da81f650fa ("KVM: arm64: PMU: Don't save PMCR_EL0.{C,P} for the vCPU") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-06	KVM: arm64: Disable interrupts while walking userspace PTs	Marc Zyngier	1	-7/+38
	commit e86fc1a3a3e9b4850fe74d738e3cfcf4297d8bba upstream. We walk the userspace PTs to discover what mapping size was used there. However, this can race against the userspace tables being freed, and we end-up in the weeds. Thankfully, the mm code is being generous and will IPI us when doing so. So let's implement our part of the bargain and disable interrupts around the walk. This ensures that nothing terrible happens during that time. We still need to handle the removal of the page tables before the walk. For that, allow get_user_mapping_size() to return an error, and make sure this error can be propagated all the way to the the exit handler. Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230316174546.3777507-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-06	KVM: arm64: PMU: Fix GET_ONE_REG for vPMC regs to return the current value	Reiji Watanabe	1	-2/+19
	commit 9228b26194d1cc00449f12f306f53ef2e234a55b upstream. Have KVM_GET_ONE_REG for vPMU counter (vPMC) registers (PMCCNTR_EL0 and PMEVCNTR<n>_EL0) return the sum of the register value in the sysreg file and the current perf event counter value. Values of vPMC registers are saved in sysreg files on certain occasions. These saved values don't represent the current values of the vPMC registers if the perf events for the vPMCs count events after the save. The current values of those registers are the sum of the sysreg file value and the current perf event counter value. But, when userspace reads those registers (using KVM_GET_ONE_REG), KVM returns the sysreg file value to userspace (not the sum value). Fix this to return the sum value for KVM_GET_ONE_REG. Fixes: 051ff581ce70 ("arm64: KVM: Add access handler for event counter register") Cc: stable@vger.kernel.org Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Reiji Watanabe <reijiw@google.com> Link: https://lore.kernel.org/r/20230313033208.1475499-1-reijiw@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11	arm64: mte: Fix/clarify the PG_mte_tagged semantics	Catalin Marinas	2	-4/+4
	commit e059853d14ca4ed0f6a190d7109487918a22a976 upstream. Currently the PG_mte_tagged page flag mostly means the page contains valid tags and it should be set after the tags have been cleared or restored. However, in mte_sync_tags() it is set before setting the tags to avoid, in theory, a race with concurrent mprotect(PROT_MTE) for shared pages. However, a concurrent mprotect(PROT_MTE) with a copy on write in another thread can cause the new page to have stale tags. Similarly, tag reading via ptrace() can read stale tags if the PG_mte_tagged flag is set before actually clearing/restoring the tags. Fix the PG_mte_tagged semantics so that it is only set after the tags have been cleared or restored. This is safe for swap restoring into a MAP_SHARED or CoW page since the core code takes the page lock. Add two functions to test and set the PG_mte_tagged flag with acquire and release semantics. The downside is that concurrent mprotect(PROT_MTE) on a MAP_SHARED page may cause tag loss. This is already the case for KVM guests if a VMM changes the page protection while the guest triggers a user_mem_abort(). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> [pcc@google.com: fix build with CONFIG_ARM64_MTE disabled] Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Steven Price <steven.price@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Peter Collingbourne <pcc@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221104011041.290951-3-pcc@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-01	KVM: arm64: GICv4.1: Fix race with doorbell on VPE activation/deactivation	Marc Zyngier	3	-16/+18
	commit ef3691683d7bfd0a2acf48812e4ffe894f10bfa8 upstream. To save the vgic LPI pending state with GICv4.1, the VPEs must all be unmapped from the ITSs so that the sGIC caches can be flushed. The opposite is done once the state is saved. This is all done by using the activate/deactivate irqdomain callbacks directly from the vgic code. Crutially, this is done without holding the irqdesc lock for the interrupts that represent the VPE. And these callbacks are changing the state of the irqdesc. What could possibly go wrong? If a doorbell fires while we are messing with the irqdesc state, it will acquire the lock and change the interrupt state concurrently. Since we don't hole the lock, curruption occurs in on the interrupt state. Oh well. While acquiring the lock would fix this (and this was Shanker's initial approach), this is still a layering violation we could do without. A better approach is actually to free the VPE interrupt, do what we have to do, and re-request it. It is more work, but this usually happens only once in the lifetime of the VM and we don't really care about this sort of overhead. Fixes: f66b7b151e00 ("KVM: arm64: GICv4.1: Try to save VLPI state in save_pending_tables") Reported-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230118022348.4137094-1-sdonthineni@nvidia.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-01	KVM: arm64: Fix SMPRI_EL1/TPIDR2_EL0 trapping on VHE	Marc Zyngier	3	-34/+20
	The trapping of SMPRI_EL1 and TPIDR2_EL0 currently only really work on nVHE, as only this mode uses the fine-grained trapping that controls these two registers. Move the trapping enable/disable code into __{de,}activate_traps_common(), allowing it to be called when it actually matters on VHE, and remove the flipping of EL2 control for TPIDR2_EL0, which only affects the host access of this register. Fixes: 861262ab8627 ("KVM: arm64: Handle SME host state when running guests") Reported-by: Mark Brown <broonie@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/86bkpqer4z.wl-maz@kernel.org
2022-10-27	KVM: arm64: Fix bad dereference on MTE-enabled systems	Ryan Roberts	1	-1/+2
	enter_exception64() performs an MTE check, which involves dereferencing vcpu->kvm. While vcpu has already been fixed up to be a HYP VA pointer, kvm is still a pointer in the kernel VA space. This only affects nVHE configurations with MTE enabled, as in other cases, the pointer is either valid (VHE) or not dereferenced (!MTE). Fix this by first converting kvm to a HYP VA pointer. Fixes: ea7fc1bb1cd1 ("KVM: arm64: Introduce MTE VM feature") Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> [maz: commit message tidy-up] Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221027120945.29679-1-ryan.roberts@arm.com
2022-10-25	KVM: arm64: Use correct accessor to parse stage-1 PTEs	Quentin Perret	1	-1/+1
	hyp_get_page_state() is used with pKVM to retrieve metadata about a page by parsing a hypervisor stage-1 PTE. However, it incorrectly uses a helper which parses stage-2 mappings. Ouch. Luckily, pkvm_getstate() only looks at the software bits, which happen to be in the same place for stage-1 and stage-2 PTEs, and this all ends up working correctly by accident. But clearly, we should do better. Fix hyp_get_page_state() to use the correct helper. Fixes: e82edcc75c4e ("KVM: arm64: Implement do_share() helper for sharing memory") Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221025145156.855308-1-qperret@google.com
2022-10-22	Merge tag 'kvmarm-fixes-6.1-2' of ↵	Paolo Bonzini	2	-1/+8
	git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.1, take #2 - Fix a bug preventing restoring an ITS containing mappings for very large and very sparse device topology - Work around a relocation handling error when compiling the nVHE object with profile optimisation
2022-10-22	Merge tag 'kvmarm-fixes-6.1-1' of ↵	Paolo Bonzini	3	-5/+12
	git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.1, take #1 - Fix for stage-2 invalidation holding the VM MMU lock for too long by limiting the walk to the largest block mapping size - Enable stack protection and branch profiling for VHE - Two selftest fixes
2022-10-15	KVM: arm64: vgic: Fix exit condition in scan_its_table()	Eric Ren	1	-1/+4
	With some PCIe topologies, restoring a guest fails while parsing the ITS device tables. Reproducer hints: 1. Create ARM virt VM with pxb-pcie bus which adds extra host bridges, with qemu command like: ``` -device pxb-pcie,bus_nr=8,id=pci.x,numa_node=0,bus=pcie.0 \ -device pcie-root-port,..,bus=pci.x \ ... -device pxb-pcie,bus_nr=37,id=pci.y,numa_node=1,bus=pcie.0 \ -device pcie-root-port,..,bus=pci.y \ ... ``` 2. Ensure the guest uses 2-level device table 3. Perform VM migration which calls save/restore device tables In that setup, we get a big "offset" between 2 device_ids, which makes unsigned "len" round up a big positive number, causing the scan loop to continue with a bad GPA. For example: 1. L1 table has 2 entries; 2. and we are now scanning at L2 table entry index 2075 (pointed to by L1 first entry) 3. if next device id is 9472, we will get a big offset: 7397; 4. with unsigned 'len', 'len -= offset * esz', len will underflow to a positive number, mistakenly into next iteration with a bad GPA; (It should break out of the current L2 table scanning, and jump into the next L1 table entry) 5. that bad GPA fails the guest read. Fix it by stopping the L2 table scan when the next device id is outside of the current table, allowing the scan to continue from the next L1 table entry. Thanks to Eric Auger for the fix suggestion. Fixes: 920a7a8fa92a ("KVM: arm64: vgic-its: Add infrastructure for tableookup") Suggested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Eric Ren <renzhengeek@gmail.com> [maz: commit message tidy-up] Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/d9c3a564af9e2c5bf63f48a7dcbf08cd593c5c0b.1665802985.git.renzhengeek@gmail.com
2022-10-15	KVM: arm64: nvhe: Fix build with profile optimization	Denis Nikitin	1	-0/+4
	Kernel build with clang and KCFLAGS=-fprofile-sample-use=<profile> fails with: error: arch/arm64/kvm/hyp/nvhe/kvm_nvhe.tmp.o: Unexpected SHT_REL section ".rel.llvm.call-graph-profile" Starting from 13.0.0 llvm can generate SHT_REL section, see https://reviews.llvm.org/rGca3bdb57fa1ac98b711a735de048c12b5fdd8086. gen-hyprel does not support SHT_REL relocation section. Filter out profile use flags to fix the build with profile optimization. Signed-off-by: Denis Nikitin <denik@chromium.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221014184532.3153551-1-denik@chromium.org