summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-08-20Merge branch kvm-arm64/generic-entry into kvmarm-master/nextMarc Zyngier5-28/+52
Switch KVM/arm64 to the generic entry code, courtesy of Oliver Upton * kvm-arm64/generic-entry: KVM: arm64: Use generic KVM xfer to guest work function entry: KVM: Allow use of generic KVM entry w/o full generic support KVM: arm64: Record number of signal exits as a vCPU stat Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20Merge branch kvm-arm64/psci/cpu_on into kvmarm-master/nextMarc Zyngier7-9/+156
PSCI fixes from Oliver Upton: - Plug race on reset - Ensure that a pending reset is applied before userspace accesses - Reject PSCI requests with illegal affinity bits * kvm-arm64/psci/cpu_on: selftests: KVM: Introduce psci_cpu_on_test KVM: arm64: Enforce reserved bits for PSCI target affinities KVM: arm64: Handle PSCI resets before userspace touches vCPU state KVM: arm64: Fix read-side race on updates to vcpu reset state Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20Merge branch kvm-arm64/mmu/el2-tracking into kvmarm-master/nextMarc Zyngier15-306/+618
* kvm-arm64/mmu/el2-tracking: (25 commits) : Enable tracking of page sharing between host EL1 and EL2 KVM: arm64: Minor optimization of range_is_memory KVM: arm64: Make hyp_panic() more robust when protected mode is enabled KVM: arm64: Return -EPERM from __pkvm_host_share_hyp() KVM: arm64: Make __pkvm_create_mappings static KVM: arm64: Restrict EL2 stage-1 changes in protected mode KVM: arm64: Refactor protected nVHE stage-1 locking KVM: arm64: Remove __pkvm_mark_hyp KVM: arm64: Mark host bss and rodata section as shared KVM: arm64: Enable retrieving protections attributes of PTEs KVM: arm64: Introduce addr_is_memory() KVM: arm64: Expose pkvm_hyp_id KVM: arm64: Expose host stage-2 manipulation helpers KVM: arm64: Add helpers to tag shared pages in SW bits KVM: arm64: Allow populating software bits KVM: arm64: Enable forcing page-level stage-2 mappings KVM: arm64: Tolerate re-creating hyp mappings to set software bits KVM: arm64: Don't overwrite software bits with owner id KVM: arm64: Rename KVM_PTE_LEAF_ATTR_S2_IGNORED KVM: arm64: Optimize host memory aborts KVM: arm64: Expose page-table helpers ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20Merge branch kvm-arm64/mmu/kmemleak-pkvm into kvmarm-master/nextMarc Zyngier2-2/+9
Prevent kmemleak from peeking into the HYP data, which is fatal in protected mode. * kvm-arm64/mmu/kmemleak-pkvm: KVM: arm64: Unregister HYP sections from kmemleak in protected mode arm64: Move .hyp.rodata outside of the _sdata.._edata range Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20Merge branch kvm-arm64/misc-5.15 into kvmarm-master/nextMarc Zyngier14-135/+112
* kvm-arm64/misc-5.15: : Misc improvements for 5.15: : : - Account the number of VMID-wide TLB invalidations as : remote TLB flushes : - Fix comments in the VGIC code : - Cleanup the PMU IMPDEF identification : - Streamline the TGRAN2 usage : - Avoid advertising a 52bit IPA range for non-64KB configs : - Avoid spurious signalling when a HW-mapped interrupt is in the : A+P state on entry, and in the P state on exit, but that the : physical line is not pending anymore. : - Bunch of minor cleanups KVM: arm64: vgic: Resample HW pending state on deactivation KVM: arm64: vgic: Drop WARN from vgic_get_irq KVM: arm64: Drop unused REQUIRES_VIRT KVM: arm64: Drop check_kvm_target_cpu() based percpu probe KVM: arm64: Drop init_common_resources() KVM: arm64: Use ARM64_MIN_PARANGE_BITS as the minimum supported IPA arm64/mm: Add remaining ID_AA64MMFR0_PARANGE_ macros KVM: arm64: Restrict IPA size to maximum 48 bits on 4K and 16K page size arm64/mm: Define ID_AA64MMFR0_TGRAN_2_SHIFT KVM: arm64: perf: Replace '0xf' instances with ID_AA64DFR0_PMUVER_IMP_DEF KVM: arm64: Fix comments related to GICv2 PMR reporting KVM: arm64: Count VMID-wide TLB invalidations arm64/kexec: Test page size support with new TGRAN range values Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20Merge branch kvm-arm64/mmu/mapping-levels into kvmarm-master/nextMarc Zyngier6-63/+98
Revamp the KVM/arm64 THP code by parsing the userspace page tables instead of relying on an infrastructure that is about to disappear (we are the last user). * kvm-arm64/mmu/mapping-levels: KVM: Get rid of kvm_get_pfn() KVM: arm64: Use get_page() instead of kvm_get_pfn() KVM: Remove kvm_is_transparent_hugepage() and PageTransCompoundMap() KVM: arm64: Avoid mapping size adjustment on permission fault KVM: arm64: Walk userspace page tables to compute the THP mapping size KVM: arm64: Introduce helper to retrieve a PTE and its level Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20Merge branch kvm-arm64/pmu/reset-values into kvmarm-master/nextMarc Zyngier3-12/+67
Fix the reset values for our PMU emulation. As a side effect, it allows a nice optimisation by only tracking the in-use counters when flipping them on and off, now that we are guaranteed not to have any spurious bit set. * kvm-arm64/pmu/reset-values: KVM: arm64: Remove PMSWINC_EL0 shadow register KVM: arm64: Disabling disabled PMU counters wastes a lot of time KVM: arm64: Drop unnecessary masking of PMU registers KVM: arm64: Narrow PMU sysreg reset values to architectural requirements Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20KVM: arm64: Minor optimization of range_is_memoryDavid Brazdil1-5/+8
Currently range_is_memory finds the corresponding struct memblock_region for both the lower and upper bounds of the given address range with two rounds of binary search, and then checks that the two memblocks are the same. Simplify this by only doing binary search on the lower bound and then checking that the upper bound is in the same memblock. Signed-off-by: David Brazdil <dbrazdil@google.com> Reviewed-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210728153232.1018911-3-dbrazdil@google.com
2021-08-20Merge tag 'kvmarm-fixes-5.14-2' into kvm-arm64/mmu/el2-trackingMarc Zyngier2-5/+9
KVM/arm64 fixes for 5.14, take #2 - Plug race between enabling MTE and creating vcpus - Fix off-by-one bug when checking whether an address range is RAM Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20Merge branch arm64/for-next/sysreg into kvm-arm64/misc-5.15Marc Zyngier2-15/+22
Merge the arm64/for-next/sysreg branch to avoid merge conflicts in -next and upstream. * arm64/for-next/sysreg: arm64/kexec: Test page size support with new TGRAN range values Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20KVM: arm64: vgic: Resample HW pending state on deactivationMarc Zyngier4-62/+50
When a mapped level interrupt (a timer, for example) is deactivated by the guest, the corresponding host interrupt is equally deactivated. However, the fate of the pending state still needs to be dealt with in SW. This is specially true when the interrupt was in the active+pending state in the virtual distributor at the point where the guest was entered. On exit, the pending state is potentially stale (the guest may have put the interrupt in a non-pending state). If we don't do anything, the interrupt will be spuriously injected in the guest. Although this shouldn't have any ill effect (spurious interrupts are always possible), we can improve the emulation by detecting the deactivation-while-pending case and resample the interrupt. While we're at it, move the logic into a common helper that can be shared between the two GIC implementations. Fixes: e40cc57bac79 ("KVM: arm/arm64: vgic: Support level-triggered mapped interrupts") Reported-by: Raghavendra Rao Ananta <rananta@google.com> Tested-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210819180305.1670525-1-maz@kernel.org
2021-08-19KVM: arm64: vgic: Drop WARN from vgic_get_irqRicardo Koller1-1/+0
vgic_get_irq(intid) is used all over the vgic code in order to get a reference to a struct irq. It warns whenever intid is not a valid number (like when it's a reserved IRQ number). The issue is that this warning can be triggered from userspace (e.g., KVM_IRQ_LINE for intid 1020). Drop the WARN call from vgic_get_irq. Signed-off-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210818213205.598471-1-ricarkol@google.com
2021-08-19KVM: arm64: Use generic KVM xfer to guest work functionOliver Upton2-28/+45
Clean up handling of checks for pending work by switching to the generic infrastructure to do so. We pick up handling for TIF_NOTIFY_RESUME from this switch, meaning that task work will be correctly handled. Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210802192809.1851010-4-oupton@google.com
2021-08-19entry: KVM: Allow use of generic KVM entry w/o full generic supportOliver Upton1-1/+5
Some architectures (e.g. arm64) have yet to adopt the generic entry infrastructure. Despite that, it would be nice to use some common plumbing for guest entry/exit handling. For example, KVM/arm64 currently does not handle TIF_NOTIFY_PENDING correctly. Allow use of only the generic KVM entry code by tightening up the include list. No functional change intended. Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210802192809.1851010-3-oupton@google.com
2021-08-19KVM: arm64: Record number of signal exits as a vCPU statOliver Upton3-0/+3
Most other architectures that implement KVM record a statistic indicating the number of times a vCPU has exited due to a pending signal. Add support for that stat to arm64. Reviewed-by: Jing Zhang <jingzhangos@google.com> Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210802192809.1851010-2-oupton@google.com
2021-08-19selftests: KVM: Introduce psci_cpu_on_testOliver Upton4-0/+126
Introduce a test for aarch64 that ensures CPU resets induced by PSCI are reflected in the target vCPU's state, even if the target is never run again. This is a regression test for a race between vCPU migration and PSCI. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210818202133.1106786-5-oupton@google.com
2021-08-19KVM: arm64: Enforce reserved bits for PSCI target affinitiesOliver Upton1-3/+12
According to the PSCI specification, ARM DEN 0022D, 5.1.4 "CPU_ON", the CPU_ON function takes a target_cpu argument that is bit-compatible with the affinity fields in MPIDR_EL1. All other bits in the argument are RES0. Note that the same constraints apply to the target_affinity argument for the AFFINITY_INFO call. Enforce the spec by returning INVALID_PARAMS if a guest incorrectly sets a RES0 bit. Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210818202133.1106786-4-oupton@google.com
2021-08-19KVM: arm64: Handle PSCI resets before userspace touches vCPU stateOliver Upton1-0/+8
The CPU_ON PSCI call takes a payload that KVM uses to configure a destination vCPU to run. This payload is non-architectural state and not exposed through any existing UAPI. Effectively, we have a race between CPU_ON and userspace saving/restoring a guest: if the target vCPU isn't ran again before the VMM saves its state, the requested PC and context ID are lost. When restored, the target vCPU will be runnable and start executing at its old PC. We can avoid this race by making sure the reset payload is serviced before userspace can access a vCPU's state. Fixes: 358b28f09f0a ("arm/arm64: KVM: Allow a VCPU to fully reset itself") Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210818202133.1106786-3-oupton@google.com
2021-08-19KVM: arm64: Fix read-side race on updates to vcpu reset stateOliver Upton1-6/+10
KVM correctly serializes writes to a vCPU's reset state, however since we do not take the KVM lock on the read side it is entirely possible to read state from two different reset requests. Cure the race for now by taking the KVM lock when reading the reset_state structure. Fixes: 358b28f09f0a ("arm/arm64: KVM: Allow a VCPU to fully reset itself") Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210818202133.1106786-2-oupton@google.com
2021-08-18KVM: arm64: Make hyp_panic() more robust when protected mode is enabledWill Deacon2-13/+31
When protected mode is enabled, the host is unable to access most parts of the EL2 hypervisor image, including 'hyp_physvirt_offset' and the contents of the hypervisor's '.rodata.str' section. Unfortunately, nvhe_hyp_panic_handler() tries to read from both of these locations when handling a BUG() triggered at EL2; the former for converting the ELR to a physical address and the latter for displaying the name of the source file where the BUG() occurred. Hack the EL2 panic asm to pass both physical and virtual ELR values to the host and utilise the newly introduced CONFIG_NVHE_EL2_DEBUG so that we disable stage-2 protection for the host before returning to the EL1 panic handler. If the debug option is not enabled, display the address instead of the source file:line information. Cc: Andrew Scull <ascull@google.com> Cc: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210813130336.8139-1-will@kernel.org
2021-08-18KVM: arm64: Drop unused REQUIRES_VIRTAnshuman Khandual1-4/+0
This seems like a residue from the past. REQUIRES_VIRT is no more available . Hence it can just be dropped along with the related code section. Cc: Marc Zyngier <maz@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1628744994-16623-6-git-send-email-anshuman.khandual@arm.com
2021-08-18KVM: arm64: Drop check_kvm_target_cpu() based percpu probeAnshuman Khandual3-18/+4
kvm_target_cpu() never returns a negative error code, so check_kvm_target() would never have 'ret' filled with a negative error code. Hence the percpu probe via check_kvm_target_cpu() does not make sense as its never going to find an unsupported CPU, forcing kvm_arch_init() to exit early. Hence lets just drop this percpu probe (and also check_kvm_target_cpu()) altogether. While here, this also changes kvm_target_cpu() return type to a u32, making it explicit that an error code will not be returned from this function. Cc: Marc Zyngier <maz@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-kernel@vger.kernel.org Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1628744994-16623-5-git-send-email-anshuman.khandual@arm.com
2021-08-18KVM: arm64: Drop init_common_resources()Anshuman Khandual1-6/+1
Could do without this additional indirection via init_common_resources() by just calling kvm_set_ipa_limit() directly instead. Cc: Marc Zyngier <maz@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1628744994-16623-4-git-send-email-anshuman.khandual@arm.com
2021-08-18KVM: arm64: Use ARM64_MIN_PARANGE_BITS as the minimum supported IPAAnshuman Khandual1-1/+1
Drop hard coded value for the minimum supported IPA range bits (i.e 32). Instead use ARM64_MIN_PARANGE_BITS which improves the code readability. Cc: Marc Zyngier <maz@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1628744994-16623-3-git-send-email-anshuman.khandual@arm.com
2021-08-18arm64/mm: Add remaining ID_AA64MMFR0_PARANGE_ macrosAnshuman Khandual2-7/+14
Currently there are macros only for 48 and 52 bits parange value extracted from the ID_AA64MMFR0.PARANGE field. This change completes the enumeration and updates the helper id_aa64mmfr0_parange_to_phys_shift(). While here it also defines ARM64_MIN_PARANGE_BITS as the absolute minimum shift value PA range which could be supported on a given platform. Cc: Marc Zyngier <maz@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1628744994-16623-2-git-send-email-anshuman.khandual@arm.com
2021-08-11KVM: arm64: Return -EPERM from __pkvm_host_share_hyp()Quentin Perret1-1/+1
Fix the error code returned by __pkvm_host_share_hyp() when the host attempts to share with EL2 a page that has already been shared with another entity. Reported-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210811173630.2536721-1-qperret@google.com
2021-08-11KVM: arm64: Restrict IPA size to maximum 48 bits on 4K and 16K page sizeAnshuman Khandual1-0/+8
Even though ID_AA64MMFR0.PARANGE reports 52 bit PA size support, it cannot be enabled as guest IPA size on 4K or 16K page size configurations. Hence kvm_ipa_limit must be restricted to 48 bits. This change achieves required IPA capping. Before the commit c9b69a0cf0b4 ("KVM: arm64: Don't constrain maximum IPA size based on host configuration"), the problem here would have been just latent via PHYS_MASK_SHIFT (which earlier in turn capped kvm_ipa_limit), which remains capped at 48 bits on 4K and 16K configs. Cc: Marc Zyngier <maz@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-kernel@vger.kernel.org Fixes: c9b69a0cf0b4 ("KVM: arm64: Don't constrain maximum IPA size based on host configuration") Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1628680275-16578-1-git-send-email-anshuman.khandual@arm.com
2021-08-11arm64/mm: Define ID_AA64MMFR0_TGRAN_2_SHIFTAnshuman Khandual2-15/+5
Streamline the Stage-2 TGRAN value extraction from ID_AA64MMFR0 register by adding a page size agnostic ID_AA64MMFR0_TGRAN_2_SHIFT. This is similar to the existing Stage-1 TGRAN shift i.e ID_AA64MMFR0_TGRAN_SHIFT. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1628569782-30213-1-git-send-email-anshuman.khandual@arm.com
2021-08-11KVM: arm64: perf: Replace '0xf' instances with ID_AA64DFR0_PMUVER_IMP_DEFAnshuman Khandual2-4/+4
ID_AA64DFR0_PMUVER_IMP_DEF which indicate implementation defined PMU, never actually gets used although there are '0xf' instances scattered all around. Just do the macro replacement to improve readability. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Marc Zyngier <maz@kernel.org> Cc: linux-perf-users@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-11KVM: arm64: Make __pkvm_create_mappings staticQuentin Perret2-4/+2
The __pkvm_create_mappings() function is no longer used outside of nvhe/mm.c, make it static. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-22-qperret@google.com
2021-08-11KVM: arm64: Restrict EL2 stage-1 changes in protected modeQuentin Perret5-12/+118
The host kernel is currently able to change EL2 stage-1 mappings without restrictions thanks to the __pkvm_create_mappings() hypercall. But in a world where the host is no longer part of the TCB, this clearly poses a problem. To fix this, introduce a new hypercall to allow the host to share a physical memory page with the hypervisor, and remove the __pkvm_create_mappings() variant. The new hypercall implements ownership and permission checks before allowing the sharing operation, and it annotates the shared page in the hypervisor stage-1 and host stage-2 page-tables. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-21-qperret@google.com
2021-08-11KVM: arm64: Refactor protected nVHE stage-1 lockingQuentin Perret2-2/+17
Refactor the hypervisor stage-1 locking in nVHE protected mode to expose a new pkvm_create_mappings_locked() function. This will be used in later patches to allow walking and changing the hypervisor stage-1 without releasing the lock. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-20-qperret@google.com
2021-08-11KVM: arm64: Remove __pkvm_mark_hypQuentin Perret5-77/+1
Now that we mark memory owned by the hypervisor in the host stage-2 during __pkvm_init(), we no longer need to rely on the host to explicitly mark the hyp sections later on. Remove the __pkvm_mark_hyp() hypercall altogether. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-19-qperret@google.com
2021-08-11KVM: arm64: Mark host bss and rodata section as sharedQuentin Perret1-8/+74
As the hypervisor maps the host's .bss and .rodata sections in its stage-1, make sure to tag them as shared in hyp and host page-tables. But since the hypervisor relies on the presence of these mappings, we cannot let the host in complete control of the memory regions -- it must not unshare or donate them to another entity for example. To prevent this, let's transfer the ownership of those ranges to the hypervisor itself, and share the pages back with the host. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-18-qperret@google.com
2021-08-11KVM: arm64: Enable retrieving protections attributes of PTEsQuentin Perret2-0/+57
Introduce helper functions in the KVM stage-2 and stage-1 page-table manipulation library allowing to retrieve the enum kvm_pgtable_prot of a PTE. This will be useful to implement custom walkers outside of pgtable.c. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-17-qperret@google.com
2021-08-11KVM: arm64: Introduce addr_is_memory()Quentin Perret2-0/+8
Introduce a helper usable in nVHE protected mode to check whether a physical address is in a RAM region or not. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-16-qperret@google.com
2021-08-11KVM: arm64: Expose pkvm_hyp_idQuentin Perret2-1/+3
Allow references to the hypervisor's owner id from outside mem_protect.c. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-15-qperret@google.com
2021-08-11KVM: arm64: Expose host stage-2 manipulation helpersQuentin Perret2-1/+19
We will need to manipulate the host stage-2 page-table from outside mem_protect.c soon. Introduce two functions allowing this, and make them usable to users of mem_protect.h. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-14-qperret@google.com
2021-08-11KVM: arm64: Add helpers to tag shared pages in SW bitsQuentin Perret1-0/+26
We will soon start annotating shared pages in page-tables in nVHE protected mode. Define all the states in which a page can be (owned, shared and owned, shared and borrowed), and provide helpers allowing to convert this into SW bits annotations using the matching prot attributes. Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-13-qperret@google.com
2021-08-11KVM: arm64: Allow populating software bitsQuentin Perret2-1/+16
Introduce infrastructure allowing to manipulate software bits in stage-1 and stage-2 page-tables using additional entries in the kvm_pgtable_prot enum. This is heavily inspired by Marc's implementation of a similar feature in the NV patch series, but adapted to allow stage-1 changes as well: https://lore.kernel.org/kvmarm/20210510165920.1913477-56-maz@kernel.org/ Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-12-qperret@google.com
2021-08-11KVM: arm64: Enable forcing page-level stage-2 mappingsQuentin Perret3-35/+94
Much of the stage-2 manipulation logic relies on being able to destroy block mappings if e.g. installing a smaller mapping in the range. The rationale for this behaviour is that stage-2 mappings can always be re-created lazily. However, this gets more complicated when the stage-2 page-table is used to store metadata about the underlying pages. In such cases, destroying a block mapping may lead to losing part of the state, and confuse the user of those metadata (such as the hypervisor in nVHE protected mode). To avoid this, introduce a callback function in the pgtable struct which is called during all map operations to determine whether the mappings can use blocks, or should be forced to page granularity. This is used by the hypervisor when creating the host stage-2 to force page-level mappings when using non-default protection attributes. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-11-qperret@google.com
2021-08-11KVM: arm64: Tolerate re-creating hyp mappings to set software bitsQuentin Perret1-2/+16
The current hypervisor stage-1 mapping code doesn't allow changing an existing valid mapping. Relax this condition by allowing changes that only target software bits, as that will soon be needed to annotate shared pages. Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-10-qperret@google.com
2021-08-11KVM: arm64: Don't overwrite software bits with owner idQuentin Perret1-1/+1
We will soon start annotating page-tables with new flags to track shared pages and such, and we will do so in valid mappings using software bits in the PTEs, as provided by the architecture. However, it is possible that we will need to use those flags to annotate invalid mappings as well in the future, similar to what we do to track page ownership in the host stage-2. In order to facilitate the annotation of invalid mappings with such flags, it would be preferable to re-use the same bits as for valid mappings (bits [58-55]), but these are currently used for ownership encoding. Since we have plenty of bits left to use in invalid mappings, move the ownership bits further down the PTE to avoid the conflict. Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-9-qperret@google.com
2021-08-11KVM: arm64: Rename KVM_PTE_LEAF_ATTR_S2_IGNOREDQuentin Perret1-2/+2
The ignored bits for both stage-1 and stage-2 page and block descriptors are in [55:58], so rename KVM_PTE_LEAF_ATTR_S2_IGNORED to make it applicable to both. And while at it, since these bits are more commonly known as 'software' bits, rename accordingly. Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-8-qperret@google.com
2021-08-11KVM: arm64: Optimize host memory abortsQuentin Perret3-105/+44
The kvm_pgtable_stage2_find_range() function is used in the host memory abort path to try and look for the largest block mapping that can be used to map the faulting address. In order to do so, the function currently walks the stage-2 page-table and looks for existing incompatible mappings within the range of the largest possible block. If incompatible mappings are found, it tries the same procedure again, but using a smaller block range, and repeats until a matching range is found (potentially up to page granularity). While this approach has benefits (mostly in the fact that it proactively coalesces host stage-2 mappings), it can be slow if the ranges are fragmented, and it isn't optimized to deal with CPUs faulting on the same IPA as all of them will do all the work every time. To avoid these issues, remove kvm_pgtable_stage2_find_range(), and walk the page-table only once in the host_mem_abort() path to find the closest leaf to the input address. With this, use the corresponding range if it is invalid and not owned by another entity. If a valid leaf is found, return -EAGAIN similar to what is done in the kvm_pgtable_stage2_map() path to optimize concurrent faults. Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-7-qperret@google.com
2021-08-11KVM: arm64: Expose page-table helpersQuentin Perret2-39/+40
The KVM pgtable API exposes the kvm_pgtable_walk() function to allow the definition of walkers outside of pgtable.c. However, it is not easy to implement any of those walkers without some of the low-level helpers. Move some of them to the header file to allow re-use from other places. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-6-qperret@google.com
2021-08-11KVM: arm64: Provide the host_stage2_try() helper macroQuentin Perret1-18/+22
We currently unmap all MMIO mappings from the host stage-2 to recycle the pages whenever we run out. In order to make this pattern easy to re-use from other places, factor the logic out into a dedicated macro. While at it, apply the macro for the kvm_pgtable_stage2_set_owner() calls. They're currently only called early on and are guaranteed to succeed, but making them robust to the -ENOMEM case doesn't hurt and will avoid painful debugging sessions later on. Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-4-qperret@google.com
2021-08-11KVM: arm64: Introduce hyp_assert_lock_held()Quentin Perret2-0/+26
Introduce a poor man's lockdep implementation at EL2 which allows to BUG() whenever a hyp spinlock is not held when it should. Hide this feature behind a new Kconfig option that targets the EL2 object specifically, instead of piggy backing on the existing CONFIG_LOCKDEP. EL2 cannot WARN() cleanly to report locking issues, hence BUG() is the only option and it is not clear whether we want this widely enabled. This is most likely going to be useful for local testing until the EL2 WARN() situation has improved. Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-3-qperret@google.com
2021-08-11KVM: arm64: Add hyp_spin_is_locked() for basic locking assertions at EL2Will Deacon1-0/+8
Introduce hyp_spin_is_locked() so that functions can easily assert that a given lock is held (albeit possibly by another CPU!) without having to drag full lockdep support up to EL2. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-2-qperret@google.com
2021-08-04KVM: arm64: Unregister HYP sections from kmemleak in protected modeMarc Zyngier1-0/+7
Booting a KVM host in protected mode with kmemleak quickly results in a pretty bad crash, as kmemleak doesn't know that the HYP sections have been taken away. This is specially true for the BSS section, which is part of the kernel BSS section and registered at boot time by kmemleak itself. Unregister the HYP part of the BSS before making that section HYP-private. The rest of the HYP-specific data is obtained via the page allocator or lives in other sections, none of which is subjected to kmemleak. Fixes: 90134ac9cabb ("KVM: arm64: Protect the .hyp sections from the host") Reviewed-by: Quentin Perret <qperret@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org # 5.13 Link: https://lore.kernel.org/r/20210802123830.2195174-3-maz@kernel.org