summaryrefslogtreecommitdiff
path: root/arch/arm64/kvm/hyp
AgeCommit message (Collapse)AuthorFilesLines
2021-10-05KVM: arm64: Report corrupted refcount at EL2Quentin Perret1-0/+1
Some of the refcount manipulation helpers used at EL2 are instrumented to catch a corrupted state, but not all of them are treated equally. Let's make things more consistent by instrumenting hyp_page_ref_dec_and_test() as well. Acked-by: Will Deacon <will@kernel.org> Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211005090155.734578-6-qperret@google.com
2021-10-05KVM: arm64: Fix host stage-2 PGD refcountQuentin Perret3-1/+27
The KVM page-table library refcounts the pages of concatenated stage-2 PGDs individually. However, when running KVM in protected mode, the host's stage-2 PGD is currently managed by EL2 as a single high-order compound page, which can cause the refcount of the tail pages to reach 0 when they shouldn't, hence corrupting the page-table. Fix this by introducing a new hyp_split_page() helper in the EL2 page allocator (matching the kernel's split_page() function), and make use of it from host_s2_zalloc_pages_exact(). Fixes: 1025c8c0c6ac ("KVM: arm64: Wrap the host with a stage 2") Acked-by: Will Deacon <will@kernel.org> Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211005090155.734578-5-qperret@google.com
2021-09-20KVM: arm64: nvhe: Fix missing FORCE for hyp-reloc.S build ruleZenghui Yu1-1/+1
Add FORCE so that if_changed can detect the command line change. We'll otherwise see a compilation warning since commit e1f86d7b4b2a ("kbuild: warn if FORCE is missing for if_changed(_dep,_rule) and filechk"). arch/arm64/kvm/hyp/nvhe/Makefile:58: FORCE prerequisite is missing Cc: David Brazdil <dbrazdil@google.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210907052137.1059-1-yuzenghui@huawei.com
2021-09-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds17-230/+524
Pull KVM updates from Paolo Bonzini: "ARM: - Page ownership tracking between host EL1 and EL2 - Rely on userspace page tables to create large stage-2 mappings - Fix incompatibility between pKVM and kmemleak - Fix the PMU reset state, and improve the performance of the virtual PMU - Move over to the generic KVM entry code - Address PSCI reset issues w.r.t. save/restore - Preliminary rework for the upcoming pKVM fixed feature - A bunch of MM cleanups - a vGIC fix for timer spurious interrupts - Various cleanups s390: - enable interpretation of specification exceptions - fix a vcpu_idx vs vcpu_id mixup x86: - fast (lockless) page fault support for the new MMU - new MMU now the default - increased maximum allowed VCPU count - allow inhibit IRQs on KVM_RUN while debugging guests - let Hyper-V-enabled guests run with virtualized LAPIC as long as they do not enable the Hyper-V "AutoEOI" feature - fixes and optimizations for the toggling of AMD AVIC (virtualized LAPIC) - tuning for the case when two-dimensional paging (EPT/NPT) is disabled - bugfixes and cleanups, especially with respect to vCPU reset and choosing a paging mode based on CR0/CR4/EFER - support for 5-level page table on AMD processors Generic: - MMU notifier invalidation callbacks do not take mmu_lock unless necessary - improved caching of LRU kvm_memory_slot - support for histogram statistics - add statistics for halt polling and remote TLB flush requests" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (210 commits) KVM: Drop unused kvm_dirty_gfn_invalid() KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted KVM: MMU: mark role_regs and role accessors as maybe unused KVM: MIPS: Remove a "set but not used" variable x86/kvm: Don't enable IRQ when IRQ enabled in kvm_wait KVM: stats: Add VM stat for remote tlb flush requests KVM: Remove unnecessary export of kvm_{inc,dec}_notifier_count() KVM: x86/mmu: Move lpage_disallowed_link further "down" in kvm_mmu_page KVM: x86/mmu: Relocate kvm_mmu_page.tdp_mmu_page for better cache locality Revert "KVM: x86: mmu: Add guest physical address check in translate_gpa()" KVM: x86/mmu: Remove unused field mmio_cached in struct kvm_mmu_page kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710 kvm: x86: Increase MAX_VCPUS to 1024 kvm: x86: Set KVM_MAX_VCPU_ID to 4*KVM_MAX_VCPUS KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation KVM: x86/mmu: Don't freak out if pml5_root is NULL on 4-level host KVM: s390: index kvm->arch.idle_mask by vcpu_idx KVM: s390: Enable specification exception interpretation KVM: arm64: Trim guest debug exception handling KVM: SVM: Add 5-level page table support for SVM ...
2021-09-03memblock: make memblock_find_in_range method privateMike Rapoport1-6/+3
There are a lot of uses of memblock_find_in_range() along with memblock_reserve() from the times memblock allocation APIs did not exist. memblock_find_in_range() is the very core of memblock allocations, so any future changes to its internal behaviour would mandate updates of all the users outside memblock. Replace the calls to memblock_find_in_range() with an equivalent calls to memblock_phys_alloc() and memblock_phys_alloc_range() and make memblock_find_in_range() private method of memblock. This simplifies the callers, ensures that (unlikely) errors in memblock_reserve() are handled and improves maintainability of memblock_find_in_range(). Link: https://lkml.kernel.org/r/20210816122622.30279-1-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Kirill A. Shutemov <kirill.shtuemov@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ACPI] Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Nick Kossifidis <mick@ics.forth.gr> [riscv] Tested-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-20Merge branch kvm-arm64/pkvm-fixed-features-prologue into kvmarm-master/nextMarc Zyngier6-24/+13
* kvm-arm64/pkvm-fixed-features-prologue: : Rework a bunch of common infrastructure as a prologue : to Fuad Tabba's protected VM fixed feature series. KVM: arm64: Upgrade trace_kvm_arm_set_dreg32() to 64bit KVM: arm64: Add config register bit definitions KVM: arm64: Add feature register flag definitions KVM: arm64: Track value of cptr_el2 in struct kvm_vcpu_arch KVM: arm64: Keep mdcr_el2's value as set by __init_el2_debug KVM: arm64: Restore mdcr_el2 from vcpu KVM: arm64: Refactor sys_regs.h,c for nVHE reuse KVM: arm64: Fix names of config register fields KVM: arm64: MDCR_EL2 is a 64-bit register KVM: arm64: Remove trailing whitespace in comment KVM: arm64: placeholder to check if VM is protected Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20Merge branch kvm-arm64/mmu/vmid-cleanups into kvmarm-master/nextMarc Zyngier6-12/+14
* kvm-arm64/mmu/vmid-cleanups: : Cleanup the stage-2 configuration by providing a single helper, : and tidy up some of the ordering requirements for the VMID : allocator. KVM: arm64: Upgrade VMID accesses to {READ,WRITE}_ONCE KVM: arm64: Unify stage-2 programming behind __load_stage2() KVM: arm64: Move kern_hyp_va() usage in __load_guest_stage2() into the callers Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20KVM: arm64: Minor optimization of range_is_memoryDavid Brazdil1-5/+8
Currently range_is_memory finds the corresponding struct memblock_region for both the lower and upper bounds of the given address range with two rounds of binary search, and then checks that the two memblocks are the same. Simplify this by only doing binary search on the lower bound and then checking that the upper bound is in the same memblock. Signed-off-by: David Brazdil <dbrazdil@google.com> Reviewed-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210728153232.1018911-3-dbrazdil@google.com
2021-08-20Merge tag 'kvmarm-fixes-5.14-2' into kvm-arm64/mmu/el2-trackingMarc Zyngier1-1/+1
KVM/arm64 fixes for 5.14, take #2 - Plug race between enabling MTE and creating vcpus - Fix off-by-one bug when checking whether an address range is RAM Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-08-20KVM: arm64: Track value of cptr_el2 in struct kvm_vcpu_archFuad Tabba1-1/+1
Track the baseline guest value for cptr_el2 in struct kvm_vcpu_arch, similar to the other registers that control traps. Use this value when setting cptr_el2 for the guest. Currently this value is unchanged (CPTR_EL2_DEFAULT), but future patches will set trapping bits based on features supported for the guest. No functional change intended. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210817081134.2918285-9-tabba@google.com
2021-08-20KVM: arm64: Keep mdcr_el2's value as set by __init_el2_debugFuad Tabba2-8/+0
__init_el2_debug configures mdcr_el2 at initialization based on, among other things, available hardware support. Trap deactivation doesn't check that, so keep the initial value. No functional change intended. Signed-off-by: Fuad Tabba <tabba@google.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210817081134.2918285-8-tabba@google.com
2021-08-20KVM: arm64: Restore mdcr_el2 from vcpuFuad Tabba4-19/+16
On deactivating traps, restore the value of mdcr_el2 from the newly created and preserved host value vcpu context, rather than directly reading the hardware register. Up until and including this patch the two values are the same, i.e., the hardware register and the vcpu one. A future patch will be changing the value of mdcr_el2 on activating traps, and this ensures that its value will be restored. No functional change intended. Signed-off-by: Fuad Tabba <tabba@google.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210817081134.2918285-7-tabba@google.com
2021-08-20KVM: arm64: MDCR_EL2 is a 64-bit registerFuad Tabba2-2/+2
Fix the places in KVM that treat MDCR_EL2 as a 32-bit register. More recent features (e.g., FEAT_SPEv1p2) use bits above 31. No functional change intended. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210817081134.2918285-4-tabba@google.com
2021-08-20KVM: arm64: Upgrade VMID accesses to {READ,WRITE}_ONCEMarc Zyngier1-2/+2
Since TLB invalidation can run in parallel with VMID allocation, we need to be careful and avoid any sort of load/store tearing. Use {READ,WRITE}_ONCE consistently to avoid any surprise. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Jade Alglave <jade.alglave@arm.com> Cc: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20210806113109.2475-6-will@kernel.org
2021-08-20KVM: arm64: Unify stage-2 programming behind __load_stage2()Marc Zyngier6-10/+10
The protected mode relies on a separate helper to load the S2 context. Move over to the __load_guest_stage2() helper instead, and rename it to __load_stage2() to present a unified interface. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Jade Alglave <jade.alglave@arm.com> Cc: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210806113109.2475-5-will@kernel.org
2021-08-20KVM: arm64: Move kern_hyp_va() usage in __load_guest_stage2() into the callersMarc Zyngier4-4/+6
It is a bit awkward to use kern_hyp_va() in __load_guest_stage2(), specially as the helper is shared between VHE and nVHE. Instead, move the use of kern_hyp_va() in the nVHE code, and pass a pointer to the kvm->arch structure instead. Although this may look a bit awkward, it allows for some further simplification. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Jade Alglave <jade.alglave@arm.com> Cc: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210806113109.2475-4-will@kernel.org
2021-08-18KVM: arm64: Make hyp_panic() more robust when protected mode is enabledWill Deacon1-4/+17
When protected mode is enabled, the host is unable to access most parts of the EL2 hypervisor image, including 'hyp_physvirt_offset' and the contents of the hypervisor's '.rodata.str' section. Unfortunately, nvhe_hyp_panic_handler() tries to read from both of these locations when handling a BUG() triggered at EL2; the former for converting the ELR to a physical address and the latter for displaying the name of the source file where the BUG() occurred. Hack the EL2 panic asm to pass both physical and virtual ELR values to the host and utilise the newly introduced CONFIG_NVHE_EL2_DEBUG so that we disable stage-2 protection for the host before returning to the EL1 panic handler. If the debug option is not enabled, display the address instead of the source file:line information. Cc: Andrew Scull <ascull@google.com> Cc: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210813130336.8139-1-will@kernel.org
2021-08-11KVM: arm64: Return -EPERM from __pkvm_host_share_hyp()Quentin Perret1-1/+1
Fix the error code returned by __pkvm_host_share_hyp() when the host attempts to share with EL2 a page that has already been shared with another entity. Reported-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210811173630.2536721-1-qperret@google.com
2021-08-11KVM: arm64: Make __pkvm_create_mappings staticQuentin Perret2-4/+2
The __pkvm_create_mappings() function is no longer used outside of nvhe/mm.c, make it static. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-22-qperret@google.com
2021-08-11KVM: arm64: Restrict EL2 stage-1 changes in protected modeQuentin Perret3-7/+93
The host kernel is currently able to change EL2 stage-1 mappings without restrictions thanks to the __pkvm_create_mappings() hypercall. But in a world where the host is no longer part of the TCB, this clearly poses a problem. To fix this, introduce a new hypercall to allow the host to share a physical memory page with the hypervisor, and remove the __pkvm_create_mappings() variant. The new hypercall implements ownership and permission checks before allowing the sharing operation, and it annotates the shared page in the hypervisor stage-1 and host stage-2 page-tables. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-21-qperret@google.com
2021-08-11KVM: arm64: Refactor protected nVHE stage-1 lockingQuentin Perret2-2/+17
Refactor the hypervisor stage-1 locking in nVHE protected mode to expose a new pkvm_create_mappings_locked() function. This will be used in later patches to allow walking and changing the hypervisor stage-1 without releasing the lock. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-20-qperret@google.com
2021-08-11KVM: arm64: Remove __pkvm_mark_hypQuentin Perret3-29/+0
Now that we mark memory owned by the hypervisor in the host stage-2 during __pkvm_init(), we no longer need to rely on the host to explicitly mark the hyp sections later on. Remove the __pkvm_mark_hyp() hypercall altogether. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-19-qperret@google.com
2021-08-11KVM: arm64: Mark host bss and rodata section as sharedQuentin Perret1-8/+74
As the hypervisor maps the host's .bss and .rodata sections in its stage-1, make sure to tag them as shared in hyp and host page-tables. But since the hypervisor relies on the presence of these mappings, we cannot let the host in complete control of the memory regions -- it must not unshare or donate them to another entity for example. To prevent this, let's transfer the ownership of those ranges to the hypervisor itself, and share the pages back with the host. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-18-qperret@google.com
2021-08-11KVM: arm64: Enable retrieving protections attributes of PTEsQuentin Perret1-0/+37
Introduce helper functions in the KVM stage-2 and stage-1 page-table manipulation library allowing to retrieve the enum kvm_pgtable_prot of a PTE. This will be useful to implement custom walkers outside of pgtable.c. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-17-qperret@google.com
2021-08-11KVM: arm64: Introduce addr_is_memory()Quentin Perret2-0/+8
Introduce a helper usable in nVHE protected mode to check whether a physical address is in a RAM region or not. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-16-qperret@google.com
2021-08-11KVM: arm64: Expose pkvm_hyp_idQuentin Perret2-1/+3
Allow references to the hypervisor's owner id from outside mem_protect.c. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-15-qperret@google.com
2021-08-11KVM: arm64: Expose host stage-2 manipulation helpersQuentin Perret2-1/+19
We will need to manipulate the host stage-2 page-table from outside mem_protect.c soon. Introduce two functions allowing this, and make them usable to users of mem_protect.h. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-14-qperret@google.com
2021-08-11KVM: arm64: Add helpers to tag shared pages in SW bitsQuentin Perret1-0/+26
We will soon start annotating shared pages in page-tables in nVHE protected mode. Define all the states in which a page can be (owned, shared and owned, shared and borrowed), and provide helpers allowing to convert this into SW bits annotations using the matching prot attributes. Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-13-qperret@google.com
2021-08-11KVM: arm64: Allow populating software bitsQuentin Perret1-0/+5
Introduce infrastructure allowing to manipulate software bits in stage-1 and stage-2 page-tables using additional entries in the kvm_pgtable_prot enum. This is heavily inspired by Marc's implementation of a similar feature in the NV patch series, but adapted to allow stage-1 changes as well: https://lore.kernel.org/kvmarm/20210510165920.1913477-56-maz@kernel.org/ Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-12-qperret@google.com
2021-08-11KVM: arm64: Enable forcing page-level stage-2 mappingsQuentin Perret2-10/+53
Much of the stage-2 manipulation logic relies on being able to destroy block mappings if e.g. installing a smaller mapping in the range. The rationale for this behaviour is that stage-2 mappings can always be re-created lazily. However, this gets more complicated when the stage-2 page-table is used to store metadata about the underlying pages. In such cases, destroying a block mapping may lead to losing part of the state, and confuse the user of those metadata (such as the hypervisor in nVHE protected mode). To avoid this, introduce a callback function in the pgtable struct which is called during all map operations to determine whether the mappings can use blocks, or should be forced to page granularity. This is used by the hypervisor when creating the host stage-2 to force page-level mappings when using non-default protection attributes. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-11-qperret@google.com
2021-08-11KVM: arm64: Tolerate re-creating hyp mappings to set software bitsQuentin Perret1-2/+16
The current hypervisor stage-1 mapping code doesn't allow changing an existing valid mapping. Relax this condition by allowing changes that only target software bits, as that will soon be needed to annotate shared pages. Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-10-qperret@google.com
2021-08-11KVM: arm64: Don't overwrite software bits with owner idQuentin Perret1-1/+1
We will soon start annotating page-tables with new flags to track shared pages and such, and we will do so in valid mappings using software bits in the PTEs, as provided by the architecture. However, it is possible that we will need to use those flags to annotate invalid mappings as well in the future, similar to what we do to track page ownership in the host stage-2. In order to facilitate the annotation of invalid mappings with such flags, it would be preferable to re-use the same bits as for valid mappings (bits [58-55]), but these are currently used for ownership encoding. Since we have plenty of bits left to use in invalid mappings, move the ownership bits further down the PTE to avoid the conflict. Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-9-qperret@google.com
2021-08-11KVM: arm64: Rename KVM_PTE_LEAF_ATTR_S2_IGNOREDQuentin Perret1-2/+2
The ignored bits for both stage-1 and stage-2 page and block descriptors are in [55:58], so rename KVM_PTE_LEAF_ATTR_S2_IGNORED to make it applicable to both. And while at it, since these bits are more commonly known as 'software' bits, rename accordingly. Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-8-qperret@google.com
2021-08-11KVM: arm64: Optimize host memory abortsQuentin Perret2-75/+44
The kvm_pgtable_stage2_find_range() function is used in the host memory abort path to try and look for the largest block mapping that can be used to map the faulting address. In order to do so, the function currently walks the stage-2 page-table and looks for existing incompatible mappings within the range of the largest possible block. If incompatible mappings are found, it tries the same procedure again, but using a smaller block range, and repeats until a matching range is found (potentially up to page granularity). While this approach has benefits (mostly in the fact that it proactively coalesces host stage-2 mappings), it can be slow if the ranges are fragmented, and it isn't optimized to deal with CPUs faulting on the same IPA as all of them will do all the work every time. To avoid these issues, remove kvm_pgtable_stage2_find_range(), and walk the page-table only once in the host_mem_abort() path to find the closest leaf to the input address. With this, use the corresponding range if it is invalid and not owned by another entity. If a valid leaf is found, return -EAGAIN similar to what is done in the kvm_pgtable_stage2_map() path to optimize concurrent faults. Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-7-qperret@google.com
2021-08-11KVM: arm64: Expose page-table helpersQuentin Perret1-39/+0
The KVM pgtable API exposes the kvm_pgtable_walk() function to allow the definition of walkers outside of pgtable.c. However, it is not easy to implement any of those walkers without some of the low-level helpers. Move some of them to the header file to allow re-use from other places. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-6-qperret@google.com
2021-08-11KVM: arm64: Provide the host_stage2_try() helper macroQuentin Perret1-18/+22
We currently unmap all MMIO mappings from the host stage-2 to recycle the pages whenever we run out. In order to make this pattern easy to re-use from other places, factor the logic out into a dedicated macro. While at it, apply the macro for the kvm_pgtable_stage2_set_owner() calls. They're currently only called early on and are guaranteed to succeed, but making them robust to the -ENOMEM case doesn't hurt and will avoid painful debugging sessions later on. Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-4-qperret@google.com
2021-08-11KVM: arm64: Introduce hyp_assert_lock_held()Quentin Perret1-0/+17
Introduce a poor man's lockdep implementation at EL2 which allows to BUG() whenever a hyp spinlock is not held when it should. Hide this feature behind a new Kconfig option that targets the EL2 object specifically, instead of piggy backing on the existing CONFIG_LOCKDEP. EL2 cannot WARN() cleanly to report locking issues, hence BUG() is the only option and it is not clear whether we want this widely enabled. This is most likely going to be useful for local testing until the EL2 WARN() situation has improved. Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-3-qperret@google.com
2021-08-11KVM: arm64: Add hyp_spin_is_locked() for basic locking assertions at EL2Will Deacon1-0/+8
Introduce hyp_spin_is_locked() so that functions can easily assert that a given lock is held (albeit possibly by another CPU!) without having to drag full lockdep support up to EL2. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-2-qperret@google.com
2021-08-02KVM: arm64: Introduce helper to retrieve a PTE and its levelMarc Zyngier1-0/+39
It is becoming a common need to fetch the PTE for a given address together with its level. Add such a helper. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Quentin Perret <qperret@google.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20210726153552.1535838-2-maz@kernel.org
2021-07-29KVM: arm64: Fix off-by-one in range_is_memoryDavid Brazdil1-1/+1
Hyp checks whether an address range only covers RAM by checking the start/endpoints against a list of memblock_region structs. However, the endpoint here is exclusive but internally is treated as inclusive. Fix the off-by-one error that caused valid address ranges to be rejected. Cc: Quentin Perret <qperret@google.com> Fixes: 90134ac9cabb6 ("KVM: arm64: Protect the .hyp sections from the host") Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210728153232.1018911-2-dbrazdil@google.com
2021-06-22Merge branch kvm-arm64/mmu/mte into kvmarm-master/nextMarc Zyngier3-1/+30
KVM/arm64 support for MTE, courtesy of Steven Price. It allows the guest to use memory tagging, and offers a new userspace API to save/restore the tags. * kvm-arm64/mmu/mte: KVM: arm64: Document MTE capability and ioctl KVM: arm64: Add ioctl to fetch/store tags in a guest KVM: arm64: Expose KVM_ARM_CAP_MTE KVM: arm64: Save/restore MTE registers KVM: arm64: Introduce MTE VM feature arm64: mte: Sync tags for pages where PTE is untagged Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-22KVM: arm64: Save/restore MTE registersSteven Price2-0/+28
Define the new system registers that MTE introduces and context switch them. The MTE feature is still hidden from the ID register as it isn't supported in a VM yet. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210621111716.37157-4-steven.price@arm.com
2021-06-22KVM: arm64: Introduce MTE VM featureSteven Price1-1/+2
Add a new VM feature 'KVM_ARM_CAP_MTE' which enables memory tagging for a VM. This will expose the feature to the guest and automatically tag memory pages touched by the VM as PG_mte_tagged (and clear the tag storage) to ensure that the guest cannot see stale tags, and so that the tags are correctly saved/restored across swap. Actually exposing the new capability to user space happens in a later patch. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> [maz: move VM_SHARED sampling into the critical section] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210621111716.37157-3-steven.price@arm.com
2021-06-18Merge branch arm64/for-next/caches into kvmarm-master/nextMarc Zyngier4-7/+15
arm64 cache management function cleanup from Fuad Tabba, shared with the arm64 tree. * arm64/for-next/caches: arm64: Rename arm64-internal cache maintenance functions arm64: Fix cache maintenance function comments arm64: sync_icache_aliases to take end parameter instead of size arm64: __clean_dcache_area_pou to take end parameter instead of size arm64: __clean_dcache_area_pop to take end parameter instead of size arm64: __clean_dcache_area_poc to take end parameter instead of size arm64: __flush_dcache_area to take end parameter instead of size arm64: dcache_by_line_op to take end parameter instead of size arm64: __inval_dcache_area to take end parameter instead of size arm64: Fix comments to refer to correct function __flush_icache_range arm64: Move documentation of dcache_by_line_op arm64: assembler: remove user_alt arm64: Downgrade flush_icache_range to invalidate arm64: Do not enable uaccess for invalidate_icache_range arm64: Do not enable uaccess for flush_icache_range arm64: Apply errata to swsusp_arch_suspend_exit arm64: assembler: add conditional cache fixups arm64: assembler: replace `kaddr` with `addr` Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-18Merge branch kvm-arm64/mmu/stage2-cmos into kvmarm-master/nextMarc Zyngier1-11/+37
Cache maintenance updates from Yanan Wang, moving the CMOs down into the page-table code. This ensures that we only issue them when actually performing a mapping rather than upfront. * kvm-arm64/mmu/stage2-cmos: KVM: arm64: Move guest CMOs to the fault handlers KVM: arm64: Tweak parameters of guest cache maintenance functions KVM: arm64: Introduce mm_ops member for structure stage2_attr_data KVM: arm64: Introduce two cache maintenance callbacks
2021-06-18KVM: arm64: Move guest CMOs to the fault handlersYanan Wang1-7/+31
We currently uniformly perform CMOs of D-cache and I-cache in function user_mem_abort before calling the fault handlers. If we get concurrent guest faults(e.g. translation faults, permission faults) or some really unnecessary guest faults caused by BBM, CMOs for the first vcpu are necessary while the others later are not. By moving CMOs to the fault handlers, we can easily identify conditions where they are really needed and avoid the unnecessary ones. As it's a time consuming process to perform CMOs especially when flushing a block range, so this solution reduces much load of kvm and improve efficiency of the stage-2 page table code. We can imagine two specific scenarios which will gain much benefit: 1) In a normal VM startup, this solution will improve the efficiency of handling guest page faults incurred by vCPUs, when initially populating stage-2 page tables. 2) After live migration, the heavy workload will be resumed on the destination VM, however all the stage-2 page tables need to be rebuilt at the moment. So this solution will ease the performance drop during resuming stage. Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Yanan Wang <wangyanan55@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210617105824.31752-5-wangyanan55@huawei.com
2021-06-18KVM: arm64: Introduce mm_ops member for structure stage2_attr_dataYanan Wang1-4/+6
Also add a mm_ops member for structure stage2_attr_data, since we will move I-cache maintenance for guest stage-2 to the permission path and as a result will need mm_ops for some callbacks. Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Yanan Wang <wangyanan55@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210617105824.31752-3-wangyanan55@huawei.com
2021-06-11Merge branch kvm-arm64/mmu/reduce-vmemmap-overhead into kvmarm-master/nextMarc Zyngier8-127/+145
Host stage-2 optimisations from Quentin Perret * kvm-arm64/mmu/reduce-vmemmap-overhead: KVM: arm64: Use less bits for hyp_page refcount KVM: arm64: Use less bits for hyp_page order KVM: arm64: Remove hyp_pool pointer from struct hyp_page KVM: arm64: Unify MMIO and mem host stage-2 pools KVM: arm64: Remove list_head from hyp_page KVM: arm64: Use refcount at hyp to check page availability KVM: arm64: Move hyp_pool locking out of refcount helpers
2021-06-11KVM: arm64: Use less bits for hyp_page refcountQuentin Perret2-1/+2
The hyp_page refcount is currently encoded on 4 bytes even though we never need to count that many objects in a page. Make it 2 bytes to save some space in the vmemmap. As overflows are more likely to happen as well, make sure to catch those with a BUG in the increment function. Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210608114518.748712-8-qperret@google.com
2021-06-11KVM: arm64: Use less bits for hyp_page orderQuentin Perret3-10/+10
The hyp_page order is currently encoded on 4 bytes even though it is guaranteed to be smaller than this. Make it 2 bytes to reduce the hyp vmemmap overhead. Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210608114518.748712-7-qperret@google.com