summaryrefslogtreecommitdiff
path: root/arch/arm64/kvm/hyp
AgeCommit message (Collapse)AuthorFilesLines
2023-05-11Merge branch kvm-arm64/pgtable-fixes-6.4 into kvmarm-master/fixesMarc Zyngier1-9/+32
* kvm-arm64/pgtable-fixes-6.4: : . : Fixes for concurrent S2 mapping race from Oliver: : : "So it appears that there is a race between two parallel stage-2 map : walkers that could lead to mapping the incorrect PA for a given IPA, as : the IPA -> PA relationship picks up an unintended offset. This series : eliminates the problem by using the current IPA of the walk as the : source-of-truth regarding where we are in a map operation." : . KVM: arm64: Constify start/end/phys fields of the pgtable walker data KVM: arm64: Infer PA offset from VA in hyp map walker KVM: arm64: Infer the PA offset from IPA in stage-2 map walker Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-05-11Merge branch kvm-arm64/misc-6.4 into kvmarm-master/fixesMarc Zyngier1-2/+10
* kvm-arm64/misc-6.4: : . : Minor changes for 6.4: : : - Make better use of the bitmap API (bitmap_zero, bitmap_zalloc...) : : - FP/SVE/SME documentation update, in the hope that this field : becomes clearer... : : - Add workaround for the usual Apple SEIS brokenness : : - Random comment fixes : . KVM: arm64: vgic: Add Apple M2 PRO/MAX cpus to the list of broken SEIS implementations KVM: arm64: Clarify host SME state management KVM: arm64: Restructure check for SVE support in FP trap handler KVM: arm64: Document check for TIF_FOREIGN_FPSTATE KVM: arm64: Fix repeated words in comments KVM: arm64: Use the bitmap API to allocate bitmaps KVM: arm64: Slightly optimize flush_context() Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-05-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds8-21/+134
Pull kvm updates from Paolo Bonzini: "s390: - More phys_to_virt conversions - Improvement of AP management for VSIE (nested virtualization) ARM64: - Numerous fixes for the pathological lock inversion issue that plagued KVM/arm64 since... forever. - New framework allowing SMCCC-compliant hypercalls to be forwarded to userspace, hopefully paving the way for some more features being moved to VMMs rather than be implemented in the kernel. - Large rework of the timer code to allow a VM-wide offset to be applied to both virtual and physical counters as well as a per-timer, per-vcpu offset that complements the global one. This last part allows the NV timer code to be implemented on top. - A small set of fixes to make sure that we don't change anything affecting the EL1&0 translation regime just after having having taken an exception to EL2 until we have executed a DSB. This ensures that speculative walks started in EL1&0 have completed. - The usual selftest fixes and improvements. x86: - Optimize CR0.WP toggling by avoiding an MMU reload when TDP is enabled, and by giving the guest control of CR0.WP when EPT is enabled on VMX (VMX-only because SVM doesn't support per-bit controls) - Add CR0/CR4 helpers to query single bits, and clean up related code where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long" return as a bool - Move AMD_PSFD to cpufeatures.h and purge KVM's definition - Avoid unnecessary writes+flushes when the guest is only adding new PTEs - Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s optimizations when emulating invalidations - Clean up the range-based flushing APIs - Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a single A/D bit using a LOCK AND instead of XCHG, and skip all of the "handle changed SPTE" overhead associated with writing the entire entry - Track the number of "tail" entries in a pte_list_desc to avoid having to walk (potentially) all descriptors during insertion and deletion, which gets quite expensive if the guest is spamming fork() - Disallow virtualizing legacy LBRs if architectural LBRs are available, the two are mutually exclusive in hardware - Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES) after KVM_RUN, similar to CPUID features - Overhaul the vmx_pmu_caps selftest to better validate PERF_CAPABILITIES - Apply PMU filters to emulated events and add test coverage to the pmu_event_filter selftest - AMD SVM: - Add support for virtual NMIs - Fixes for edge cases related to virtual interrupts - Intel AMX: - Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is not being reported due to userspace not opting in via prctl() - Fix a bug in emulation of ENCLS in compatibility mode - Allow emulation of NOP and PAUSE for L2 - AMX selftests improvements - Misc cleanups MIPS: - Constify MIPS's internal callbacks (a leftover from the hardware enabling rework that landed in 6.3) Generic: - Drop unnecessary casts from "void *" throughout kvm_main.c - Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the struct size by 8 bytes on 64-bit kernels by utilizing a padding hole Documentation: - Fix goof introduced by the conversion to rST" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (211 commits) KVM: s390: pci: fix virtual-physical confusion on module unload/load KVM: s390: vsie: clarifications on setting the APCB KVM: s390: interrupt: fix virtual-physical confusion for next alert GISA KVM: arm64: Have kvm_psci_vcpu_on() use WRITE_ONCE() to update mp_state KVM: arm64: Acquire mp_state_lock in kvm_arch_vcpu_ioctl_vcpu_init() KVM: selftests: Test the PMU event "Instructions retired" KVM: selftests: Copy full counter values from guest in PMU event filter test KVM: selftests: Use error codes to signal errors in PMU event filter test KVM: selftests: Print detailed info in PMU event filter asserts KVM: selftests: Add helpers for PMC asserts in PMU event filter test KVM: selftests: Add a common helper for the PMU event filter guest code KVM: selftests: Fix spelling mistake "perrmited" -> "permitted" KVM: arm64: vhe: Drop extra isb() on guest exit KVM: arm64: vhe: Synchronise with page table walker on MMU update KVM: arm64: pkvm: Document the side effects of kvm_flush_dcache_to_poc() KVM: arm64: nvhe: Synchronise with page table walker on TLBI KVM: arm64: Handle 32bit CNTPCTSS traps KVM: arm64: nvhe: Synchronise with page table walker on vcpu run KVM: arm64: vgic: Don't acquire its_lock before config_lock KVM: selftests: Add test to verify KVM's supported XCR0 ...
2023-04-28Merge tag 'mm-stable-2023-04-27-15-30' of ↵Linus Torvalds2-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page() - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. * tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm,unmap: avoid flushing TLB in batch if PTE is inaccessible shmem: restrict noswap option to initial user namespace mm/khugepaged: fix conflicting mods to collapse_file() sparse: remove unnecessary 0 values from rc mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area() hugetlb: pte_alloc_huge() to replace huge pte_alloc_map() maple_tree: fix allocation in mas_sparse_area() mm: do not increment pgfault stats when page fault handler retries zsmalloc: allow only one active pool compaction context selftests/mm: add new selftests for KSM mm: add new KSM process and sysfs knobs mm: add new api to enable ksm per process mm: shrinkers: fix debugfs file permissions mm: don't check VMA write permissions if the PTE/PMD indicates write permissions migrate_pages_batch: fix statistics for longterm pin retry userfaultfd: use helper function range_in_vma() lib/show_mem.c: use for_each_populated_zone() simplify code mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list() fs/buffer: convert create_page_buffers to folio_create_buffers fs/buffer: add folio_create_empty_buffers helper ...
2023-04-21KVM: arm64: Restructure check for SVE support in FP trap handlerMark Brown1-2/+10
We share the same handler for general floating point and SVE traps with a check to make sure we don't handle any SVE traps if the system doesn't have SVE support. Since we will be adding SME support and wishing to handle that along with other FP related traps rewrite the check to be more scalable and a bit clearer too, ensuring we don't misidentify SME traps as SVE ones. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221214-kvm-arm64-sme-context-switch-v2-2-57ba0082e9ff@kernel.org
2023-04-21KVM: arm64: Constify start/end/phys fields of the pgtable walker dataMarc Zyngier1-4/+4
As we are revamping the way the pgtable walker evaluates some of the data, make it clear that we rely on somew of the fields to be constant across the lifetime of a walk. For this, flag the start, end and phys fields of the walk data as 'const', which will generate an error if we were to accidentally update these fields again. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-04-21KVM: arm64: Infer PA offset from VA in hyp map walkerOliver Upton1-2/+1
Similar to the recently fixed stage-2 walker, the hyp map walker increments the PA and VA of a walk separately. Unlike stage-2, there is no bug here as the map walker has exclusive access to the stage-1 page tables. Nonetheless, in the interest of continuity throughout the page table code, tweak the hyp map walker to avoid incrementing the PA and instead use the VA as the authoritative source of how far along a table walk has gotten. Calculate the PA to use for a leaf PTE by adding the offset of the VA from the start of the walk to the starting PA. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230421071606.1603916-3-oliver.upton@linux.dev
2023-04-21KVM: arm64: Infer the PA offset from IPA in stage-2 map walkerOliver Upton1-4/+28
Until now, the page table walker counted increments to the PA and IPA of a walk in two separate places. While the PA is incremented as soon as a leaf PTE is installed in stage2_map_walker_try_leaf(), the IPA is actually bumped in the generic table walker context. Critically, __kvm_pgtable_visit() rereads the PTE after the LEAF callback returns to work out if a table or leaf was installed, and only bumps the IPA for a leaf PTE. This arrangement worked fine when we handled faults behind the write lock, as the walker had exclusive access to the stage-2 page tables. However, commit 1577cb5823ce ("KVM: arm64: Handle stage-2 faults in parallel") started handling all stage-2 faults behind the read lock, opening up a race where a walker could increment the PA but not the IPA of a walk. Nothing good ensues, as the walker starts mapping with the incorrect IPA -> PA relationship. For example, assume that two vCPUs took a data abort on the same IPA. One observes that dirty logging is disabled, and the other observed that it is enabled: vCPU attempting PMD mapping vCPU attempting PTE mapping ====================================== ===================================== /* install PMD */ stage2_make_pte(ctx, leaf); data->phys += granule; /* replace PMD with a table */ stage2_try_break_pte(ctx, data->mmu); stage2_make_pte(ctx, table); /* table is observed */ ctx.old = READ_ONCE(*ptep); table = kvm_pte_table(ctx.old, level); /* * map walk continues w/o incrementing * IPA. */ __kvm_pgtable_walk(..., level + 1); Bring an end to the whole mess by using the IPA as the single source of truth for how far along a walk has gotten. Work out the correct PA to map by calculating the IPA offset from the beginning of the walk and add that to the starting physical address. Cc: stable@vger.kernel.org Fixes: 1577cb5823ce ("KVM: arm64: Handle stage-2 faults in parallel") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230421071606.1603916-2-oliver.upton@linux.dev
2023-04-21Merge branch kvm-arm64/spec-ptw into kvmarm-master/nextMarc Zyngier6-15/+69
* kvm-arm64/spec-ptw: : . : On taking an exception from EL1&0 to EL2(&0), the page table walker is : allowed to carry on with speculative walks started from EL1&0 while : running at EL2 (see R_LFHQG). Given that the PTW may be actively using : the EL1&0 system registers, the only safe way to deal with it is to : issue a DSB before changing any of it. : : We already did the right thing for SPE and TRBE, but ignored the PTW : for unknown reasons (probably because the architecture wasn't crystal : clear at the time). : : This requires a bit of surgery in the nvhe code, though most of these : patches are comments so that my future self can understand the purpose : of these barriers. The VHE code is largely unaffected, thanks to the : DSB in the context switch. : . KVM: arm64: vhe: Drop extra isb() on guest exit KVM: arm64: vhe: Synchronise with page table walker on MMU update KVM: arm64: pkvm: Document the side effects of kvm_flush_dcache_to_poc() KVM: arm64: nvhe: Synchronise with page table walker on TLBI KVM: arm64: nvhe: Synchronise with page table walker on vcpu run Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-04-14KVM: arm64: vhe: Drop extra isb() on guest exitMarc Zyngier1-4/+3
__kvm_vcpu_run_vhe() end on VHE with an isb(). However, this function is only reachable via kvm_call_hyp_ret(), which already contains an isb() in order to mimick the behaviour of nVHE and provide a context synchronisation event. We thus have two isb()s back to back, which is one too many. Drop the first one and solely rely on the one in the helper. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
2023-04-14KVM: arm64: vhe: Synchronise with page table walker on MMU updateMarc Zyngier1-0/+12
Contrary to nVHE, VHE is a lot easier when it comes to dealing with speculative page table walks started at EL1. As we only change EL1&0 translation regime when context-switching, we already benefit from the effect of the DSB that sits in the context switch code. We only need to take care of it in the NV case, where we can flip between between two EL1 contexts (one of them being the virtual EL2) without a context switch. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
2023-04-14KVM: arm64: pkvm: Document the side effects of kvm_flush_dcache_to_poc()Marc Zyngier1-0/+7
We rely on the presence of a DSB at the end of kvm_flush_dcache_to_poc() that, on top of ensuring completion of the cache clean, also covers the speculative page table walk started from EL1. Document this dependency. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
2023-04-14KVM: arm64: nvhe: Synchronise with page table walker on TLBIMarc Zyngier1-9/+29
A TLBI from EL2 impacting EL1 involves messing with the EL1&0 translation regime, and the page table walker may still be performing speculative walks. Piggyback on the existing DSBs to always have a DSB ISH that will synchronise all load/store operations that the PTW may still have. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-04-13KVM: arm64: nvhe: Synchronise with page table walker on vcpu runMarc Zyngier2-2/+18
When taking an exception between the EL1&0 translation regime and the EL2 translation regime, the page table walker is allowed to complete the walks started from EL0 or EL1 while running at EL2. It means that altering the system registers that define the EL1&0 translation regime is fraught with danger *unless* we wait for the completion of such walk with a DSB (R_LFHQG and subsequent statements in the ARM ARM). We already did the right thing for other external agents (SPE, TRBE), but not the PTW. Rework the existing SPE/TRBE synchronisation to include the PTW, and add the missing DSB on guest exit. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
2023-04-06mm, treewide: redefine MAX_ORDER sanelyKirill A. Shutemov2-6/+6
MAX_ORDER currently defined as number of orders page allocator supports: user can ask buddy allocator for page order between 0 and MAX_ORDER-1. This definition is counter-intuitive and lead to number of bugs all over the kernel. Change the definition of MAX_ORDER to be inclusive: the range of orders user can ask from buddy allocator is 0..MAX_ORDER now. [kirill@shutemov.name: fix min() warning] Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box [akpm@linux-foundation.org: fix another min_t warning] [kirill@shutemov.name: fixups per Zi Yan] Link: https://lkml.kernel.org/r/20230316232144.b7ic4cif4kjiabws@box.shutemov.name [akpm@linux-foundation.org: fix underlining in docs] Link: https://lore.kernel.org/oe-kbuild-all/202303191025.VRCTk6mP-lkp@intel.com/ Link: https://lkml.kernel.org/r/20230315113133.11326-11-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-04KVM: arm64: Advertise ID_AA64PFR0_EL1.CSV2/3 to protected VMsFuad Tabba2-8/+4
The existing pKVM code attempts to advertise CSV2/3 using values initialized to 0, but never set. To advertise CSV2/3 to protected guests, pass the CSV2/3 values to hyp when initializing hyp's view of guests' ID_AA64PFR0_EL1. Similar to non-protected KVM, these are system-wide, rather than per cpu, for simplicity. Fixes: 6c30bfb18d0b ("KVM: arm64: Add handlers for protected VM System Registers") Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20230404152321.413064-1-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-03-30KVM: arm64: nv: timers: Support hyp timer emulationMarc Zyngier1-0/+15
Emulating EL2 also means emulating the EL2 timers. To do so, we expand our timer framework to deal with at most 4 timers. At any given time, two timers are using the HW timers, and the two others are purely emulated. The role of deciding which is which at any given time is left to a mapping function which is called every time we need to make such a decision. Reviewed-by: Colton Lewis <coltonlewis@google.com> Co-developed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230330174800.2677007-18-maz@kernel.org
2023-03-30KVM: arm64: nv: timers: Add a per-timer, per-vcpu offsetMarc Zyngier1-0/+2
Being able to set a global offset isn't enough. With NV, we also need to a per-vcpu, per-timer offset (for example, CNTVCT_EL0 being offset by CNTVOFF_EL2). Use a similar method as the VM-wide offset to have a timer point to the shadow register that contains the offset value. Reviewed-by: Colton Lewis <coltonlewis@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230330174800.2677007-17-maz@kernel.org
2023-03-30KVM: arm64: timers: Fast-track CNTPCT_EL0 trap handlingMarc Zyngier1-0/+36
Now that it is likely that CNTPCT_EL0 accesses will trap, fast-track the emulation of the counter read which doesn't need more that a simple offsetting. One day, we'll have CNTPOFF everywhere. One day. Suggested-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230330174800.2677007-14-maz@kernel.org
2023-03-30KVM: arm64: timers: Allow physical offset without CNTPOFF_EL2Marc Zyngier1-6/+12
CNTPOFF_EL2 is awesome, but it is mostly vapourware, and no publicly available implementation has it. So for the common mortals, let's implement the emulated version of this thing. It means trapping accesses to the physical counter and timer, and emulate some of it as necessary. As for CNTPOFF_EL2, nobody sets the offset yet. Reviewed-by: Colton Lewis <coltonlewis@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230330174800.2677007-6-maz@kernel.org
2023-02-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds6-22/+118
Pull kvm updates from Paolo Bonzini: "ARM: - Provide a virtual cache topology to the guest to avoid inconsistencies with migration on heterogenous systems. Non secure software has no practical need to traverse the caches by set/way in the first place - Add support for taking stage-2 access faults in parallel. This was an accidental omission in the original parallel faults implementation, but should provide a marginal improvement to machines w/o FEAT_HAFDBS (such as hardware from the fruit company) - A preamble to adding support for nested virtualization to KVM, including vEL2 register state, rudimentary nested exception handling and masking unsupported features for nested guests - Fixes to the PSCI relay that avoid an unexpected host SVE trap when resuming a CPU when running pKVM - VGIC maintenance interrupt support for the AIC - Improvements to the arch timer emulation, primarily aimed at reducing the trap overhead of running nested - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the interest of CI systems - Avoid VM-wide stop-the-world operations when a vCPU accesses its own redistributor - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions in the host - Aesthetic and comment/kerneldoc fixes - Drop the vestiges of the old Columbia mailing list and add [Oliver] as co-maintainer RISC-V: - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE - Correctly place the guest in S-mode after redirecting a trap to the guest - Redirect illegal instruction traps to guest - SBI PMU support for guest s390: - Sort out confusion between virtual and physical addresses, which currently are the same on s390 - A new ioctl that performs cmpxchg on guest memory - A few fixes x86: - Change tdp_mmu to a read-only parameter - Separate TDP and shadow MMU page fault paths - Enable Hyper-V invariant TSC control - Fix a variety of APICv and AVIC bugs, some of them real-world, some of them affecting architecurally legal but unlikely to happen in practice - Mark APIC timer as expired if its in one-shot mode and the count underflows while the vCPU task was being migrated - Advertise support for Intel's new fast REP string features - Fix a double-shootdown issue in the emergency reboot code - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give SVM similar treatment to VMX - Update Xen's TSC info CPUID sub-leaves as appropriate - Add support for Hyper-V's extended hypercalls, where "support" at this point is just forwarding the hypercalls to userspace - Clean up the kvm->lock vs. kvm->srcu sequences when updating the PMU and MSR filters - One-off fixes and cleanups - Fix and cleanup the range-based TLB flushing code, used when KVM is running on Hyper-V - Add support for filtering PMU events using a mask. If userspace wants to restrict heavily what events the guest can use, it can now do so without needing an absurd number of filter entries - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU support is disabled - Add PEBS support for Intel Sapphire Rapids - Fix a mostly benign overflow bug in SEV's send|receive_update_data() - Move several SVM-specific flags into vcpu_svm x86 Intel: - Handle NMI VM-Exits before leaving the noinstr region - A few trivial cleanups in the VM-Enter flows - Stop enabling VMFUNC for L1 purely to document that KVM doesn't support EPTP switching (or any other VM function) for L1 - Fix a crash when using eVMCS's enlighted MSR bitmaps Generic: - Clean up the hardware enable and initialization flow, which was scattered around multiple arch-specific hooks. Instead, just let the arch code call into generic code. Both x86 and ARM should benefit from not having to fight common KVM code's notion of how to do initialization - Account allocations in generic kvm_arch_alloc_vm() - Fix a memory leak if coalesced MMIO unregistration fails selftests: - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to emit the correct hypercall instruction instead of relying on KVM to patch in VMMCALL - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (325 commits) KVM: SVM: hyper-v: placate modpost section mismatch error KVM: x86/mmu: Make tdp_mmu_allowed static KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes KVM: arm64: nv: Filter out unsupported features from ID regs KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2 KVM: arm64: nv: Allow a sysreg to be hidden from userspace only KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2 KVM: arm64: nv: Handle SMCs taken from virtual EL2 KVM: arm64: nv: Handle trapped ERET from virtual EL2 KVM: arm64: nv: Inject HVC exceptions to the virtual EL2 KVM: arm64: nv: Support virtual EL2 exceptions KVM: arm64: nv: Handle HCR_EL2.NV system register traps KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state KVM: arm64: nv: Add EL2 system registers to vcpu context KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set KVM: arm64: nv: Introduce nested virtualization VCPU feature KVM: arm64: Use the S2 MMU context to iterate over S2 table ...
2023-02-22Merge tag 'arm64-upstream' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - Support for arm64 SME 2 and 2.1. SME2 introduces a new 512-bit architectural register (ZT0, for the look-up table feature) that Linux needs to save/restore - Include TPIDR2 in the signal context and add the corresponding kselftests - Perf updates: Arm SPEv1.2 support, HiSilicon uncore PMU updates, ACPI support to the Marvell DDR and TAD PMU drivers, reset DTM_PMU_CONFIG (ARM CMN) at probe time - Support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64 - Permit EFI boot with MMU and caches on. Instead of cleaning the entire loaded kernel image to the PoC and disabling the MMU and caches before branching to the kernel bare metal entry point, leave the MMU and caches enabled and rely on EFI's cacheable 1:1 mapping of all of system RAM to populate the initial page tables - Expose the AArch32 (compat) ELF_HWCAP features to user in an arm64 kernel (the arm32 kernel only defines the values) - Harden the arm64 shadow call stack pointer handling: stash the shadow stack pointer in the task struct on interrupt, load it directly from this structure - Signal handling cleanups to remove redundant validation of size information and avoid reading the same data from userspace twice - Refactor the hwcap macros to make use of the automatically generated ID registers. It should make new hwcaps writing less error prone - Further arm64 sysreg conversion and some fixes - arm64 kselftest fixes and improvements - Pointer authentication cleanups: don't sign leaf functions, unify asm-arch manipulation - Pseudo-NMI code generation optimisations - Minor fixes for SME and TPIDR2 handling - Miscellaneous updates: ARCH_FORCE_MAX_ORDER is now selectable, replace strtobool() to kstrtobool() in the cpufeature.c code, apply dynamic shadow call stack in two passes, intercept pfn changes in set_pte_at() without the required break-before-make sequence, attempt to dump all instructions on unhandled kernel faults * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (130 commits) arm64: fix .idmap.text assertion for large kernels kselftest/arm64: Don't require FA64 for streaming SVE+ZA tests kselftest/arm64: Copy whole EXTRA context arm64: kprobes: Drop ID map text from kprobes blacklist perf: arm_spe: Print the version of SPE detected perf: arm_spe: Add support for SPEv1.2 inverted event filtering perf: Add perf_event_attr::config3 arm64/sme: Fix __finalise_el2 SMEver check drivers/perf: fsl_imx8_ddr_perf: Remove set-but-not-used variable arm64/signal: Only read new data when parsing the ZT context arm64/signal: Only read new data when parsing the ZA context arm64/signal: Only read new data when parsing the SVE context arm64/signal: Avoid rereading context frame sizes arm64/signal: Make interface for restore_fpsimd_context() consistent arm64/signal: Remove redundant size validation from parse_user_sigframe() arm64/signal: Don't redundantly verify FPSIMD magic arm64/cpufeature: Use helper macros to specify hwcaps arm64/cpufeature: Always use symbolic name for feature value in hwcaps arm64/sysreg: Initial unsigned annotations for ID registers arm64/sysreg: Initial annotation of signed ID registers ...
2023-02-14Merge branch kvm-arm64/nv-prefix into kvmarm/nextOliver Upton3-13/+78
* kvm-arm64/nv-prefix: : Preamble to NV support, courtesy of Marc Zyngier. : : This brings in a set of prerequisite patches for supporting nested : virtualization in KVM/arm64. Of course, there is a long way to go until : NV is actually enabled in KVM. : : - Introduce cpucap / vCPU feature flag to pivot the NV code on : : - Add support for EL2 vCPU register state : : - Basic nested exception handling : : - Hide unsupported features from the ID registers for NV-capable VMs KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes KVM: arm64: nv: Filter out unsupported features from ID regs KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2 KVM: arm64: nv: Allow a sysreg to be hidden from userspace only KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2 KVM: arm64: nv: Handle SMCs taken from virtual EL2 KVM: arm64: nv: Handle trapped ERET from virtual EL2 KVM: arm64: nv: Inject HVC exceptions to the virtual EL2 KVM: arm64: nv: Support virtual EL2 exceptions KVM: arm64: nv: Handle HCR_EL2.NV system register traps KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state KVM: arm64: nv: Add EL2 system registers to vcpu context KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set KVM: arm64: nv: Introduce nested virtualization VCPU feature KVM: arm64: Use the S2 MMU context to iterate over S2 table arm64: Add ARM64_HAS_NESTED_VIRT cpufeature Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-14Merge branch kvm-arm64/misc into kvmarm/nextOliver Upton1-1/+1
* kvm-arm64/misc: : Miscellaneous updates : : - Convert CPACR_EL1_TTA to the new, generated system register : definitions. : : - Serialize toggling CPACR_EL1.SMEN to avoid unexpected exceptions when : accessing SVCR in the host. : : - Avoid quiescing the guest if a vCPU accesses its own redistributor's : SGIs/PPIs, eliminating the need to IPI. Largely an optimization for : nested virtualization, as the L1 accesses the affected registers : rather often. : : - Conversion to kstrtobool() : : - Common definition of INVALID_GPA across architectures : : - Enable CONFIG_USERFAULTFD for CI runs of KVM selftests KVM: arm64: Fix non-kerneldoc comments KVM: selftests: Enable USERFAULTFD KVM: selftests: Remove redundant setbuf() arm64/sysreg: clean up some inconsistent indenting KVM: MMU: Make the definition of 'INVALID_GPA' common KVM: arm64: vgic-v3: Use kstrtobool() instead of strtobool() KVM: arm64: vgic-v3: Limit IPI-ing when accessing GICR_{C,S}ACTIVER0 KVM: arm64: Synchronize SMEN on vcpu schedule out KVM: arm64: Kill CPACR_EL1_TTA definition Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-14Merge branch kvm-arm64/psci-relay-fixes into kvmarm/nextOliver Upton2-0/+2
* kvm-arm64/psci-relay-fixes: : Fixes for CPU on/resume with pKVM, courtesy Quentin Perret. : : A consequence of deprivileging the host is that pKVM relays PSCI calls : on behalf of the host. pKVM's CPU initialization failed to fully : initialize the CPU's EL2 state, which notably led to unexpected SVE : traps resulting in a hyp panic. : : The issue is addressed by reusing parts of __finalise_el2 to restore CPU : state in the PSCI relay. KVM: arm64: Finalise EL2 state from pKVM PSCI relay KVM: arm64: Use sanitized values in __check_override in nVHE KVM: arm64: Introduce finalise_el2_state macro KVM: arm64: Provide sanitized SYS_ID_AA64SMFR0_EL1 to nVHE
2023-02-14Merge branch kvm-arm64/parallel-access-faults into kvmarm/nextOliver Upton1-6/+37
* kvm-arm64/parallel-access-faults: : Parallel stage-2 access fault handling : : The parallel faults changes that went in to 6.2 covered most stage-2 : aborts, with the exception of stage-2 access faults. Building on top of : the new infrastructure, this series adds support for handling access : faults (i.e. updating the access flag) in parallel. : : This is expected to provide a performance uplift for cores that do not : implement FEAT_HAFDBS, such as those from the fruit company. KVM: arm64: Condition HW AF updates on config option KVM: arm64: Handle access faults behind the read lock KVM: arm64: Don't serialize if the access flag isn't set KVM: arm64: Return EAGAIN for invalid PTE in attr walker KVM: arm64: Ignore EAGAIN for walks outside of a fault KVM: arm64: Use KVM's pte type/helpers in handle_access_fault() Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-14Merge branch kvm-arm64/virtual-cache-geometry into kvmarm/nextOliver Upton1-2/+0
* kvm-arm64/virtual-cache-geometry: : Virtualized cache geometry for KVM guests, courtesy of Akihiko Odaki. : : KVM/arm64 has always exposed the host cache geometry directly to the : guest, even though non-secure software should never perform CMOs by : Set/Way. This was slightly wrong, as the cache geometry was derived from : the PE on which the vCPU thread was running and not a sanitized value. : : All together this leads to issues migrating VMs on heterogeneous : systems, as the cache geometry saved/restored could be inconsistent. : : KVM/arm64 now presents 1 level of cache with 1 set and 1 way. The cache : geometry is entirely controlled by userspace, such that migrations from : older kernels continue to work. KVM: arm64: Mark some VM-scoped allocations as __GFP_ACCOUNT KVM: arm64: Normalize cache configuration KVM: arm64: Mask FEAT_CCIDX KVM: arm64: Always set HCR_TID2 arm64/cache: Move CLIDR macro definitions arm64/sysreg: Add CCSIDR2_EL1 arm64/sysreg: Convert CCSIDR_EL1 to automatic generation arm64: Allow the definition of UNKNOWN system register fields Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-11KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisorMarc Zyngier2-1/+42
We can no longer blindly copy the VCPU's PSTATE into SPSR_EL2 and return to the guest and vice versa when taking an exception to the hypervisor, because we emulate virtual EL2 in EL1 and therefore have to translate the mode field from EL2 to EL1 and vice versa. This requires keeping track of the state we enter the guest, for which we transiently use a dedicated flag. Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-15-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-11KVM: arm64: nv: Support virtual EL2 exceptionsJintack Lim1-12/+36
Support injecting exceptions and performing exception returns to and from virtual EL2. This must be done entirely in software except when taking an exception from vEL0 to vEL2 when the virtual HCR_EL2.{E2H,TGE} == {1,1} (a VHE guest hypervisor). [maz: switch to common exception injection framework, illegal exeption return handling] Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-10-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-10Merge branches 'for-next/sysreg', 'for-next/sme', 'for-next/kselftest', ↵Catalin Marinas1-1/+1
'for-next/misc', 'for-next/sme2', 'for-next/tpidr2', 'for-next/scs', 'for-next/compat-hwcap', 'for-next/ftrace', 'for-next/efi-boot-mmu-on', 'for-next/ptrauth' and 'for-next/pseudo-nmi', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: perf: arm_spe: Print the version of SPE detected perf: arm_spe: Add support for SPEv1.2 inverted event filtering perf: Add perf_event_attr::config3 drivers/perf: fsl_imx8_ddr_perf: Remove set-but-not-used variable perf: arm_spe: Support new SPEv1.2/v8.7 'not taken' event perf: arm_spe: Use new PMSIDR_EL1 register enums perf: arm_spe: Drop BIT() and use FIELD_GET/PREP accessors arm64/sysreg: Convert SPE registers to automatic generation arm64: Drop SYS_ from SPE register defines perf: arm_spe: Use feature numbering for PMSEVFR_EL1 defines perf/marvell: Add ACPI support to TAD uncore driver perf/marvell: Add ACPI support to DDR uncore driver perf/arm-cmn: Reset DTM_PMU_CONFIG at probe drivers/perf: hisi: Extract initialization of "cpa_pmu->pmu" drivers/perf: hisi: Simplify the parameters of hisi_pmu_init() drivers/perf: hisi: Advertise the PERF_PMU_CAP_NO_EXCLUDE capability * for-next/sysreg: : arm64 sysreg and cpufeature fixes/updates KVM: arm64: Use symbolic definition for ISR_EL1.A arm64/sysreg: Add definition of ISR_EL1 arm64/sysreg: Add definition for ICC_NMIAR1_EL1 arm64/cpufeature: Remove 4 bit assumption in ARM64_FEATURE_MASK() arm64/sysreg: Fix errors in 32 bit enumeration values arm64/cpufeature: Fix field sign for DIT hwcap detection * for-next/sme: : SME-related updates arm64/sme: Optimise SME exit on syscall entry arm64/sme: Don't use streaming mode to probe the maximum SME VL arm64/ptrace: Use system_supports_tpidr2() to check for TPIDR2 support * for-next/kselftest: (23 commits) : arm64 kselftest fixes and improvements kselftest/arm64: Don't require FA64 for streaming SVE+ZA tests kselftest/arm64: Copy whole EXTRA context kselftest/arm64: Fix enumeration of systems without 128 bit SME for SSVE+ZA kselftest/arm64: Fix enumeration of systems without 128 bit SME kselftest/arm64: Don't require FA64 for streaming SVE tests kselftest/arm64: Limit the maximum VL we try to set via ptrace kselftest/arm64: Correct buffer size for SME ZA storage kselftest/arm64: Remove the local NUM_VL definition kselftest/arm64: Verify simultaneous SSVE and ZA context generation kselftest/arm64: Verify that SSVE signal context has SVE_SIG_FLAG_SM set kselftest/arm64: Remove spurious comment from MTE test Makefile kselftest/arm64: Support build of MTE tests with clang kselftest/arm64: Initialise current at build time in signal tests kselftest/arm64: Don't pass headers to the compiler as source kselftest/arm64: Remove redundant _start labels from FP tests kselftest/arm64: Fix .pushsection for strings in FP tests kselftest/arm64: Run BTI selftests on systems without BTI kselftest/arm64: Fix test numbering when skipping tests kselftest/arm64: Skip non-power of 2 SVE vector lengths in fp-stress kselftest/arm64: Only enumerate power of two VLs in syscall-abi ... * for-next/misc: : Miscellaneous arm64 updates arm64/mm: Intercept pfn changes in set_pte_at() Documentation: arm64: correct spelling arm64: traps: attempt to dump all instructions arm64: Apply dynamic shadow call stack patching in two passes arm64: el2_setup.h: fix spelling typo in comments arm64: Kconfig: fix spelling arm64: cpufeature: Use kstrtobool() instead of strtobool() arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path arm64: make ARCH_FORCE_MAX_ORDER selectable * for-next/sme2: (23 commits) : Support for arm64 SME 2 and 2.1 arm64/sme: Fix __finalise_el2 SMEver check kselftest/arm64: Remove redundant _start labels from zt-test kselftest/arm64: Add coverage of SME 2 and 2.1 hwcaps kselftest/arm64: Add coverage of the ZT ptrace regset kselftest/arm64: Add SME2 coverage to syscall-abi kselftest/arm64: Add test coverage for ZT register signal frames kselftest/arm64: Teach the generic signal context validation about ZT kselftest/arm64: Enumerate SME2 in the signal test utility code kselftest/arm64: Cover ZT in the FP stress test kselftest/arm64: Add a stress test program for ZT0 arm64/sme: Add hwcaps for SME 2 and 2.1 features arm64/sme: Implement ZT0 ptrace support arm64/sme: Implement signal handling for ZT arm64/sme: Implement context switching for ZT0 arm64/sme: Provide storage for ZT0 arm64/sme: Add basic enumeration for SME2 arm64/sme: Enable host kernel to access ZT0 arm64/sme: Manually encode ZT0 load and store instructions arm64/esr: Document ISS for ZT0 being disabled arm64/sme: Document SME 2 and SME 2.1 ABI ... * for-next/tpidr2: : Include TPIDR2 in the signal context kselftest/arm64: Add test case for TPIDR2 signal frame records kselftest/arm64: Add TPIDR2 to the set of known signal context records arm64/signal: Include TPIDR2 in the signal context arm64/sme: Document ABI for TPIDR2 signal information * for-next/scs: : arm64: harden shadow call stack pointer handling arm64: Stash shadow stack pointer in the task struct on interrupt arm64: Always load shadow stack pointer directly from the task struct * for-next/compat-hwcap: : arm64: Expose compat ARMv8 AArch32 features (HWCAPs) arm64: Add compat hwcap SSBS arm64: Add compat hwcap SB arm64: Add compat hwcap I8MM arm64: Add compat hwcap ASIMDBF16 arm64: Add compat hwcap ASIMDFHM arm64: Add compat hwcap ASIMDDP arm64: Add compat hwcap FPHP and ASIMDHP * for-next/ftrace: : Add arm64 support for DYNAMICE_FTRACE_WITH_CALL_OPS arm64: avoid executing padding bytes during kexec / hibernation arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS arm64: ftrace: Update stale comment arm64: patching: Add aarch64_insn_write_literal_u64() arm64: insn: Add helpers for BTI arm64: Extend support for CONFIG_FUNCTION_ALIGNMENT ACPI: Don't build ACPICA with '-Os' Compiler attributes: GCC cold function alignment workarounds ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS * for-next/efi-boot-mmu-on: : Permit arm64 EFI boot with MMU and caches on arm64: kprobes: Drop ID map text from kprobes blacklist arm64: head: Switch endianness before populating the ID map efi: arm64: enter with MMU and caches enabled arm64: head: Clean the ID map and the HYP text to the PoC if needed arm64: head: avoid cache invalidation when entering with the MMU on arm64: head: record the MMU state at primary entry arm64: kernel: move identity map out of .text mapping arm64: head: Move all finalise_el2 calls to after __enable_mmu * for-next/ptrauth: : arm64 pointer authentication cleanup arm64: pauth: don't sign leaf functions arm64: unify asm-arch manipulation * for-next/pseudo-nmi: : Pseudo-NMI code generation optimisations arm64: irqflags: use alternative branches for pseudo-NMI logic arm64: add ARM64_HAS_GIC_PRIO_RELAXED_SYNC cpucap arm64: make ARM64_HAS_GIC_PRIO_MASKING depend on ARM64_HAS_GIC_CPUIF_SYSREGS arm64: rename ARM64_HAS_IRQ_PRIO_MASKING to ARM64_HAS_GIC_PRIO_MASKING arm64: rename ARM64_HAS_SYSREG_GIC_CPUIF to ARM64_HAS_GIC_CPUIF_SYSREGS
2023-02-03KVM: arm64: Finalise EL2 state from pKVM PSCI relayQuentin Perret1-0/+1
The EL2 state is not initialised correctly when a CPU comes out of CPU_{SUSPEND,OFF} as the finalise_el2 function is not being called. Let's directly call finalise_el2_state from this path to solve the issue. Fixes: 504ee23611c4 ("arm64: Add the arm64.nosve command line option") Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20230201103755.1398086-5-qperret@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-02-03KVM: arm64: Provide sanitized SYS_ID_AA64SMFR0_EL1 to nVHEQuentin Perret1-0/+1
We will need a sanitized copy of SYS_ID_AA64SMFR0_EL1 from the nVHE EL2 code shortly, so make sure to provide it with a copy. Signed-off-by: Quentin Perret <qperret@google.com> Acked-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230201103755.1398086-2-qperret@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-01-19arm64: Drop SYS_ from SPE register definesRob Herring1-1/+1
We currently have a non-standard SYS_ prefix in the constants generated for the SPE register bitfields. Drop this in preparation for automatic register definition generation. The SPE mask defines were unshifted, and the SPE register field enumerations were shifted. The autogenerated defines are the opposite, so make the necessary adjustments. No functional changes. Tested-by: James Clark <james.clark@arm.com> Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20220825-arm-spe-v8-7-v4-2-327f860daf28@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2023-01-13KVM: arm64: Kill CPACR_EL1_TTA definitionMarc Zyngier1-1/+1
Since the One True Way is to use the new generated definition, kill the KVM-specific definition of CPACR_EL1_TTA, and move over to CPACR_ELx_TTA, hopefully for the same result. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230112154803.1808559-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-01-13KVM: arm64: Condition HW AF updates on config optionOliver Upton1-0/+2
As it currently stands, KVM makes use of FEAT_HAFDBS unconditionally. Use of the feature in the rest of the kernel is guarded by an associated Kconfig option. Align KVM with the rest of the kernel and only enable VTCR_HA when ARM64_HW_AFDBM is enabled. This can be helpful for testing changes to the stage-2 access fault path on Armv8.1+ implementations. Link: https://lore.kernel.org/r/20221202185156.696189-7-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-01-13KVM: arm64: Handle access faults behind the read lockOliver Upton1-1/+2
As the underlying software walkers are able to traverse and update stage-2 in parallel there is no need to serialize access faults. Only take the read lock when handling an access fault. Link: https://lore.kernel.org/r/20221202185156.696189-6-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-01-13KVM: arm64: Don't serialize if the access flag isn't setOliver Upton1-4/+8
Of course, if the PTE wasn't changed then there are absolutely no serialization requirements. Skip the DSB for an unsuccessful update to the access flag. Link: https://lore.kernel.org/r/20221202185156.696189-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-01-13KVM: arm64: Return EAGAIN for invalid PTE in attr walkerOliver Upton1-1/+1
Return EAGAIN for invalid PTEs in the attr walker, signaling to the caller that any serialization and/or invalidation can be elided. Link: https://lore.kernel.org/r/20221202185156.696189-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-01-13KVM: arm64: Ignore EAGAIN for walks outside of a faultOliver Upton1-3/+27
The page table walkers are invoked outside fault handling paths, such as write protecting a range of memory. EAGAIN is generally used by the walkers to retry execution due to races on a particular PTE, like taking an access fault on a PTE being invalidated from another thread. This early return behavior is undesirable for walkers that operate outside a fault handler. Suppress EAGAIN and continue the walk if operating outside a fault handler. Link: https://lore.kernel.org/r/20221202185156.696189-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-01-13KVM: arm64: Always set HCR_TID2Akihiko Odaki1-2/+0
Always set HCR_TID2 to trap CTR_EL0, CCSIDR2_EL1, CLIDR_EL1, and CSSELR_EL1. This saves a few lines of code and allows to employ their access trap handlers for more purposes anticipated by the old condition for setting HCR_TID2. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Link: https://lore.kernel.org/r/20230112023852.42012-6-akihiko.odaki@daynix.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-01-12KVM: arm64: Use symbolic definition for ISR_EL1.AMark Brown1-1/+1
Now that we are generating ISR_EL1 we have acquired a constant for ISR_EL1.A, use it rather than the magic number we had been using in the KVM entry code. Suggested-by: Marc Zyngier <maz@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20221208-arm64-isr-el1-v2-3-89f7073a1ca9@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-03KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_*Marc Zyngier2-2/+2
The former is an AArch32 legacy, so let's move over to the verbose (and strictly identical) version. This involves moving some of the #defines that were private to KVM into the more generic esr.h. Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-12-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds16-438/+1741
Pull kvm updates from Paolo Bonzini: "ARM64: - Enable the per-vcpu dirty-ring tracking mechanism, together with an option to keep the good old dirty log around for pages that are dirtied by something other than a vcpu. - Switch to the relaxed parallel fault handling, using RCU to delay page table reclaim and giving better performance under load. - Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option, which multi-process VMMs such as crosvm rely on (see merge commit 382b5b87a97d: "Fix a number of issues with MTE, such as races on the tags being initialised vs the PG_mte_tagged flag as well as the lack of support for VM_SHARED when KVM is involved. Patches from Catalin Marinas and Peter Collingbourne"). - Merge the pKVM shadow vcpu state tracking that allows the hypervisor to have its own view of a vcpu, keeping that state private. - Add support for the PMUv3p5 architecture revision, bringing support for 64bit counters on systems that support it, and fix the no-quite-compliant CHAIN-ed counter support for the machines that actually exist out there. - Fix a handful of minor issues around 52bit VA/PA support (64kB pages only) as a prefix of the oncoming support for 4kB and 16kB pages. - Pick a small set of documentation and spelling fixes, because no good merge window would be complete without those. s390: - Second batch of the lazy destroy patches - First batch of KVM changes for kernel virtual != physical address support - Removal of a unused function x86: - Allow compiling out SMM support - Cleanup and documentation of SMM state save area format - Preserve interrupt shadow in SMM state save area - Respond to generic signals during slow page faults - Fixes and optimizations for the non-executable huge page errata fix. - Reprogram all performance counters on PMU filter change - Cleanups to Hyper-V emulation and tests - Process Hyper-V TLB flushes from a nested guest (i.e. from a L2 guest running on top of a L1 Hyper-V hypervisor) - Advertise several new Intel features - x86 Xen-for-KVM: - Allow the Xen runstate information to cross a page boundary - Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured - Add support for 32-bit guests in SCHEDOP_poll - Notable x86 fixes and cleanups: - One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0). - Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few years back when eliminating unnecessary barriers when switching between vmcs01 and vmcs02. - Clean up vmread_error_trampoline() to make it more obvious that params must be passed on the stack, even for x86-64. - Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective of the current guest CPUID. - Fudge around a race with TSC refinement that results in KVM incorrectly thinking a guest needs TSC scaling when running on a CPU with a constant TSC, but no hardware-enumerated TSC frequency. - Advertise (on AMD) that the SMM_CTL MSR is not supported - Remove unnecessary exports Generic: - Support for responding to signals during page faults; introduces new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks Selftests: - Fix an inverted check in the access tracking perf test, and restore support for asserting that there aren't too many idle pages when running on bare metal. - Fix build errors that occur in certain setups (unsure exactly what is unique about the problematic setup) due to glibc overriding static_assert() to a variant that requires a custom message. - Introduce actual atomics for clear/set_bit() in selftests - Add support for pinning vCPUs in dirty_log_perf_test. - Rename the so called "perf_util" framework to "memstress". - Add a lightweight psuedo RNG for guest use, and use it to randomize the access pattern and write vs. read percentage in the memstress tests. - Add a common ucall implementation; code dedup and pre-work for running SEV (and beyond) guests in selftests. - Provide a common constructor and arch hook, which will eventually be used by x86 to automatically select the right hypercall (AMD vs. Intel). - A bunch of added/enabled/fixed selftests for ARM64, covering memslots, breakpoints, stage-2 faults and access tracking. - x86-specific selftest changes: - Clean up x86's page table management. - Clean up and enhance the "smaller maxphyaddr" test, and add a related test to cover generic emulation failure. - Clean up the nEPT support checks. - Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values. - Fix an ordering issue in the AMX test introduced by recent conversions to use kvm_cpu_has(), and harden the code to guard against similar bugs in the future. Anything that tiggers caching of KVM's supported CPUID, kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if the caching occurs before the test opts in via prctl(). Documentation: - Remove deleted ioctls from documentation - Clean up the docs for the x86 MSR filter. - Various fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (361 commits) KVM: x86: Add proper ReST tables for userspace MSR exits/flags KVM: selftests: Allocate ucall pool from MEM_REGION_DATA KVM: arm64: selftests: Align VA space allocator with TTBR0 KVM: arm64: Fix benign bug with incorrect use of VA_BITS KVM: arm64: PMU: Fix period computation for 64bit counters with 32bit overflow KVM: x86: Advertise that the SMM_CTL MSR is not supported KVM: x86: remove unnecessary exports KVM: selftests: Fix spelling mistake "probabalistic" -> "probabilistic" tools: KVM: selftests: Convert clear/set_bit() to actual atomics tools: Drop "atomic_" prefix from atomic test_and_set_bit() tools: Drop conflicting non-atomic test_and_{clear,set}_bit() helpers KVM: selftests: Use non-atomic clear/set bit helpers in KVM tests perf tools: Use dedicated non-atomic clear/set bit helpers tools: Take @bit as an "unsigned long" in {clear,set}_bit() helpers KVM: arm64: selftests: Enable single-step without a "full" ucall() KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself KVM: Remove stale comment about KVM_REQ_UNHALT KVM: Add missing arch for KVM_CREATE_DEVICE and KVM_{SET,GET}_DEVICE_ATTR KVM: Reference to kvm_userspace_memory_region in doc and comments KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctl ...
2022-12-05Merge branch kvm-arm64/misc-6.2 into kvmarm-master/nextMarc Zyngier1-1/+1
* kvm-arm64/misc-6.2: : . : Misc fixes for 6.2: : : - Fix formatting for the pvtime documentation : : - Fix a comment in the VHE-specific Makefile : . KVM: arm64: Fix typo in comment KVM: arm64: Fix pvtime documentation Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-12-05Merge branch kvm-arm64/pkvm-vcpu-state into kvmarm-master/nextMarc Zyngier15-125/+1392
* kvm-arm64/pkvm-vcpu-state: (25 commits) : . : Large drop of pKVM patches from Will Deacon and co, adding : a private vm/vcpu state at EL2, managed independently from : the EL1 state. From the cover letter: : : "This is version six of the pKVM EL2 state series, extending the pKVM : hypervisor code so that it can dynamically instantiate and manage VM : data structures without the host being able to access them directly. : These structures consist of a hyp VM, a set of hyp vCPUs and the stage-2 : page-table for the MMU. The pages used to hold the hypervisor structures : are returned to the host when the VM is destroyed." : . KVM: arm64: Use the pKVM hyp vCPU structure in handle___kvm_vcpu_run() KVM: arm64: Don't unnecessarily map host kernel sections at EL2 KVM: arm64: Explicitly map 'kvm_vgic_global_state' at EL2 KVM: arm64: Maintain a copy of 'kvm_arm_vmid_bits' at EL2 KVM: arm64: Unmap 'kvm_arm_hyp_percpu_base' from the host KVM: arm64: Return guest memory from EL2 via dedicated teardown memcache KVM: arm64: Instantiate guest stage-2 page-tables at EL2 KVM: arm64: Consolidate stage-2 initialisation into a single function KVM: arm64: Add generic hyp_memcache helpers KVM: arm64: Provide I-cache invalidation by virtual address at EL2 KVM: arm64: Initialise hypervisor copies of host symbols unconditionally KVM: arm64: Add per-cpu fixmap infrastructure at EL2 KVM: arm64: Instantiate pKVM hypervisor VM and vCPU structures from EL1 KVM: arm64: Add infrastructure to create and track pKVM instances at EL2 KVM: arm64: Rename 'host_kvm' to 'host_mmu' KVM: arm64: Add hyp_spinlock_t static initializer KVM: arm64: Include asm/kvm_mmu.h in nvhe/mem_protect.h KVM: arm64: Add helpers to pin memory shared with the hypervisor at EL2 KVM: arm64: Prevent the donation of no-map pages KVM: arm64: Implement do_donate() helper for donating memory ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-12-05Merge branch kvm-arm64/parallel-faults into kvmarm-master/nextMarc Zyngier3-319/+355
* kvm-arm64/parallel-faults: : . : Parallel stage-2 fault handling, courtesy of Oliver Upton. : From the cover letter: : : "Presently KVM only takes a read lock for stage 2 faults if it believes : the fault can be fixed by relaxing permissions on a PTE (write unprotect : for dirty logging). Otherwise, stage 2 faults grab the write lock, which : predictably can pile up all the vCPUs in a sufficiently large VM. : : Like the TDP MMU for x86, this series loosens the locking around : manipulations of the stage 2 page tables to allow parallel faults. RCU : and atomics are exploited to safely build/destroy the stage 2 page : tables in light of multiple software observers." : . KVM: arm64: Reject shared table walks in the hyp code KVM: arm64: Don't acquire RCU read lock for exclusive table walks KVM: arm64: Take a pointer to walker data in kvm_dereference_pteref() KVM: arm64: Handle stage-2 faults in parallel KVM: arm64: Make table->block changes parallel-aware KVM: arm64: Make leaf->leaf PTE changes parallel-aware KVM: arm64: Make block->table PTE changes parallel-aware KVM: arm64: Split init and set for table PTE KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks KVM: arm64: Protect stage-2 traversal with RCU KVM: arm64: Tear down unlinked stage-2 subtree after break-before-make KVM: arm64: Use an opaque type for pteps KVM: arm64: Add a helper to tear down unlinked stage-2 subtrees KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data KVM: arm64: Pass mm_ops through the visitor context KVM: arm64: Stash observed pte value in visitor context KVM: arm64: Combine visitor arguments into a context structure Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-11-22KVM: arm64: Reject shared table walks in the hyp codeOliver Upton1-1/+4
Exclusive table walks are the only supported table walk in the hyp, as there is no construct like RCU available in the hypervisor code. Reject any attempt to do a shared table walk by returning an error and allowing the caller to clean up the mess. Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221118182222.3932898-4-oliver.upton@linux.dev
2022-11-22KVM: arm64: Don't acquire RCU read lock for exclusive table walksOliver Upton1-2/+2
Marek reported a BUG resulting from the recent parallel faults changes, as the hyp stage-1 map walker attempted to allocate table memory while holding the RCU read lock: BUG: sleeping function called from invalid context at include/linux/sched/mm.h:274 in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0 preempt_count: 0, expected: 0 RCU nest depth: 1, expected: 0 2 locks held by swapper/0/1: #0: ffff80000a8a44d0 (kvm_hyp_pgd_mutex){+.+.}-{3:3}, at: __create_hyp_mappings+0x80/0xc4 #1: ffff80000a927720 (rcu_read_lock){....}-{1:2}, at: kvm_pgtable_walk+0x0/0x1f4 CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc3+ #5918 Hardware name: Raspberry Pi 3 Model B (DT) Call trace: dump_backtrace.part.0+0xe4/0xf0 show_stack+0x18/0x40 dump_stack_lvl+0x8c/0xb8 dump_stack+0x18/0x34 __might_resched+0x178/0x220 __might_sleep+0x48/0xa0 prepare_alloc_pages+0x178/0x1a0 __alloc_pages+0x9c/0x109c alloc_page_interleave+0x1c/0xc4 alloc_pages+0xec/0x160 get_zeroed_page+0x1c/0x44 kvm_hyp_zalloc_page+0x14/0x20 hyp_map_walker+0xd4/0x134 kvm_pgtable_visitor_cb.isra.0+0x38/0x5c __kvm_pgtable_walk+0x1a4/0x220 kvm_pgtable_walk+0x104/0x1f4 kvm_pgtable_hyp_map+0x80/0xc4 __create_hyp_mappings+0x9c/0xc4 kvm_mmu_init+0x144/0x1cc kvm_arch_init+0xe4/0xef4 kvm_init+0x3c/0x3d0 arm_init+0x20/0x30 do_one_initcall+0x74/0x400 kernel_init_freeable+0x2e0/0x350 kernel_init+0x24/0x130 ret_from_fork+0x10/0x20 Since the hyp stage-1 table walkers are serialized by kvm_hyp_pgd_mutex, RCU protection really doesn't add anything. Don't acquire the RCU read lock for an exclusive walk. Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221118182222.3932898-3-oliver.upton@linux.dev
2022-11-22KVM: arm64: Take a pointer to walker data in kvm_dereference_pteref()Oliver Upton1-3/+3
Rather than passing through the state of the KVM_PGTABLE_WALK_SHARED flag, just take a pointer to the whole walker structure instead. Move around struct kvm_pgtable and the RCU indirection such that the associated ifdeffery remains in one place while ensuring the walker + flags definitions precede their use. No functional change intended. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221118182222.3932898-2-oliver.upton@linux.dev
2022-11-11KVM: arm64: Use the pKVM hyp vCPU structure in handle___kvm_vcpu_run()Will Deacon3-2/+109
As a stepping stone towards deprivileging the host's access to the guest's vCPU structures, introduce some naive flush/sync routines to copy most of the host vCPU into the hyp vCPU on vCPU run and back again on return to EL1. This allows us to run using the pKVM hyp structures when KVM is initialised in protected mode. Tested-by: Vincent Donnefort <vdonnefort@google.com> Co-developed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-27-will@kernel.org