summaryrefslogtreecommitdiff
path: root/arch/arm64/include
AgeCommit message (Collapse)AuthorFilesLines
2023-11-11Merge tag 'probes-fixes-v6.7-rc1' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fixes from Masami Hiramatsu: - Documentation update: Add a note about argument and return value fetching is the best effort because it depends on the type. - objpool: Fix to make internal global variables static in test_objpool.c. - kprobes: Unify kprobes_exceptions_nofify() prototypes. There are the same prototypes in asm/kprobes.h for some architectures, but some of them are missing the prototype and it causes a warning. So move the prototype into linux/kprobes.h. - tracing: Fix to check the tracepoint event and return event at parsing stage. The tracepoint event doesn't support %return but if $retval exists, it will be converted to %return silently. This finds that case and rejects it. - tracing: Fix the order of the descriptions about the parameters of __kprobe_event_gen_cmd_start() to be consistent with the argument list of the function. * tag 'probes-fixes-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/kprobes: Fix the order of argument descriptions tracing: fprobe-event: Fix to check tracepoint event and return kprobes: unify kprobes_exceptions_nofify() prototypes lib: test_objpool: make global variables static Documentation: tracing: Add a note about argument and retval access
2023-11-10Merge tag 'arm64-fixes' of ↵Linus Torvalds2-21/+5
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: "Mostly PMU fixes and a reworking of the pseudo-NMI disabling on broken MediaTek firmware: - Move the MediaTek GIC quirk handling from irqchip to core. Before the merging window commit 44bd78dd2b88 ("irqchip/gic-v3: Disable pseudo NMIs on MediaTek devices w/ firmware issues") temporarily addressed this issue. Fixed now at a deeper level in the arch code - Reject events meant for other PMUs in the CoreSight PMU driver, otherwise some of the core PMU events would disappear - Fix the Armv8 PMUv3 driver driver to not truncate 64-bit registers, causing some events to be invisible - Remove duplicate declaration of __arm64_sys##name following the patch to avoid prototype warning for syscalls - Typos in the elf_hwcap documentation" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64/syscall: Remove duplicate declaration Revert "arm64: smp: avoid NMI IPIs with broken MediaTek FW" arm64: Move MediaTek GIC quirk handling from irqchip to core arm64/arm: arm_pmuv3: perf: Don't truncate 64-bit registers perf: arm_cspmu: Reject events meant for other PMUs Documentation/arm64: Fix typos in elf_hwcaps
2023-11-10kprobes: unify kprobes_exceptions_nofify() prototypesArnd Bergmann1-2/+0
Most architectures that support kprobes declare this function in their own asm/kprobes.h header and provide an override, but some are missing the prototype, which causes a warning for the __weak stub implementation: kernel/kprobes.c:1865:12: error: no previous prototype for 'kprobe_exceptions_notify' [-Werror=missing-prototypes] 1865 | int __weak kprobe_exceptions_notify(struct notifier_block *self, Move the prototype into linux/kprobes.h so it is visible to all the definitions. Link: https://lore.kernel.org/all/20231108125843.3806765-4-arnd@kernel.org/ Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-11-09arm64/syscall: Remove duplicate declarationKevin Brodsky1-1/+0
Commit 6ac19f96515e ("arm64: avoid prototype warnings for syscalls") added missing declarations to various syscall wrapper macros. It however proved a little too zealous in __SYSCALL_DEFINEx(), as a declaration for __arm64_sys##name was already present. A declaration is required before the call to ALLOW_ERROR_INJECTION(), so keep the original one and remove the new one. Fixes: 6ac19f96515e ("arm64: avoid prototype warnings for syscalls") Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20231109141153.250046-1-kevin.brodsky@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-11-07arm64/arm: arm_pmuv3: perf: Don't truncate 64-bit registersIlkka Koskinen1-20/+5
The driver used to truncate several 64-bit registers such as PMCEID[n] registers used to describe whether architectural and microarchitectural events in range 0x4000-0x401f exist. Due to discarding the bits, the driver made the events invisible, even if they existed. Moreover, PMCCFILTR and PMCR registers have additional bits in the upper 32 bits. This patch makes them available although they aren't currently used. Finally, functions handling PMXEVCNTR and PMXEVTYPER registers are removed as they not being used at all. Fixes: df29ddf4f04b ("arm64: perf: Abstract system register accesses away") Reported-by: Carl Worth <carl@os.amperecomputing.com> Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Acked-by: Will Deacon <will@kernel.org> Closes: https://lore.kernel.org/.. Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20231102183012.1251410-1-ilkka@os.amperecomputing.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-11-03Merge tag 'mm-nonmm-stable-2023-11-02-14-08' of ↵Linus Torvalds1-0/+10
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "As usual, lots of singleton and doubleton patches all over the tree and there's little I can say which isn't in the individual changelogs. The lengthier patch series are - 'kdump: use generic functions to simplify crashkernel reservation in arch', from Baoquan He. This is mainly cleanups and consolidation of the 'crashkernel=' kernel parameter handling - After much discussion, David Laight's 'minmax: Relax type checks in min() and max()' is here. Hopefully reduces some typecasting and the use of min_t() and max_t() - A group of patches from Oleg Nesterov which clean up and slightly fix our handling of reads from /proc/PID/task/... and which remove task_struct.thread_group" * tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (64 commits) scripts/gdb/vmalloc: disable on no-MMU scripts/gdb: fix usage of MOD_TEXT not defined when CONFIG_MODULES=n .mailmap: add address mapping for Tomeu Vizoso mailmap: update email address for Claudiu Beznea tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions .mailmap: map Benjamin Poirier's address scripts/gdb: add lx_current support for riscv ocfs2: fix a spelling typo in comment proc: test ProtectionKey in proc-empty-vm test proc: fix proc-empty-vm test with vsyscall fs/proc/base.c: remove unneeded semicolon do_io_accounting: use sig->stats_lock do_io_accounting: use __for_each_thread() ocfs2: replace BUG_ON() at ocfs2_num_free_extents() with ocfs2_error() ocfs2: fix a typo in a comment scripts/show_delta: add __main__ judgement before main code treewide: mark stuff as __ro_after_init fs: ocfs2: check status values proc: test /proc/${pid}/statm compiler.h: move __is_constexpr() to compiler.h ...
2023-11-03Merge tag 'mm-stable-2023-11-01-14-33' of ↵Linus Torvalds1-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Kemeng Shi has contributed some compation maintenance work in the series 'Fixes and cleanups to compaction' - Joel Fernandes has a patchset ('Optimize mremap during mutual alignment within PMD') which fixes an obscure issue with mremap()'s pagetable handling during a subsequent exec(), based upon an implementation which Linus suggested - More DAMON/DAMOS maintenance and feature work from SeongJae Park i the following patch series: mm/damon: misc fixups for documents, comments and its tracepoint mm/damon: add a tracepoint for damos apply target regions mm/damon: provide pseudo-moving sum based access rate mm/damon: implement DAMOS apply intervals mm/damon/core-test: Fix memory leaks in core-test mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval - In the series 'Do not try to access unaccepted memory' Adrian Hunter provides some fixups for the recently-added 'unaccepted memory' feature. To increase the feature's checking coverage. 'Plug a few gaps where RAM is exposed without checking if it is unaccepted memory' - In the series 'cleanups for lockless slab shrink' Qi Zheng has done some maintenance work which is preparation for the lockless slab shrinking code - Qi Zheng has redone the earlier (and reverted) attempt to make slab shrinking lockless in the series 'use refcount+RCU method to implement lockless slab shrink' - David Hildenbrand contributes some maintenance work for the rmap code in the series 'Anon rmap cleanups' - Kefeng Wang does more folio conversions and some maintenance work in the migration code. Series 'mm: migrate: more folio conversion and unification' - Matthew Wilcox has fixed an issue in the buffer_head code which was causing long stalls under some heavy memory/IO loads. Some cleanups were added on the way. Series 'Add and use bdev_getblk()' - In the series 'Use nth_page() in place of direct struct page manipulation' Zi Yan has fixed a potential issue with the direct manipulation of hugetlb page frames - In the series 'mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO' has improved our handling of gigantic pages in the hugetlb vmmemmep optimizaton code. This provides significant boot time improvements when significant amounts of gigantic pages are in use - Matthew Wilcox has sent the series 'Small hugetlb cleanups' - code rationalization and folio conversions in the hugetlb code - Yin Fengwei has improved mlock()'s handling of large folios in the series 'support large folio for mlock' - In the series 'Expose swapcache stat for memcg v1' Liu Shixin has added statistics for memcg v1 users which are available (and useful) under memcg v2 - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable) prctl so that userspace may direct the kernel to not automatically propagate the denial to child processes. The series is named 'MDWE without inheritance' - Kefeng Wang has provided the series 'mm: convert numa balancing functions to use a folio' which does what it says - In the series 'mm/ksm: add fork-exec support for prctl' Stefan Roesch makes is possible for a process to propagate KSM treatment across exec() - Huang Ying has enhanced memory tiering's calculation of memory distances. This is used to permit the dax/kmem driver to use 'high bandwidth memory' in addition to Optane Data Center Persistent Memory Modules (DCPMM). The series is named 'memory tiering: calculate abstract distance based on ACPI HMAT' - In the series 'Smart scanning mode for KSM' Stefan Roesch has optimized KSM by teaching it to retain and use some historical information from previous scans - Yosry Ahmed has fixed some inconsistencies in memcg statistics in the series 'mm: memcg: fix tracking of pending stats updates values' - In the series 'Implement IOCTL to get and optionally clear info about PTEs' Peter Xu has added an ioctl to /proc/<pid>/pagemap which permits us to atomically read-then-clear page softdirty state. This is mainly used by CRIU - Hugh Dickins contributed the series 'shmem,tmpfs: general maintenance', a bunch of relatively minor maintenance tweaks to this code - Matthew Wilcox has increased the use of the VMA lock over file-backed page faults in the series 'Handle more faults under the VMA lock'. Some rationalizations of the fault path became possible as a result - In the series 'mm/rmap: convert page_move_anon_rmap() to folio_move_anon_rmap()' David Hildenbrand has implemented some cleanups and folio conversions - In the series 'various improvements to the GUP interface' Lorenzo Stoakes has simplified and improved the GUP interface with an eye to providing groundwork for future improvements - Andrey Konovalov has sent along the series 'kasan: assorted fixes and improvements' which does those things - Some page allocator maintenance work from Kemeng Shi in the series 'Two minor cleanups to break_down_buddy_pages' - In thes series 'New selftest for mm' Breno Leitao has developed another MM self test which tickles a race we had between madvise() and page faults - In the series 'Add folio_end_read' Matthew Wilcox provides cleanups and an optimization to the core pagecache code - Nhat Pham has added memcg accounting for hugetlb memory in the series 'hugetlb memcg accounting' - Cleanups and rationalizations to the pagemap code from Lorenzo Stoakes, in the series 'Abstract vma_merge() and split_vma()' - Audra Mitchell has fixed issues in the procfs page_owner code's new timestamping feature which was causing some misbehaviours. In the series 'Fix page_owner's use of free timestamps' - Lorenzo Stoakes has fixed the handling of new mappings of sealed files in the series 'permit write-sealed memfd read-only shared mappings' - Mike Kravetz has optimized the hugetlb vmemmap optimization in the series 'Batch hugetlb vmemmap modification operations' - Some buffer_head folio conversions and cleanups from Matthew Wilcox in the series 'Finish the create_empty_buffers() transition' - As a page allocator performance optimization Huang Ying has added automatic tuning to the allocator's per-cpu-pages feature, in the series 'mm: PCP high auto-tuning' - Roman Gushchin has contributed the patchset 'mm: improve performance of accounted kernel memory allocations' which improves their performance by ~30% as measured by a micro-benchmark - folio conversions from Kefeng Wang in the series 'mm: convert page cpupid functions to folios' - Some kmemleak fixups in Liu Shixin's series 'Some bugfix about kmemleak' - Qi Zheng has improved our handling of memoryless nodes by keeping them off the allocation fallback list. This is done in the series 'handle memoryless nodes more appropriately' - khugepaged conversions from Vishal Moola in the series 'Some khugepaged folio conversions'" [ bcachefs conflicts with the dynamically allocated shrinkers have been resolved as per Stephen Rothwell in https://lore.kernel.org/all/20230913093553.4290421e@canb.auug.org.au/ with help from Qi Zheng. The clone3 test filtering conflict was half-arsed by yours truly ] * tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (406 commits) mm/damon/sysfs: update monitoring target regions for online input commit mm/damon/sysfs: remove requested targets when online-commit inputs selftests: add a sanity check for zswap Documentation: maple_tree: fix word spelling error mm/vmalloc: fix the unchecked dereference warning in vread_iter() zswap: export compression failure stats Documentation: ubsan: drop "the" from article title mempolicy: migration attempt to match interleave nodes mempolicy: mmap_lock is not needed while migrating folios mempolicy: alloc_pages_mpol() for NUMA policy without vma mm: add page_rmappable_folio() wrapper mempolicy: remove confusing MPOL_MF_LAZY dead code mempolicy: mpol_shared_policy_init() without pseudo-vma mempolicy trivia: use pgoff_t in shared mempolicy tree mempolicy trivia: slightly more consistent naming mempolicy trivia: delete those ancient pr_debug()s mempolicy: fix migrate_pages(2) syscall return nr_failed kernfs: drop shared NUMA mempolicy hooks hugetlbfs: drop shared NUMA mempolicy pretence mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets() ...
2023-11-03Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds11-46/+232
Pull kvm updates from Paolo Bonzini: "ARM: - Generalized infrastructure for 'writable' ID registers, effectively allowing userspace to opt-out of certain vCPU features for its guest - Optimization for vSGI injection, opportunistically compressing MPIDR to vCPU mapping into a table - Improvements to KVM's PMU emulation, allowing userspace to select the number of PMCs available to a VM - Guest support for memory operation instructions (FEAT_MOPS) - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing bugs and getting rid of useless code - Changes to the way the SMCCC filter is constructed, avoiding wasted memory allocations when not in use - Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing the overhead of errata mitigations - Miscellaneous kernel and selftest fixes LoongArch: - New architecture for kvm. The hardware uses the same model as x86, s390 and RISC-V, where guest/host mode is orthogonal to supervisor/user mode. The virtualization extensions are very similar to MIPS, therefore the code also has some similarities but it's been cleaned up to avoid some of the historical bogosities that are found in arch/mips. The kernel emulates MMU, timer and CSR accesses, while interrupt controllers are only emulated in userspace, at least for now. RISC-V: - Support for the Smstateen and Zicond extensions - Support for virtualizing senvcfg - Support for virtualized SBI debug console (DBCN) S390: - Nested page table management can be monitored through tracepoints and statistics x86: - Fix incorrect handling of VMX posted interrupt descriptor in KVM_SET_LAPIC, which could result in a dropped timer IRQ - Avoid WARN on systems with Intel IPI virtualization - Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs without forcing more common use cases to eat the extra memory overhead. - Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and SBPB, aka Selective Branch Predictor Barrier). - Fix a bug where restoring a vCPU snapshot that was taken within 1 second of creating the original vCPU would cause KVM to try to synchronize the vCPU's TSC and thus clobber the correct TSC being set by userspace. - Compute guest wall clock using a single TSC read to avoid generating an inaccurate time, e.g. if the vCPU is preempted between multiple TSC reads. - "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which complain about a "Firmware Bug" if the bit isn't set for select F/M/S combos. Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to appease Windows Server 2022. - Don't apply side effects to Hyper-V's synthetic timer on writes from userspace to fix an issue where the auto-enable behavior can trigger spurious interrupts, i.e. do auto-enabling only for guest writes. - Remove an unnecessary kick of all vCPUs when synchronizing the dirty log without PML enabled. - Advertise "support" for non-serializing FS/GS base MSR writes as appropriate. - Harden the fast page fault path to guard against encountering an invalid root when walking SPTEs. - Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n. - Use the fast path directly from the timer callback when delivering Xen timer events, instead of waiting for the next iteration of the run loop. This was not done so far because previously proposed code had races, but now care is taken to stop the hrtimer at critical points such as restarting the timer or saving the timer information for userspace. - Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future flag. - Optimize injection of PMU interrupts that are simultaneous with NMIs. - Usual handful of fixes for typos and other warts. x86 - MTRR/PAT fixes and optimizations: - Clean up code that deals with honoring guest MTRRs when the VM has non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled. - Zap EPT entries when non-coherent DMA assignment stops/start to prevent using stale entries with the wrong memtype. - Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y This was done as a workaround for virtual machine BIOSes that did not bother to clear CR0.CD (because ancient KVM/QEMU did not bother to set it, in turn), and there's zero reason to extend the quirk to also ignore guest PAT. x86 - SEV fixes: - Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts SHUTDOWN while running an SEV-ES guest. - Clean up the recognition of emulation failures on SEV guests, when KVM would like to "skip" the instruction but it had already been partially emulated. This makes it possible to drop a hack that second guessed the (insufficient) information provided by the emulator, and just do the right thing. Documentation: - Various updates and fixes, mostly for x86 - MTRR and PAT fixes and optimizations" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (164 commits) KVM: selftests: Avoid using forced target for generating arm64 headers tools headers arm64: Fix references to top srcdir in Makefile KVM: arm64: Add tracepoint for MMIO accesses where ISV==0 KVM: arm64: selftest: Perform ISB before reading PAR_EL1 KVM: arm64: selftest: Add the missing .guest_prepare() KVM: arm64: Always invalidate TLB for stage-2 permission faults KVM: x86: Service NMI requests after PMI requests in VM-Enter path KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs KVM: arm64: Refine _EL2 system register list that require trap reinjection arm64: Add missing _EL2 encodings arm64: Add missing _EL12 encodings KVM: selftests: aarch64: vPMU test for validating user accesses KVM: selftests: aarch64: vPMU register test for unimplemented counters KVM: selftests: aarch64: vPMU register test for implemented counters KVM: selftests: aarch64: Introduce vpmu_counter_access test tools: Import arm_pmuv3.h KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} ...
2023-11-02Merge tag 'asm-generic-6.7' of ↵Linus Torvalds1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull ia64 removal and asm-generic updates from Arnd Bergmann: - The ia64 architecture gets its well-earned retirement as planned, now that there is one last (mostly) working release that will be maintained as an LTS kernel. - The architecture specific system call tables are updated for the added map_shadow_stack() syscall and to remove references to the long-gone sys_lookup_dcookie() syscall. * tag 'asm-generic-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: hexagon: Remove unusable symbols from the ptrace.h uapi asm-generic: Fix spelling of architecture arch: Reserve map_shadow_stack() syscall number for all architectures syscalls: Cleanup references to sys_lookup_dcookie() Documentation: Drop or replace remaining mentions of IA64 lib/raid6: Drop IA64 support Documentation: Drop IA64 from feature descriptions kernel: Drop IA64 support from sig_fault handlers arch: Remove Itanium (IA-64) architecture
2023-11-01Merge tag 'arm64-upstream' of ↵Linus Torvalds28-141/+229
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "No major architecture features this time around, just some new HWCAP definitions, support for the Ampere SoC PMUs and a few fixes/cleanups. The bulk of the changes is reworking of the CPU capability checking code (cpus_have_cap() etc). - Major refactoring of the CPU capability detection logic resulting in the removal of the cpus_have_const_cap() function and migrating the code to "alternative" branches where possible - Backtrace/kgdb: use IPIs and pseudo-NMI - Perf and PMU: - Add support for Ampere SoC PMUs - Multi-DTC improvements for larger CMN configurations with multiple Debug & Trace Controllers - Rework the Arm CoreSight PMU driver to allow separate registration of vendor backend modules - Fixes: add missing MODULE_DEVICE_TABLE to the amlogic perf driver; use device_get_match_data() in the xgene driver; fix NULL pointer dereference in the hisi driver caused by calling cpuhp_state_remove_instance(); use-after-free in the hisi driver - HWCAP updates: - FEAT_SVE_B16B16 (BFloat16) - FEAT_LRCPC3 (release consistency model) - FEAT_LSE128 (128-bit atomic instructions) - SVE: remove a couple of pseudo registers from the cpufeature code. There is logic in place already to detect mismatched SVE features - Miscellaneous: - Reduce the default swiotlb size (currently 64MB) if no ZONE_DMA bouncing is needed. The buffer is still required for small kmalloc() buffers - Fix module PLT counting with !RANDOMIZE_BASE - Restrict CPU_BIG_ENDIAN to LLVM IAS 15.x or newer move synchronisation code out of the set_ptes() loop - More compact cpufeature displaying enabled cores - Kselftest updates for the new CPU features" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (83 commits) arm64: Restrict CPU_BIG_ENDIAN to GNU as or LLVM IAS 15.x or newer arm64: module: Fix PLT counting when CONFIG_RANDOMIZE_BASE=n arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helper perf: hisi: Fix use-after-free when register pmu fails drivers/perf: hisi_pcie: Initialize event->cpu only on success drivers/perf: hisi_pcie: Check the type first in pmu::event_init() arm64: cpufeature: Change DBM to display enabled cores arm64: cpufeature: Display the set of cores with a feature perf/arm-cmn: Enable per-DTC counter allocation perf/arm-cmn: Rework DTC counters (again) perf/arm-cmn: Fix DTC domain detection drivers: perf: arm_pmuv3: Drop some unused arguments from armv8_pmu_init() drivers: perf: arm_pmuv3: Read PMMIR_EL1 unconditionally drivers/perf: hisi: use cpuhp_state_remove_instance_nocalls() for hisi_hns3_pmu uninit process clocksource/drivers/arm_arch_timer: limit XGene-1 workaround arm64: Remove system_uses_lse_atomics() arm64: Mark the 'addr' argument to set_ptes() and __set_pte_at() as unused drivers/perf: xgene: Use device_get_match_data() perf/amlogic: add missing MODULE_DEVICE_TABLE arm64/mm: Hoist synchronization out of set_ptes() loop ...
2023-10-31Merge tag 'kvmarm-6.7' of ↵Paolo Bonzini11-46/+232
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.7 - Generalized infrastructure for 'writable' ID registers, effectively allowing userspace to opt-out of certain vCPU features for its guest - Optimization for vSGI injection, opportunistically compressing MPIDR to vCPU mapping into a table - Improvements to KVM's PMU emulation, allowing userspace to select the number of PMCs available to a VM - Guest support for memory operation instructions (FEAT_MOPS) - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing bugs and getting rid of useless code - Changes to the way the SMCCC filter is constructed, avoiding wasted memory allocations when not in use - Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing the overhead of errata mitigations - Miscellaneous kernel and selftest fixes
2023-10-31Merge tag 'locking-core-2023-10-28' of ↵Linus Torvalds2-1/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Info Molnar: "Futex improvements: - Add the 'futex2' syscall ABI, which is an attempt to get away from the multiplex syscall and adds a little room for extentions, while lifting some limitations. - Fix futex PI recursive rt_mutex waiter state bug - Fix inter-process shared futexes on no-MMU systems - Use folios instead of pages Micro-optimizations of locking primitives: - Improve arch_spin_value_unlocked() on asm-generic ticket spinlock architectures, to improve lockref code generation - Improve the x86-32 lockref_get_not_zero() main loop by adding build-time CMPXCHG8B support detection for the relevant lockref code, and by better interfacing the CMPXCHG8B assembly code with the compiler - Introduce arch_sync_try_cmpxchg() on x86 to improve sync_try_cmpxchg() code generation. Convert some sync_cmpxchg() users to sync_try_cmpxchg(). - Micro-optimize rcuref_put_slowpath() Locking debuggability improvements: - Improve CONFIG_DEBUG_RT_MUTEXES=y to have a fast-path as well - Enforce atomicity of sched_submit_work(), which is de-facto atomic but was un-enforced previously. - Extend <linux/cleanup.h>'s no_free_ptr() with __must_check semantics - Fix ww_mutex self-tests - Clean up const-propagation in <linux/seqlock.h> and simplify the API-instantiation macros a bit RT locking improvements: - Provide the rt_mutex_*_schedule() primitives/helpers and use them in the rtmutex code to avoid recursion vs. rtlock on the PI state. - Add nested blocking lockdep asserts to rt_mutex_lock(), rtlock_lock() and rwbase_read_lock() .. plus misc fixes & cleanups" * tag 'locking-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits) futex: Don't include process MM in futex key on no-MMU locking/seqlock: Fix grammar in comment alpha: Fix up new futex syscall numbers locking/seqlock: Propagate 'const' pointers within read-only methods, remove forced type casts locking/lockdep: Fix string sizing bug that triggers a format-truncation compiler-warning locking/seqlock: Change __seqprop() to return the function pointer locking/seqlock: Simplify SEQCOUNT_LOCKNAME() locking/atomics: Use atomic_try_cmpxchg_release() to micro-optimize rcuref_put_slowpath() locking/atomic, xen: Use sync_try_cmpxchg() instead of sync_cmpxchg() locking/atomic/x86: Introduce arch_sync_try_cmpxchg() locking/atomic: Add generic support for sync_try_cmpxchg() and its fallback locking/seqlock: Fix typo in comment futex/requeue: Remove unnecessary ‘NULL’ initialization from futex_proxy_trylock_atomic() locking/local, arch: Rewrite local_add_unless() as a static inline function locking/debug: Fix debugfs API return value checks to use IS_ERR() locking/ww_mutex/test: Make sure we bail out instead of livelock locking/ww_mutex/test: Fix potential workqueue corruption locking/ww_mutex/test: Use prng instead of rng to avoid hangs at bootup futex: Add sys_futex_requeue() futex: Add flags2 argument to futex_requeue() ...
2023-10-30Merge branch kvm-arm64/pmu_pmcr_n into kvmarm/nextOliver Upton1-0/+3
* kvm-arm64/pmu_pmcr_n: : User-defined PMC limit, courtesy Raghavendra Rao Ananta : : Certain VMMs may want to reserve some PMCs for host use while running a : KVM guest. This was a bit difficult before, as KVM advertised all : supported counters to the guest. Userspace can now limit the number of : advertised PMCs by writing to PMCR_EL0.N, as KVM's sysreg and PMU : emulation enforce the specified limit for handling guest accesses. KVM: selftests: aarch64: vPMU test for validating user accesses KVM: selftests: aarch64: vPMU register test for unimplemented counters KVM: selftests: aarch64: vPMU register test for implemented counters KVM: selftests: aarch64: Introduce vpmu_counter_access test tools: Import arm_pmuv3.h KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} KVM: arm64: PMU: Set PMCR_EL0.N for vCPU based on the associated PMU KVM: arm64: PMU: Add a helper to read a vCPU's PMCR_EL0 KVM: arm64: Select default PMU in KVM_ARM_VCPU_INIT handler KVM: arm64: PMU: Introduce helpers to set the guest's PMU Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/mops into kvmarm/nextOliver Upton2-3/+55
* kvm-arm64/mops: : KVM support for MOPS, courtesy of Kristina Martsenko : : MOPS adds new instructions for accelerating memcpy(), memset(), and : memmove() operations in hardware. This series brings virtualization : support for KVM guests, and allows VMs to run on asymmetrict systems : that may have different MOPS implementations. KVM: arm64: Expose MOPS instructions to guests KVM: arm64: Add handler for MOPS exceptions Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/writable-id-regs into kvmarm/nextOliver Upton2-0/+34
* kvm-arm64/writable-id-regs: : Writable ID registers, courtesy of Jing Zhang : : This series significantly expands the architectural feature set that : userspace can manipulate via the ID registers. A new ioctl is defined : that makes the mutable fields in the ID registers discoverable to : userspace. KVM: selftests: Avoid using forced target for generating arm64 headers tools headers arm64: Fix references to top srcdir in Makefile KVM: arm64: selftests: Test for setting ID register from usersapce tools headers arm64: Update sysreg.h with kernel sources KVM: selftests: Generate sysreg-defs.h and add to include path perf build: Generate arm64's sysreg-defs.h and add to include path tools: arm64: Add a Makefile for generating sysreg-defs.h KVM: arm64: Document vCPU feature selection UAPIs KVM: arm64: Allow userspace to change ID_AA64ZFR0_EL1 KVM: arm64: Allow userspace to change ID_AA64PFR0_EL1 KVM: arm64: Allow userspace to change ID_AA64MMFR{0-2}_EL1 KVM: arm64: Allow userspace to change ID_AA64ISAR{0-2}_EL1 KVM: arm64: Bump up the default KVM sanitised debug version to v8p8 KVM: arm64: Reject attempts to set invalid debug arch version KVM: arm64: Advertise selected DebugVer in DBGDIDR.Version KVM: arm64: Use guest ID register values for the sake of emulation KVM: arm64: Document KVM_ARM_GET_REG_WRITABLE_MASKS KVM: arm64: Allow userspace to get the writable masks for feature ID registers Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/sgi-injection into kvmarm/nextOliver Upton2-1/+29
* kvm-arm64/sgi-injection: : vSGI injection improvements + fixes, courtesy Marc Zyngier : : Avoid linearly searching for vSGI targets using a compressed MPIDR to : index a cache. While at it, fix some egregious bugs in KVM's mishandling : of vcpuid (user-controlled value) and vcpu_idx. KVM: arm64: Clarify the ordering requirements for vcpu/RD creation KVM: arm64: vgic-v3: Optimize affinity-based SGI injection KVM: arm64: Fast-track kvm_mpidr_to_vcpu() when mpidr_data is available KVM: arm64: Build MPIDR to vcpu index cache at runtime KVM: arm64: Simplify kvm_vcpu_get_mpidr_aff() KVM: arm64: Use vcpu_idx for invalidation tracking KVM: arm64: vgic: Use vcpu_idx for the debug information KVM: arm64: vgic-v2: Use cpuid from userspace as vcpu_id KVM: arm64: vgic-v3: Refactor GICv3 SGI generation KVM: arm64: vgic-its: Treat the collection target address as a vcpu_id KVM: arm64: vgic: Make kvm_vgic_inject_irq() take a vcpu pointer Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/stage2-vhe-load into kvmarm/nextOliver Upton4-17/+21
* kvm-arm64/stage2-vhe-load: : Setup stage-2 MMU from vcpu_load() for VHE : : Unlike nVHE, there is no need to switch the stage-2 MMU around on guest : entry/exit in VHE mode as the host is running at EL2. Despite this KVM : reloads the stage-2 on every guest entry, which is needless. : : This series moves the setup of the stage-2 MMU context to vcpu_load() : when running in VHE mode. This is likely to be a win across the board, : but also allows us to remove an ISB on the guest entry path for systems : with one of the speculative AT errata. KVM: arm64: Move VTCR_EL2 into struct s2_mmu KVM: arm64: Load the stage-2 MMU context in kvm_vcpu_load_vhe() KVM: arm64: Rename helpers for VHE vCPU load/put KVM: arm64: Reload stage-2 for VMID change on VHE KVM: arm64: Restore the stage-2 context in VHE's __tlb_switch_to_host() KVM: arm64: Don't zero VTTBR in __tlb_switch_to_host() Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/nv-trap-fixes into kvmarm/nextOliver Upton1-0/+45
* kvm-arm64/nv-trap-fixes: : NV trap forwarding fixes, courtesy Miguel Luis and Marc Zyngier : : - Explicitly define the effects of HCR_EL2.NV on EL2 sysregs in the : NV trap encoding : : - Make EL2 registers that access AArch32 guest state UNDEF or RAZ/WI : where appropriate for NV guests KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs KVM: arm64: Refine _EL2 system register list that require trap reinjection arm64: Add missing _EL2 encodings arm64: Add missing _EL12 encodings Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/smccc-filter-cleanups into kvmarm/nextOliver Upton1-3/+1
* kvm-arm64/smccc-filter-cleanups: : Cleanup the management of KVM's SMCCC maple tree : : Avoid the cost of maintaining the SMCCC filter maple tree if userspace : hasn't writen a rule to the filter. While at it, rip out the now : unnecessary VM flag to indicate whether or not the SMCCC filter was : configured. KVM: arm64: Use mtree_empty() to determine if SMCCC filter configured KVM: arm64: Only insert reserved ranges when SMCCC filter is used KVM: arm64: Add a predicate for testing if SMCCC filter is configured Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/feature-flag-refactor into kvmarm/nextOliver Upton3-12/+9
* kvm-arm64/feature-flag-refactor: : vCPU feature flag cleanup : : Clean up KVM's handling of vCPU feature flags to get rid of the : vCPU-scoped bitmaps and remove failure paths from kvm_reset_vcpu(). KVM: arm64: Get rid of vCPU-scoped feature bitmap KVM: arm64: Remove unused return value from kvm_reset_vcpu() KVM: arm64: Hoist NV+SVE check into KVM_ARM_VCPU_INIT ioctl handler KVM: arm64: Prevent NV feature flag on systems w/o nested virt KVM: arm64: Hoist PAuth checks into KVM_ARM_VCPU_INIT ioctl KVM: arm64: Hoist SVE check into KVM_ARM_VCPU_INIT ioctl handler KVM: arm64: Hoist PMUv3 check into KVM_ARM_VCPU_INIT ioctl handler KVM: arm64: Add generic check for system-supported vCPU features Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/misc into kvmarm/nextOliver Upton2-10/+35
* kvm-arm64/misc: : Miscellaneous updates : : - Put an upper bound on the number of I-cache invalidations by : cacheline to avoid soft lockups : : - Get rid of bogus refererence count transfer for THP mappings : : - Do a local TLB invalidation on permission fault race : : - Fixes for page_fault_test KVM selftest : : - Add a tracepoint for detecting MMIO instructions unsupported by KVM KVM: arm64: Add tracepoint for MMIO accesses where ISV==0 KVM: arm64: selftest: Perform ISB before reading PAR_EL1 KVM: arm64: selftest: Add the missing .guest_prepare() KVM: arm64: Always invalidate TLB for stage-2 permission faults KVM: arm64: Do not transfer page refcount for THP adjustment KVM: arm64: Avoid soft lockups due to I-cache maintenance arm64: tlbflush: Rename MAX_TLBI_OPS KVM: arm64: Don't use kerneldoc comment for arm64_check_features() Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-26Merge branch 'for-next/cpus_have_const_cap' into for-next/coreCatalin Marinas19-108/+190
* for-next/cpus_have_const_cap: (38 commits) : cpus_have_const_cap() removal arm64: Remove cpus_have_const_cap() arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_REPEAT_TLBI arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_NVIDIA_CARMEL_CNP arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_CAVIUM_23154 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_2645198 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1742098 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1542419 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_843419 arm64: Avoid cpus_have_const_cap() for ARM64_UNMAP_KERNEL_AT_EL0 arm64: Avoid cpus_have_const_cap() for ARM64_{SVE,SME,SME2,FA64} arm64: Avoid cpus_have_const_cap() for ARM64_SPECTRE_V2 arm64: Avoid cpus_have_const_cap() for ARM64_SSBS arm64: Avoid cpus_have_const_cap() for ARM64_MTE arm64: Avoid cpus_have_const_cap() for ARM64_HAS_TLB_RANGE arm64: Avoid cpus_have_const_cap() for ARM64_HAS_WFXT arm64: Avoid cpus_have_const_cap() for ARM64_HAS_RNG arm64: Avoid cpus_have_const_cap() for ARM64_HAS_EPAN arm64: Avoid cpus_have_const_cap() for ARM64_HAS_PAN arm64: Avoid cpus_have_const_cap() for ARM64_HAS_GIC_PRIO_MASKING arm64: Avoid cpus_have_const_cap() for ARM64_HAS_DIT ...
2023-10-26Merge branch 'for-next/feat_lse128' into for-next/coreCatalin Marinas2-0/+2
* for-next/feat_lse128: : HWCAP for FEAT_LSE128 kselftest/arm64: add FEAT_LSE128 to hwcap test arm64: add FEAT_LSE128 HWCAP
2023-10-26Merge branch 'for-next/feat_lrcpc3' into for-next/coreCatalin Marinas2-0/+2
* for-next/feat_lrcpc3: : HWCAP for FEAT_LRCPC3 selftests/arm64: add HWCAP2_LRCPC3 test arm64: add FEAT_LRCPC3 HWCAP
2023-10-26Merge branch 'for-next/feat_sve_b16b16' into for-next/coreCatalin Marinas2-0/+2
* for-next/feat_sve_b16b16: : Add support for FEAT_SVE_B16B16 (BFloat16) kselftest/arm64: Verify HWCAP2_SVE_B16B16 arm64/sve: Report FEAT_SVE_B16B16 to userspace
2023-10-26Merge branches 'for-next/sve-remove-pseudo-regs', 'for-next/backtrace-ipi', ↵Catalin Marinas9-33/+33
'for-next/kselftest', 'for-next/misc' and 'for-next/cpufeat-display-cores', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: perf: hisi: Fix use-after-free when register pmu fails drivers/perf: hisi_pcie: Initialize event->cpu only on success drivers/perf: hisi_pcie: Check the type first in pmu::event_init() perf/arm-cmn: Enable per-DTC counter allocation perf/arm-cmn: Rework DTC counters (again) perf/arm-cmn: Fix DTC domain detection drivers: perf: arm_pmuv3: Drop some unused arguments from armv8_pmu_init() drivers: perf: arm_pmuv3: Read PMMIR_EL1 unconditionally drivers/perf: hisi: use cpuhp_state_remove_instance_nocalls() for hisi_hns3_pmu uninit process drivers/perf: xgene: Use device_get_match_data() perf/amlogic: add missing MODULE_DEVICE_TABLE docs/perf: Add ampere_cspmu to toctree to fix a build warning perf: arm_cspmu: ampere_cspmu: Add support for Ampere SoC PMU perf: arm_cspmu: Support implementation specific validation perf: arm_cspmu: Support implementation specific filters perf: arm_cspmu: Split 64-bit write to 32-bit writes perf: arm_cspmu: Separate Arm and vendor module * for-next/sve-remove-pseudo-regs: : arm64/fpsimd: Remove the vector length pseudo registers arm64/sve: Remove SMCR pseudo register from cpufeature code arm64/sve: Remove ZCR pseudo register from cpufeature code * for-next/backtrace-ipi: : Add IPI for backtraces/kgdb, use NMI arm64: smp: Don't directly call arch_smp_send_reschedule() for wakeup arm64: smp: avoid NMI IPIs with broken MediaTek FW arm64: smp: Mark IPI globals as __ro_after_init arm64: kgdb: Implement kgdb_roundup_cpus() to enable pseudo-NMI roundup arm64: smp: IPI_CPU_STOP and IPI_CPU_CRASH_STOP should try for NMI arm64: smp: Add arch support for backtrace using pseudo-NMI arm64: smp: Remove dedicated wakeup IPI arm64: idle: Tag the arm64 idle functions as __cpuidle irqchip/gic-v3: Enable support for SGIs to act as NMIs * for-next/kselftest: : Various arm64 kselftest updates kselftest/arm64: Validate SVCR in streaming SVE stress test * for-next/misc: : Miscellaneous patches arm64: Restrict CPU_BIG_ENDIAN to GNU as or LLVM IAS 15.x or newer arm64: module: Fix PLT counting when CONFIG_RANDOMIZE_BASE=n arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helper clocksource/drivers/arm_arch_timer: limit XGene-1 workaround arm64: Remove system_uses_lse_atomics() arm64: Mark the 'addr' argument to set_ptes() and __set_pte_at() as unused arm64/mm: Hoist synchronization out of set_ptes() loop arm64: swiotlb: Reduce the default size if no ZONE_DMA bouncing needed * for-next/cpufeat-display-cores: : arm64 cpufeature display enabled cores arm64: cpufeature: Change DBM to display enabled cores arm64: cpufeature: Display the set of cores with a feature
2023-10-25KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WIMarc Zyngier1-0/+4
When trapping accesses from a NV guest that tries to access SPSR_{irq,abt,und,fiq}, make sure we handle them as RAZ/WI, as if AArch32 wasn't implemented. This involves a bit of repainting to make the visibility handler more generic. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231023095444.1587322-6-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-25arm64: Add missing _EL2 encodingsMiguel Luis1-0/+30
Some _EL2 encodings are missing. Add them. Signed-off-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> [maz: dropped secure encodings] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231023095444.1587322-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-25arm64: Add missing _EL12 encodingsMiguel Luis1-0/+11
Some _EL12 encodings are missing. Add them. Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Miguel Luis <miguel.luis@oracle.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231023095444.1587322-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-25KVM: arm64: PMU: Set PMCR_EL0.N for vCPU based on the associated PMURaghavendra Rao Ananta1-0/+3
The number of PMU event counters is indicated in PMCR_EL0.N. For a vCPU with PMUv3 configured, the value is set to the same value as the current PE on every vCPU reset. Unless the vCPU is pinned to PEs that has the PMU associated to the guest from the initial vCPU reset, the value might be different from the PMU's PMCR_EL0.N on heterogeneous PMU systems. Fix this by setting the vCPU's PMCR_EL0.N to the PMU's PMCR_EL0.N value. Track the PMCR_EL0.N per guest, as only one PMU can be set for the guest (PMCR_EL0.N must be the same for all vCPUs of the guest), and it is convenient for updating the value. To achieve this, the patch introduces a helper, kvm_arm_pmu_get_max_counters(), that reads the maximum number of counters from the arm_pmu associated to the VM. Make the function global as upcoming patches will be interested to know the value while setting the PMCR.N of the guest from userspace. KVM does not yet support userspace modifying PMCR_EL0.N. The following patch will add support for that. Reviewed-by: Sebastian Ott <sebott@redhat.com> Co-developed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Link: https://lore.kernel.org/r/20231020214053.2144305-5-rananta@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-23KVM: arm64: Move VTCR_EL2 into struct s2_mmuMarc Zyngier3-9/+16
We currently have a global VTCR_EL2 value for each guest, even if the guest uses NV. This implies that the guest's own S2 must fit in the host's. This is odd, for multiple reasons: - the PARange values and the number of IPA bits don't necessarily match: you can have 33 bits of IPA space, and yet you can only describe 32 or 36 bits of PARange - When userspace set the IPA space, it creates a contract with the kernel saying "this is the IPA space I'm prepared to handle". At no point does it constraint the guest's own IPA space as long as the guest doesn't try to use a [I]PA outside of the IPA space set by userspace - We don't even try to hide the value of ID_AA64MMFR0_EL1.PARange. And then there is the consequence of the above: if a guest tries to create a S2 that has for input address something that is larger than the IPA space defined by the host, we inject a fatal exception. This is no good. For all intent and purposes, a guest should be able to have the S2 it really wants, as long as the *output* address of that S2 isn't outside of the IPA space. For that, we need to have a per-s2_mmu VTCR_EL2 setting, which allows us to represent the full PARange. Move the vctr field into the s2_mmu structure, which has no impact whatsoever, except for NV. Note that once we are able to override ID_AA64MMFR0_EL1.PARange from userspace, we'll also be able to restrict the size of the shadow S2 that NV uses. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231012205108.3937270-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-23arm64: cpufeature: Display the set of cores with a featureJeremy Linton1-0/+2
The AMU feature can be enabled on a subset of the cores in a system. Because of that, it prints a message for each core as it is detected. This becomes tedious when there are hundreds of cores. Instead, for CPU features which can be enabled on a subset of the present cores, lets wait until update_cpu_capabilities() and print the subset of cores the feature was enabled on. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Tested-by: Ionela Voinescu <ionela.voinescu@arm.com> Reviewed-by: Punit Agrawal <punit.agrawal@bytedance.com> Tested-by: Punit Agrawal <punit.agrawal@bytedance.com> Link: https://lore.kernel.org/r/20231017052322.1211099-2-jeremy.linton@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-20KVM: arm64: Rename helpers for VHE vCPU load/putOliver Upton2-7/+4
The names for the helpers we expose to the 'generic' KVM code are a bit imprecise; we switch the EL0 + EL1 sysreg context and setup trap controls that do not need to change for every guest entry/exit. Rename + shuffle things around a bit in preparation for loading the stage-2 MMU context on vcpu_load(). Link: https://lore.kernel.org/r/20231018233212.2888027-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-20KVM: arm64: Reload stage-2 for VMID change on VHEMarc Zyngier1-1/+1
Naturally, a change to the VMID for an MMU implies a new value for VTTBR. Reload on VMID change in anticipation of loading stage-2 on vcpu_load() instead of every guest entry. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231018233212.2888027-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-18clocksource/drivers/arm_arch_timer: limit XGene-1 workaroundAndre Przywara1-1/+2
The AppliedMicro XGene-1 CPU has an erratum where the timer condition would only consider TVAL, not CVAL. We currently apply a workaround when seeing the PartNum field of MIDR_EL1 being 0x000, under the assumption that this would match only the XGene-1 CPU model. However even the Ampere eMAG (aka XGene-3) uses that same part number, and only differs in the "Variant" and "Revision" fields: XGene-1's MIDR is 0x500f0000, our eMAG reports 0x503f0002. Experiments show the latter doesn't show the faulty behaviour. Increase the specificity of the check to only consider partnum 0x000 and variant 0x00, to exclude the Ampere eMAG. Fixes: 012f18850452 ("clocksource/drivers/arm_arch_timer: Work around broken CVAL implementations") Reported-by: Ross Burton <ross.burton@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20231016153127.116101-1-andre.przywara@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-18arm64: Remove system_uses_lse_atomics()Gavin Shan1-8/+1
There are two variants of system_uses_lse_atomics(), depending on CONFIG_ARM64_LSE_ATOMICS. The function isn't called anywhere when CONFIG_ARM64_LSE_ATOMICS is disabled. It can be directly replaced by alternative_has_cap_likely(ARM64_HAS_LSE_ATOMICS) when the kernel option is enabled. No need to keep system_uses_lse_atomics() and just remove it. Signed-off-by: Gavin Shan <gshan@redhat.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20231017005036.334067-1-gshan@redhat.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-18arm64: Mark the 'addr' argument to set_ptes() and __set_pte_at() as unusedCatalin Marinas1-4/+5
This argument is not used by the arm64 implementation. Mark it as __always_unused and also remove the unnecessary 'addr' increment in set_ptes(). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310140531.BQQwt3NQ-lkp@intel.com/ Cc: Will Deacon <will@kernel.org> Tested-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/ZS6EvMiJ0QF5INkv@arm.com
2023-10-16arm64/mm: Hoist synchronization out of set_ptes() loopRyan Roberts2-12/+19
set_ptes() sets a physically contiguous block of memory (which all belongs to the same folio) to a contiguous block of ptes. The arm64 implementation of this previously just looped, operating on each individual pte. But the __sync_icache_dcache() and mte_sync_tags() operations can both be hoisted out of the loop so that they are performed once for the contiguous set of pages (which may be less than the whole folio). This should result in minor performance gains. __sync_icache_dcache() already acts on the whole folio, and sets a flag in the folio so that it skips duplicate calls. But by hoisting the call, all the pte testing is done only once. mte_sync_tags() operates on each individual page with its own loop. But by passing the number of pages explicitly, we can rely solely on its loop and do the checks only once. This approach also makes it robust for the future, rather than assuming if a head page of a compound page is being mapped, then the whole compound page is being mapped, instead we explicitly know how many pages are being mapped. The old assumption may not continue to hold once the "anonymous large folios" feature is merged. Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20231005140730.2191134-1-ryan.roberts@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Remove cpus_have_const_cap()Mark Rutland1-36/+2
There are no longer any users of cpus_have_const_cap(), and therefore it can be removed. Remove cpus_have_const_cap(). At the same time, remove __cpus_have_const_cap(), as this is a trivial wrapper of alternative_has_cap_unlikely(), which can be used directly instead. The comment for __system_matches_cap() is updated to no longer refer to cpus_have_const_cap(). As we have a number of ways to check the cpucaps, the specific suggestions are removed. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Kristina Martsenko <kristina.martsenko@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_REPEAT_TLBIMark Rutland2-3/+4
In arch_tlbbatch_should_defer() we use cpus_have_const_cap() to check for ARM64_WORKAROUND_REPEAT_TLBI, but this is not necessary and alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The cpus_have_const_cap() check in arch_tlbbatch_should_defer() is an optimization to avoid some redundant work when the ARM64_WORKAROUND_REPEAT_TLBI cpucap is detected and forces the immediate use of TLBI + DSB ISH. In the window between detecting the ARM64_WORKAROUND_REPEAT_TLBI cpucap and patching alternatives this is not a big concern and there's no need to optimize this window at the expsense of subsequent usage at runtime. This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. The ARM64_WORKAROUND_REPEAT_TLBI cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible without requiring ifdeffery or IS_ENABLED() checks at each usage. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_NVIDIA_CARMEL_CNPMark Rutland1-0/+2
In has_useable_cnp() we use cpus_have_const_cap() to check for ARM64_WORKAROUND_NVIDIA_CARMEL_CNP, but this is not necessary and cpus_have_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. We use has_useable_cnp() to determine whether we have the system-wide ARM64_HAS_CNP cpucap. Due to the structure of the cpufeature code, we call has_useable_cnp() in two distinct cases: 1) When finalizing system capabilities, setup_system_capabilities() will call has_useable_cnp() with SCOPE_SYSTEM to determine whether all CPUs have the feature. This is called after we've detected any local cpucaps including ARM64_WORKAROUND_NVIDIA_CARMEL_CNP, but prior to patching alternatives. If the ARM64_WORKAROUND_NVIDIA_CARMEL_CNP was detected, we will not detect ARM64_HAS_CNP. 2) After finalizing system capabilties, verify_local_cpu_capabilities() will call has_useable_cnp() with SCOPE_LOCAL_CPU to verify that CPUs have CNP if we previously detected it. Note that if ARM64_WORKAROUND_NVIDIA_CARMEL_CNP was detected, we will not have detected ARM64_HAS_CNP. For case 1 we must check the system_cpucaps bitmap as this occurs prior to patching the alternatives. For case 2 we'll only call has_useable_cnp() once per subsequent onlining of a CPU, and as this isn't a fast path it's not necessary to optimize for this case. This patch replaces the use of cpus_have_const_cap() with cpus_have_cap(), which will only generate the bitmap test and avoid generating an alternative sequence, resulting in slightly simpler annd smaller code being generated. The ARM64_WORKAROUND_NVIDIA_CARMEL_CNP cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_CAVIUM_23154Mark Rutland2-0/+10
In gic_read_iar() we use cpus_have_const_cap() to check for ARM64_WORKAROUND_CAVIUM_23154 but this is not necessary and alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_WORKAROUND_CAVIUM_23154 cpucap is detected and patched early on the boot CPU before the GICv3 driver is initialized and hence before gic_read_iar() is ever called. Thus it is not necessary to use cpus_have_const_cap(), and alternative_has_cap() is equivalent. In addition, arm64's gic_read_iar() lives in irq-gic-v3.c purely for historical reasons. It was originally added prior to 32-bit arm support in commit: 6d4e11c5e2e8cd54 ("irqchip/gicv3: Workaround for Cavium ThunderX erratum 23154") When support for 32-bit arm was added, 32-bit arm's gic_read_iar() implementation was placed in <asm/arch_gicv3.h>, but the arm64 version was kept within irq-gic-v3.c as it depended on a static key local to irq-gic-v3.c and it was easier to add ifdeffery, which is what we did in commit: 7936e914f7b0827c ("irqchip/gic-v3: Refactor the arm64 specific parts") Subsequently the static key was replaced with a cpucap in commit: a4023f682739439b ("arm64: Add hypervisor safe helper for checking constant capabilities") Since that commit there has been no need to keep arm64's gic_read_iar() in irq-gic-v3.c. This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. For consistency, move the arm64-specific gic_read_iar() implementation over to arm64's <asm/arch_gicv3.h>. The ARM64_WORKAROUND_CAVIUM_23154 cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_2645198Mark Rutland1-0/+2
We use cpus_have_const_cap() to check for ARM64_WORKAROUND_2645198 but this is not necessary and alternative_has_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_WORKAROUND_2645198 cpucap is detected and patched before any userspace translation table exist, and the workaround is only necessary when manipulating usrspace translation tables which are in use. Thus it is not necessary to use cpus_have_const_cap(), and alternative_has_cap() is equivalent. This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. The ARM64_WORKAROUND_2645198 cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible, and redundant IS_ENABLED() checks are removed. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1742098Mark Rutland1-0/+2
In elf_hwcap_fixup() we use cpus_have_const_cap() to check for ARM64_WORKAROUND_1742098, but this is not necessary and cpus_have_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_WORKAROUND_1742098 cpucap is detected and patched before elf_hwcap_fixup() can run, and hence it is not necessary to use cpus_have_const_cap(). We run cpus_have_const_cap() at most twice: once after finalizing system cpucaps, and potentially once more after detecting mismatched CPUs which support AArch32 at EL0. Due to this, it's not necessary to optimize for many calls to elf_hwcap_fixup(), and it's fine to use cpus_have_cap(). This patch replaces the use of cpus_have_const_cap() with cpus_have_cap(), which will only generate the bitmap test and avoid generating an alternative sequence, resulting in slightly simpler annd smaller code being generated. For consistenct with other cpucaps, the ARM64_WORKAROUND_1742098 cpucap is added to cpucap_is_possible() so that code can be elided when this is not possible. However, as we only define compat_elf_hwcap2 when CONFIG_COMPAT=y, some ifdeffery is still required within user_feature_fixup() to avoid build errors when CONFIG_COMPAT=n. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_843419Mark Rutland2-2/+3
In count_plts() and is_forbidden_offset_for_adrp() we use cpus_have_const_cap() to check for ARM64_WORKAROUND_843419, but this is not necessary and cpus_have_final_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. It's not possible to load a module in the window between detecting the ARM64_WORKAROUND_843419 cpucap and patching alternatives. The module VA range limits are initialized much later in module_init_limits() which is a subsys_initcall, and module loading cannot happen before this. Hence it's not necessary for count_plts() or is_forbidden_offset_for_adrp() to use cpus_have_const_cap(). This patch replaces the use of cpus_have_const_cap() with cpus_have_final_cap() which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. Using cpus_have_final_cap() clearly documents that we do not expect this code to run before cpucaps are finalized, and will make it easier to spot issues if code is changed in future to allow modules to be loaded earlier. The ARM64_WORKAROUND_843419 cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible, and redundant IS_ENABLED() checks are removed. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_UNMAP_KERNEL_AT_EL0Mark Rutland3-2/+4
In arm64_kernel_unmapped_at_el0() we use cpus_have_const_cap() to check for ARM64_UNMAP_KERNEL_AT_EL0, but this is only necessary so that arm64_get_bp_hardening_vector() and this_cpu_set_vectors() can run prior to alternatives being patched. Otherwise this is not necessary and alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_UNMAP_KERNEL_AT_EL0 cpucap is a system-wide feature that is detected and patched before any translation tables are created for userspace. In the window between detecting the ARM64_UNMAP_KERNEL_AT_EL0 cpucap and patching alternatives, most users of arm64_kernel_unmapped_at_el0() do not need to know that the cpucap has been detected: * As KVM is initialized after cpucaps are finalized, no usaef of arm64_kernel_unmapped_at_el0() in the KVM code is reachable during this window. * The arm64_mm_context_get() function in arch/arm64/mm/context.c is only called after the SMMU driver is brought up after alternatives have been patched. Thus this can safely use cpus_have_final_cap() or alternative_has_cap_*(). Similarly the asids_update_limit() function is called after alternatives have been patched as an arch_initcall, and this can safely use cpus_have_final_cap() or alternative_has_cap_*(). Similarly we do not expect an ASID rollover to occur between cpucaps being detected and patching alternatives. Thus set_reserved_asid_bits() can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The __tlbi_user() and __tlbi_user_level() macros are not used during this window, and only need to invalidate additional entries once userspace translation tables have been active on a CPU. Thus these can safely use alternative_has_cap_*(). * The xen_kernel_unmapped_at_usr() function is not used during this window as it is only used in a late_initcall. Thus this can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The arm64_get_meltdown_state() function is not used during this window. It only used by arm64_get_meltdown_state() and KVM code, both of which are only used after cpucaps have been finalized. Thus this can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The tls_thread_switch() uses arm64_kernel_unmapped_at_el0() as an optimization to avoid zeroing tpidrro_el0 when KPTI is enabled and this will be trampled by the KPTI trampoline. It doesn't matter if this continues to zero the register during the window between detecting the cpucap and patching alternatives, so this can safely use alternative_has_cap_*(). * The sdei_arch_get_entry_point() and do_sdei_event() functions aren't reachable at this time as the SDEI driver is registered later by acpi_init() -> acpi_ghes_init() -> sdei_init(), where acpi_init is a subsys_initcall. Thus these can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The uses under drivers/ aren't reachable at this time as the drivers are registered later: - TRBE is registered via module_init() - SMMUv3 is registred via module_driver() - SPE is registred via module_init() * The arm64_get_bp_hardening_vector() and this_cpu_set_vectors() functions need to run on boot CPUs prior to patching alternatives. As these are only called during the onlining of a CPU, it's fine to perform a system_cpucaps bitmap test using cpus_have_cap(). This patch modifies this_cpu_set_vectors() to use cpus_have_cap(), and replaced all other use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. The ARM64_UNMAP_KERNEL_AT_EL0 cpucap is added to cpucap_is_possible() so that code can be elided entirely when this is not possible. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_{SVE,SME,SME2,FA64}Mark Rutland1-4/+4
In system_supports_{sve,sme,sme2,fa64}() we use cpus_have_const_cap() to check for the relevant cpucaps, but this is only necessary so that sve_setup() and sme_setup() can run prior to alternatives being patched, and otherwise alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. All of system_supports_{sve,sme,sme2,fa64}() will return false prior to system cpucaps being detected. In the window between system cpucaps being detected and patching alternatives, we need system_supports_sve() and system_supports_sme() to run to initialize SVE and SME properties, but all other users of system_supports_{sve,sme,sme2,fa64}() don't depend on the relevant cpucap becoming true until alternatives are patched: * No KVM code runs until after alternatives are patched, and so this can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The cpuid_cpu_online() callback in arch/arm64/kernel/cpuinfo.c is registered later from cpuinfo_regs_init() as a device_initcall, and so this can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The entry, signal, and ptrace code isn't reachable until userspace has run, and so this can safely use cpus_have_final_cap() or alternative_has_cap_*(). * Currently perf_reg_validate() will un-reserve the PERF_REG_ARM64_VG pseudo-register before alternatives are patched, and before sve_setup() has run. If a sampling event is created early enough, this would allow perf_ext_reg_value() to sample (the as-yet uninitialized) thread_struct::vl[] prior to alternatives being patched. It would be preferable to defer this until alternatives are patched, and this can safely use alternative_has_cap_*(). * The context-switch code will run during this window as part of stop_machine() used during alternatives_patch_all(), and potentially for other work if other kernel threads are created early. No threads require the use of SVE/SME/SME2/FA64 prior to alternatives being patched, and it would be preferable for the related context-switch logic to take effect after alternatives are patched so that ths is guaranteed to see a consistent system-wide state (e.g. anything initialized by sve_setup() and sme_setup(). This can safely ues alternative_has_cap_*(). This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. The sve_setup() and sme_setup() functions are modified to use cpus_have_cap() directly so that they can observe the cpucaps being set prior to alternatives being patched. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_SPECTRE_V2Mark Rutland1-1/+1
In arm64_apply_bp_hardening() we use cpus_have_const_cap() to check for ARM64_SPECTRE_V2 , but this is not necessary and alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The cpus_have_const_cap() check in arm64_apply_bp_hardening() is intended to avoid the overhead of looking up and invoking a per-cpu function pointer when no branch predictor hardening is required. The arm64_apply_bp_hardening() function itself is called in two distinct flows: 1) When handling certain exceptions taken from EL0, where the PC could be a TTBR1 address and hence might have trained a branch predictor. As cpucaps are detected and alternatives are patched long before it is possible to execute userspace, it is not necessary to use cpus_have_const_cap() for these cases, and cpus_have_final_cap() or alternative_has_cap() would be preferable. 2) When switching between tasks in check_and_switch_context(). This can be called before cpucaps are detected and alternatives are patched, but this is long before the kernel mounts filesystems or accepts any input. At this stage the kernel hasn't loaded any secrets and there is no potential for hostile branch predictor training. Once cpucaps have been finalized and alternatives have been patched, switching tasks will invalidate any prior predictions. Hence it is not necessary to use cpus_have_const_cap() for this case. This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_MTEMark Rutland1-1/+1
In system_supports_mte() we use cpus_have_const_cap() to check for ARM64_MTE, but this is not necessary and cpus_have_final_boot_cap() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. The ARM64_MTE cpucap is a boot cpu feature which is detected and patched early on the boot CPU under smp_prepare_boot_cpu(). In the window between detecting the ARM64_MTE cpucap and patching alternatives, nothing depends on the ARM64_MTE cpucap: * The kasan_hw_tags_enabled() helper depends upon the kasan_flag_enabled static key, which is initialized later in kasan_init_hw_tags() after alternatives have been applied. * No KVM code is called during this window, and KVM is not initialized until after system cpucaps have been detected and patched. KVM code can safely use cpus_have_final_cap() or alternative_has_cap_*(). * We don't context-switch prior to patching boot alternatives, and thus mte_thread_switch() is not reachable during this window. Thus, we can safely use cpus_have_final_boot_cap() or alternative_has_cap_*() in the context-switch code. * IRQ and FIQ are masked during this window, and we can only take SError and Debug exceptions. SError exceptions are fatal at this point in time, and we do not expect to take Debug exceptions, thus: - It's fine to lave TCO set for exceptions taken during this window, and mte_disable_tco_entry() doesn't need to do anything. - We don't need to detect and report asynchronous tag cehck faults during this window, and neither mte_check_tfsr_entry() nor mte_check_tfsr_exit() need to do anything. Since we want to report any SErrors taken during thiw window, these cannot safely use cpus_have_final_boot_cap() or cpus_have_final_cap(), but these can safely use alternative_has_cap_*(). * The __set_pte_at() function is not used during this window. It is possible for this to be used on kernel mappings prior to boot cpucaps being finalized, so this cannot safely use cpus_have_final_boot_cap() or cpus_have_final_cap(), but this can safely use alternative_has_cap_*(). * No userspace translation tables have been created yet, and swap has not been initialized yet. Thus swapping is not possible and none of the following are called: - arch_thp_swp_supported() - arch_prepare_to_swap() - arch_swap_invalidate_page() - arch_swap_invalidate_area() - arch_swap_restore() These can safely use system_has_final_cap() or alternative_has_cap_*(). * The elfcore functions are only reachable after userspace is brought up, which happens after system cpucaps have been detected and patched. Thus the elfcore code can safely use cpus_have_final_cap() or alternative_has_cap_*(). * Hibernation is only possible after userspace is brought up, which happens after system cpucaps have been detected and patched. Thus the hibernate code can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The set_tagged_addr_ctrl() function is only reachable after userspace is brought up, which happens after system cpucaps have been detected and patched. Thus this can safely use cpus_have_final_cap() or alternative_has_cap_*(). * The copy_user_highpage() and copy_highpage() functions are not used during this window, and can safely use alternative_has_cap_*(). This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16arm64: Avoid cpus_have_const_cap() for ARM64_HAS_TLB_RANGEMark Rutland1-1/+1
We use cpus_have_const_cap() to check for ARM64_HAS_TLB_RANGE, but this is not necessary and alternative_has_cap_*() would be preferable. For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. In the window between detecting the ARM64_HAS_TLB_RANGE cpucap and patching alternative branches, we do not perform any TLB invalidation, and even if we were to perform TLB invalidation here it would not be functionally necessary to optimize this by using range invalidation. Hence there's no need to use cpus_have_const_cap(), and alternative_has_cap_unlikely() is sufficient. This patch replaces the use of cpus_have_const_cap() with alternative_has_cap_unlikely(), which will avoid generating code to test the system_cpucaps bitmap and should be better for all subsequent calls at runtime. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>