summaryrefslogtreecommitdiff
path: root/arch/arm64/include
AgeCommit message (Collapse)AuthorFilesLines
2025-08-03Merge tag 'rust-6.17' of ↵Linus Torvalds1-4/+29
git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux Pull Rust updates from Miguel Ojeda: "Toolchain and infrastructure: - Enable a set of Clippy lints: 'ptr_as_ptr', 'ptr_cast_constness', 'as_ptr_cast_mut', 'as_underscore', 'cast_lossless' and 'ref_as_ptr' These are intended to avoid type casts with the 'as' operator, which are quite powerful, into restricted variants that are less powerful and thus should help to avoid mistakes - Remove the 'author' key now that most instances were moved to the plural one in the previous cycle 'kernel' crate: - New 'bug' module: add 'warn_on!' macro which reuses the existing 'BUG'/'WARN' infrastructure, i.e. it respects the usual sysctls and kernel parameters: warn_on!(value == 42); To avoid duplicating the assembly code, the same strategy is followed as for the static branch code in order to share the assembly between both C and Rust This required a few rearrangements on C arch headers -- the existing C macros should still generate the same outputs, thus no functional change expected there - 'workqueue' module: add delayed work items, including a 'DelayedWork' struct, a 'impl_has_delayed_work!' macro and an 'enqueue_delayed' method, e.g.: /// Enqueue the struct for execution on the system workqueue, /// where its value will be printed 42 jiffies later. fn print_later(value: Arc<MyStruct>) { let _ = workqueue::system().enqueue_delayed(value, 42); } - New 'bits' module: add support for 'bit' and 'genmask' functions, with runtime- and compile-time variants, e.g.: static_assert!(0b00010000 == bit_u8(4)); static_assert!(0b00011110 == genmask_u8(1..=4)); assert!(checked_bit_u32(u32::BITS).is_none()); - 'uaccess' module: add 'UserSliceReader::strcpy_into_buf', which reads NUL-terminated strings from userspace into a '&CStr' Introduce 'UserPtr' newtype, similar in purpose to '__user' in C, to minimize mistakes handling userspace pointers, including mixing them up with integers and leaking them via the 'Debug' trait. Add it to the prelude, too - Start preparations for the replacement of our custom 'CStr' type with the analogous type in the 'core' standard library. This will take place across several cycles to make it easier. For this one, it includes a new 'fmt' module, using upstream method names and some other cleanups Replace 'fmt!' with a re-export, which helps Clippy lint properly, and clean up the found 'uninlined-format-args' instances - 'dma' module: - Clarify wording and be consistent in 'coherent' nomenclature - Convert the 'read!()' and 'write!()' macros to return a 'Result' - Add 'as_slice()', 'write()' methods in 'CoherentAllocation' - Expose 'count()' and 'size()' in 'CoherentAllocation' and add the corresponding type invariants - Implement 'CoherentAllocation::dma_handle_with_offset()' - 'time' module: - Make 'Instant' generic over clock source. This allows the compiler to assert that arithmetic expressions involving the 'Instant' use 'Instants' based on the same clock source - Make 'HrTimer' generic over the timer mode. 'HrTimer' timers take a 'Duration' or an 'Instant' when setting the expiry time, depending on the timer mode. With this change, the compiler can check the type matches the timer mode - Add an abstraction for 'fsleep'. 'fsleep' is a flexible sleep function that will select an appropriate sleep method depending on the requested sleep time - Avoid 64-bit divisions on 32-bit hardware when calculating timestamps - Seal the 'HrTimerMode' trait. This prevents users of the 'HrTimerMode' from implementing the trait on their own types - Pass the correct timer mode ID to 'hrtimer_start_range_ns()' - 'list' module: remove 'OFFSET' constants, allowing to remove pointer arithmetic; now 'impl_list_item!' invokes 'impl_has_list_links!' or 'impl_has_list_links_self_ptr!'. Other simplifications too - 'types' module: remove 'ForeignOwnable::PointedTo' in favor of a constant, which avoids exposing the type of the opaque pointer, and require 'into_foreign' to return non-null Remove the 'Either<L, R>' type as well. It is unused, and we want to encourage the use of custom enums for concrete use cases - 'sync' module: implement 'Borrow' and 'BorrowMut' for 'Arc' types to allow them to be used in generic APIs - 'alloc' module: implement 'Borrow' and 'BorrowMut' for 'Box<T, A>'; and 'Borrow', 'BorrowMut' and 'Default' for 'Vec<T, A>' - 'Opaque' type: add 'cast_from' method to perform a restricted cast that cannot change the inner type and use it in callers of 'container_of!'. Rename 'raw_get' to 'cast_into' to match it - 'rbtree' module: add 'is_empty' method - 'sync' module: new 'aref' submodule to hold 'AlwaysRefCounted' and 'ARef', which are moved from the too general 'types' module which we want to reduce or eventually remove. Also fix a safety comment in 'static_lock_class' 'pin-init' crate: - Add 'impl<T, E> [Pin]Init<T, E> for Result<T, E>', so results are now (pin-)initializers - Add 'Zeroable::init_zeroed()' that delegates to 'init_zeroed()' - New 'zeroed()', a safe version of 'mem::zeroed()' and also provide it via 'Zeroable::zeroed()' - Implement 'Zeroable' for 'Option<&T>', 'Option<&mut T>' and for 'Option<[unsafe] [extern "abi"] fn(...args...) -> ret>' for '"Rust"' and '"C"' ABIs and up to 20 arguments - Changed blanket impls of 'Init' and 'PinInit' from 'impl<T, E> [Pin]Init<T, E> for T' to 'impl<T> [Pin]Init<T> for T' - Renamed 'zeroed()' to 'init_zeroed()' - Upstream dev news: improve CI more to deny warnings, use '--all-targets'. Check the synchronization status of the two '-next' branches in upstream and the kernel MAINTAINERS: - Add Vlastimil Babka, Liam R. Howlett, Uladzislau Rezki and Lorenzo Stoakes as reviewers (thanks everyone) And a few other cleanups and improvements" * tag 'rust-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (76 commits) rust: Add warn_on macro arm64/bug: Add ARCH_WARN_ASM macro for BUG/WARN asm code sharing with Rust riscv/bug: Add ARCH_WARN_ASM macro for BUG/WARN asm code sharing with Rust x86/bug: Add ARCH_WARN_ASM macro for BUG/WARN asm code sharing with Rust rust: kernel: move ARef and AlwaysRefCounted to sync::aref rust: sync: fix safety comment for `static_lock_class` rust: types: remove `Either<L, R>` rust: kernel: use `core::ffi::CStr` method names rust: str: add `CStr` methods matching `core::ffi::CStr` rust: str: remove unnecessary qualification rust: use `kernel::{fmt,prelude::fmt!}` rust: kernel: add `fmt` module rust: kernel: remove `fmt!`, fix clippy::uninlined-format-args scripts: rust: emit path candidates in panic message scripts: rust: replace length checks with match rust: list: remove nonexistent generic parameter in link rust: bits: add support for bits/genmask macros rust: list: remove OFFSET constants rust: list: add `impl_list_item!` examples rust: list: use fully qualified path ...
2025-08-02Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds1-0/+7
Pull bpf fixes from Alexei Starovoitov: - Fix kCFI failures in JITed BPF code on arm64 (Sami Tolvanen, Puranjay Mohan, Mark Rutland, Maxwell Bland) - Disallow tail calls between BPF programs that use different cgroup local storage maps to prevent out-of-bounds access (Daniel Borkmann) - Fix unaligned access in flow_dissector and netfilter BPF programs (Paul Chaignon) - Avoid possible use of uninitialized mod_len in libbpf (Achill Gilgenast) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Test for unaligned flow_dissector ctx access bpf: Improve ctx access verifier error message bpf: Check netfilter ctx accesses are aligned bpf: Check flow_dissector ctx accesses are aligned arm64/cfi,bpf: Support kCFI + BPF on arm64 cfi: Move BPF CFI types and helpers to generic code cfi: add C CFI type macro libbpf: Avoid possible use of uninitialized mod_len bpf: Fix oob access in cgroup local storage bpf: Move cgroup iterator helpers to bpf.h bpf: Move bpf map owner out of common struct bpf: Add cookie object to bpf maps
2025-08-01arm64/cfi,bpf: Support kCFI + BPF on arm64Puranjay Mohan1-0/+7
Currently, bpf_dispatcher_*_func() is marked with `__nocfi` therefore calling BPF programs from this interface doesn't cause CFI warnings. When BPF programs are called directly from C: from BPF helpers or struct_ops, CFI warnings are generated. Implement proper CFI prologues for the BPF programs and callbacks and drop __nocfi for arm64. Fix the trampoline generation code to emit kCFI prologue when a struct_ops trampoline is being prepared. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Co-developed-by: Maxwell Bland <mbland@motorola.com> Signed-off-by: Maxwell Bland <mbland@motorola.com> Co-developed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Dao Huang <huangdao1@oppo.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250801001004.1859976-8-samitolvanen@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-01Merge tag 'mm-stable-2025-07-30-15-25' of ↵Linus Torvalds4-41/+23
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "As usual, many cleanups. The below blurbiage describes 42 patchsets. 21 of those are partially or fully cleanup work. "cleans up", "cleanup", "maintainability", "rationalizes", etc. I never knew the MM code was so dirty. "mm: ksm: prevent KSM from breaking merging of new VMAs" (Lorenzo Stoakes) addresses an issue with KSM's PR_SET_MEMORY_MERGE mode: newly mapped VMAs were not eligible for merging with existing adjacent VMAs. "mm/damon: introduce DAMON_STAT for simple and practical access monitoring" (SeongJae Park) adds a new kernel module which simplifies the setup and usage of DAMON in production environments. "stop passing a writeback_control to swap/shmem writeout" (Christoph Hellwig) is a cleanup to the writeback code which removes a couple of pointers from struct writeback_control. "drivers/base/node.c: optimization and cleanups" (Donet Tom) contains largely uncorrelated cleanups to the NUMA node setup and management code. "mm: userfaultfd: assorted fixes and cleanups" (Tal Zussman) does some maintenance work on the userfaultfd code. "Readahead tweaks for larger folios" (Ryan Roberts) implements some tuneups for pagecache readahead when it is reading into order>0 folios. "selftests/mm: Tweaks to the cow test" (Mark Brown) provides some cleanups and consistency improvements to the selftests code. "Optimize mremap() for large folios" (Dev Jain) does that. A 37% reduction in execution time was measured in a memset+mremap+munmap microbenchmark. "Remove zero_user()" (Matthew Wilcox) expunges zero_user() in favor of the more modern memzero_page(). "mm/huge_memory: vmf_insert_folio_*() and vmf_insert_pfn_pud() fixes" (David Hildenbrand) addresses some warts which David noticed in the huge page code. These were not known to be causing any issues at this time. "mm/damon: use alloc_migrate_target() for DAMOS_MIGRATE_{HOT,COLD" (SeongJae Park) provides some cleanup and consolidation work in DAMON. "use vm_flags_t consistently" (Lorenzo Stoakes) uses vm_flags_t in places where we were inappropriately using other types. "mm/memfd: Reserve hugetlb folios before allocation" (Vivek Kasireddy) increases the reliability of large page allocation in the memfd code. "mm: Remove pXX_devmap page table bit and pfn_t type" (Alistair Popple) removes several now-unneeded PFN_* flags. "mm/damon: decouple sysfs from core" (SeongJae Park) implememnts some cleanup and maintainability work in the DAMON sysfs layer. "madvise cleanup" (Lorenzo Stoakes) does quite a lot of cleanup/maintenance work in the madvise() code. "madvise anon_name cleanups" (Vlastimil Babka) provides additional cleanups on top or Lorenzo's effort. "Implement numa node notifier" (Oscar Salvador) creates a standalone notifier for NUMA node memory state changes. Previously these were lumped under the more general memory on/offline notifier. "Make MIGRATE_ISOLATE a standalone bit" (Zi Yan) cleans up the pageblock isolation code and fixes a potential issue which doesn't seem to cause any problems in practice. "selftests/damon: add python and drgn based DAMON sysfs functionality tests" (SeongJae Park) adds additional drgn- and python-based DAMON selftests which are more comprehensive than the existing selftest suite. "Misc rework on hugetlb faulting path" (Oscar Salvador) fixes a rather obscure deadlock in the hugetlb fault code and follows that fix with a series of cleanups. "cma: factor out allocation logic from __cma_declare_contiguous_nid" (Mike Rapoport) rationalizes and cleans up the highmem-specific code in the CMA allocator. "mm/migration: rework movable_ops page migration (part 1)" (David Hildenbrand) provides cleanups and future-preparedness to the migration code. "mm/damon: add trace events for auto-tuned monitoring intervals and DAMOS quota" (SeongJae Park) adds some tracepoints to some DAMON auto-tuning code. "mm/damon: fix misc bugs in DAMON modules" (SeongJae Park) does that. "mm/damon: misc cleanups" (SeongJae Park) also does what it claims. "mm: folio_pte_batch() improvements" (David Hildenbrand) cleans up the large folio PTE batching code. "mm/damon/vaddr: Allow interleaving in migrate_{hot,cold} actions" (SeongJae Park) facilitates dynamic alteration of DAMON's inter-node allocation policy. "Remove unmap_and_put_page()" (Vishal Moola) provides a couple of page->folio conversions. "mm: per-node proactive reclaim" (Davidlohr Bueso) implements a per-node control of proactive reclaim - beyond the current memcg-based implementation. "mm/damon: remove damon_callback" (SeongJae Park) replaces the damon_callback interface with a more general and powerful damon_call()+damos_walk() interface. "mm/mremap: permit mremap() move of multiple VMAs" (Lorenzo Stoakes) implements a number of mremap cleanups (of course) in preparation for adding new mremap() functionality: newly permit the remapping of multiple VMAs when the user is specifying MREMAP_FIXED. It still excludes some specialized situations where this cannot be performed reliably. "drop hugetlb_free_pgd_range()" (Anthony Yznaga) switches some sparc hugetlb code over to the generic version and removes the thus-unneeded hugetlb_free_pgd_range(). "mm/damon/sysfs: support periodic and automated stats update" (SeongJae Park) augments the present userspace-requested update of DAMON sysfs monitoring files. Automatic update is now provided, along with a tunable to control the update interval. "Some randome fixes and cleanups to swapfile" (Kemeng Shi) does what is claims. "mm: introduce snapshot_page" (Luiz Capitulino and David Hildenbrand) provides (and uses) a means by which debug-style functions can grab a copy of a pageframe and inspect it locklessly without tripping over the races inherent in operating on the live pageframe directly. "use per-vma locks for /proc/pid/maps reads" (Suren Baghdasaryan) addresses the large contention issues which can be triggered by reads from that procfs file. Latencies are reduced by more than half in some situations. The series also introduces several new selftests for the /proc/pid/maps interface. "__folio_split() clean up" (Zi Yan) cleans up __folio_split()! "Optimize mprotect() for large folios" (Dev Jain) provides some quite large (>3x) speedups to mprotect() when dealing with large folios. "selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));" and some cleanup" (wang lian) does some cleanup work in the selftests code. "tools/testing: expand mremap testing" (Lorenzo Stoakes) extends the mremap() selftest in several ways, including adding more checking of Lorenzo's recently added "permit mremap() move of multiple VMAs" feature. "selftests/damon/sysfs.py: test all parameters" (SeongJae Park) extends the DAMON sysfs interface selftest so that it tests all possible user-requested parameters. Rather than the present minimal subset" * tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (370 commits) MAINTAINERS: add missing headers to mempory policy & migration section MAINTAINERS: add missing file to cgroup section MAINTAINERS: add MM MISC section, add missing files to MISC and CORE MAINTAINERS: add missing zsmalloc file MAINTAINERS: add missing files to page alloc section MAINTAINERS: add missing shrinker files MAINTAINERS: move memremap.[ch] to hotplug section MAINTAINERS: add missing mm_slot.h file THP section MAINTAINERS: add missing interval_tree.c to memory mapping section MAINTAINERS: add missing percpu-internal.h file to per-cpu section mm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info() selftests/damon: introduce _common.sh to host shared function selftests/damon/sysfs.py: test runtime reduction of DAMON parameters selftests/damon/sysfs.py: test non-default parameters runtime commit selftests/damon/sysfs.py: generalize DAMON context commit assertion selftests/damon/sysfs.py: generalize monitoring attributes commit assertion selftests/damon/sysfs.py: generalize DAMOS schemes commit assertion selftests/damon/sysfs.py: test DAMOS filters commitment selftests/damon/sysfs.py: generalize DAMOS scheme commit assertion selftests/damon/sysfs.py: test DAMOS destinations commitment ...
2025-07-31Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds9-10/+242
Pull kvm updates from Paolo Bonzini: "ARM: - Host driver for GICv5, the next generation interrupt controller for arm64, including support for interrupt routing, MSIs, interrupt translation and wired interrupts - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on GICv5 hardware, leveraging the legacy VGIC interface - Userspace control of the 'nASSGIcap' GICv3 feature, allowing userspace to disable support for SGIs w/o an active state on hardware that previously advertised it unconditionally - Map supporting endpoints with cacheable memory attributes on systems with FEAT_S2FWB and DIC where KVM no longer needs to perform cache maintenance on the address range - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest hypervisor to inject external aborts into an L2 VM and take traps of masked external aborts to the hypervisor - Convert more system register sanitization to the config-driven implementation - Fixes to the visibility of EL2 registers, namely making VGICv3 system registers accessible through the VGIC device instead of the ONE_REG vCPU ioctls - Various cleanups and minor fixes LoongArch: - Add stat information for in-kernel irqchip - Add tracepoints for CPUCFG and CSR emulation exits - Enhance in-kernel irqchip emulation - Various cleanups RISC-V: - Enable ring-based dirty memory tracking - Improve perf kvm stat to report interrupt events - Delegate illegal instruction trap to VS-mode - MMU improvements related to upcoming nested virtualization s390x - Fixes x86: - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC, and PIT emulation at compile time - Share device posted IRQ code between SVM and VMX and harden it against bugs and runtime errors - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1) instead of O(n) - For MMIO stale data mitigation, track whether or not a vCPU has access to (host) MMIO based on whether the page tables have MMIO pfns mapped; using VFIO is prone to false negatives - Rework the MSR interception code so that the SVM and VMX APIs are more or less identical - Recalculate all MSR intercepts from scratch on MSR filter changes, instead of maintaining shadow bitmaps - Advertise support for LKGS (Load Kernel GS base), a new instruction that's loosely related to FRED, but is supported and enumerated independently - Fix a user-triggerable WARN that syzkaller found by setting the vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting the vCPU into VMX Root Mode (post-VMXON). Trying to detect every possible path leading to architecturally forbidden states is hard and even risks breaking userspace (if it goes from valid to valid state but passes through invalid states), so just wait until KVM_RUN to detect that the vCPU state isn't allowed - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of APERF/MPERF reads, so that a "properly" configured VM can access APERF/MPERF. This has many caveats (APERF/MPERF cannot be zeroed on vCPU creation or saved/restored on suspend and resume, or preserved over thread migration let alone VM migration) but can be useful whenever you're interested in letting Linux guests see the effective physical CPU frequency in /proc/cpuinfo - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been created, as there's no known use case for changing the default frequency for other VM types and it goes counter to the very reason why the ioctl was added to the vm file descriptor. And also, there would be no way to make it work for confidential VMs with a "secure" TSC, so kill two birds with one stone - Dynamically allocation the shadow MMU's hashed page list, and defer allocating the hashed list until it's actually needed (the TDP MMU doesn't use the list) - Extract many of KVM's helpers for accessing architectural local APIC state to common x86 so that they can be shared by guest-side code for Secure AVIC - Various cleanups and fixes x86 (Intel): - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest. Failure to honor FREEZE_IN_SMM can leak host state into guests - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF x86 (AMD): - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the nested SVM MSRPM offsets tracker can't handle an MSR (which is pretty much a static condition and therefore should never happen, but still) - Fix a variety of flaws and bugs in the AVIC device posted IRQ code - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware supports) instead of rejecting vCPU creation - Extend enable_ipiv module param support to SVM, by simply leaving IsRunning clear in the vCPU's physical ID table entry - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by erratum #1235, to allow (safely) enabling AVIC on such CPUs - Request GA Log interrupts if and only if the target vCPU is blocking, i.e. only if KVM needs a notification in order to wake the vCPU - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the vCPU's CPUID model - Accept any SNP policy that is accepted by the firmware with respect to SMT and single-socket restrictions. An incompatible policy doesn't put the kernel at risk in any way, so there's no reason for KVM to care - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and use WBNOINVD instead of WBINVD when possible for SEV cache maintenance - When reclaiming memory from an SEV guest, only do cache flushes on CPUs that have ever run a vCPU for the guest, i.e. don't flush the caches for CPUs that can't possibly have cache lines with dirty, encrypted data Generic: - Rework irqbypass to track/match producers and consumers via an xarray instead of a linked list. Using a linked list leads to O(n^2) insertion times, which is hugely problematic for use cases that create large numbers of VMs. Such use cases typically don't actually use irqbypass, but eliminating the pointless registration is a future problem to solve as it likely requires new uAPI - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *", to avoid making a simple concept unnecessarily difficult to understand - Decouple device posted IRQs from VFIO device assignment, as binding a VM to a VFIO group is not a requirement for enabling device posted IRQs - Clean up and document/comment the irqfd assignment code - Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e. ensure an eventfd is bound to at most one irqfd through the entire host, and add a selftest to verify eventfd:irqfd bindings are globally unique - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues related to private <=> shared memory conversions - Drop guest_memfd's .getattr() implementation as the VFS layer will call generic_fillattr() if inode_operations.getattr is NULL - Fix issues with dirty ring harvesting where KVM doesn't bound the processing of entries in any way, which allows userspace to keep KVM in a tight loop indefinitely - Kill off kvm_arch_{start,end}_assignment() and x86's associated tracking, now that KVM no longer uses assigned_device_count as a heuristic for either irqbypass usage or MDS mitigation Selftests: - Fix a comment typo - Verify KVM is loaded when getting any KVM module param so that attempting to run a selftest without kvm.ko loaded results in a SKIP message about KVM not being loaded/enabled (versus some random parameter not existing) - Skip tests that hit EACCES when attempting to access a file, and print a "Root required?" help message. In most cases, the test just needs to be run with elevated permissions" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits) Documentation: KVM: Use unordered list for pre-init VGIC registers RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map() RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs RISC-V: perf/kvm: Add reporting of interrupt events RISC-V: KVM: Enable ring-based dirty memory tracking RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap RISC-V: KVM: Delegate illegal instruction fault to VS mode RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs RISC-V: KVM: Factor-out g-stage page table management RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence RISC-V: KVM: Introduce struct kvm_gstage_mapping RISC-V: KVM: Factor-out MMU related declarations into separate headers RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect() RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range() RISC-V: KVM: Don't flush TLB when PTE is unchanged RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize() RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init() RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list ...
2025-07-30Merge tag 'arm64-upstream' of ↵Linus Torvalds19-68/+173
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "A quick summary: perf support for Branch Record Buffer Extensions (BRBE), typical PMU hardware updates, small additions to MTE for store-only tag checking and exposing non-address bits to signal handlers, HAVE_LIVEPATCH enabled on arm64, VMAP_STACK forced on. There is also a TLBI optimisation on hardware that does not require break-before-make when changing the user PTEs between contiguous and non-contiguous. More details: Perf and PMU updates: - Add support for new (v3) Hisilicon SLLC and DDRC PMUs - Add support for Arm-NI PMU integrations that share interrupts between clock domains within a given instance - Allow SPE to be configured with a lower sample period than the minimum recommendation advertised by PMSIDR_EL1.Interval - Add suppport for Arm's "Branch Record Buffer Extension" (BRBE) - Adjust the perf watchdog period according to cpu frequency changes - Minor driver fixes and cleanups Hardware features: - Support for MTE store-only checking (FEAT_MTE_STORE_ONLY) - Support for reporting the non-address bits during a synchronous MTE tag check fault (FEAT_MTE_TAGGED_FAR) - Optimise the TLBI when folding/unfolding contiguous PTEs on hardware with FEAT_BBM (break-before-make) level 2 and no TLB conflict aborts Software features: - Enable HAVE_LIVEPATCH after implementing arch_stack_walk_reliable() and using the text-poke API for late module relocations - Force VMAP_STACK always on and change arm64_efi_rt_init() to use arch_alloc_vmap_stack() in order to avoid KASAN false positives ACPI: - Improve SPCR handling and messaging on systems lacking an SPCR table Debug: - Simplify the debug exception entry path - Drop redundant DBG_MDSCR_* macros Kselftests: - Cleanups and improvements for SME, SVE and FPSIMD tests Miscellaneous: - Optimise loop to reduce redundant operations in contpte_ptep_get() - Remove ISB when resetting POR_EL0 during signal handling - Mark the kernel as tainted on SEA and SError panic - Remove redundant gcs_free() call" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits) arm64/gcs: task_gcs_el0_enable() should use passed task arm64: Kconfig: Keep selects somewhat alphabetically ordered arm64: signal: Remove ISB when resetting POR_EL0 kselftest/arm64: Handle attempts to disable SM on SME only systems kselftest/arm64: Fix SVE write data generation for SME only systems kselftest/arm64: Test SME on SME only systems in fp-ptrace kselftest/arm64: Test FPSIMD format data writes via NT_ARM_SVE in fp-ptrace kselftest/arm64: Allow sve-ptrace to run on SME only systems arm64/mm: Drop redundant addr increment in set_huge_pte_at() kselftest/arm4: Provide local defines for AT_HWCAP3 arm64: Mark kernel as tainted on SAE and SError panic arm64/gcs: Don't call gcs_free() when releasing task_struct drivers/perf: hisi: Support PMUs with no interrupt drivers/perf: hisi: Relax the event number check of v2 PMUs drivers/perf: hisi: Add support for HiSilicon SLLC v3 PMU driver drivers/perf: hisi: Use ACPI driver_data to retrieve SLLC PMU information drivers/perf: hisi: Add support for HiSilicon DDRC v3 PMU driver drivers/perf: hisi: Simplify the probe process for each DDRC version perf/arm-ni: Support sharing IRQs within an NI instance perf/arm-ni: Consolidate CPU affinity handling ...
2025-07-30Merge tag 'timers-ptp-2025-07-27' of ↵Linus Torvalds1-4/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timekeeping and VDSO updates from Thomas Gleixner: - Introduce support for auxiliary timekeepers PTP clocks can be disconnected from the universal CLOCK_TAI reality for various reasons including regularatory requirements for functional safety redundancy. The kernel so far only supports a single notion of time, which means that all clocks are correlated in frequency and only differ by offset to each other. Access to non-correlated PTP clocks has been available so far only through the file descriptor based "POSIX clock IDs", which are subject to locking and have to go all the way out to the hardware. The access is not only horribly slow, as it has to go all the way out to the NIC/PTP hardware, but that also prevents the kernel to read the time of such clocks e.g. from the network stack, where it is required for TSN networking both on the transmit and receive side unless the hardware provides offloading. The auxiliary clocks provide a mechanism to support arbitrary clocks which are not correlated to the system clock. This is not restricted to the PTP use case on purpose as there is no kernel side association of these clocks to a particular PTP device because that's a pure user space configuration decision. Having them independent allows to utilize them for other purposes and also enables them to be tested without hardware dependencies. To avoid pointless overhead these clocks have to be enabled individualy via a new sysfs interface to reduce the overhead to a single compare in the hotpath if they are enabled at the Kconfig level at all. These clocks utilize the existing timekeeping/NTP infrastructures, which has been made possible over the recent releases by incrementaly converting these infrastructures over from a single static instance to a multi-instance pointer based implementation without any performance regression reported. The auxiliary clocks provide the same "emulation" of a "correct" clock as the existing CLOCK_* variants do with an independent instance of data and provide the same steering mechanism through the existing sys_clock_adjtime() interface, which has been confirmed to work by the chronyd(8) maintainer. That allows to provide lockless kernel internal and VDSO support so that applications and kernel internal functionalities can access these clocks without restrictions and at the same performance as the existing system clocks. - Avoid double notifications in the adjtimex() syscall. Not a big issue, but a trivial to avoid latency source. * tag 'timers-ptp-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits) vdso/gettimeofday: Add support for auxiliary clocks vdso/vsyscall: Update auxiliary clock data in the datapage vdso: Introduce aux_clock_resolution_ns() vdso/gettimeofday: Introduce vdso_get_timestamp() vdso/gettimeofday: Introduce vdso_set_timespec() vdso/gettimeofday: Introduce vdso_clockid_valid() vdso/gettimeofday: Return bool from clock_gettime() helpers vdso/gettimeofday: Return bool from clock_getres() helpers vdso/helpers: Add helpers for seqlocks of single vdso_clock vdso/vsyscall: Split up __arch_update_vsyscall() into __arch_update_vdso_clock() vdso/vsyscall: Introduce a helper to fill clock configurations timekeeping: Remove the temporary CLOCK_AUX workaround timekeeping: Provide ktime_get_clock_ts64() timekeeping: Provide interface to control auxiliary clocks timekeeping: Provide update for auxiliary timekeepers timekeeping: Provide adjtimex() for auxiliary clocks timekeeping: Prepare do_adtimex() for auxiliary clocks timekeeping: Make do_adjtimex() reusable timekeeping: Add auxiliary clock support to __timekeeping_inject_offset() timekeeping: Make timekeeping_inject_offset() reusable ...
2025-07-29Merge tag 'driver-core-6.17-rc1' of ↵Linus Torvalds1-0/+17
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core updates from Danilo Krummrich: "debugfs: - Remove unneeded debugfs_file_{get,put}() instances - Remove last remnants of debugfs_real_fops() - Allow storing non-const void * in struct debugfs_inode_info::aux sysfs: - Switch back to attribute_group::bin_attrs (treewide) - Switch back to bin_attribute::read()/write() (treewide) - Constify internal references to 'struct bin_attribute' Support cache-ids for device-tree systems: - Add arch hook arch_compact_of_hwid() - Use arch_compact_of_hwid() to compact MPIDR values on arm64 Rust: - Device: - Introduce CoreInternal device context (for bus internal methods) - Provide generic drvdata accessors for bus devices - Provide Driver::unbind() callbacks - Use the infrastructure above for auxiliary, PCI and platform - Implement Device::as_bound() - Rename Device::as_ref() to Device::from_raw() (treewide) - Implement fwnode and device property abstractions - Implement example usage in the Rust platform sample driver - Devres: - Remove the inner reference count (Arc) and use pin-init instead - Replace Devres::new_foreign_owned() with devres::register() - Require T to be Send in Devres<T> - Initialize the data kept inside a Devres last - Provide an accessor for the Devres associated Device - Device ID: - Add support for ACPI device IDs and driver match tables - Split up generic device ID infrastructure - Use generic device ID infrastructure in net::phy - DMA: - Implement the dma::Device trait - Add DMA mask accessors to dma::Device - Implement dma::Device for PCI and platform devices - Use DMA masks from the DMA sample module - I/O: - Implement abstraction for resource regions (struct resource) - Implement resource-based ioremap() abstractions - Provide platform device accessors for I/O (remap) requests - Misc: - Support fallible PinInit types in Revocable - Implement Wrapper<T> for Opaque<T> - Merge pin-init blanket dependencies (for Devres) Misc: - Fix OF node leak in auxiliary_device_create() - Use util macros in device property iterators - Improve kobject sample code - Add device_link_test() for testing device link flags - Fix typo in Documentation/ABI/testing/sysfs-kernel-address_bits - Hint to prefer container_of_const() over container_of()" * tag 'driver-core-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (84 commits) rust: io: fix broken intra-doc links to `platform::Device` rust: io: fix broken intra-doc link to missing `flags` module rust: io: mem: enable IoRequest doc-tests rust: platform: add resource accessors rust: io: mem: add a generic iomem abstraction rust: io: add resource abstraction rust: samples: dma: set DMA mask rust: platform: implement the `dma::Device` trait rust: pci: implement the `dma::Device` trait rust: dma: add DMA addressing capabilities rust: dma: implement `dma::Device` trait rust: net::phy Change module_phy_driver macro to use module_device_table macro rust: net::phy represent DeviceId as transparent wrapper over mdio_device_id rust: device_id: split out index support into a separate trait device: rust: rename Device::as_ref() to Device::from_raw() arm64: cacheinfo: Provide helper to compress MPIDR value into u32 cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id cacheinfo: Set cache 'id' based on DT data container_of: Document container_of() is not to be used in new code driver core: auxiliary bus: fix OF node leak ...
2025-07-29Merge tag 'kvmarm-6.17' of ↵Paolo Bonzini9-10/+242
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 changes for 6.17, round #1 - Host driver for GICv5, the next generation interrupt controller for arm64, including support for interrupt routing, MSIs, interrupt translation and wired interrupts. - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on GICv5 hardware, leveraging the legacy VGIC interface. - Userspace control of the 'nASSGIcap' GICv3 feature, allowing userspace to disable support for SGIs w/o an active state on hardware that previously advertised it unconditionally. - Map supporting endpoints with cacheable memory attributes on systems with FEAT_S2FWB and DIC where KVM no longer needs to perform cache maintenance on the address range. - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest hypervisor to inject external aborts into an L2 VM and take traps of masked external aborts to the hypervisor. - Convert more system register sanitization to the config-driven implementation. - Fixes to the visibility of EL2 registers, namely making VGICv3 system registers accessible through the VGIC device instead of the ONE_REG vCPU ioctls. - Various cleanups and minor fixes.
2025-07-29Merge tag 'hardening-v6.17-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: - Introduce and start using TRAILING_OVERLAP() helper for fixing embedded flex array instances (Gustavo A. R. Silva) - mux: Convert mux_control_ops to a flex array member in mux_chip (Thorsten Blum) - string: Group str_has_prefix() and strstarts() (Andy Shevchenko) - Remove KCOV instrumentation from __init and __head (Ritesh Harjani, Kees Cook) - Refactor and rename stackleak feature to support Clang - Add KUnit test for seq_buf API - Fix KUnit fortify test under LTO * tag 'hardening-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (22 commits) sched/task_stack: Add missing const qualifier to end_of_stack() kstack_erase: Support Clang stack depth tracking kstack_erase: Add -mgeneral-regs-only to silence Clang warnings init.h: Disable sanitizer coverage for __init and __head kstack_erase: Disable kstack_erase for all of arm compressed boot code x86: Handle KCOV __init vs inline mismatches arm64: Handle KCOV __init vs inline mismatches s390: Handle KCOV __init vs inline mismatches arm: Handle KCOV __init vs inline mismatches mips: Handle KCOV __init vs inline mismatch powerpc/mm/book3s64: Move kfence and debug_pagealloc related calls to __init section configs/hardening: Enable CONFIG_INIT_ON_FREE_DEFAULT_ON configs/hardening: Enable CONFIG_KSTACK_ERASE stackleak: Split KSTACK_ERASE_CFLAGS from GCC_PLUGINS_CFLAGS stackleak: Rename stackleak_track_stack to __sanitizer_cov_stack_depth stackleak: Rename STACKLEAK to KSTACK_ERASE seq_buf: Introduce KUnit tests string: Group str_has_prefix() and strstarts() kunit/fortify: Add back "volatile" for sizeof() constants acpi: nfit: intel: avoid multiple -Wflex-array-member-not-at-end warnings ...
2025-07-26Merge tag 'irqchip-gic-v5-host' into kvmarm/nextOliver Upton4-4/+139
GICv5 initial host support Add host kernel support for the new arm64 GICv5 architecture, which is quite a departure from the previous ones. Include support for the full gamut of the architecture (interrupt routing and delivery to CPUs, wired interrupts, MSIs, and interrupt translation). * tag 'irqchip-gic-v5-host': (32 commits) arm64: smp: Fix pNMI setup after GICv5 rework arm64: Kconfig: Enable GICv5 docs: arm64: gic-v5: Document booting requirements for GICv5 irqchip/gic-v5: Add GICv5 IWB support irqchip/gic-v5: Add GICv5 ITS support irqchip/msi-lib: Add IRQ_DOMAIN_FLAG_FWNODE_PARENT handling irqchip/gic-v3: Rename GICv3 ITS MSI parent PCI/MSI: Add pci_msi_map_rid_ctlr_node() helper function of/irq: Add of_msi_xlate() helper function irqchip/gic-v5: Enable GICv5 SMP booting irqchip/gic-v5: Add GICv5 LPI/IPI support irqchip/gic-v5: Add GICv5 IRS/SPI support irqchip/gic-v5: Add GICv5 PPI support arm64: Add support for GICv5 GSB barriers arm64: smp: Support non-SGIs for IPIs arm64: cpucaps: Add GICv5 CPU interface (GCIE) capability arm64: cpucaps: Rename GICv3 CPU interface capability arm64: Disable GICv5 read/write/instruction traps arm64/sysreg: Add ICH_HFGITR_EL2 arm64/sysreg: Add ICH_HFGWTR_EL2 ... Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-26Merge branch 'kvm-arm64/doublefault2' into kvmarm/nextOliver Upton4-6/+85
* kvm-arm64/doublefault2: (33 commits) : NV Support for FEAT_RAS + DoubleFault2 : : Delegate the vSError context to the guest hypervisor when in a nested : state, including registers related to ESR propagation. Additionally, : catch up KVM's external abort infrastructure to the architecture, : implementing the effects of FEAT_DoubleFault2. : : This has some impact on non-nested guests, as SErrors deemed unmasked at : the time they're made pending are now immediately injected with an : emulated exception entry rather than using the VSE bit. KVM: arm64: Make RAS registers UNDEF when RAS isn't advertised KVM: arm64: Filter out HCR_EL2 bits when running in hypervisor context KVM: arm64: Check for SYSREGS_ON_CPU before accessing the CPU state KVM: arm64: Commit exceptions from KVM_SET_VCPU_EVENTS immediately KVM: arm64: selftests: Test ESR propagation for vSError injection KVM: arm64: Populate ESR_ELx.EC for emulated SError injection KVM: arm64: selftests: Catch up set_id_regs with the kernel KVM: arm64: selftests: Add SCTLR2_EL1 to get-reg-list KVM: arm64: selftests: Test SEAs are taken to SError vector when EASE=1 KVM: arm64: selftests: Add basic SError injection test KVM: arm64: Don't retire MMIO instruction w/ pending (emulated) SError KVM: arm64: Advertise support for FEAT_DoubleFault2 KVM: arm64: Advertise support for FEAT_SCTLR2 KVM: arm64: nv: Enable vSErrors when HCRX_EL2.TMEA is set KVM: arm64: nv: Honor SError routing effects of SCTLR2_ELx.NMEA KVM: arm64: nv: Take "masked" aborts to EL2 when HCRX_EL2.TMEA is set KVM: arm64: Route SEAs to the SError vector when EASE is set KVM: arm64: nv: Ensure Address size faults affect correct ESR KVM: arm64: Factor out helper for selecting exception target EL KVM: arm64: Describe SCTLR2_ELx RESx masks ... Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-25arm64: add batched versions of ptep_modify_prot_start/commitDev Jain1-0/+10
Override the generic definition of modify_prot_start_ptes() to use get_and_clear_full_ptes(). This helper does a TLBI only for the starting and ending contpte block of the range, whereas the current implementation will call ptep_get_and_clear() for every contpte block, thus doing a TLBI on every contpte block. Therefore, we have a performance win. The arm64 definition of pte_accessible() allows us to batch in the errata specific case: #define pte_accessible(mm, pte) \ (mm_tlb_flush_pending(mm) ? pte_present(pte) : pte_valid(pte)) All ptes are obviously present in the folio batch, and they are also valid. Override the generic definition of modify_prot_commit_ptes() to simply use set_ptes() to map the new ptes into the pagetable. Link: https://lkml.kernel.org/r/20250718090244.21092-8-dev.jain@arm.com Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Zhenhua Huang <quic_zhenhuah@quicinc.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-25mm: remove arch_flush_tlb_batched_pending() arch helperRyan Roberts1-11/+0
Since commit 4b634918384c ("arm64/mm: Close theoretical race where stale TLB entry remains valid"), all arches that use tlbbatch for reclaim (arm64, riscv, x86) implement arch_flush_tlb_batched_pending() with a flush_tlb_mm(). So let's simplify by removing the unnecessary abstraction and doing the flush_tlb_mm() directly in flush_tlb_batched_pending(). This effectively reverts commit db6c1f6f236d ("mm/tlbbatch: introduce arch_flush_tlb_batched_pending()"). Link: https://lkml.kernel.org/r/20250609103132.447370-1-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Suggested-by: Will Deacon <will@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Will Deacon <will@kernel.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Borislav Betkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-25arm64: Handle KCOV __init vs inline mismatchesKees Cook1-1/+1
GCC appears to have kind of fragile inlining heuristics, in the sense that it can change whether or not it inlines something based on optimizations. It looks like the kcov instrumentation being added (or in this case, removed) from a function changes the optimization results, and some functions marked "inline" are _not_ inlined. In that case, we end up with __init code calling a function not marked __init, and we get the build warnings I'm trying to eliminate in the coming patch that adds __no_sanitize_coverage to __init functions: WARNING: modpost: vmlinux: section mismatch in reference: acpi_get_enable_method+0x1c (section: .text.unlikely) -> acpi_psci_present (section: .init.text) This problem is somewhat fragile (though using either __always_inline or __init will deterministically solve it), but we've tripped over this before with GCC and the solution has usually been to just use __always_inline and move on. For arm64 this requires forcing one ACPI function to be inlined with __always_inline. Link: https://lore.kernel.org/r/20250724055029.3623499-1-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2025-07-24Merge tag 'arm64-fixes' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "Two important arm64 fixes ahead of the 6.16 release. The first fixes a regression introduced during the merge window where the KVM UUID (which is used to advertise KVM-specific hypercalls for things like time synchronisation in the guest) was corrupted thanks to an endianness bug introduced when converting the code to use the UUID_INIT() helper. The second fixes a stack-pointer corruption issue during context-switch which has been observed in the wild when taking a pseudo-NMI with shadow call stack enabled. Summary: - Fix broken UUID value for the KVM/arm64 hypervisor SMCCC interface - Fix stack corruption on context-switch, primarily seen on (but not limited to) configurations with both pNMI and SCS enabled" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack() arm64: kvm, smccc: Fix vendor uuid
2025-07-24Merge branch 'for-next/feat_mte_store_only' into for-next/coreCatalin Marinas3-0/+4
* for-next/feat_mte_store_only: : MTE feature to restrict tag checking to store only operations kselftest/arm64/mte: Add MTE_STORE_ONLY testcases kselftest/arm64/mte: Preparation for mte store only test kselftest/arm64/abi: Add MTE_STORE_ONLY feature hwcap test KVM: arm64: Expose MTE_STORE_ONLY feature to guest arm64/hwcaps: Add MTE_STORE_ONLY hwcaps arm64/kernel: Support store-only mte tag check prctl: Introduce PR_MTE_STORE_ONLY arm64/cpufeature: Add MTE_STORE_ONLY feature
2025-07-24Merge branches 'for-next/livepatch', 'for-next/user-contig-bbml2', ↵Catalin Marinas15-55/+93
'for-next/misc', 'for-next/acpi', 'for-next/debug-entry', 'for-next/feat_mte_tagged_far', 'for-next/kselftest', 'for-next/mdscr-cleanup' and 'for-next/vmap-stack', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: (23 commits) drivers/perf: hisi: Support PMUs with no interrupt drivers/perf: hisi: Relax the event number check of v2 PMUs drivers/perf: hisi: Add support for HiSilicon SLLC v3 PMU driver drivers/perf: hisi: Use ACPI driver_data to retrieve SLLC PMU information drivers/perf: hisi: Add support for HiSilicon DDRC v3 PMU driver drivers/perf: hisi: Simplify the probe process for each DDRC version perf/arm-ni: Support sharing IRQs within an NI instance perf/arm-ni: Consolidate CPU affinity handling perf/cxlpmu: Fix typos in cxl_pmu.c comments and documentation perf/cxlpmu: Remove unintended newline from IRQ name format string perf/cxlpmu: Fix devm_kcalloc() argument order in cxl_pmu_probe() perf: arm_spe: Relax period restriction perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE) KVM: arm64: nvhe: Disable branch generation in nVHE guests arm64: Handle BRBE booting requirements arm64/sysreg: Add BRBE registers and fields perf/arm: Add missing .suppress_bind_attrs perf/arm-cmn: Reduce stack usage during discovery perf: imx9_perf: make the read-only array mask static const perf/arm-cmn: Broaden module description for wider interconnect support ... * for-next/livepatch: : Support for HAVE_LIVEPATCH on arm64 arm64: Kconfig: Keep selects somewhat alphabetically ordered arm64: Implement HAVE_LIVEPATCH arm64: stacktrace: Implement arch_stack_walk_reliable() arm64: stacktrace: Check kretprobe_find_ret_addr() return value arm64/module: Use text-poke API for late relocations. * for-next/user-contig-bbml2: : Optimise the TLBI when folding/unfolding contigous PTEs on hardware with BBML2 and no TLB conflict aborts arm64/mm: Elide tlbi in contpte_convert() under BBML2 iommu/arm: Add BBM Level 2 smmu feature arm64: Add BBM Level 2 cpu feature arm64: cpufeature: Introduce MATCH_ALL_EARLY_CPUS capability type * for-next/misc: : Miscellaneous arm64 patches arm64/gcs: task_gcs_el0_enable() should use passed task arm64: signal: Remove ISB when resetting POR_EL0 arm64/mm: Drop redundant addr increment in set_huge_pte_at() arm64: Mark kernel as tainted on SAE and SError panic arm64/gcs: Don't call gcs_free() when releasing task_struct arm64: fix unnecessary rebuilding when CONFIG_DEBUG_EFI=y arm64/mm: Optimize loop to reduce redundant operations of contpte_ptep_get arm64: pi: use 'targets' instead of extra-y in Makefile * for-next/acpi: : Various ACPI arm64 changes ACPI: Suppress misleading SPCR console message when SPCR table is absent ACPI: Return -ENODEV from acpi_parse_spcr() when SPCR support is disabled * for-next/debug-entry: : Simplify the debug exception entry path arm64: debug: remove debug exception registration infrastructure arm64: debug: split bkpt32 exception entry arm64: debug: split brk64 exception entry arm64: debug: split hardware watchpoint exception entry arm64: debug: split single stepping exception entry arm64: debug: refactor reinstall_suspended_bps() arm64: debug: split hardware breakpoint exception entry arm64: entry: Add entry and exit functions for debug exceptions arm64: debug: remove break/step handler registration infrastructure arm64: debug: call step handlers statically arm64: debug: call software breakpoint handlers statically arm64: refactor aarch32_break_handler() arm64: debug: clean up single_step_handler logic * for-next/feat_mte_tagged_far: : Support for reporting the non-address bits during a synchronous MTE tag check fault kselftest/arm64/mte: Add mtefar tests on check_mmap_options kselftest/arm64/mte: Refactor check_mmap_option test kselftest/arm64/mte: Add verification for address tag in signal handler kselftest/arm64/mte: Add address tag related macro and function kselftest/arm64/mte: Check MTE_FAR feature is supported kselftest/arm64/mte: Register mte signal handler with SA_EXPOSE_TAGBITS kselftest/arm64: Add MTE_FAR hwcap test KVM: arm64: Expose FEAT_MTE_TAGGED_FAR feature to guest arm64: Report address tag when FEAT_MTE_TAGGED_FAR is supported arm64/cpufeature: Add FEAT_MTE_TAGGED_FAR feature * for-next/kselftest: : Kselftest updates for arm64 kselftest/arm64: Handle attempts to disable SM on SME only systems kselftest/arm64: Fix SVE write data generation for SME only systems kselftest/arm64: Test SME on SME only systems in fp-ptrace kselftest/arm64: Test FPSIMD format data writes via NT_ARM_SVE in fp-ptrace kselftest/arm64: Allow sve-ptrace to run on SME only systems kselftest/arm4: Provide local defines for AT_HWCAP3 kselftest/arm64: Specify SVE data when testing VL set in sve-ptrace kselftest/arm64: Fix test for streaming FPSIMD write in sve-ptrace kselftest/arm64: Fix check for setting new VLs in sve-ptrace kselftest/arm64: Convert tpidr2 test to use kselftest.h * for-next/mdscr-cleanup: : Drop redundant DBG_MDSCR_* macros KVM: selftests: Change MDSCR_EL1 register holding variables as uint64_t arm64/debug: Drop redundant DBG_MDSCR_* macros * for-next/vmap-stack: : Force VMAP_STACK on arm64 arm64: remove CONFIG_VMAP_STACK checks from entry code arm64: remove CONFIG_VMAP_STACK checks from SDEI stack handling arm64: remove CONFIG_VMAP_STACK checks from stacktrace overflow logic arm64: remove CONFIG_VMAP_STACK conditionals from traps overflow stack arm64: remove CONFIG_VMAP_STACK conditionals from irq stack setup arm64: Remove CONFIG_VMAP_STACK conditionals from THREAD_SHIFT and THREAD_ALIGN arm64: efi: Remove CONFIG_VMAP_STACK check arm64: Mandate VMAP_STACK arm64: efi: Fix KASAN false positive for EFI runtime stack arm64/ptrace: Fix stack-out-of-bounds read in regs_get_kernel_stack_nth() arm64/gcs: Don't call gcs_free() during flush_gcs() arm64: Restrict pagetable teardown to avoid false warning docs: arm64: Fix ICC_SRE_EL2 register typo in booting.rst
2025-07-23arm64/gcs: task_gcs_el0_enable() should use passed taskJeremy Linton1-1/+1
Mark Rutland noticed that the task parameter is ignored and 'current' is being used instead. Since this is usually what its passed, it hasn't yet been causing problems but likely will as the code gets more testing. But, once this is fixed, it creates a new bug in copy_thread_gcs() since the gcs_el_mode isn't yet set for the task before its being checked. Move gcs_alloc_thread_stack() after the new task's gcs_el0_mode initialization to avoid this. Fixes: fc84bc5378a8 ("arm64/gcs: Context switch GCS state for EL0") Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250719043740.4548-2-jeremy.linton@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-23arm64/bug: Add ARCH_WARN_ASM macro for BUG/WARN asm code sharing with RustFUJITA Tomonori1-4/+29
Add new ARCH_WARN_ASM macro for BUG/WARN assembly code sharing with Rust to avoid the duplication. No functional changes. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com> Link: https://lore.kernel.org/r/20250502094537.231725-4-fujita.tomonori@gmail.com Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-07-22arm64/entry: Mask DAIF in cpu_switch_to(), call_on_irq_stack()Ada Couprie Diaz1-0/+5
`cpu_switch_to()` and `call_on_irq_stack()` manipulate SP to change to different stacks along with the Shadow Call Stack if it is enabled. Those two stack changes cannot be done atomically and both functions can be interrupted by SErrors or Debug Exceptions which, though unlikely, is very much broken : if interrupted, we can end up with mismatched stacks and Shadow Call Stack leading to clobbered stacks. In `cpu_switch_to()`, it can happen when SP_EL0 points to the new task, but x18 stills points to the old task's SCS. When the interrupt handler tries to save the task's SCS pointer, it will save the old task SCS pointer (x18) into the new task struct (pointed to by SP_EL0), clobbering it. In `call_on_irq_stack()`, it can happen when switching from the task stack to the IRQ stack and when switching back. In both cases, we can be interrupted when the SCS pointer points to the IRQ SCS, but SP points to the task stack. The nested interrupt handler pushes its return addresses on the IRQ SCS. It then detects that SP points to the task stack, calls `call_on_irq_stack()` and clobbers the task SCS pointer with the IRQ SCS pointer, which it will also use ! This leads to tasks returning to addresses on the wrong SCS, or even on the IRQ SCS, triggering kernel panics via CONFIG_VMAP_STACK or FPAC if enabled. This is possible on a default config, but unlikely. However, when enabling CONFIG_ARM64_PSEUDO_NMI, DAIF is unmasked and instead the GIC is responsible for filtering what interrupts the CPU should receive based on priority. Given the goal of emulating NMIs, pseudo-NMIs can be received by the CPU even in `cpu_switch_to()` and `call_on_irq_stack()`, possibly *very* frequently depending on the system configuration and workload, leading to unpredictable kernel panics. Completely mask DAIF in `cpu_switch_to()` and restore it when returning. Do the same in `call_on_irq_stack()`, but restore and mask around the branch. Mask DAIF even if CONFIG_SHADOW_CALL_STACK is not enabled for consistency of behaviour between all configurations. Introduce and use an assembly macro for saving and masking DAIF, as the existing one saves but only masks IF. Cc: <stable@vger.kernel.org> Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Reported-by: Cristian Prundeanu <cpru@amazon.com> Fixes: 59b37fe52f49 ("arm64: Stash shadow stack pointer in the task struct on interrupt") Tested-by: Cristian Prundeanu <cpru@amazon.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250718142814.133329-1-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-21KVM: arm64: Check for SYSREGS_ON_CPU before accessing the CPU stateMarc Zyngier1-0/+4
Mark Brown reports that since we commit to making exceptions visible without the vcpu being loaded, the external abort selftest fails. Upon investigation, it turns out that the code that makes registers affected by an exception visible to the guest is completely broken on VHE, as we don't check whether the system registers are loaded on the CPU at this point. We managed to get away with this so far, but that's obviously as bad as it gets, Add the required checksm and document the absolute need to check for the SYSREGS_ON_CPU flag before calling into any of the __vcpu_write_sys_reg_to_cpu()__vcpu_read_sys_reg_from_cpu() helpers. Reported-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/18535df8-e647-4643-af9a-bb780af03a70@sirena.org.uk Link: https://lore.kernel.org/r/20250720102229.179114-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-16arm64: cacheinfo: Provide helper to compress MPIDR value into u32James Morse1-0/+17
Filesystems like resctrl use the cache-id exposed via sysfs to identify groups of CPUs. The value is also used for PCIe cache steering tags. On DT platforms cache-id is not something that is described in the device-tree, but instead generated from the smallest MPIDR of the CPUs associated with that cache. The cache-id exposed to user-space has historically been 32 bits. MPIDR values may be larger than 32 bits. MPIDR only has 32 bits worth of affinity data, but the aff3 field lives above 32bits. The corresponding lower bits are masked out by MPIDR_HWID_BITMASK and contain an SMT flag and Uni-Processor flag. Swizzzle the aff3 field into the bottom 32 bits and using that. In case more affinity fields are added in the future, the upper RES0 area should be checked. Returning a value greater than 32 bits from this helper will cause the caller to give up on allocating cache-ids. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Link: https://lore.kernel.org/r/20250711182743.30141-4-james.morse@arm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-10Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-1/+0
Pull KVM fixes from Paolo Bonzini: "Many patches, pretty much all of them small, that accumulated while I was on vacation. ARM: - Remove the last leftovers of the ill-fated FPSIMD host state mapping at EL2 stage-1 - Fix unexpected advertisement to the guest of unimplemented S2 base granule sizes - Gracefully fail initialising pKVM if the interrupt controller isn't GICv3 - Also gracefully fail initialising pKVM if the carveout allocation fails - Fix the computing of the minimum MMIO range required for the host on stage-2 fault - Fix the generation of the GICv3 Maintenance Interrupt in nested mode x86: - Reject SEV{-ES} intra-host migration if one or more vCPUs are actively being created, so as not to create a non-SEV{-ES} vCPU in an SEV{-ES} VM - Use a pre-allocated, per-vCPU buffer for handling de-sparsification of vCPU masks in Hyper-V hypercalls; fixes a "stack frame too large" issue - Allow out-of-range/invalid Xen event channel ports when configuring IRQ routing, to avoid dictating a specific ioctl() ordering to userspace - Conditionally reschedule when setting memory attributes to avoid soft lockups when userspace converts huge swaths of memory to/from private - Add back MWAIT as a required feature for the MONITOR/MWAIT selftest - Add a missing field in struct sev_data_snp_launch_start that resulted in the guest-visible workarounds field being filled at the wrong offset - Skip non-canonical address when processing Hyper-V PV TLB flushes to avoid VM-Fail on INVVPID - Advertise supported TDX TDVMCALLs to userspace - Pass SetupEventNotifyInterrupt arguments to userspace - Fix TSC frequency underflow" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: avoid underflow when scaling TSC frequency KVM: arm64: Remove kvm_arch_vcpu_run_map_fp() KVM: arm64: Fix handling of FEAT_GTG for unimplemented granule sizes KVM: arm64: Don't free hyp pages with pKVM on GICv2 KVM: arm64: Fix error path in init_hyp_mode() KVM: arm64: Adjust range correctly during host stage-2 faults KVM: arm64: nv: Fix MI line level calculation in vgic_v3_nested_update_mi() KVM: x86/hyper-v: Skip non-canonical addresses during PV TLB flush KVM: SVM: Add missing member in SNP_LAUNCH_START command structure Documentation: KVM: Fix unexpected unindent warnings KVM: selftests: Add back the missing check of MONITOR/MWAIT availability KVM: Allow CPU to reschedule while setting per-page memory attributes KVM: x86/xen: Allow 'out of range' event channel ports in IRQ routing table. KVM: x86/hyper-v: Use preallocated per-vCPU buffer for de-sparsified vCPU masks KVM: SVM: Initialize vmsa_pa in VMCB to INVALID_PAGE if VMSA page is NULL KVM: SVM: Reject SEV{-ES} intra host migration if vCPU creation is in-flight KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities KVM: TDX: Exit to userspace for SetupEventNotifyInterrupt
2025-07-10mm: remove devmap related functions and page table bitsAlistair Popple2-25/+0
Now that DAX and all other reference counts to ZONE_DEVICE pages are managed normally there is no need for the special devmap PTE/PMD/PUD page table bits. So drop all references to these, freeing up a software defined page table bit on architectures supporting it. Link: https://lkml.kernel.org/r/6389398c32cc9daa3dfcaa9f79c7972525d310ce.1750323463.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Acked-by: Will Deacon <will@kernel.org> # arm64 Acked-by: David Hildenbrand <david@redhat.com> Suggested-by: Chunyan Zhang <zhang.lyra@gmail.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Björn Töpel <bjorn@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Deepak Gupta <debug@rivosinc.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Inki Dae <m.szyprowski@samsung.com> Cc: John Groves <john@groves.net> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-10mm: update architecture and driver code to use vm_flags_tLorenzo Stoakes1-5/+5
In future we intend to change the vm_flags_t type, so it isn't correct for architecture and driver code to assume it is unsigned long. Correct this assumption across the board. Overall, this patch does not introduce any functional change. Link: https://lkml.kernel.org/r/b6eb1894abc5555ece80bb08af5c022ef780c8bc.1750274467.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-10mm/filemap: allow arch to request folio size for exec memoryRyan Roberts1-0/+8
Change the readahead config so that if it is being requested for an executable mapping, do a synchronous read into a set of folios with an arch-specified order and in a naturally aligned manner. We no longer center the read on the faulting page but simply align it down to the previous natural boundary. Additionally, we don't bother with an asynchronous part. On arm64 if memory is physically contiguous and naturally aligned to the "contpte" size, we can use contpte mappings, which improves utilization of the TLB. When paired with the "multi-size THP" feature, this works well to reduce dTLB pressure. However iTLB pressure is still high due to executable mappings having a low likelihood of being in the required folio size and mapping alignment, even when the filesystem supports readahead into large folios (e.g. XFS). The reason for the low likelihood is that the current readahead algorithm starts with an order-0 folio and increases the folio order by 2 every time the readahead mark is hit. But most executable memory tends to be accessed randomly and so the readahead mark is rarely hit and most executable folios remain order-0. So let's special-case the read(ahead) logic for executable mappings. The trade-off is performance improvement (due to more efficient storage of the translations in iTLB) vs potential for making reclaim more difficult (due to the folios being larger so if a part of the folio is hot the whole thing is considered hot). But executable memory is a small portion of the overall system memory so I doubt this will even register from a reclaim perspective. I've chosen 64K folio size for arm64 which benefits both the 4K and 16K base page size configs. Crucially the same amount of data is still read (usually 128K) so I'm not expecting any read amplification issues. I don't anticipate any write amplification because text is always RO. Note that the text region of an ELF file could be populated into the page cache for other reasons than taking a fault in a mmapped area. The most common case is due to the loader read()ing the header which can be shared with the beginning of text. So some text will still remain in small folios, but this simple, best effort change provides good performance improvements as is. Confine this special-case approach to the bounds of the VMA. This prevents wasting memory for any padding that might exist in the file between sections. Previously the padding would have been contained in order-0 folios and would be easy to reclaim. But now it would be part of a larger folio so more difficult to reclaim. Solve this by simply not reading it into memory in the first place. Benchmarking ============ The below shows pgbench and redis benchmarks on Graviton3 arm64 system. First, confirmation that this patch causes more text to be contained in 64K folios: +----------------------+---------------+---------------+---------------+ | File-backed folios by| system boot | pgbench | redis | | size as percentage of+-------+-------+-------+-------+-------+-------+ | all mapped text mem |before | after |before | after |before | after | +======================+=======+=======+=======+=======+=======+=======+ | base-page-4kB | 78% | 30% | 78% | 11% | 73% | 14% | | thp-aligned-8kB | 1% | 0% | 0% | 0% | 1% | 0% | | thp-aligned-16kB | 17% | 4% | 17% | 3% | 20% | 4% | | thp-aligned-32kB | 1% | 1% | 1% | 2% | 1% | 1% | | thp-aligned-64kB | 3% | 63% | 3% | 81% | 4% | 77% | | thp-aligned-128kB | 0% | 1% | 1% | 1% | 1% | 2% | | thp-unaligned-64kB | 0% | 0% | 0% | 1% | 0% | 1% | | thp-unaligned-128kB | 0% | 1% | 0% | 0% | 0% | 0% | | thp-partial | 0% | 0% | 0% | 1% | 0% | 1% | +----------------------+-------+-------+-------+-------+-------+-------+ | cont-aligned-64kB | 4% | 65% | 4% | 83% | 6% | 79% | +----------------------+-------+-------+-------+-------+-------+-------+ The above shows that for both workloads (each isolated with cgroups) as well as the general system state after boot, the amount of text backed by 4K and 16K folios reduces and the amount backed by 64K folios increases significantly. And the amount of text that is contpte-mapped significantly increases (see last row). And this is reflected in performance improvement. "(I)" indicates a statistically significant improvement. Note TPS and Reqs/sec are rates so bigger is better, ms is time so smaller is better: +-------------+-------------------------------------------+------------+ | Benchmark | Result Class | Improvemnt | +=============+===========================================+============+ | pts/pgbench | Scale: 1 Clients: 1 RO (TPS) | (I) 3.47% | | | Scale: 1 Clients: 1 RO - Latency (ms) | -2.88% | | | Scale: 1 Clients: 250 RO (TPS) | (I) 5.02% | | | Scale: 1 Clients: 250 RO - Latency (ms) | (I) -4.79% | | | Scale: 1 Clients: 1000 RO (TPS) | (I) 6.16% | | | Scale: 1 Clients: 1000 RO - Latency (ms) | (I) -5.82% | | | Scale: 100 Clients: 1 RO (TPS) | 2.51% | | | Scale: 100 Clients: 1 RO - Latency (ms) | -3.51% | | | Scale: 100 Clients: 250 RO (TPS) | (I) 4.75% | | | Scale: 100 Clients: 250 RO - Latency (ms) | (I) -4.44% | | | Scale: 100 Clients: 1000 RO (TPS) | (I) 6.34% | | | Scale: 100 Clients: 1000 RO - Latency (ms)| (I) -5.95% | +-------------+-------------------------------------------+------------+ | pts/redis | Test: GET Connections: 50 (Reqs/sec) | (I) 3.20% | | | Test: GET Connections: 1000 (Reqs/sec) | (I) 2.55% | | | Test: LPOP Connections: 50 (Reqs/sec) | (I) 4.59% | | | Test: LPOP Connections: 1000 (Reqs/sec) | (I) 4.81% | | | Test: LPUSH Connections: 50 (Reqs/sec) | (I) 5.31% | | | Test: LPUSH Connections: 1000 (Reqs/sec) | (I) 4.36% | | | Test: SADD Connections: 50 (Reqs/sec) | (I) 2.64% | | | Test: SADD Connections: 1000 (Reqs/sec) | (I) 4.15% | | | Test: SET Connections: 50 (Reqs/sec) | (I) 3.11% | | | Test: SET Connections: 1000 (Reqs/sec) | (I) 3.36% | +-------------+-------------------------------------------+------------+ [ryan.roberts@arm.com: fix use-after-free] Link: https://lkml.kernel.org/r/ea7f9da7-9a9f-4b85-9d0a-35b320f5ed25@arm.com [ryan.roberts@arm.com: use the vma_pages() helper instead of open-coding] Link: https://lkml.kernel.org/r/0e0f674b-3b7e-494f-ae7a-fc9dbb98dad4@arm.com Link: https://lkml.kernel.org/r/20250609092729.274960-6-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Will Deacon <will@kernel.org> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09Merge tag 'arm64-fixes' of ↵Linus Torvalds1-12/+7
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: - Fix bogus KASAN splat on EFI runtime stack - Select JUMP_LABEL unconditionally to avoid boot failure with pKVM and the legacy implementation of static keys - Avoid touching GCS registers when 'arm64.nogcs' has been passed on the command-line - Move a 'cpumask_t' off the stack in smp_send_stop() - Don't advertise SME-related hwcaps to userspace when ID_AA64PFR1_EL1 indicates that SME is not implemented - Always check the VMA when handling an Overlay fault - Avoid corrupting TCR2_EL1 during boot * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64/mm: Drop wrong writes into TCR2_EL1 arm64: poe: Handle spurious Overlay faults arm64: Filter out SME hwcaps when FEAT_SME isn't implemented arm64: move smp_send_stop() cpu mask off stack arm64/gcs: Don't try to access GCS registers if arm64.nogcs is enabled arm64: Unconditionally select CONFIG_JUMP_LABEL arm64: efi: Fix KASAN false positive for EFI runtime stack
2025-07-09vdso/vsyscall: Split up __arch_update_vsyscall() into __arch_update_vdso_clock()Thomas Weißschuh1-4/+3
The upcoming auxiliary clocks need this hook, too. To separate the architecture hooks from the timekeeper internals, refactor the hook to only operate on a single vDSO clock. While at it, use a more robust #define for the hook override. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250701-vdso-auxclock-v1-3-df7d9f87b9b8@linutronix.de
2025-07-08KVM: arm64: nv: Enable vSErrors when HCRX_EL2.TMEA is setOliver Upton1-1/+5
Per R_CDCKC, vSErrors are enabled if HCRX_EL2.TMEA is set, regardless of HCR_EL2.AMO. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250708172532.1699409-21-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08KVM: arm64: Enable SCTLR2 when advertised to the guestOliver Upton1-0/+3
HCRX_EL2.SCTLR2En needs to be set for SCTLR2_EL1 to take effect in hardware (in addition to disabling traps). Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250708172532.1699409-14-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08KVM: arm64: Wire up SCTLR2_ELx sysreg descriptorsOliver Upton2-0/+8
Set up the sysreg descriptors for SCTLR2_ELx, along with the associated storage and VNCR mapping. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250708172532.1699409-12-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08KVM: arm64: nv: Use guest hypervisor's vSError stateOliver Upton2-0/+8
When HCR_EL2.AMO is set, physical SErrors are routed to EL2 and virtual SError injection is enabled for EL1. Conceptually treating host-initiated SErrors as 'physical', this means we can delegate control of the vSError injection context to the guest hypervisor when nesting && AMO is set. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250708172532.1699409-9-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08KVM: arm64: nv: Add FEAT_RAS vSError sys regs to tableOliver Upton2-0/+5
Prepare to implement RAS for NV by adding the missing EL2 sysregs for the vSError context. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250708172532.1699409-8-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08KVM: arm64: nv: Honor SError exception routing / maskingOliver Upton3-4/+36
To date KVM has used HCR_EL2.VSE to track the state of a pending SError for the guest. With this bit set, hardware respects the EL1 exception routing / masking rules and injects the vSError when appropriate. This isn't correct for NV guests as hardware is oblivious to vEL2's intentions for SErrors. Better yet, with FEAT_NV2 the guest can change the routing behind our back as HCR_EL2 is redirected to memory. Cope with this mess by: - Using a flag (instead of HCR_EL2.VSE) to track the pending SError state when SErrors are unconditionally masked for the current context - Resampling the routing / masking of a pending SError on every guest entry/exit - Emulating exception entry when SError routing implies a translation regime change Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250708172532.1699409-7-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08KVM: arm64: nv: Respect exception routing rules for SEAsOliver Upton1-2/+12
Synchronous external aborts are taken to EL2 if ELIsInHost() or HCR_EL2.TEA=1. Rework the SEA injection plumbing to respect the imposed routing of the guest hypervisor and opportunistically rephrase things to make their function a bit more obvious. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250708172532.1699409-6-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08KVM: arm64: Add helper to identify a nested contextMarc Zyngier1-0/+5
A common idiom in the KVM code is to check if we are currently dealing with a "nested" context, defined as having NV enabled, but being in the EL1&0 translation regime. This is usually expressed as: if (vcpu_has_nv(vcpu) && !is_hyp_ctxt(vcpu) ... ) which is a mouthful and a bit hard to read, specially when followed by additional conditions. Introduce a new helper that encapsulate these two terms, allowing the above to be written as if (is_nested_context(vcpu) ... ) which is both shorter and easier to read, and makes more obvious the potential for simplification on some code paths. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250708172532.1699409-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08irqchip/gic-v5: Add GICv5 LPI/IPI supportLorenzo Pieralisi2-0/+23
An IRS supports Logical Peripheral Interrupts (LPIs) and implement Linux IPIs on top of it. LPIs are used for interrupt signals that are translated by a GICv5 ITS (Interrupt Translation Service) but also for software generated IRQs - namely interrupts that are not driven by a HW signal, ie IPIs. LPIs rely on memory storage for interrupt routing and state. LPIs state and routing information is kept in the Interrupt State Table (IST). IRSes provide support for 1- or 2-level IST tables configured to support a maximum number of interrupts that depend on the OS configuration and the HW capabilities. On systems that provide 2-level IST support, always allow the maximum number of LPIs; On systems with only 1-level support, limit the number of LPIs to 2^12 to prevent wasting memory (presumably a system that supports a 1-level only IST is not expecting a large number of interrupts). On a 2-level IST system, L2 entries are allocated on demand. The IST table memory is allocated using the kmalloc() interface; the allocation required may be smaller than a page and must be made up of contiguous physical pages if larger than a page. On systems where the IRS is not cache-coherent with the CPUs, cache mainteinance operations are executed to clean and invalidate the allocated memory to the point of coherency making it visible to the IRS components. On GICv5 systems, IPIs are implemented using LPIs. Add an LPI IRQ domain and implement an IPI-specific IRQ domain created as a child/subdomain of the LPI domain to allocate the required number of LPIs needed to implement the IPIs. IPIs are backed by LPIs, add LPIs allocation/de-allocation functions. The LPI INTID namespace is managed using an IDA to alloc/free LPI INTIDs. Associate an IPI irqchip with IPI IRQ descriptors to provide core code with the irqchip.ipi_send_single() method required to raise an IPI. Co-developed-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Co-developed-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-22-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-07-08irqchip/gic-v5: Add GICv5 IRS/SPI supportLorenzo Pieralisi1-0/+36
The GICv5 Interrupt Routing Service (IRS) component implements interrupt management and routing in the GICv5 architecture. A GICv5 system comprises one or more IRSes, that together handle the interrupt routing and state for the system. An IRS supports Shared Peripheral Interrupts (SPIs), that are interrupt sources directly connected to the IRS; they do not rely on memory for storage. The number of supported SPIs is fixed for a given implementation and can be probed through IRS IDR registers. SPI interrupt state and routing are managed through GICv5 instructions. Each core (PE in GICv5 terms) in a GICv5 system is identified with an Interrupt AFFinity ID (IAFFID). An IRS manages a set of cores that are connected to it. Firmware provides a topology description that the driver uses to detect to which IRS a CPU (ie an IAFFID) is associated with. Use probeable information and firmware description to initialize the IRSes and implement GICv5 IRS SPIs support through an SPI-specific IRQ domain. The GICv5 IRS driver: - Probes IRSes in the system to detect SPI ranges - Associates an IRS with a set of cores connected to it - Adds an IRQchip structure for SPI handling SPIs priority is set to a value corresponding to the lowest permissible priority in the system (taking into account the implemented priority bits of the IRS and CPU interface). Since all IRQs are set to the same priority value, the value itself does not matter as long as it is a valid one. Co-developed-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Co-developed-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-21-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-07-08irqchip/gic-v5: Add GICv5 PPI supportLorenzo Pieralisi1-0/+19
The GICv5 CPU interface implements support for PE-Private Peripheral Interrupts (PPI), that are handled (enabled/prioritized/delivered) entirely within the CPU interface hardware. To enable PPI interrupts, implement the baseline GICv5 host kernel driver infrastructure required to handle interrupts on a GICv5 system. Add the exception handling code path and definitions for GICv5 instructions. Add GICv5 PPI handling code as a specific IRQ domain to: - Set-up PPI priority - Manage PPI configuration and state - Manage IRQ flow handler - IRQs allocation/free - Hook-up a PPI specific IRQchip to provide the relevant methods PPI IRQ priority is chosen as the minimum allowed priority by the system design (after probing the number of priority bits implemented by the CPU interface). Co-developed-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Co-developed-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-20-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-07-08arm64: Add support for GICv5 GSB barriersLorenzo Pieralisi2-4/+11
The GICv5 architecture introduces two barriers instructions (GSB SYS, GSB ACK) that are used to manage interrupt effects. Rework macro used to emit the SB barrier instruction and implement the GSB barriers on top of it. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-19-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-07-08arm64: smp: Support non-SGIs for IPIsMarc Zyngier1-1/+6
The arm64 arch has relied so far on GIC architectural software generated interrupt (SGIs) to handle IPIs. Those are per-cpu software generated interrupts. arm64 architecture code that allocates the IPIs virtual IRQs and IRQ descriptors was written accordingly. On GICv5 systems, IPIs are implemented using LPIs that are not per-cpu interrupts - they are just normal routable IRQs. Add arch code to set-up IPIs on systems where they are handled using normal routable IRQs. For those systems, force the IRQ affinity (and make it immutable) to the cpu a given IRQ was assigned to. Signed-off-by: Timothy Hayes <timothy.hayes@arm.com> [lpieralisi: changed affinity set-up, log] Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-18-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-07-08arm64: Disable GICv5 read/write/instruction trapsLorenzo Pieralisi1-0/+45
GICv5 trap configuration registers value is UNKNOWN at reset. Initialize GICv5 EL2 trap configuration registers to prevent trapping GICv5 instruction/register access upon entering the kernel. Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-15-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-07-08KVM: arm64: nvhe: Disable branch generation in nVHE guestsAnshuman Khandual1-0/+2
While BRBE can record branches within guests, the host recording branches in guests is not supported by perf (though events are). Support for BRBE in guests will supported by providing direct access to BRBE within the guests. That is how x86 LBR works for guests. Therefore, BRBE needs to be disabled on guest entry and restored on exit. For nVHE, this requires explicit handling for guests. Before entering a guest, save the BRBE state and disable the it. When returning to the host, restore the state. For VHE, it is not necessary. We initialize BRBCR_EL1.{E1BRE,E0BRE}=={0,0} at boot time, and HCR_EL2.TGE==1 while running in the host. We configure BRBCR_EL2.{E2BRE,E0HBRE} to enable branch recording in the host. When entering the guest, we set HCR_EL2.TGE==0 which means BRBCR_EL1 is used instead of BRBCR_EL2. Consequently for VHE, BRBE recording is disabled at EL1 and EL0 when running a guest. Should recording in guests (by the host) ever be desired, the perf ABI will need to be extended to distinguish guest addresses (struct perf_branch_entry.priv) for starters. BRBE records would also need to be invalidated on guest entry/exit as guest/host EL1 and EL0 records can't be distinguished. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Co-developed-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Tested-by: James Clark <james.clark@linaro.org> Reviewed-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250611-arm-brbe-v19-v23-3-e7775563036e@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08arm64: Handle BRBE booting requirementsAnshuman Khandual1-3/+68
To use the Branch Record Buffer Extension (BRBE), some configuration is necessary at EL3 and EL2. This patch documents the requirements and adds the initial EL2 setup code, which largely consists of configuring the fine-grained traps and initializing a couple of BRBE control registers. Before this patch, __init_el2_fgt() would initialize HDFGRTR_EL2 and HDFGWTR_EL2 with the same value, relying on the read/write trap controls for a register occupying the same bit position in either register. The 'nBRBIDR' trap control only exists in bit 59 of HDFGRTR_EL2, while bit 59 of HDFGWTR_EL2 is RES0, and so this assumption no longer holds. To handle HDFGRTR_EL2 and HDFGWTR_EL2 having (slightly) different bit layouts, __init_el2_fgt() is changed to accumulate the HDFGRTR_EL2 and HDFGWTR_EL2 control bits separately. While making this change the open-coded value (1 << 62) is replaced with HDFG{R,W}TR_EL2_nPMSNEVFR_EL1_MASK. The BRBCR_EL1 and BRBCR_EL2 registers are unusual and require special initialisation: even though they are subject to E2H renaming, both have an effect regardless of HCR_EL2.TGE, even when running at EL2. So we must initialize BRBCR_EL2 in case we run in nVHE mode. This is handled in __init_el2_brbe() with a comment to explain the situation. Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Leo Yan <leo.yan@arm.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> [Mark: rewrite commit message, fix typo in comment] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Co-developed-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> tested-by: Adam Young <admiyo@os.amperecomputing.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250611-arm-brbe-v19-v23-2-e7775563036e@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08arm64/sysreg: Add BRBE registers and fieldsAnshuman Khandual1-10/+6
This patch adds definitions related to the Branch Record Buffer Extension (BRBE) as per ARM DDI 0487K.a. These will be used by KVM and a BRBE driver in subsequent patches. Some existing BRBE definitions in asm/sysreg.h are replaced with equivalent generated definitions. Cc: Marc Zyngier <maz@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Tested-by: James Clark <james.clark@linaro.org> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> tested-by: Adam Young <admiyo@os.amperecomputing.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250611-arm-brbe-v19-v23-1-e7775563036e@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08arm64: remove CONFIG_VMAP_STACK checks from stacktrace overflow logicBreno Leitao1-5/+1
With VMAP_STACK now always enabled on arm64, remove all CONFIG_VMAP_STACK conditionals from overflow stack handling in stacktrace code. This change unconditionally defines the per-CPU overflow_stack and stackinfo_get_overflow() helper in arch/arm64/include/asm/stacktrace.h, and always includes the overflow stack in the stack_info array in arch/arm64/kernel/stacktrace.c. Also, drop redundant CONFIG_VMAP_STACK checks from SDEI stack declarations. Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707-arm64_vmap-v1-6-8de98ca0f91c@debian.org Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08arm64: Remove CONFIG_VMAP_STACK conditionals from THREAD_SHIFT and THREAD_ALIGNBreno Leitao1-5/+1
Now that VMAP_STACK is always enabled on arm64, remove the CONFIG_VMAP_STACK conditional logic from the definitions of THREAD_SHIFT and THREAD_ALIGN in arch/arm64/include/asm/memory.h. This simplifies the code by unconditionally setting THREAD_ALIGN to (2 * THREAD_SIZE) and adjusting the THREAD_SHIFT definition to only depend on MIN_THREAD_SHIFT and PAGE_SHIFT. This change reflects the updated arm64 stack model, where all kernel threads use virtually mapped stacks with guard pages, and ensures alignment and stack sizing are consistently handled. Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707-arm64_vmap-v1-3-8de98ca0f91c@debian.org Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08arm64: debug: remove debug exception registration infrastructureAda Couprie Diaz3-12/+0
Now that debug exceptions are handled individually and without the need for dynamic registration, remove the unused registration infrastructure. This removes the external caller for `debug_exception_enter()` and `debug_exception_exit()`. Make them static again and remove them from the header. Remove `early_brk64()` as it has been made redundant by (arm64: debug: split brk64 exception entry) and is not used anymore. Note : in `early_brk64()` `bug_brk_handler()` is called unconditionally as a fall-through, but now `call_break_hook()` only calls it if the immediate matches. This does not change the behaviour in early boot, as if `bug_brk_handler()` was called on a non-BUG immediate it would return DBG_HOOK_ERROR anyway, which `call_break_hook()` will do if no immediate matches. Remove `trap_init()`, as it would be empty and a weak definition already exists in `init/main.c`. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-14-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-07-08arm64: debug: split bkpt32 exception entryAda Couprie Diaz1-0/+1
Currently all debug exceptions share common entry code and are routed to `do_debug_exception()`, which calls dynamically-registered handlers for each specific debug exception. This is unfortunate as different debug exceptions have different entry handling requirements, and it would be better to handle these distinct requirements earlier. The BKPT32 exception can only be triggered by a BKPT instruction. Thus, we know that the PC is a legitimate address and isn't being used to train a branch predictor with a bogus address : we don't need to call `arm64_apply_bp_hardening()`. The handler for this exception only pends a signal and doesn't depend on any per-CPU state : we don't need to inhibit preemption, nor do we need to keep the DAIF exceptions masked, so we can unmask them earlier. Split the BKPT32 exception entry and adjust function signatures and its behaviour to match its relaxed constraints compared to other debug exceptions. We can also remove `NOKRPOBE_SYMBOL`, as this cannot lead to a kprobe recursion. This replaces the last usage of `el0_dbg()`, so remove it. Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Reviewed-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20250707114109.35672-13-ada.coupriediaz@arm.com Signed-off-by: Will Deacon <will@kernel.org>