kernel/linux.git/arch/arm64/include, branch v6.18.33

arm64: Reserve an extra page for early kernel mapping

2026-05-23T11:07:14+00:00

[ Upstream commit 4d8e74ad4585672489da6145b3328d415f50db82 ] The final part of [data, end) segment may overflow into the next page of init_pg_end[1] which is the gap page before early_init_stack[2]: [1] crash_arm64_v9.0.1> vtop ffffffed00601000 VIRTUAL PHYSICAL ffffffed00601000 83401000 PAGE DIRECTORY: ffffffecffd62000 PGD: ffffffecffd62da0 => 10000000833fb003 PMD: ffffff80033fb018 => 10000000833fe003 PTE: ffffff80033fe008 => 68000083401f03 PAGE: 83401000 PTE PHYSICAL FLAGS 68000083401f03 83401000 (VALID|SHARED|AF|NG|PXN|UXN) PAGE PHYSICAL MAPPING INDEX CNT FLAGS fffffffec00d0040 83401000 0 0 1 4000 reserved [2] ffffffed002c8000 (r) __pi__data ffffffed0054e000 (d) __pi___bss_start ffffffed005f5000 (b) __pi_init_pg_dir ffffffed005fe000 (b) __pi_init_pg_end ffffffed005ff000 (B) early_init_stack ffffffed00608000 (b) __pi__end For 4K pages, the early kernel mapping may use 2MB block entries but the kernel segments are only 64KB aligned. Segment boundaries that fall within a 2MB block therefore require a PTE table so that different attributes can be applied on either side of the boundary. KERNEL_SEGMENT_COUNT still correctly counts the five permanent kernel VMAs registered by declare_kernel_vmas(). However, since commit 5973a62efa34 ("arm64: map [_text, _stext) virtual address range non-executable+read-only"), the early mapper also maps [_text, _stext) separately from [_stext, _etext). This adds one more early-only split and can require one more page-table page than the existing EARLY_SEGMENT_EXTRA_PAGES allowance reserves. Increase the 4K-page early mapping allowance by one page to cover that additional split. Fixes: 5973a62efa34 ("arm64: map [_text, _stext) virtual address range non-executable+read-only") Assisted-by: TRAE:GLM-5.1 Suggested-by: Ard Biesheuvel Signed-off-by: Zhaoyang Huang [catalin.marinas@arm.com: rewrote part of the commit log] [catalin.marinas@arm.com: expanded the code comment] Signed-off-by: Catalin Marinas Signed-off-by: Sasha Levin

arm64/xor: fix conflicting attributes for xor_block_template

2026-05-23T11:06:48+00:00

[ Upstream commit 675a0dd596e712404557286d0a883b54ee28e4f4 ] Commit 2c54b423cf85 ("arm64/xor: use EOR3 instructions when available") changes the definition to __ro_after_init instead of const, but failed to update the external declaration in xor.h. This was not found because xor-neon.c doesn't include , and can't easily do that due to current architecture of the XOR code. Link: https://lkml.kernel.org/r/20260327061704.3707577-4-hch@lst.de Fixes: 2c54b423cf85 ("arm64/xor: use EOR3 instructions when available") Signed-off-by: Christoph Hellwig Reviewed-by: Eric Biggers Tested-by: Eric Biggers Cc: Albert Ou Cc: Alexander Gordeev Cc: Alexandre Ghiti Cc: Andreas Larsson Cc: Anton Ivanov Cc: Ard Biesheuvel Cc: Arnd Bergmann Cc: "Borislav Petkov (AMD)" Cc: Catalin Marinas Cc: Chris Mason Cc: Christian Borntraeger Cc: Dan Williams Cc: David S. Miller Cc: David Sterba Cc: Heiko Carstens Cc: Herbert Xu Cc: "H. Peter Anvin" Cc: Huacai Chen Cc: Ingo Molnar Cc: Jason A. Donenfeld Cc: Johannes Berg Cc: Li Nan Cc: Madhavan Srinivasan Cc: Magnus Lindholm Cc: Matt Turner Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Palmer Dabbelt Cc: Richard Henderson Cc: Richard Weinberger Cc: Russell King Cc: Song Liu Cc: Sven Schnelle Cc: Ted Ts'o Cc: Vasily Gorbik Cc: WANG Xuerui Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Sasha Levin

arm64: entry: Don't preempt with SError or Debug masked

2026-05-23T11:06:30+00:00

[ Upstream commit 2371bd83b3df9d833191fe58dadb0e69a794a1cd ] On arm64, involuntary kernel preemption has been subtly broken since the move to the generic irqentry code. When preemption occurs, the new task may run with SError and Debug exceptions masked unexpectedly, leading to a loss of RAS events, breakpoints, watchpoints, and single-step exceptions. Prior to moving to the generic irqentry code, involuntary preemption of kernel mode would only occur when returning from regular interrupts, in a state where interrupts were masked and all other arm64-specific exceptions (SError, Debug, and pseudo-NMI) were unmasked. This is the only state in which it is valid to switch tasks. As part of moving to the generic irqentry code, the involuntary preemption logic was moved such that involuntary preemption could occur when returning from any (non-NMI) exception. As most exception handlers mask all arm64-specific exceptions before this point, preemption could occur in a state where arm64-specific exceptions were masked. This is not a valid state to switch tasks, and resulted in the loss of exceptions described above. As a temporary bodge, avoid the loss of exceptions by avoiding involuntary preemption when SError and/or Debug exceptions are masked. Practically speaking this means that involuntary preemption will only occur when returning from regular interrupts, as was the case before moving to the generic irqentry code. Fixes: 99eb057ccd67 ("arm64: entry: Move arm64_preempt_schedule_irq() into __exit_to_kernel_mode()") Reported-by: Ada Couprie Diaz Reported-by: Vladimir Murzin Signed-off-by: Mark Rutland Cc: Andy Lutomirski Cc: Jinjie Ruan Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Will Deacon Reviewed-by: Jinjie Ruan Acked-by: Peter Zijlstra (Intel) Signed-off-by: Catalin Marinas Signed-off-by: Sasha Levin

KVM: arm64: Fix kvm_vcpu_initialized() macro parameter

2026-05-14T13:30:15+00:00

commit d89fdda7dd8a488f922e1175e6782f781ba8a23b upstream. The macro is defined with parameter 'v' but the body references the literal token 'vcpu' instead, causing it to silently operate on whatever 'vcpu' resolves to in the caller's scope rather than the value passed by the caller. All current call sites happen to use a variable named 'vcpu', so the bug is latent. Fixes: e016333745c7 ("KVM: arm64: Only reset vCPU-scoped feature ID regs once") Signed-off-by: Fuad Tabba Link: https://patch.msgid.link/20260424084908.370776-5-tabba@google.com Signed-off-by: Marc Zyngier Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

arm64: mm: Fix rodata=full block mapping support for realm guests

2026-05-07T04:12:00+00:00

[ Upstream commit f12b435de2f2bb09ce406467020181ada528844c ] Commit a166563e7ec37 ("arm64: mm: support large block mapping when rodata=full") enabled the linear map to be mapped by block/cont while still allowing granular permission changes on BBML2_NOABORT systems by lazily splitting the live mappings. This mechanism was intended to be usable by realm guests since they need to dynamically share dma buffers with the host by "decrypting" them - which for Arm CCA, means marking them as shared in the page tables. However, it turns out that the mechanism was failing for realm guests because realms need to share their dma buffers (via __set_memory_enc_dec()) much earlier during boot than split_kernel_leaf_mapping() was able to handle. The report linked below showed that GIC's ITS was one such user. But during the investigation I found other callsites that could not meet the split_kernel_leaf_mapping() constraints. The problem is that we block map the linear map based on the boot CPU supporting BBML2_NOABORT, then check that all the other CPUs support it too when finalizing the caps. If they don't, then we stop_machine() and split to ptes. For safety, split_kernel_leaf_mapping() previously wouldn't permit splitting until after the caps were finalized. That ensured that if any secondary cpus were running that didn't support BBML2_NOABORT, we wouldn't risk breaking them. I've fix this problem by reducing the black-out window where we refuse to split; there are now 2 windows. The first is from T0 until the page allocator is inititialized. Splitting allocates memory for the page allocator so it must be in use. The second covers the period between starting to online the secondary cpus until the system caps are finalized (this is a very small window). All of the problematic callers are calling __set_memory_enc_dec() before the secondary cpus come online, so this solves the problem. However, one of these callers, swiotlb_update_mem_attributes(), was trying to split before the page allocator was initialized. So I have moved this call from arch_mm_preinit() to mem_init(), which solves the ordering issue. I've added warnings and return an error if any attempt is made to split in the black-out windows. Note there are other issues which prevent booting all the way to user space, which will be fixed in subsequent patches. Reported-by: Jinjiang Tu Closes: https://lore.kernel.org/all/0b2a4ae5-fc51-4d77-b177-b2e9db74f11d@huawei.com/ Fixes: a166563e7ec3 ("arm64: mm: support large block mapping when rodata=full") Cc: stable@vger.kernel.org Reviewed-by: Kevin Brodsky Signed-off-by: Ryan Roberts Reviewed-by: Suzuki K Poulose Tested-by: Suzuki K Poulose Signed-off-by: Catalin Marinas [ adjusted context to use `__ASSEMBLY__` instead of `__ASSEMBLER__` ] Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman

arm64: errata: Work around early CME DVMSync acknowledgement

2026-04-27T13:27:28+00:00

commit 0baba94a9779c13c857f6efc55807e6a45b1d4e4 upstream. C1-Pro acknowledges DVMSync messages before completing the SME/CME memory accesses. Work around this by issuing an IPI to the affected CPUs if they are running in EL0 with SME enabled. Note that we avoid the local DSB in the IPI handler as the kernel runs with SCTLR_EL1.IESB=1. This is sufficient to complete SME memory accesses at EL0 on taking an exception to EL1. On the return to user path, no barrier is necessary either. See the comment in sme_set_active() and the more detailed explanation in the link below. To avoid a potential IPI flood from malicious applications (e.g. madvise(MADV_PAGEOUT) in a tight loop), track where a process is active via mm_cpumask() and only interrupt those CPUs. Link: https://lore.kernel.org/r/ablEXwhfKyJW1i7l@J2N7QTR9R3 Cc: Will Deacon Cc: Mark Rutland Cc: James Morse Cc: Mark Brown Reviewed-by: Will Deacon Signed-off-by: Catalin Marinas Signed-off-by: Greg Kroah-Hartman

arm64: cputype: Add C1-Pro definitions

2026-04-27T13:27:28+00:00

commit 2c99561016c591f4c3d5ad7d22a61b8726e79735 upstream. Add cputype definitions for C1-Pro. These will be used for errata detection in subsequent patches. These values can be found in "Table A-303: MIDR_EL1 bit descriptions" in issue 07 of the C1-Pro TRM: https://documentation-service.arm.com/static/6930126730f8f55a656570af Acked-by: Mark Rutland Cc: Will Deacon Cc: James Morse Reviewed-by: Will Deacon Signed-off-by: Catalin Marinas Signed-off-by: Greg Kroah-Hartman

arm64: tlb: Pass the corresponding mm to __tlbi_sync_s1ish()

2026-04-27T13:27:28+00:00

commit d9fb08ba946a6190c371dcd9f9e465d0d52c5021 upstream. The mm structure will be used for workarounds that need limiting to specific tasks. Acked-by: Mark Rutland Cc: Will Deacon Reviewed-by: Will Deacon Signed-off-by: Catalin Marinas Signed-off-by: Greg Kroah-Hartman

arm64: tlb: Introduce __tlbi_sync_s1ish_{kernel,batch}() for TLB maintenance

2026-04-27T13:27:28+00:00

commit 6bfbf574a39139da11af9fdf6e8d56fe1989cd3e upstream. Add __tlbi_sync_s1ish_kernel() similar to __tlbi_sync_s1ish() and use it for kernel TLB maintenance. Also use this function in flush_tlb_all() which is only used in relation to kernel mappings. Subsequent patches can differentiate between workarounds that apply to user only or both user and kernel. A subsequent patch will add mm_struct to __tlbi_sync_s1ish(). Since arch_tlbbatch_flush() is not specific to an mm, add a corresponding __tlbi_sync_s1ish_batch() helper. Acked-by: Mark Rutland Cc: Will Deacon Reviewed-by: Will Deacon Signed-off-by: Catalin Marinas Signed-off-by: Greg Kroah-Hartman

arm64: tlb: Optimize ARM64_WORKAROUND_REPEAT_TLBI

2026-04-27T13:27:28+00:00

commit a8f78680ee6bf795086384e8aea159a52814f827 upstream. The ARM64_WORKAROUND_REPEAT_TLBI workaround is used to mitigate several errata where broadcast TLBI;DSB sequences don't provide all the architecturally required synchronization. The workaround performs more work than necessary, and can have significant overhead. This patch optimizes the workaround, as explained below. The workaround was originally added for Qualcomm Falkor erratum 1009 in commit: d9ff80f83ecb ("arm64: Work around Falkor erratum 1009") As noted in the message for that commit, the workaround is applied even in cases where it is not strictly necessary. The workaround was later reused without changes for: * Arm Cortex-A76 erratum #1286807 SDEN v33: https://developer.arm.com/documentation/SDEN-885749/33-0/ * Arm Cortex-A55 erratum #2441007 SDEN v16: https://developer.arm.com/documentation/SDEN-859338/1600/ * Arm Cortex-A510 erratum #2441009 SDEN v19: https://developer.arm.com/documentation/SDEN-1873351/1900/ The important details to note are as follows: 1. All relevant errata only affect the ordering and/or completion of memory accesses which have been translated by an invalidated TLB entry. The actual invalidation of TLB entries is unaffected. 2. The existing workaround is applied to both broadcast and local TLB invalidation, whereas for all relevant errata it is only necessary to apply a workaround for broadcast invalidation. 3. The existing workaround replaces every TLBI with a TLBI;DSB;TLBI sequence, whereas for all relevant errata it is only necessary to execute a single additional TLBI;DSB sequence after any number of TLBIs are completed by a DSB. For example, for a sequence of batched TLBIs: TLBI [, ] TLBI [, ] TLBI [, ] DSB ISH ... the existing workaround will expand this to: TLBI [, ] DSB ISH // additional TLBI [, ] // additional TLBI [, ] DSB ISH // additional TLBI [, ] // additional TLBI [, ] DSB ISH // additional TLBI [, ] // additional DSB ISH ... whereas it is sufficient to have: TLBI [, ] TLBI [, ] TLBI [, ] DSB ISH TLBI [, ] // additional DSB ISH // additional Using a single additional TBLI and DSB at the end of the sequence can have significantly lower overhead as each DSB which completes a TLBI must synchronize with other PEs in the system, with potential performance effects both locally and system-wide. 4. The existing workaround repeats each specific TLBI operation, whereas for all relevant errata it is sufficient for the additional TLBI to use *any* operation which will be broadcast, regardless of which translation regime or stage of translation the operation applies to. For example, for a single TLBI: TLBI ALLE2IS DSB ISH ... the existing workaround will expand this to: TLBI ALLE2IS DSB ISH TLBI ALLE2IS // additional DSB ISH // additional ... whereas it is sufficient to have: TLBI ALLE2IS DSB ISH TLBI VALE1IS, XZR // additional DSB ISH // additional As the additional TLBI doesn't have to match a specific earlier TLBI, the additional TLBI can be implemented in separate code, with no memory of the earlier TLBIs. The additional TLBI can also use a cheaper TLBI operation. 5. The existing workaround is applied to both Stage-1 and Stage-2 TLB invalidation, whereas for all relevant errata it is only necessary to apply a workaround for Stage-1 invalidation. Architecturally, TLBI operations which invalidate only Stage-2 information (e.g. IPAS2E1IS) are not required to invalidate TLB entries which combine information from Stage-1 and Stage-2 translation table entries, and consequently may not complete memory accesses translated by those combined entries. In these cases, completion of memory accesses is only guaranteed after subsequent invalidation of Stage-1 information (e.g. VMALLE1IS). Taking the above points into account, this patch reworks the workaround logic to reduce overhead: * New __tlbi_sync_s1ish() and __tlbi_sync_s1ish_hyp() functions are added and used in place of any dsb(ish) which is used to complete broadcast Stage-1 TLB maintenance. When the ARM64_WORKAROUND_REPEAT_TLBI workaround is enabled, these helpers will execute an additional TLBI;DSB sequence. For consistency, it might make sense to add __tlbi_sync_*() helpers for local and stage 2 maintenance. For now I've left those with open-coded dsb() to keep the diff small. * The duplication of TLBIs in __TLBI_0() and __TLBI_1() is removed. This is no longer needed as the necessary synchronization will happen in __tlbi_sync_s1ish() or __tlbi_sync_s1ish_hyp(). * The additional TLBI operation is chosen to have minimal impact: - __tlbi_sync_s1ish() uses "TLBI VALE1IS, XZR". This is only used at EL1 or at EL2 with {E2H,TGE}=={1,1}, where it will target an unused entry for the reserved ASID in the kernel's own translation regime, and have no adverse affect. - __tlbi_sync_s1ish_hyp() uses "TLBI VALE2IS, XZR". This is only used in hyp code, where it will target an unused entry in the hyp code's TTBR0 mapping, and should have no adverse effect. * As __TLBI_0() and __TLBI_1() no longer replace each TLBI with a TLBI;DSB;TLBI sequence, batching TLBIs is worthwhile, and there's no need for arch_tlbbatch_should_defer() to consider ARM64_WORKAROUND_REPEAT_TLBI. When building defconfig with GCC 15.1.0, compared to v6.19-rc1, this patch saves ~1KiB of text, makes the vmlinux ~42KiB smaller, and makes the resulting Image 64KiB smaller: | [mark@lakrids:~/src/linux]% size vmlinux-* | text data bss dec hex filename | 21179831 19660919 708216 41548966 279fca6 vmlinux-after | 21181075 19660903 708216 41550194 27a0172 vmlinux-before | [mark@lakrids:~/src/linux]% ls -l vmlinux-* | -rwxr-xr-x 1 mark mark 157771472 Feb 4 12:05 vmlinux-after | -rwxr-xr-x 1 mark mark 157815432 Feb 4 12:05 vmlinux-before | [mark@lakrids:~/src/linux]% ls -l Image-* | -rw-r--r-- 1 mark mark 41007616 Feb 4 12:05 Image-after | -rw-r--r-- 1 mark mark 41073152 Feb 4 12:05 Image-before Signed-off-by: Mark Rutland Cc: Catalin Marinas Cc: Marc Zyngier Cc: Oliver Upton Cc: Ryan Roberts Cc: Will Deacon Signed-off-by: Will Deacon Signed-off-by: Catalin Marinas Signed-off-by: Greg Kroah-Hartman