summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2025-05-08arm64: dts: rockchip: Move rk3568 PCIe3 MSI to use GIC ITSChukun Pan1-4/+4
Following commit b956c9de9175 ("arm64: dts: rockchip: rk356x: Move PCIe MSI to use GIC ITS instead of MBI"), change the PCIe3 controller's MSI on rk3568 to use ITS, so that all MSI-X can work properly. Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn> Link: https://lore.kernel.org/r/20250308093008.568437-2-amadeus@jmu.edu.cn Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-05-08riscv: vDSO: Remove --hash-style=bothXi Ruoyao1-1/+1
When RISC-V borned, DT_GNU_HASH had already became the de-facto standard so DT_HASH is just wasting storage space. Remove the explicit --hash-style=both setting and rely on the distro toolchain default, which is most likely "gnu" (i.e. generating only DT_GNU_HASH, no DT_HASH). Following the logic of commit 48f6430505c0 ("arm64/vdso: Remove --hash-style=sysv"). Signed-off-by: Xi Ruoyao <xry111@xry111.site> Link: https://lore.kernel.org/r/20250224112042.60282-2-xry111@xry111.site Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-05-08arm64: dts: rockchip: Update eMMC for NanoPi R5 seriesPeter Robinson1-1/+4
Add the 3.3v and 1.8v regulators that are connected to the eMMC on the R5 series devices, as well as adding the eMMC data strobe, and enable eMMC HS200 mode as the Foresee FEMDNN0xxG-A3A55 modules support it. Fixes: c8ec73b05a95d ("arm64: dts: rockchip: create common dtsi for NanoPi R5 series") Signed-off-by: Peter Robinson <pbrobinson@gmail.com> Reviewed-by: Diederik de Haas <didi.debian@cknow.org> Link: https://lore.kernel.org/r/20250506222531.625157-1-pbrobinson@gmail.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2025-05-08Merge patch series "riscv: uaccess: optimisations"Palmer Dabbelt4-53/+179
Cyril Bur <cyrilbur@tenstorrent.com> says: This series tries to optimize riscv uaccess by allowing the use of user_access_begin() and user_access_end() which permits grouping user accesses and avoiding the CSR write penalty for each access. The error path can also be optimised using asm goto which patches 3 and 4 achieve. This will speed up jumping to labels by avoiding the need of an intermediary error type variable within the uaccess macros I did read the discussion this series generated. It isn't clear to me which direction to take the patches, if any. * b4-shazam-merge: riscv: uaccess: use 'asm_goto_output' for get_user() riscv: uaccess: use 'asm goto' for put_user() riscv: uaccess: use input constraints for ptr of __put_user() riscv: implement user_access_begin() and families riscv: save the SR_SUM status over switches Link: https://lore.kernel.org/r/20250410070526.3160847-1-cyrilbur@tenstorrent.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-05-08riscv: uaccess: use 'asm_goto_output' for get_user()Jisheng Zhang1-27/+68
With 'asm goto' we don't need to test the error etc, the exception just jumps to the error handling directly. Unlike put_user(), get_user() must work around GCC bugs [1] when using output clobbers in an asm goto statement. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 # 1 Signed-off-by: Jisheng Zhang <jszhang@kernel.org> [Cyril Bur: Rewritten commit message] Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250410070526.3160847-6-cyrilbur@tenstorrent.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-05-08riscv: uaccess: use 'asm goto' for put_user()Jisheng Zhang1-38/+33
With 'asm goto' we don't need to test the error etc, the exception just jumps to the error handling directly. Because there are no output clobbers which could trigger gcc bugs [1] the use of asm_goto_output() macro is not necessary here. Not using asm_goto_output() is desirable as the generated output asm will be cleaner. Use of the volatile keyword is redundant as per gcc 14.2.0 manual section 6.48.2.7 Goto Labels: > Also note that an asm goto statement is always implicitly considered volatile. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 # 1 Signed-off-by: Jisheng Zhang <jszhang@kernel.org> [Cyril Bur: Rewritten commit message] Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250410070526.3160847-5-cyrilbur@tenstorrent.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-05-08riscv: uaccess: use input constraints for ptr of __put_user()Jisheng Zhang1-9/+9
Putting ptr in the inputs as opposed to output may seem incorrect but this is done for a few reasons: - Not having it in the output permits the use of asm goto in a subsequent patch. There are bugs in gcc [1] which would otherwise prevent it. - Since the output memory is userspace there isn't any real benefit from telling the compiler about the memory clobber. - x86, arm and powerpc all use this technique. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 # 1 Signed-off-by: Jisheng Zhang <jszhang@kernel.org> [Cyril Bur: Rewritten commit message] Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250410070526.3160847-4-cyrilbur@tenstorrent.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-05-08riscv: implement user_access_begin() and familiesJisheng Zhang1-0/+76
Currently, when a function like strncpy_from_user() is called, the userspace access protection is disabled and enabled for every word read. By implementing user_access_begin() and families, the protection is disabled at the beginning of the copy and enabled at the end. The __inttype macro is borrowed from x86 implementation. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250410070526.3160847-3-cyrilbur@tenstorrent.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-05-08riscv: save the SR_SUM status over switchesBen Dooks3-0/+14
When threads/tasks are switched we need to ensure the old execution's SR_SUM state is saved and the new thread has the old SR_SUM state restored. The issue was seen under heavy load especially with the syz-stress tool running, with crashes as follows in schedule_tail: Unable to handle kernel access to user memory without uaccess routines at virtual address 000000002749f0d0 Oops [#1] Modules linked in: CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0 Hardware name: riscv-virtio,qemu (DT) epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264 ra : task_pid_vnr include/linux/sched.h:1421 [inline] ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264 epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0 gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000 t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0 s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003 a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00 a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0 s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850 s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8 s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2 t5 : ffffffc4043cafba t6 : 0000000000040000 status: 0000000000000120 badaddr: 000000002749f0d0 cause: 000000000000000f Call Trace: [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264 [<ffffffe000005570>] ret_from_exception+0x0/0x14 Dumping ftrace buffer: (ftrace buffer empty) ---[ end trace b5f8f9231dc87dda ]--- The issue comes from the put_user() in schedule_tail (kernel/sched/core.c) doing the following: asmlinkage __visible void schedule_tail(struct task_struct *prev) { ... if (current->set_child_tid) put_user(task_pid_vnr(current), current->set_child_tid); ... } the put_user() macro causes the code sequence to come out as follows: 1: __enable_user_access() 2: reg = task_pid_vnr(current); 3: *current->set_child_tid = reg; 4: __disable_user_access() The problem is that we may have a sleeping function as argument which could clear SR_SUM causing the panic above. This was fixed by evaluating the argument of the put_user() macro outside the user-enabled section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before enabling user access")" In order for riscv to take advantage of unsafe_get/put_XXX() macros and to avoid the same issue we had with put_user() and sleeping functions we must ensure code flow can go through switch_to() from within a region of code with SR_SUM enabled and come back with SR_SUM still enabled. This patch addresses the problem allowing future work to enable full use of unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost on every access. Make switch_to() save and restore SR_SUM. Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Deepak Gupta <debug@rivosinc.com> Link: https://lore.kernel.org/r/20250410070526.3160847-2-cyrilbur@tenstorrent.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-05-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski26-80/+221
Cross-merge networking fixes after downstream PR (net-6.15-rc6). No conflicts. Adjacent changes: net/core/dev.c: 08e9f2d584c4 ("net: Lock netdevices during dev_shutdown") a82dc19db136 ("net: avoid potential race between netdev_get_by_index_lock() and netns switch") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-08Merge tag 's390-6.15-4' of ↵Linus Torvalds5-18/+40
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Heiko Carstens: - Fix potential use-after-free bug and missing error handling in PCI code - Fix dcssblk build error - Fix last breaking event handling in case of stack corruption to allow for better error reporting - Update defconfigs * tag 's390-6.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/pci: Fix duplicate pci_dev_put() in disable_slot() when PF has child VFs s390/pci: Fix missing check for zpci_create_device() error return s390: Update defconfigs s390/dcssblk: Fix build error with CONFIG_DAX=m and CONFIG_DCSSBLK=y s390/entry: Fix last breaking event handling in case of stack corruption s390/configs: Enable options required for TC flow offload s390/configs: Enable VDPA on Nvidia ConnectX-6 network card
2025-05-08arm64: proton-pack: Add new CPUs 'k' values for branch mitigationJames Morse2-0/+3
Update the list of 'k' values for the branch mitigation from arm's website. Add the values for Cortex-X1C. The MIDR_EL1 value can be found here: https://developer.arm.com/documentation/101968/0002/Register-descriptions/AArch> Link: https://developer.arm.com/documentation/110280/2-0/?lang=en Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
2025-05-08arm64/fpsimd: Allow CONFIG_ARM64_SME to be selectedMark Rutland1-1/+0
Now that the known issues with SME have been addressed, allow SME to be selected. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Kiss <daniel.kiss@arm.com> Cc: David Spickett <david.spickett@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Luis Machado <luis.machado@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Richard Sandiford <richard.sandiford@arm.com> Cc: Sander De Smalen <sander.desmalen@arm.com> Cc: Tamas Petz <tamas.petz@arm.com> Cc: Todd Kjos <tkjos@google.com> Cc: Will Deacon <will@kernel.org> Cc: Yury Khrustalev <yury.khrustalev@arm.com> Tested-By: Luis Machado <luis.machado@arm.com> Link: https://lore.kernel.org/r/20250508132644.1395904-21-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-08arm64/fpsimd: ptrace: Gracefully handle errorsMark Rutland1-35/+26
Within sve_set_common() we do not handle error conditions correctly: * When writing to NT_ARM_SSVE, if sme_alloc() fails, the task will be left with task->thread.sme_state==NULL, but TIF_SME will be set and task->thread.fp_type==FP_STATE_SVE. This will result in a subsequent null pointer dereference when the task's state is loaded or otherwise manipulated. * When writing to NT_ARM_SSVE, if sve_alloc() fails, the task will be left with task->thread.sve_state==NULL, but TIF_SME will be set, PSTATE.SM will be set, and task->thread.fp_type==FP_STATE_FPSIMD. This is not a legitimate state, and can result in various problems, including a subsequent null pointer dereference and/or the task inheriting stale streaming mode register state the next time its state is loaded into hardware. * When writing to NT_ARM_SSVE, if the VL is changed but the resulting VL differs from that in the header, the task will be left with TIF_SME set, PSTATE.SM set, but task->thread.fp_type==FP_STATE_FPSIMD. This is not a legitimate state, and can result in various problems as described above. Avoid these problems by allocating memory earlier, and by changing the task's saved fp_type to FP_STATE_SVE before skipping register writes due to a change of VL. To make early returns simpler, I've moved the call to fpsimd_flush_task_state() earlier. As the tracee's state has already been saved, and the tracee is known to be blocked for the duration of sve_set_common(), it doesn't matter whether this is called at the start or the end. For consistency I've moved the setting of TIF_SVE earlier. This will be cleared when loading FPSIMD-only state, and so moving this has no resulting functional change. Note that we only allocate the memory for SVE state when SVE register contents are provided, avoiding unnecessary memory allocations for tasks which only use FPSIMD. Fixes: e12310a0d30f ("arm64/sme: Implement ptrace support for streaming mode SVE registers") Fixes: baa8515281b3 ("arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE") Fixes: 5d0a8d2fba50 ("arm64/ptrace: Ensure that SME is set up for target when writing SSVE state") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Spickett <david.spickett@arm.com> Cc: Luis Machado <luis.machado@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250508132644.1395904-20-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-08arm64/fpsimd: ptrace: Mandate SVE payload for streaming-mode stateMark Rutland1-1/+11
When a task has PSTATE.SM==1, reads of NT_ARM_SSVE are required to always present a header with SVE_PT_REGS_SVE, and register data in SVE format. Reads of NT_ARM_SSVE must never present register data in FPSIMD format. Within the kernel, we always expect streaming SVE data to be stored in SVE format. Currently a user can write to NT_ARM_SSVE with a header presenting SVE_PT_REGS_FPSIMD rather than SVE_PT_REGS_SVE, placing the task's FPSIMD/SVE data into an invalid state. To fix this we can either: (a) Forbid such writes. (b) Accept such writes, and immediately convert data into SVE format. Take the simple option and forbid such writes. Fixes: e12310a0d30f ("arm64/sme: Implement ptrace support for streaming mode SVE registers") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Spickett <david.spickett@arm.com> Cc: Luis Machado <luis.machado@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250508132644.1395904-19-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-08arm64/fpsimd: ptrace: Do not present register data for inactive modeMark Rutland1-14/+18
The SME ptrace ABI is written around the incorrect assumption that SVE_PT_REGS_FPSIMD and SVE_PT_REGS_SVE are independent bit flags, where it is possible for both to be clear. In reality they are different values for bit 0 of the header flags, where SVE_PT_REGS_FPSIMD is 0 and SVE_PT_REGS_SVE is 1. In cases where code was written expecting that neither bit flag would be set, the value is equivalent to SVE_PT_REGS_FPSIMD. One consequence of this is that reads of the NT_ARM_SVE or NT_ARM_SSVE will erroneously present data from the other mode: * When PSTATE.SM==1, reads of NT_ARM_SVE will present a header with SVE_PT_REGS_FPSIMD, and FPSIMD-formatted data from streaming mode. * When PSTATE.SM==0, reads of NT_ARM_SSVE will present a header with SVE_PT_REGS_FPSIMD, and FPSIMD-formatted data from non-streaming mode. The original intent was that no register data would be provided in these cases, as described in commit: e12310a0d30f ("arm64/sme: Implement ptrace support for streaming mode SVE registers") Luckily, debuggers do not consume the bogus register data. Both GDB and LLDB read the NT_ARM_SSVE regset before the NT_ARM_SVE regset, and assume that when the NT_ARM_SSVE header presents SVE_PT_REGS_FPSIMD, it is necessary to read register contents from the NT_ARM_SVE regset, regardless of whether the NT_ARM_SSVE regset provided bogus register data. Fix the code to stop presenting register data from the inactive mode. At the same time, make the manipulation of the flag clearer, and remove the bogus comment from sve_set_common(). I've given this a quick spin with GDB and LLDB, and both seem happy. Fixes: e12310a0d30f ("arm64/sme: Implement ptrace support for streaming mode SVE registers") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Spickett <david.spickett@arm.com> Cc: Luis Machado <luis.machado@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250508132644.1395904-18-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-08arm64/fpsimd: ptrace: Save task state before generating SVE headerMark Rutland1-3/+3
As sve_init_header_from_task() consumes the saved value of PSTATE.SM and the saved fp_type, both must be saved before the header is generated. When generating a coredump for the current task, sve_get_common() calls sve_init_header_from_task() before saving the task's state. Consequently the header may be bogus, and the contents of the regset may be misleading. Fix this by saving the task's state before generting the header. Fixes: e12310a0d30f ("arm64/sme: Implement ptrace support for streaming mode SVE registers") Fixes: b017a0cea627 ("arm64/ptrace: Use saved floating point state type to determine SVE layout") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Spickett <david.spickett@arm.com> Cc: Luis Machado <luis.machado@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250508132644.1395904-17-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-08arm64/fpsimd: ptrace/prctl: Ensure VL changes leave task in a valid stateMark Rutland1-65/+72
Currently, vec_set_vector_length() can manipulate a task into an invalid state as a result of a prctl/ptrace syscall which changes the SVE/SME vector length, resulting in several problems: (1) When changing the SVE vector length, if the task initially has PSTATE.ZA==1, and sve_alloc() fails to allocate memory, the task will be left with PSTATE.ZA==1 and sve_state==NULL. This is not a legitimate state, and could result in a subsequent null pointer dereference. (2) When changing the SVE vector length, if the task initially has PSTATE.SM==1, the task will be left with PSTATE.SM==1 and fp_type==FP_STATE_FPSIMD. Streaming mode state always needs to be saved in SVE format, so this is not a legitimate state. Attempting to restore this state may cause a task to erroneously inherit stale streaming mode predicate registers and FFR contents, behaving non-deterministically and potentially leaving information from another task. While in this state, reads of the NT_ARM_SSVE regset will indicate that the registers are not stored in SVE format. For the NT_ARM_SSVE regset specifically, debuggers interpret this as meaning that PSTATE.SM==0. (3) When changing the SME vector length, if the task initially has PSTATE.SM==1, the lower 128 bits of task's streaming mode vector state will be migrated to non-streaming mode, rather than these bits being zeroed as is usually the case for changes to PSTATE.SM. To fix the first issue, we can eagerly allocate the new sve_state and sme_state before modifying the task. This makes it possible to handle memory allocation failure without modifying the task state at all, and removes the need to clear TIF_SVE and TIF_SME. To fix the second issue, we either need to clear PSTATE.SM or not change the saved fp_type. Given we're going to eagerly allocate sve_state and sme_state, the simplest option is to preserve PSTATE.SM and the saves fp_type, and consistently truncate the SVE state. This ensures that the task always stays in a valid state, and by virtue of not exiting streaming mode, this also sidesteps the third issue. I believe these changes should not be problematic for realistic usage: * When the SVE/SME vector length is changed via prctl(), syscall entry will have cleared PSTATE.SM. Unless the task's state has been manipulated via ptrace after entry, the task will have PSTATE.SM==0. * When the SVE/SME vector length is changed via a write to the NT_ARM_SVE or NT_ARM_SSVE regsets, PSTATE.SM will be forced immediately after the length change, and new vector state will be copied from userspace. * When the SME vector length is changed via a write to the NT_ARM_ZA regset, the (S)SVE state is clobbered today, so anyone who cares about the specific state would need to install this after writing to the NT_ARM_ZA regset. As we need to free the old SVE state while TIF_SVE may still be set, we cannot use sve_free(), and using kfree() directly makes it clear that the free pairs with the subsequent assignment. As this leaves sve_free() unused, I've removed the existing sve_free() and renamed __sve_free() to mirror sme_free(). Fixes: 8bd7f91c03d8 ("arm64/sme: Implement traps and syscall handling for SME") Fixes: baa8515281b3 ("arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Spickett <david.spickett@arm.com> Cc: Luis Machado <luis.machado@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250508132644.1395904-16-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-08arm64/fpsimd: ptrace/prctl: Ensure VL changes do not resurrect stale dataMark Rutland1-1/+1
The SVE/SME vector lengths can be changed via prctl/ptrace syscalls. Changes to the SVE/SME vector lengths are documented as preserving the lower 128 bits of the Z registers (i.e. the bits shared with the FPSIMD V registers). To ensure this, vec_set_vector_length() explicitly copies register values from a task's saved SVE state to its saved FPSIMD state when dropping the task to FPSIMD-only. The logic for this was not updated when when FPSIMD/SVE state tracking was changed across commits: baa8515281b3 ("arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE") a0136be443d5 (arm64/fpsimd: Load FP state based on recorded data type") bbc6172eefdb ("arm64/fpsimd: SME no longer requires SVE register state") 8c845e273104 ("arm64/sve: Leave SVE enabled on syscall if we don't context switch") Since the last commit above, a task's FPSIMD/SVE state may be stored in FPSIMD format while TIF_SVE is set, and the stored SVE state is stale. When vec_set_vector_length() encounters this case, it will erroneously clobber the live FPSIMD state with stale SVE state by using sve_to_fpsimd(). Fix this by using fpsimd_sync_from_effective_state() instead. Related issues with streaming mode state will be addressed in subsequent patches. Fixes: 8c845e273104 ("arm64/sve: Leave SVE enabled on syscall if we don't context switch") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Spickett <david.spickett@arm.com> Cc: Luis Machado <luis.machado@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250508132644.1395904-15-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-08arm64/fpsimd: Make clone() compatible with ZA lazy savingMark Rutland1-30/+62
Linux is intended to be compatible with userspace written to Arm's AAPCS64 procedure call standard [1,2]. For the Scalable Matrix Extension (SME), AAPCS64 was extended with a "ZA lazy saving scheme", where SME's ZA tile is lazily callee-saved and caller-restored. In this scheme, TPIDR2_EL0 indicates whether the ZA tile is live or has been saved by pointing to a "TPIDR2 block" in memory, which has a "za_save_buffer" pointer. This scheme has been implemented in GCC and LLVM, with necessary runtime support implemented in glibc and bionic. AAPCS64 does not specify how the ZA lazy saving scheme is expected to interact with thread creation mechanisms such as fork() and pthread_create(), which would be implemented in terms of the Linux clone syscall. The behaviour implemented by Linux and glibc/bionic doesn't always compose safely, as explained below. Currently the clone syscall is implemented such that PSTATE.ZA and the ZA tile are always inherited by the new task, and TPIDR2_EL0 is inherited unless the 'flags' argument includes CLONE_SETTLS, in which case TPIDR2_EL0 is set to 0/NULL. This doesn't make much sense: (a) TPIDR2_EL0 is part of the calling convention, and changes as control is passed between functions. It is *NOT* used for thread local storage, despite superficial similarity to TPIDR_EL0, which is is used as the TLS register. (b) TPIDR2_EL0 and PSTATE.ZA are tightly coupled in the procedure call standard, and some combinations of states are illegal. In general, manipulating the two independently is not guaranteed to be safe. In practice, code which is compliant with the procedure call standard may issue a clone syscall while in the "ZA dormant" state, where PSTATE.ZA==1 and TPIDR2_EL0 is non-null and indicates that ZA needs to be saved. This can cause a variety of problems, including: * If the implementation of pthread_create() passes CLONE_SETTLS, the new thread will start with PSTATE.ZA==1 and TPIDR2==NULL. Per the procedure call standard this is not a legitimate state for most functions. This can cause data corruption (e.g. as code may rely on PSTATE.ZA being 0 to guarantee that an SMSTART ZA instruction will zero the ZA tile contents), and may result in other undefined behaviour. * If the implementation of pthread_create() does not pass CLONE_SETTLS, the new thread will start with PSTATE.ZA==1 and TPIDR2 pointing to a TPIDR2 block on the parent thread's stack. This can result in a variety of problems, e.g. - The child may write back to the parent's za_save_buffer, corrupting its contents. - The child may read from the TPIDR2 block after the parent has reused this memory for something else, and consequently the child may abort or clobber arbitrary memory. Ideally we'd require that userspace ensures that a task is in the "ZA off" state (with PSTATE.ZA==0 and TPIDR2_EL0==NULL) prior to issuing a clone syscall, and have the kernel force this state for new threads. Unfortunately, contemporary C libraries do not do this, and simply forcing this state within the implementation of clone would break fork(). Instead, we can bodge around this by considering the CLONE_VM flag, and manipulate PSTATE.ZA and TPIDR2_EL0 as a pair. CLONE_VM indicates that the new task will run in the same address space as its parent, and in that case it doesn't make sense to inherit a stale pointer to the parent's TPIDR2 block: * For fork(), CLONE_VM will not be set, and it is safe to inherit both PSTATE.ZA and TPIDR2_EL0 as the new task will have its own copy of the address space, and cannot clobber its parent's stack. * For pthread_create() and vfork(), CLONE_VM will be set, and discarding PSTATE.ZA and TPIDR2_EL0 for the new task doesn't break any existing assumptions in userspace. Implement this behaviour for clone(). We currently inherit PSTATE.ZA in arch_dup_task_struct(), but this does not have access to the clone flags, so move this logic under copy_thread(). Documentation is updated to describe the new behaviour. [1] https://github.com/ARM-software/abi-aa/releases/download/2025Q1/aapcs64.pdf [2] https://github.com/ARM-software/abi-aa/blob/c51addc3dc03e73a016a1e4edf25440bcac76431/aapcs64/aapcs64.rst Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Kiss <daniel.kiss@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Richard Sandiford <richard.sandiford@arm.com> Cc: Sander De Smalen <sander.desmalen@arm.com> Cc: Tamas Petz <tamas.petz@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yury Khrustalev <yury.khrustalev@arm.com> Acked-by: Yury Khrustalev <yury.khrustalev@arm.com> Link: https://lore.kernel.org/r/20250508132644.1395904-14-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-08arm64/fpsimd: Clear PSTATE.SM during clone()Mark Rutland1-9/+4
Currently arch_dup_task_struct() doesn't handle cases where the parent task has PSTATE.SM==1. Since syscall entry exits streaming mode, the parent will usually have PSTATE.SM==0, but this can be change by ptrace after syscall entry. When this happens, arch_dup_task_struct() will initialise the new task into an invalid state. The new task inherits the parent's configuration of PSTATE.SM, but fp_type is set to FP_STATE_FPSIMD, TIF_SVE and SME may be cleared, and both sve_state and sme_state may be set to NULL. This can result in a variety of problems whenever the new task's state is manipulated, including kernel NULL pointer dereferences and leaking of streaming mode state between tasks. When ptrace is not involved, the parent will have PSTATE.SM==0 as a result of syscall entry, and the documentation in Documentation/arch/arm64/sme.rst says: | On process creation (eg, clone()) the newly created process will have | PSTATE.SM cleared. ... so make this true by using task_smstop_sm() to exit streaming mode in the child task, avoiding the problems above. Fixes: 8bd7f91c03d8 ("arm64/sme: Implement traps and syscall handling for SME") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250508132644.1395904-13-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-08arm64/fpsimd: Consistently preserve FPSIMD state during clone()Mark Rutland1-1/+8
In arch_dup_task_struct() we try to ensure that the child task inherits the FPSIMD state of its parent, but this depends on the parent task's saved state being in FPSIMD format, which is not always the case. Consequently the child task may inherit stale FPSIMD state in some cases. This can happen when the parent's state has been modified by ptrace since syscall entry, as writes to the NT_ARM_SVE regset may save state in SVE format. This has been possible since commit: bc0ee4760364 ("arm64/sve: Core task context handling") More recently it has been possible for a task's FPSIMD/SVE state to be saved before lazy discarding was guaranteed to occur, in which case preemption could cause the effective FPSIMD state to be saved in SVE format non-deterministically. This has been possible since commit: f130ac0ae441 ("arm64: syscall: unmask DAIF earlier for SVCs") Fix this by saving the parent task's effective FPSIMD state into FPSIMD format before copying the task_struct. As this requires modifying the parent's fpsimd_state, we must save+flush the state to avoid racing with concurrent manipulation. Similar issues exist when the parent has streaming mode state, and will be addressed by subsequent patches. Fixes: bc0ee4760364 ("arm64/sve: Core task context handling") Fixes: f130ac0ae441 ("arm64: syscall: unmask DAIF earlier for SVCs") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250508132644.1395904-12-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-08arm64/fpsimd: Remove redundant task->mm checkMark Rutland1-2/+1
For historical reasons, arch_dup_task_struct() only calls fpsimd_preserve_current_state() when current->mm is non-NULL, but this is no longer necessary. Historically TIF_FOREIGN_FPSTATE was only managed for user threads, and was never set for kernel threads. At the time, various functions attempted to avoid saving/restoring state for kernel threads by checking task_struct::mm to try to distinguish user threads from kernel threads. We added the current->mm check to arch_dup_task_struct() in commit: 6eb6c80187c5 ("arm64: kernel thread don't need to save fpsimd context.") ... where the intent was to avoid pointlessly saving state for kernel threads, which never had live state (and the saved state would never be consumed). Subsequently we began setting TIF_FOREIGN_FPSTATE for kernel threads, and removed most of the task_struct::mm checks in commit: df3fb9682045 ("arm64: fpsimd: Eliminate task->mm checks") ... but we missed the check in arch_dup_task_struct(), which is similarly redundant. Remove the redundant check from arch_dup_task_struct(). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250508132644.1395904-11-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-08arm64/fpsimd: signal: Use SMSTOP behaviour in setup_return()Mark Rutland1-16/+2
Historically the behaviour of setup_return() was nondeterministic, depending on whether the task's FSIMD/SVE/SME state happened to be live. We fixed most of that in commit: 929fa99b1215 ("arm64/fpsimd: signal: Always save+flush state early") ... but we didn't decide on how clearing PSTATE.SM should behave, and left a TODO comment to that effect. Use the new task_smstop_sm() helper to make this behave as if an SMSTOP instruction was used to exit streaming mode. This would have been the most common behaviour prior to the commit above. Fixes: 40a8e87bb328 ("arm64/sme: Disable ZA and streaming mode when handling signals") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250508132644.1395904-10-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-08arm64/fpsimd: Add task_smstop_sm()Mark Rutland2-0/+24
In a few places we want to transition a task from streaming mode to non-streaming mode, e.g. signal delivery where we historically tried to use an SMSTOP SM instruction. Add a new helper to manipulate a task's state in the same way as an SMSTOP SM instruction. I have not added a corresponding helper to simulate the effects of SMSTART SM. Only ptrace transitions a task into streaming mode, and ptrace has distinct semantics for such transitions. Per ARM DDI 0487 L.a, section B1.4.6: | RRSWFQ | When the Effective value of PSTATE.SM is changed by any method from 0 | to 1, an entry to Streaming SVE mode is performed, and all implemented | bits of Streaming SVE register state are set to zero. | RKFRQZ | When the Effective value of PSTATE.SM is changed by any method from 1 | to 0, an exit from Streaming SVE mode is performed, and in the | newly-entered mode, all implemented bits of the SVE scalable vector | registers, SVE predicate registers, and FFR, are set to zero. Per ARM DDI 0487 L.a, section C5.2.9: | On entry to or exit from Streaming SVE mode, FPMR is set to 0 Per ARM DDI 0487 L.a, section C5.2.10: | On entry to or exit from Streaming SVE mode, FPSR.{IOC, DZC, OFC, UFC, | IXC, IDC, QC} are set to 1 and the remaining bits are set to 0. This means bits 0, 1, 2, 3, 4, 7, and 27 respectively, i.e. 0x0800009f Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250508132644.1395904-9-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-08arm64/fpsimd: Factor out {sve,sme}_state_size() helpersMark Rutland2-26/+37
In subsequent patches we'll need to determine the SVE/SME state size for a given SVE VL and SME VL regardless of whether a task is currently configured with those VLs. Split the sizing logic out of sve_state_size() and sme_state_size() so that we don't need to open-code this logic elsewhere. At the same time, apply minor cleanups: * Move sve_state_size() into fpsimd.h, matching the placement of sme_state_size(). * Remove the feature checks from sve_state_size(). We only call sve_state_size() when at least one of SVE and SME are supported, and when either of the two is not supported, the task's corresponding SVE/SME vector length will be zero. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250508132644.1395904-8-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-08arm64/fpsimd: Clarify sve_sync_*() functionsMark Rutland4-26/+20
The sve_sync_{to,from}_fpsimd*() functions are intended to extract/insert the currently effective FPSIMD state of a task regardless of whether the task's state is saved in FPSIMD format or SVE format. Historically they were only used by ptrace, but sve_sync_to_fpsimd() is now used more widely, and sve_sync_from_fpsimd_zeropad() may be used more widely in future. When FPSIMD/SVE state tracking was changed across commits: baa8515281b3 ("arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE") a0136be443d5 (arm64/fpsimd: Load FP state based on recorded data type") bbc6172eefdb ("arm64/fpsimd: SME no longer requires SVE register state") 8c845e273104 ("arm64/sve: Leave SVE enabled on syscall if we don't context switch") ... sve_sync_to_fpsimd() was updated to consider task->thread.fp_type rather than the task's TIF_SVE and PSTATE.SM, but (apparently due to an oversight) sve_sync_from_fpsimd_zeropad() was left as-is, leaving the two inconsistent. Due to this, sve_sync_from_fpsimd_zeropad() may copy state from task->thread.uw.fpsimd_state into task->thread.sve_state when task->thread.fp_type == FP_STATE_FPSIMD. This is redundant (but benign) as task->thread.uw.fpsimd_state is the effective state that will be restored, and task->thread.sve_state will not be consumed. For consistency, and to avoid the redundant work, it better for sve_sync_from_fpsimd_zeropad() to consider task->thread.fp_type alone, matching sve_sync_to_fpsimd(). The naming of both functions is somehat unfortunate, as it is unclear when and why they copy state. It would be better to describe them in terms of the effective state. Considering all of the above, clean this up: * Adjust sve_sync_from_fpsimd_zeropad() to consider task->thread.fp_type. * Update comments to clarify the intended semantics/usage. I've removed the description that task->thread.sve_state must have been allocated, as this is only necessary when task->thread.fp_type == FP_STATE_SVE, which itself implies that task->thread.sve_state must have been allocated. * Rename the functions to more clearly indicate when/why they copy state: - sve_sync_to_fpsimd() => fpsimd_sync_from_effective_state() - sve_sync_from_fpsimd_zeropad => fpsimd_sync_to_effective_state_zeropad() Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250508132644.1395904-7-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-08arm64/fpsimd: ptrace: Consistently handle partial writes to NT_ARM_(S)SVEMark Rutland3-33/+9
Partial writes to the NT_ARM_SVE and NT_ARM_SSVE regsets using an payload are handled inconsistently and non-deterministically. A comment within sve_set_common() indicates that we intended that a partial write would preserve any effective FPSIMD/SVE state which was not overwritten, but this has never worked consistently, and during syscalls the FPSIMD vector state may be non-deterministically preserved and may be erroneously migrated between streaming and non-streaming SVE modes. The simplest fix is to handle a partial write by consistently zeroing the remaining state. As detailed below I do not believe this will adversely affect any real usage. Neither GDB nor LLDB attempt partial writes to these regsets, and the documentation (in Documentation/arch/arm64/sve.rst) has always indicated that state preservation was not guaranteed, as is says: | The effect of writing a partial, incomplete payload is unspecified. When the logic was originally introduced in commit: 43d4da2c45b2 ("arm64/sve: ptrace and ELF coredump support") ... there were two potential behaviours, depending on TIF_SVE: * When TIF_SVE was clear, all SVE state would be zeroed, excluding the low 128 bits of vectors shared with FPSIMD, FPSR, and FPCR. * When TIF_SVE was set, all SVE state would be zeroed, including the low 128 bits of vectors shared with FPSIMD, but excluding FPSR and FPCR. Note that as writing to NT_ARM_SVE would set TIF_SVE, partial writes to NT_ARM_SVE would not be idempotent, and if a first write preserved the low 128 bits, a subsequent (potentially identical) partial write would discard the low 128 bits. When support for the NT_ARM_SSVE regset was added in commit: e12310a0d30f ("arm64/sme: Implement ptrace support for streaming mode SVE registers") ... the above behaviour was retained for writes to the NT_ARM_SVE regset, though writes to the NT_ARM_SSVE would always zero the SVE registers and would not inherit FPSIMD register state. This happened as fpsimd_sync_to_sve() only copied the FPSIMD regs when TIF_SVE was clear and PSTATE.SM==0. Subsequently, when FPSIMD/SVE state tracking was changed across commits: baa8515281b3 ("arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE") a0136be443d5 (arm64/fpsimd: Load FP state based on recorded data type") bbc6172eefdb ("arm64/fpsimd: SME no longer requires SVE register state") 8c845e273104 ("arm64/sve: Leave SVE enabled on syscall if we don't context switch") ... there was no corresponding update to the ptrace code, nor to fpsimd_sync_to_sve(), which stil considers TIF_SVE and PSTATE.SM rather than the saved fp_type. The saved state can be in the FPSIMD format regardless of whether TIF_SVE is set or clear, and the saved type can change non-deterministically during syscalls. Consequently a subsequent partial write to the NT_ARM_SVE or NT_ARM_SSVE regsets may non-deterministically preserve the FPSIMD state, and may migrate this state between streaming and non-streaming modes. Clean this up by never attempting to preserve ANY state when writing an SVE payload to the NT_ARM_SVE/NT_ARM_SSVE regsets, zeroing all relevant state including FPSR and FPCR. This simplifies the code, makes the behaviour deterministic, and avoids migrating state between streaming and non-streaming modes. As above, I do not believe this should adversely affect existing userspace applications. At the same time, remove fpsimd_sync_to_sve(). It is no longer used, doesn't do what its documentation implies, and gets in the way of other cleanups and fixes. Fixes: 43d4da2c45b2 ("arm64/sve: ptrace and ELF coredump support") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Spickett <david.spickett@arm.com> Cc: Luis Machado <luis.machado@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250508132644.1395904-6-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-08arm64: bpf: Only mitigate cBPF programs loaded by unprivileged usersJames Morse1-0/+3
Support for eBPF programs loaded by unprivileged users is typically disabled. This means only cBPF programs need to be mitigated for BHB. In addition, only mitigate cBPF programs that were loaded by an unprivileged user. Privileged users can also load the same program via eBPF, making the mitigation pointless. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2025-05-08arm64: bpf: Add BHB mitigation to the epilogue for cBPF programsJames Morse3-5/+52
A malicious BPF program may manipulate the branch history to influence what the hardware speculates will happen next. On exit from a BPF program, emit the BHB mititgation sequence. This is only applied for 'classic' cBPF programs that are loaded by seccomp. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2025-05-08arm64: proton-pack: Expose whether the branchy loop k valueJames Morse2-0/+6
Add a helper to expose the k value of the branchy loop. This is needed by the BPF JIT to generate the mitigation sequence in BPF programs. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
2025-05-08arm64: proton-pack: Expose whether the platform is mitigated by firmwareJames Morse2-0/+6
is_spectre_bhb_fw_affected() allows the caller to determine if the CPU is known to need a firmware mitigation. CPUs are either on the list of CPUs we know about, or firmware has been queried and reported that the platform is affected - and mitigated by firmware. This helper is not useful to determine if the platform is mitigated by firmware. A CPU could be on the know list, but the firmware may not be implemented. Its affected but not mitigated. spectre_bhb_enable_mitigation() handles this distinction by checking the firmware state before enabling the mitigation. Add a helper to expose this state. This will be used by the BPF JIT to determine if calling firmware for a mitigation is necessary and supported. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
2025-05-08arm64: insn: Add support for encoding DSBJames Morse2-23/+38
To generate code in the eBPF epilogue that uses the DSB instruction, insn.c needs a heler to encode the type and domain. Re-use the crm encoding logic from the DMB instruction. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
2025-05-08arm64/fpsimd: signal: Consistently read FPSIMD contextMark Rutland1-28/+29
For historical reasons, restore_sve_fpsimd_context() has an open-coded copy of the logic from read_fpsimd_context(), which is used to either restore an FPSIMD-only context, or to merge FPSIMD state into an SVE state when restoring an SVE+FPSIMD context. The logic is *almost* identical. Refactor the logic to avoid duplication and make this clearer. This comes with two functional changes that I do not believe will be problematic in practice: * The user_fpsimd_state::size field will be checked in all restore paths that consume it user_fpsimd_state. The kernel always populates this field when delivering a signal, and so this should contain the expected value unless it has been corrupted. * If a read of user_fpsimd_state fails, we will return early without modifying TIF_SVE, the saved SVCR, or the save fp_type. This will leave the task in a consistent state, without potentially resurrecting stale FPSIMD state. A read of user_fpsimd_state should never fail unless the structure has been corrupted or the stack has been unmapped. Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250508132644.1395904-5-mark.rutland@arm.com [will: Ensure read_fpsimd_context() returns negative error code or zero] Signed-off-by: Will Deacon <will@kernel.org>
2025-05-08arm64/fpsimd: signal: Mandate SVE payload for streaming-mode stateMark Rutland1-2/+13
Non-streaming SVE state may be preserved without an SVE payload, in which case the SVE context only has a header with VL==0, and all state can be restored from the FPSIMD context. Streaming SVE state is always preserved with an SVE payload, where the SVE context header has VL!=0, and the SVE_SIG_FLAG_SM flag is set. The kernel never preserves an SVE context where SVE_SIG_FLAG_SM is set without an SVE payload. However, restore_sve_fpsimd_context() doesn't forbid restoring such a context, and will handle this case by clearing PSTATE.SM and restoring the FPSIMD context into non-streaming mode, which isn't consistent with the SVE_SIG_FLAG_SM flag. Forbid this case, and mandate an SVE payload when the SVE_SIG_FLAG_SM flag is set. This avoids an awkward ABI quirk and reduces the risk that later rework to this code permits configuring a task with PSTATE.SM==1 and fp_type==FP_STATE_FPSIMD. I've marked this as a fix given that we never intended to support this case, and we don't want anyone to start relying upon the old behaviour once we re-enable SME. Fixes: 85ed24dad290 ("arm64/sme: Implement streaming SVE signal handling") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250508132644.1395904-4-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-08arm64/fpsimd: signal: Clear PSTATE.SM when restoring FPSIMD frame onlyMark Rutland1-0/+1
On systems with SVE and/or SME, the kernel will always create SVE and FPSIMD signal frames when delivering a signal, but a user can manipulate signal frames such that a signal return only observes an FPSIMD signal frame. When this happens, restore_fpsimd_context() will restore state such that fp_type==FP_STATE_FPSIMD, but will leave PSTATE.SM as-is. It is possible for a user to set PSTATE.SM between syscall entry and execution of the sigreturn logic (e.g. via ptrace), and consequently the sigreturn may result in the task having PSTATE.SM==1 and fp_type==FP_STATE_FPSIMD. For various reasons it is not legitimate for a task to be in a state where PSTATE.SM==1 and fp_type==FP_STATE_FPSIMD. Portions of the user ABI are written with the requirement that streaming SVE state is always presented in SVE format rather than FPSIMD format, and as there is no mechanism to permit access to only the FPSIMD subset of streaming SVE state, streaming SVE state must always be saved and restored in SVE format. Fix restore_fpsimd_context() to clear PSTATE.SM when restoring an FPSIMD signal frame without an SVE signal frame. This matches the current behaviour when an SVE signal frame is present, but the SVE signal frame has no register payload (e.g. as is the case on SME-only systems which lack SVE). This change should have no effect for applications which do not alter signal frames (i.e. almost all applications). I do not expect non-{malicious,buggy} applications to hide the SVE signal frame, but I've chosen to clear PSTATE.SM rather than mandating the presence of an SVE signal frame in case there is some legacy (non-SME) usage that I am not currently aware of. For context, the SME handling was originally introduced in commit: 85ed24dad290 ("arm64/sme: Implement streaming SVE signal handling") ... and subsequently updated/fixed to handle SME-only systems in commits: 7dde62f0687c ("arm64/signal: Always accept SVE signal frames on SME only systems") f26cd7372160 ("arm64/signal: Always allocate SVE signal frames on SME only systems") Fixes: 85ed24dad290 ("arm64/sme: Implement streaming SVE signal handling") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250508132644.1395904-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-08arm64/fpsimd: Do not discard modified SVE stateMark Rutland3-17/+47
Historically SVE state was discarded deterministically early in the syscall entry path, before ptrace is notified of syscall entry. This permitted ptrace to modify SVE state before and after the "real" syscall logic was executed, with the modified state being retained. This behaviour was changed by commit: 8c845e2731041f0f ("arm64/sve: Leave SVE enabled on syscall if we don't context switch") That commit was intended to speed up workloads that used SVE by opportunistically leaving SVE enabled when returning from a syscall. The syscall entry logic was modified to truncate the SVE state without disabling userspace access to SVE, and fpsimd_save_user_state() was modified to discard userspace SVE state whenever in_syscall(current_pt_regs()) is true, i.e. when current_pt_regs()->syscallno != NO_SYSCALL. Leaving SVE enabled opportunistically resulted in a couple of changes to userspace visible behaviour which weren't described at the time, but are logical consequences of opportunistically leaving SVE enabled: * Signal handlers can observe the type of saved state in the signal's sve_context record. When the kernel only tracks FPSIMD state, the 'vq' field is 0 and there is no space allocated for register contents. When the kernel tracks SVE state, the 'vq' field is non-zero and the register contents are saved into the record. As a result of the above commit, 'vq' (and the presence of SVE register state) is non-deterministically zero or non-zero for a period of time after a syscall. The effective register state is still deterministic. Hopefully no-one relies on this being deterministic. In general, handlers for asynchronous events cannot expect a deterministic state. * Similarly to signal handlers, ptrace requests can observe the type of saved state in the NT_ARM_SVE and NT_ARM_SSVE regsets, as this is exposed in the header flags. As a result of the above commit, this is now in a non-deterministic state after a syscall. The effective register state is still deterministic. Hopefully no-one relies on this being deterministic. In general, debuggers would have to handle this changing at arbitrary points during program flow. Discarding the SVE state within fpsimd_save_user_state() resulted in other changes to userspace visible behaviour which are not desirable: * A ptrace tracer can modify (or create) a tracee's SVE state at syscall entry or syscall exit. As a result of the above commit, the tracee's SVE state can be discarded non-deterministically after modification, rather than being retained as it previously was. Note that for co-operative tracer/tracee pairs, the tracer may (re)initialise the tracee's state arbitrarily after the tracee sends itself an initial SIGSTOP via a syscall, so this affects realistic design patterns. * The current_pt_regs()->syscallno field can be modified via ptrace, and can be altered even when the tracee is not really in a syscall, causing non-deterministic discarding to occur in situations where this was not previously possible. Further, using current_pt_regs()->syscallno in this way is unsound: * There are data races between readers and writers of the current_pt_regs()->syscallno field. The current_pt_regs()->syscallno field is written in interruptible task context using plain C accesses, and is read in irq/softirq context using plain C accesses. These accesses are subject to data races, with the usual concerns with tearing, etc. * Writes to current_pt_regs()->syscallno are subject to compiler reordering. As current_pt_regs()->syscallno is written with plain C accesses, the compiler is free to move those writes arbitrarily relative to anything which doesn't access the same memory location. In theory this could break signal return, where prior to restoring the SVE state, restore_sigframe() calls forget_syscall(). If the write were hoisted after restore of some SVE state, that state could be discarded unexpectedly. In practice that reordering cannot happen in the absence of LTO (as cross compilation-unit function calls happen prevent this reordering), and that reordering appears to be unlikely in the presence of LTO. Additionally, since commit: f130ac0ae4412dbe ("arm64: syscall: unmask DAIF earlier for SVCs") ... DAIF is unmasked before el0_svc_common() sets regs->syscallno to the real syscall number. Consequently state may be saved in SVE format prior to this point. Considering all of the above, current_pt_regs()->syscallno should not be used to infer whether the SVE state can be discarded. Luckily we can instead use cpu_fp_state::to_save to track when it is safe to discard the SVE state: * At syscall entry, after the live SVE register state is truncated, set cpu_fp_state::to_save to FP_STATE_FPSIMD to indicate that only the FPSIMD portion is live and needs to be saved. * At syscall exit, once the task's state is guaranteed to be live, set cpu_fp_state::to_save to FP_STATE_CURRENT to indicate that TIF_SVE must be considered to determine which state needs to be saved. * Whenever state is modified, it must be saved+flushed prior to manipulation. The state will be truncated if necessary when it is saved, and reloading the state will set fp_state::to_save to FP_STATE_CURRENT, preventing subsequent discarding. This permits SVE state to be discarded *only* when it is known to have been truncated (and the non-FPSIMD portions must be zero), and ensures that SVE state is retained after it is explicitly modified. For backporting, note that this fix depends on the following commits: * b2482807fbd4 ("arm64/sme: Optimise SME exit on syscall entry") * f130ac0ae441 ("arm64: syscall: unmask DAIF earlier for SVCs") * 929fa99b1215 ("arm64/fpsimd: signal: Always save+flush state early") Fixes: 8c845e273104 ("arm64/sve: Leave SVE enabled on syscall if we don't context switch") Fixes: f130ac0ae441 ("arm64: syscall: unmask DAIF earlier for SVCs") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250508132644.1395904-2-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-08KVM: SVM: Set/clear SRSO's BP_SPEC_REDUCE on 0 <=> 1 VM count transitionsSean Christopherson2-6/+67
Set the magic BP_SPEC_REDUCE bit to mitigate SRSO when running VMs if and only if KVM has at least one active VM. Leaving the bit set at all times unfortunately degrades performance by a wee bit more than expected. Use a dedicated spinlock and counter instead of hooking virtualization enablement, as changing the behavior of kvm.enable_virt_at_load based on SRSO_BP_SPEC_REDUCE is painful, and has its own drawbacks, e.g. could result in performance issues for flows that are sensitive to VM creation latency. Defer setting BP_SPEC_REDUCE until VMRUN is imminent to avoid impacting performance on CPUs that aren't running VMs, e.g. if a setup is using housekeeping CPUs. Setting BP_SPEC_REDUCE in task context, i.e. without blasting IPIs to all CPUs, also helps avoid serializing 1<=>N transitions without incurring a gross amount of complexity (see the Link for details on how ugly coordinating via IPIs gets). Link: https://lore.kernel.org/all/aBOnzNCngyS_pQIW@google.com Fixes: 8442df2b49ed ("x86/bugs: KVM: Add support for SRSO_MSR_FIX") Reported-by: Michael Larabel <Michael@michaellarabel.com> Closes: https://www.phoronix.com/review/linux-615-amd-regression Cc: Borislav Petkov <bp@alien8.de> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20250505180300.973137-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-05-08ARM: dts: am335x: Set wakeup-source for UART0Sukrut Bellary1-1/+1
On am335x evm[1], UART0(UART1-HW) has a wakeup capability. Set wakeup-source, which will be used in the omap serial driver to enable the device wakeup capability. [1] https://www.ti.com/tool/TMDXEVM3358 [2] AM335x TRM - https://www.ti.com/lit/ug/spruh73q/spruh73q.pdf Signed-off-by: Sukrut Bellary <sbellary@baylibre.com> Tested-by: Judith Mendez <jm@ti.com> Link: https://lore.kernel.org/r/20250318230042.3138542-4-sbellary@baylibre.com Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2025-05-08ARM: OMAP2+: Fix l4ls clk domain handling in STANDBYSukrut Bellary3-2/+15
Don't put the l4ls clk domain to sleep in case of standby. Since CM3 PM FW[1](ti-v4.1.y) doesn't wake-up/enable the l4ls clk domain upon wake-up, CM3 PM FW fails to wake-up the MPU. [1] https://git.ti.com/cgit/processor-firmware/ti-amx3-cm3-pm-firmware/ Signed-off-by: Sukrut Bellary <sbellary@baylibre.com> Tested-by: Judith Mendez <jm@ti.com> Link: https://lore.kernel.org/r/20250318230042.3138542-2-sbellary@baylibre.com Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2025-05-08riscv: Disallow PR_GET_TAGGED_ADDR_CTRL without SupmSamuel Holland1-0/+3
When the prctl() interface for pointer masking was added, it did not check that the pointer masking ISA extension was supported, only the individual submodes. Userspace could still attempt to disable pointer masking and query the pointer masking state. commit 81de1afb2dd1 ("riscv: Fix kernel crash due to PR_SET_TAGGED_ADDR_CTRL") disallowed the former, as the senvcfg write could crash on older systems. PR_GET_TAGGED_ADDR_CTRL state does not crash, because it reads only kernel-internal state and not senvcfg, but it should still be disallowed for consistency. Fixes: 09d6775f503b ("riscv: Add support for userspace pointer masking") Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20250507145230.2272871-1-samuel.holland@sifive.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2025-05-08riscv: Fix kernel crash due to PR_SET_TAGGED_ADDR_CTRLNam Cao1-0/+3
When userspace does PR_SET_TAGGED_ADDR_CTRL, but Supm extension is not available, the kernel crashes: Oops - illegal instruction [#1] [snip] epc : set_tagged_addr_ctrl+0x112/0x15a ra : set_tagged_addr_ctrl+0x74/0x15a epc : ffffffff80011ace ra : ffffffff80011a30 sp : ffffffc60039be10 [snip] status: 0000000200000120 badaddr: 0000000010a79073 cause: 0000000000000002 set_tagged_addr_ctrl+0x112/0x15a __riscv_sys_prctl+0x352/0x73c do_trap_ecall_u+0x17c/0x20c andle_exception+0x150/0x15c Fix it by checking if Supm is available. Fixes: 09d6775f503b ("riscv: Add support for userspace pointer masking") Signed-off-by: Nam Cao <namcao@linutronix.de> Cc: stable@vger.kernel.org Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20250504101920.3393053-1-namcao@linutronix.de Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2025-05-08riscv: misaligned: use get_user() instead of __get_user()Clément Léger1-1/+1
Now that we can safely handle user memory accesses while in the misaligned access handlers, use get_user() instead of __get_user() to have user memory access checks. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250422162324.956065-4-cleger@rivosinc.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2025-05-08riscv: misaligned: enable IRQs while handling misaligned accessesClément Léger1-4/+8
We can safely reenable IRQs if coming from userspace. This allows to access user memory that could potentially trigger a page fault. Fixes: b686ecdeacf6 ("riscv: misaligned: Restrict user access to kernel memory") Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250422162324.956065-3-cleger@rivosinc.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2025-05-08riscv: misaligned: factorize trap handlingClément Léger1-30/+36
Since both load/store and user/kernel should use almost the same path and that we are going to add some code around that, factorize it. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250422162324.956065-2-cleger@rivosinc.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2025-05-08KVM: arm64: Drop sort_memblock_regions()Gavin Shan1-19/+0
Drop sort_memblock_regions() and avoid sorting the copied memory regions to be ascending order on their base addresses, because the source memory regions should have been sorted correctly when they are added by memblock_add() or its variants. This is generally reverting commit a14307f5310c ("KVM: arm64: Sort the hypervisor memblocks"). No functional changes intended. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Quentin Perret <qperret@google.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250311043718.91004-1-gshan@redhat.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-08x86: disable image size check for test buildsGuenter Roeck1-1/+9
64-bit allyesconfig builds fail with x86_64-linux-ld: kernel image bigger than KERNEL_IMAGE_SIZE Bisect points to commit 6f110a5e4f99 ("Disable SLUB_TINY for build testing") as the responsible commit. Reverting that patch does indeed fix the problem. Further analysis shows that disabling SLUB_TINY enables KASAN, and that KASAN is responsible for the image size increase. Solve the build problem by disabling the image size check for test builds. [akpm@linux-foundation.org: add comment, fix nearby typo (sink->sync)] [akpm@linux-foundation.org: fix comment snafu Link: https://lore.kernel.org/oe-kbuild-all/202504191813.4r9H6Glt-lkp@intel.com/ Link: https://lkml.kernel.org/r/20250417010950.2203847-1-linux@roeck-us.net Fixes: 6f110a5e4f99 ("Disable SLUB_TINY for build testing") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: <x86@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-08riscv: dts: thead: Add device tree VO clock controllerMichal Wilczynski1-0/+7
VO clocks reside in a different address space from the AP clocks on the T-HEAD SoC. Add the device tree node of a clock-controller to handle VO address space as well. Reviewed-by: Drew Fustini <drew@pdp7.com> Signed-off-by: Michal Wilczynski <m.wilczynski@samsung.com> Signed-off-by: Drew Fustini <drew@pdp7.com>
2025-05-08crypto: arm64/sha256 - fix build when CONFIG_PREEMPT_VOLUNTARY=yEric Biggers1-3/+4
Fix the build of sha256-ce.S when CONFIG_PREEMPT_VOLUNTARY=y by passing the correct label to the cond_yield macro. Also adjust the code to execute only one branch instruction when CONFIG_PREEMPT_VOLUNTARY=n. Fixes: 6e36be511d28 ("crypto: arm64/sha256 - implement library instead of shash") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202505071811.yYpLUbav-lkp@intel.com/ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-08crypto: powerpc/poly1305 - Add missing poly1305_emit_archHerbert Xu2-2/+3
Rename poly1305_emit_64 to poly1305_emit_arch to conform with the expectation of the poly1305 library. Reported-by: Thorsten Leemhuis <linux@leemhuis.info> Fixes: 14d31979145d ("crypto: powerpc/poly1305 - Add block-only interface") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Thorsten Leemhuis <linux@leemhuis.info> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>