summaryrefslogtreecommitdiff
path: root/arch/riscv/kernel
AgeCommit message (Collapse)AuthorFilesLines
2023-10-12RISC-V: Fix wrong use of CONFIG_HAVE_SOFTIRQ_ON_OWN_STACKJiexun Wang1-2/+2
If configuration options SOFTIRQ_ON_OWN_STACK and PREEMPT_RT are enabled simultaneously under RISC-V architecture, it will result in a compilation failure: arch/riscv/kernel/irq.c:64:6: error: redefinition of 'do_softirq_own_stack' 64 | void do_softirq_own_stack(void) | ^~~~~~~~~~~~~~~~~~~~ In file included from ./arch/riscv/include/generated/asm/softirq_stack.h:1, from arch/riscv/kernel/irq.c:15: ./include/asm-generic/softirq_stack.h:8:20: note: previous definition of 'do_softirq_own_stack' was here 8 | static inline void do_softirq_own_stack(void) | ^~~~~~~~~~~~~~~~~~~~ After changing CONFIG_HAVE_SOFTIRQ_ON_OWN_STACK to CONFIG_SOFTIRQ_ON_OWN_STACK, compilation can be successful. Fixes: dd69d07a5a6c ("riscv: stack: Support HAVE_SOFTIRQ_ON_OWN_STACK") Reviewed-by: Guo Ren <guoren@kernel.org> Signed-off-by: Jiexun Wang <wangjiexun@tinylab.org> Reviewed-by: Samuel Holland <samuel@sholland.org> Link: https://lore.kernel.org/r/20230913052940.374686-1-wangjiexun@tinylab.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-12riscv: kdump: fix crashkernel reserving problem on RISC-VChen Jiahao1-13/+0
When testing on risc-v QEMU environment with "crashkernel=" parameter enabled, a problem occurred with the following message: [ 0.000000] crashkernel low memory reserved: 0xf8000000 - 0x100000000 (128 MB) [ 0.000000] crashkernel reserved: 0x0000000177e00000 - 0x0000000277e00000 (4096 MB) [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/resource.c:779 __insert_resource+0x8e/0xd0 [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc2-next-20230920 #1 [ 0.000000] Hardware name: riscv-virtio,qemu (DT) [ 0.000000] epc : __insert_resource+0x8e/0xd0 [ 0.000000] ra : insert_resource+0x28/0x4e [ 0.000000] epc : ffffffff80017344 ra : ffffffff8001742e sp : ffffffff81203db0 [ 0.000000] gp : ffffffff812ece98 tp : ffffffff8120dac0 t0 : ff600001f7ff2b00 [ 0.000000] t1 : 0000000000000000 t2 : 3428203030303030 s0 : ffffffff81203dc0 [ 0.000000] s1 : ffffffff81211e18 a0 : ffffffff81211e18 a1 : ffffffff81289380 [ 0.000000] a2 : 0000000277dfffff a3 : 0000000177e00000 a4 : 0000000177e00000 [ 0.000000] a5 : ffffffff81289380 a6 : 0000000277dfffff a7 : 0000000000000078 [ 0.000000] s2 : ffffffff81289380 s3 : ffffffff80a0bac8 s4 : ff600001f7ff2880 [ 0.000000] s5 : 0000000000000280 s6 : 8000000a00006800 s7 : 000000000000007f [ 0.000000] s8 : 0000000080017038 s9 : 0000000080038ea0 s10: 0000000000000000 [ 0.000000] s11: 0000000000000000 t3 : ffffffff80a0bc00 t4 : ffffffff80a0bc00 [ 0.000000] t5 : ffffffff80a0bbd0 t6 : ffffffff80a0bc00 [ 0.000000] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 [ 0.000000] [<ffffffff80017344>] __insert_resource+0x8e/0xd0 [ 0.000000] ---[ end trace 0000000000000000 ]--- [ 0.000000] Failed to add a Crash kernel resource at 177e00000 The crashkernel memory has been allocated successfully, whereas it failed to insert into iomem_resource. This is due to the unique reserving logic in risc-v arch specific code, i.e. crashk_res/crashk_low_res will be added into iomem_resource later in init_resources(), which is not aligned with current unified reserving logic in reserve_crashkernel_{generic,low}() and therefore leads to the failure of crashkernel reservation. Removing the arch specific code within #ifdef CONFIG_KEXEC_CORE in init_resources() to fix above problem. Fixes: 31549153088e ("riscv: kdump: use generic interface to simplify crashkernel reservation") Signed-off-by: Chen Jiahao <chenjiahao16@huawei.com> Link: https://lore.kernel.org/r/20230925024333.730964-1-chenjiahao16@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-12riscv: signal: fix sigaltstack frame size checkingAndy Chiu1-7/+0
The alternative stack checking in get_sigframe introduced by the Vector support is not needed and has a problem. It is not needed as we have already validate it at the beginning of the function if we are already on an altstack. If not, the size of an altstack is always validated at its allocation stage with sigaltstack_size_valid(). Besides, we must only regard the size of an altstack if the handler of a signal is registered with SA_ONSTACK. So, blindly checking overflow of an altstack if sas_ss_size not equals to zero will check against wrong signal handlers if only a subset of signals are registered with SA_ONSTACK. Fixes: 8ee0b41898fa ("riscv: signal: Add sigcontext save/restore for vector") Reported-by: Prashanth Swaminathan <prashanthsw@google.com> Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Link: https://lore.kernel.org/r/20230822164904.21660-1-andy.chiu@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-20riscv: Only consider swbp/ss handlers for correct privileged modeBjörn Töpel1-10/+18
RISC-V software breakpoint trap handlers are used for {k,u}probes. When trapping from kernelmode, only the kernelmode handlers should be considered. Vice versa, only usermode handlers for usermode traps. This is not the case on RISC-V, which can trigger a bug if a userspace process uses uprobes, and a WARN() is triggered from kernelmode (which is implemented via {c.,}ebreak). The kernel will trap on the kernelmode {c.,}ebreak, look for uprobes handlers, realize incorrectly that uprobes need to be handled, and exit the trap handler early. The trap returns to re-executing the {c.,}ebreak, and enter an infinite trap-loop. The issue was found running the BPF selftest [1]. Fix this issue by only considering the swbp/ss handlers for kernel/usermode respectively. Also, move CONFIG ifdeffery from traps.c to the asm/{k,u}probes.h headers. Note that linux/uprobes.h only include asm/uprobes.h if CONFIG_UPROBES is defined, which is why asm/uprobes.h needs to be unconditionally included in traps.c Link: https://lore.kernel.org/linux-riscv/87v8d19aun.fsf@all.your.base.are.belong.to.us/ # [1] Fixes: 74784081aac8 ("riscv: Add uprobes supported") Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Nam Cao <namcaov@gmail.com> Tested-by: Puranjay Mohan <puranjay12@gmail.com> Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20230912065619.62020-1-bjorn@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-12riscv: kexec: Align the kexeced kernel entrySong Shuai1-1/+7
The current riscv boot protocol requires 2MB alignment for RV64 and 4MB alignment for RV32. In KEXEC_FILE path, the elf_find_pbase() function should align the kexeced kernel entry according to the requirement, otherwise the kexeced kernel would silently BUG at the setup_vm(). Fixes: 8acea455fafa ("RISC-V: Support for kexec_file on panic") Signed-off-by: Song Shuai <songshuaishuai@tinylab.org> Link: https://lore.kernel.org/r/20230906095817.364390-1-songshuaishuai@tinylab.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08Merge patch series "bpf, riscv: use BPF prog pack allocator in BPF JIT"Palmer Dabbelt1-5/+109
Puranjay Mohan <puranjay12@gmail.com> says: Here is some data to prove the V2 fixes the problem: Without this series: root@rv-selftester:~/src/kselftest/bpf# time ./test_tag test_tag: OK (40945 tests) real 7m47.562s user 0m24.145s sys 6m37.064s With this series applied: root@rv-selftester:~/src/selftest/bpf# time ./test_tag test_tag: OK (40945 tests) real 7m29.472s user 0m25.865s sys 6m18.401s BPF programs currently consume a page each on RISCV. For systems with many BPF programs, this adds significant pressure to instruction TLB. High iTLB pressure usually causes slow down for the whole system. Song Liu introduced the BPF prog pack allocator[1] to mitigate the above issue. It packs multiple BPF programs into a single huge page. It is currently only enabled for the x86_64 BPF JIT. I enabled this allocator on the ARM64 BPF JIT[2]. It is being reviewed now. This patch series enables the BPF prog pack allocator for the RISCV BPF JIT. ====================================================== Performance Analysis of prog pack allocator on RISCV64 ====================================================== Test setup: =========== Host machine: Debian GNU/Linux 11 (bullseye) Qemu Version: QEMU emulator version 8.0.3 (Debian 1:8.0.3+dfsg-1) u-boot-qemu Version: 2023.07+dfsg-1 opensbi Version: 1.3-1 To test the performance of the BPF prog pack allocator on RV, a stresser tool[4] linked below was built. This tool loads 8 BPF programs on the system and triggers 5 of them in an infinite loop by doing system calls. The runner script starts 20 instances of the above which loads 8*20=160 BPF programs on the system, 5*20=100 of which are being constantly triggered. The script is passed a command which would be run in the above environment. The script was run with following perf command: ./run.sh "perf stat -a \ -e iTLB-load-misses \ -e dTLB-load-misses \ -e dTLB-store-misses \ -e instructions \ --timeout 60000" The output of the above command is discussed below before and after enabling the BPF prog pack allocator. The tests were run on qemu-system-riscv64 with 8 cpus, 16G memory. The rootfs was created using Bjorn's riscv-cross-builder[5] docker container linked below. Results ======= Before enabling prog pack allocator: ------------------------------------ Performance counter stats for 'system wide': 4939048 iTLB-load-misses 5468689 dTLB-load-misses 465234 dTLB-store-misses 1441082097998 instructions 60.045791200 seconds time elapsed After enabling prog pack allocator: ----------------------------------- Performance counter stats for 'system wide': 3430035 iTLB-load-misses 5008745 dTLB-load-misses 409944 dTLB-store-misses 1441535637988 instructions 60.046296600 seconds time elapsed Improvements in metrics ======================= It was expected that the iTLB-load-misses would decrease as now a single huge page is used to keep all the BPF programs compared to a single page for each program earlier. -------------------------------------------- The improvement in iTLB-load-misses: -30.5 % -------------------------------------------- I repeated this expriment more than 100 times in different setups and the improvement was always greater than 30%. This patch series is boot tested on the Starfive VisionFive 2 board[6]. The performance analysis was not done on the board because it doesn't expose iTLB-load-misses, etc. The stresser program was run on the board to test the loading and unloading of BPF programs [1] https://lore.kernel.org/bpf/20220204185742.271030-1-song@kernel.org/ [2] https://lore.kernel.org/all/20230626085811.3192402-1-puranjay12@gmail.com/ [3] https://lore.kernel.org/all/20230626085811.3192402-2-puranjay12@gmail.com/ [4] https://github.com/puranjaymohan/BPF-Allocator-Bench [5] https://github.com/bjoto/riscv-cross-builder [6] https://www.starfivetech.com/en/site/boards * b4-shazam-merge: bpf, riscv: use prog pack allocator in the BPF JIT riscv: implement a memset like function for text riscv: extend patch_text_nosync() for multiple pages bpf: make bpf_prog_pack allocator portable Link: https://lore.kernel.org/r/20230831131229.497941-1-puranjay12@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08Merge patch series "riscv: Introduce KASLR"Palmer Dabbelt5-1/+70
Alexandre Ghiti <alexghiti@rivosinc.com> says: The following KASLR implementation allows to randomize the kernel mapping: - virtually: we expect the bootloader to provide a seed in the device-tree - physically: only implemented in the EFI stub, it relies on the firmware to provide a seed using EFI_RNG_PROTOCOL. arm64 has a similar implementation hence the patch 3 factorizes KASLR related functions for riscv to take advantage. The new virtual kernel location is limited by the early page table that only has one PUD and with the PMD alignment constraint, the kernel can only take < 512 positions. * b4-shazam-merge: riscv: libstub: Implement KASLR by using generic functions libstub: Fix compilation warning for rv32 arm64: libstub: Move KASLR handling functions to kaslr.c riscv: Dump out kernel offset information on panic riscv: Introduce virtual kernel mapping KASLR Link: https://lore.kernel.org/r/20230722123850.634544-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08Merge patch "RISC-V: Add ptrace support for vectors"Palmer Dabbelt1-0/+79
This resurrects the vector ptrace() support that was removed for 6.5 due to some bugs cropping up as part of the GDB review process. * b4-shazam-merge: RISC-V: Add ptrace support for vectors Link: https://lore.kernel.org/r/20230825050248.32681-1-andy.chiu@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08Merge patch series "Add non-coherent DMA support for AX45MP"Palmer Dabbelt1-0/+5
Prabhakar <prabhakar.csengg@gmail.com> says: From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> non-coherent DMA support for AX45MP ==================================== On the Andes AX45MP core, cache coherency is a specification option so it may not be supported. In this case DMA will fail. To get around with this issue this patch series does the below: 1] Andes alternative ports is implemented as errata which checks if the IOCP is missing and only then applies to CMO errata. One vendor specific SBI EXT (ANDES_SBI_EXT_IOCP_SW_WORKAROUND) is implemented as part of errata. Below are the configs which Andes port provides (and are selected by RZ/Five): - ERRATA_ANDES - ERRATA_ANDES_CMO OpenSBI patch supporting ANDES_SBI_EXT_IOCP_SW_WORKAROUND SBI is now part v1.3 release. 2] Andes AX45MP core has a Programmable Physical Memory Attributes (PMA) block that allows dynamic adjustment of memory attributes in the runtime. It contains a configurable amount of PMA entries implemented as CSR registers to control the attributes of memory locations in interest. OpenSBI configures the PMA regions as required and creates a reserve memory node and propagates it to the higher boot stack. Currently OpenSBI (upstream) configures the required PMA region and passes this a shared DMA pool to Linux. reserved-memory { #address-cells = <2>; #size-cells = <2>; ranges; pma_resv0@58000000 { compatible = "shared-dma-pool"; reg = <0x0 0x58000000 0x0 0x08000000>; no-map; linux,dma-default; }; }; The above shared DMA pool gets appended to Linux DTB so the DMA memory requests go through this region. 3] We provide callbacks to synchronize specific content between memory and cache. 4] RZ/Five SoC selects the below configs - AX45MP_L2_CACHE - DMA_GLOBAL_POOL - ERRATA_ANDES - ERRATA_ANDES_CMO ----------x---------------------x--------------------x---------------x---- * b4-shazam-merge: soc: renesas: Kconfig: Select the required configs for RZ/Five SoC cache: Add L2 cache management for Andes AX45MP RISC-V core dt-bindings: cache: andestech,ax45mp-cache: Add DT binding documentation for L2 cache controller riscv: mm: dma-noncoherent: nonstandard cache operations support riscv: errata: Add Andes alternative ports riscv: asm: vendorid_list: Add Andes Technology to the vendors list Link: https://lore.kernel.org/r/20230818135723.80612-1-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08Merge patch series "RISC-V: Probe for misaligned access speed"Palmer Dabbelt6-20/+191
Evan Green <evan@rivosinc.com> says: The current setting for the hwprobe bit indicating misaligned access speed is controlled by a vendor-specific feature probe function. This is essentially a per-SoC table we have to maintain on behalf of each vendor going forward. Let's convert that instead to something we detect at runtime. We have two assembly routines at the heart of our probe: one that does a bunch of word-sized accesses (without aligning its input buffer), and the other that does byte accesses. If we can move a larger number of bytes using misaligned word accesses than we can with the same amount of time doing byte accesses, then we can declare misaligned accesses as "fast". The tradeoff of reducing this maintenance burden is boot time. We spend 4-6 jiffies per core doing this measurement (0-2 on jiffie edge alignment, and 4 on measurement). The timing loop was based on raid6_choose_gen(), which uses (16+1)*N jiffies (where N is the number of algorithms). By taking only the fastest iteration out of all attempts for use in the comparison, variance between runs is very low. On my THead C906, it looks like this: [ 0.047563] cpu0: Ratio of byte access time to unaligned word access is 4.34, unaligned accesses are fast Several others have chimed in with results on slow machines with the older algorithm, which took all runs into account, including noise like interrupts. Even with this variation, results indicate that in all cases (fast, slow, and emulated) the measured numbers are nowhere near each other (always multiple factors away). * b4-shazam-merge: RISC-V: alternative: Remove feature_probe_func RISC-V: Probe for unaligned access speed Link: https://lore.kernel.org/r/20230818194136.4084400-1-evan@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-06riscv: implement a memset like function for textPuranjay Mohan1-0/+77
The BPF JIT needs to write invalid instructions to RX regions of memory to invalidate removed BPF programs. This needs a function like memset() that can work with RX memory. Implement patch_text_set_nosync() which is similar to text_poke_set() of x86. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Pu Lehui <pulehui@huawei.com> Acked-by: Björn Töpel <bjorn@kernel.org> Tested-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20230831131229.497941-4-puranjay12@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-06riscv: extend patch_text_nosync() for multiple pagesPuranjay Mohan1-5/+32
The patch_insn_write() function currently doesn't work for multiple pages of instructions, therefore patch_text_nosync() will fail with a page fault if called with lengths spanning multiple pages. This commit extends the patch_insn_write() function to support multiple pages by copying at max 2 pages at a time in a loop. This implementation is similar to text_poke_copy() function of x86. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Pu Lehui <pulehui@huawei.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20230831131229.497941-3-puranjay12@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-06riscv: libstub: Implement KASLR by using generic functionsAlexandre Ghiti1-0/+1
We can now use arm64 functions to handle the move of the kernel physical mapping: if KASLR is enabled, we will try to get a random seed from the firmware, if not possible, the kernel will be moved to a location that suits its alignment constraints. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Song Shuai <songshuaishuai@tinylab.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20230722123850.634544-6-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-06riscv: Dump out kernel offset information on panicAlexandre Ghiti1-0/+25
Dump out the KASLR virtual kernel offset when panic to help debug kernel. Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Song Shuai <songshuaishuai@tinylab.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20230722123850.634544-3-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-06riscv: Introduce virtual kernel mapping KASLRAlexandre Ghiti3-1/+44
KASLR implementation relies on a relocatable kernel so that we can move the kernel mapping. The seed needed to virtually move the kernel is taken from the device tree, so we rely on the bootloader to provide a correct seed. Zkr could be used unconditionnally instead if implemented, but that's for another patch. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Song Shuai <songshuaishuai@tinylab.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20230722123850.634544-2-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-01RISC-V: Add ptrace support for vectorsAndy Chiu1-0/+79
This patch add back the ptrace support with the following fix: - Define NT_RISCV_CSR and re-number NT_RISCV_VECTOR to prevent conflicting with gdb's NT_RISCV_CSR. - Use struct __riscv_v_regset_state to handle ptrace requests Since gdb does not directly include the note description header in Linux and has already defined NT_RISCV_CSR as 0x900, we decide to sync with gdb and renumber NT_RISCV_VECTOR to solve and prevent future conflicts. Fixes: 0c59922c769a ("riscv: Add ptrace vector support") Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Link: https://lore.kernel.org/r/20230825050248.32681-1-andy.chiu@sifive.com [Palmer: Drop the unused "size" variable in riscv_vr_set().] Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-01riscv: errata: Add Andes alternative portsLad Prabhakar1-0/+5
Add required ports of the Alternative scheme for Andes CPU cores. I/O Coherence Port (IOCP) provides an AXI interface for connecting external non-caching masters, such as DMA controllers. IOCP is a specification option and is disabled on the Renesas RZ/Five SoC due to this reason cache management needs a software workaround. Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> # tyre-kicking on a d1 Link: https://lore.kernel.org/r/20230818135723.80612-3-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-01RISC-V: alternative: Remove feature_probe_funcEvan Green2-20/+0
Now that we're testing unaligned memory copy and making that determination generically, there are no more users of the vendor feature_probe_func(). While I think it's probably going to need to come back, there are no users right now, so let's remove it until it's needed. Signed-off-by: Evan Green <evan@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230818194136.4084400-3-evan@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-01RISC-V: Probe for unaligned access speedEvan Green5-0/+191
Rather than deferring unaligned access speed determinations to a vendor function, let's probe them and find out how fast they are. If we determine that an unaligned word access is faster than N byte accesses, mark the hardware's unaligned access as "fast". Otherwise, we mark accesses as slow. The algorithm itself runs for a fixed amount of jiffies. Within each iteration it attempts to time a single loop, and then keeps only the best (fastest) loop it saw. This algorithm was found to have lower variance from run to run than my first attempt, which counted the total number of iterations that could be done in that fixed amount of jiffies. By taking only the best iteration in the loop, assuming at least one loop wasn't perturbed by an interrupt, we eliminate the effects of interrupts and other "warm up" factors like branch prediction. The only downside is it depends on having an rdtime granular and accurate enough to measure a single copy. If we ever manage to complete a loop in 0 rdtime ticks, we leave the unaligned setting at UNKNOWN. There is a slight change in user-visible behavior here. Previously, all boards except the THead C906 reported misaligned access speed of UNKNOWN. C906 reported FAST. With this change, since we're now measuring misaligned access speed on each hart, all RISC-V systems will have this key set as either FAST or SLOW. Currently, we don't have a way to confidently measure the difference between SLOW and EMULATED, so we label anything not fast as SLOW. This will mislabel some systems that are actually EMULATED as SLOW. When we get support for delegating misaligned access traps to the kernel (as opposed to the firmware quietly handling it), we can explicitly test in Linux to see if unaligned accesses trap. Those systems will start to report EMULATED, though older (today's) systems without that new SBI mechanism will continue to report SLOW. I've updated the documentation for those hwprobe values to reflect this, specifically: SLOW may or may not be emulated by software, and FAST represents means being faster than equivalent byte accesses. The change in documentation is accurate with respect to both the former and current behavior. Signed-off-by: Evan Green <evan@rivosinc.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230818194136.4084400-2-evan@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-01Merge tag 'riscv-for-linus-6.6-mw1' of ↵Linus Torvalds15-318/+636
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for the new "riscv,isa-extensions" and "riscv,isa-base" device tree interfaces for probing extensions - Support for userspace access to the performance counters - Support for more instructions in kprobes - Crash kernels can be allocated above 4GiB - Support for KCFI - Support for ELFs in !MMU configurations - ARCH_KMALLOC_MINALIGN has been reduced to 8 - mmap() defaults to sv48-sized addresses, with longer addresses hidden behind a hint (similar to Arm and Intel) - Also various fixes and cleanups * tag 'riscv-for-linus-6.6-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (51 commits) lib/Kconfig.debug: Restrict DEBUG_INFO_SPLIT for RISC-V riscv: support PREEMPT_DYNAMIC with static keys riscv: Move create_tmp_mapping() to init sections riscv: Mark KASAN tmp* page tables variables as static riscv: mm: use bitmap_zero() API riscv: enable DEBUG_FORCE_FUNCTION_ALIGN_64B riscv: remove redundant mv instructions RISC-V: mm: Document mmap changes RISC-V: mm: Update pgtable comment documentation RISC-V: mm: Add tests for RISC-V mm RISC-V: mm: Restrict address space for sv39,sv48,sv57 riscv: enable DMA_BOUNCE_UNALIGNED_KMALLOC for !dma_coherent riscv: allow kmalloc() caches aligned to the smallest value riscv: support the elf-fdpic binfmt loader binfmt_elf_fdpic: support 64-bit systems riscv: Allow CONFIG_CFI_CLANG to be selected riscv/purgatory: Disable CFI riscv: Add CFI error handling riscv: Add ftrace_stub_graph riscv: Add types to indirectly called assembly functions ...
2023-08-31Merge patch series "riscv: Reduce ARCH_KMALLOC_MINALIGN to 8"Palmer Dabbelt1-0/+1
Jisheng Zhang <jszhang@kernel.org> says: Currently, riscv defines ARCH_DMA_MINALIGN as L1_CACHE_BYTES, I.E 64Bytes, if CONFIG_RISCV_DMA_NONCOHERENT=y. To support unified kernel Image, usually we have to enable CONFIG_RISCV_DMA_NONCOHERENT, thus it brings some bad effects to coherent platforms: Firstly, it wastes memory, kmalloc-96, kmalloc-32, kmalloc-16 and kmalloc-8 slab caches don't exist any more, they are replaced with either kmalloc-128 or kmalloc-64. Secondly, larger than necessary kmalloc aligned allocations results in unnecessary cache/TLB pressure. This issue also exists on arm64 platforms. From last year, Catalin tried to solve this issue by decoupling ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN, limiting kmalloc() minimum alignment to dma_get_cache_alignment() and replacing ARCH_KMALLOC_MINALIGN usage in various drivers with ARCH_DMA_MINALIGN etc.[1] One fact we can make use of for riscv: if the CPU doesn't support ZICBOM or T-HEAD CMO, we know the platform is coherent. Based on Catalin's work and above fact, we can easily solve the kmalloc align issue for riscv: we can override dma_get_cache_alignment(), then let it return ARCH_DMA_MINALIGN at the beginning and return 1 once we know the underlying HW neither supports ZICBOM nor supports T-HEAD CMO. So what about if the CPU supports ZICBOM or T-HEAD CMO, but all the devices are dma coherent? Well, we use ARCH_DMA_MINALIGN as the kmalloc minimum alignment, nothing changed in this case. This case can be improved in the future once we see such platforms in mainline. After this patch, a simple test of booting to a small buildroot rootfs on qemu shows: kmalloc-96 5041 5041 96 ... kmalloc-64 9606 9606 64 ... kmalloc-32 5128 5128 32 ... kmalloc-16 7682 7682 16 ... kmalloc-8 10246 10246 8 ... So we save about 1268KB memory. The saving will be much larger in normal OS env on real HW platforms. patch1 allows kmalloc() caches aligned to the smallest value. patch2 enables DMA_BOUNCE_UNALIGNED_KMALLOC. After this series: As for coherent platforms, kmalloc-{8,16,32,96} caches come back on coherent both RV32 and RV64 platforms, I.E !ZICBOM and !THEAD_CMO. As for noncoherent RV32 platforms, nothing changed. As for noncoherent RV64 platforms, I.E either ZICBOM or THEAD_CMO, the above kmalloc caches also come back if > 4GB memory or users pass "swiotlb=mmnn,force" to force swiotlb creation if <= 4GB memory. How much mmnn should be depends on the specific platform, it needs to be tried and tested all possible usage case on the specific hardware. For example, I can use the minimal I/O TLB slabs on Sipeed M1S Dock. * b4-shazam-merge: riscv: enable DMA_BOUNCE_UNALIGNED_KMALLOC for !dma_coherent riscv: allow kmalloc() caches aligned to the smallest value Link: https://lore.kernel.org/linux-arm-kernel/20230524171904.3967031-1-catalin.marinas@arm.com/ [1] Link: https://lore.kernel.org/r/20230718152214.2907-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-31Merge patch series "riscv: KCFI support"Palmer Dabbelt8-9/+110
Sami Tolvanen <samitolvanen@google.com> says: This series adds KCFI support for RISC-V. KCFI is a fine-grained forward-edge control-flow integrity scheme supported in Clang >=16, which ensures indirect calls in instrumented code can only branch to functions whose type matches the function pointer type, thus making code reuse attacks more difficult. Patch 1 implements a pt_regs based syscall wrapper to address function pointer type mismatches in syscall handling. Patches 2 and 3 annotate indirectly called assembly functions with CFI types. Patch 4 implements error handling for indirect call checks. Patch 5 disables CFI for arch/riscv/purgatory. Patch 6 finally allows CONFIG_CFI_CLANG to be enabled for RISC-V. Note that Clang 16 has a generic architecture-agnostic KCFI implementation, which does work with the kernel, but doesn't produce a stable code sequence for indirect call checks, which means potential failures just trap and won't result in informative error messages. Clang 17 includes a RISC-V specific back-end implementation for KCFI, which emits a predictable code sequence for the checks and a .kcfi_traps section with locations of the traps, which patch 5 uses to produce more useful errors. The type mismatch fixes and annotations in the first three patches also become necessary in future if the kernel decides to support fine-grained CFI implemented using the hardware landing pad feature proposed in the in-progress Zicfisslp extension. Once the specification is ratified and hardware support emerges, implementing runtime patching support that replaces KCFI instrumentation with Zicfisslp landing pads might also be feasible (similarly to KCFI to FineIBT patching on x86_64), allowing distributions to ship a unified kernel binary for all devices. * b4-shazam-merge: riscv: Allow CONFIG_CFI_CLANG to be selected riscv/purgatory: Disable CFI riscv: Add CFI error handling riscv: Add ftrace_stub_graph riscv: Add types to indirectly called assembly functions riscv: Implement syscall wrappers Link: https://lore.kernel.org/r/20230710183544.999540-8-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-31Merge patch series "support allocating crashkernel above 4G explicitly on riscv"Palmer Dabbelt1-0/+5
Chen Jiahao <chenjiahao16@huawei.com> says: On riscv, the current crash kernel allocation logic is trying to allocate within 32bit addressible memory region by default, if failed, try to allocate without 4G restriction. In need of saving DMA zone memory while allocating a relatively large crash kernel region, allocating the reserved memory top down in high memory, without overlapping the DMA zone, is a mature solution. Hence this patchset introduces the parameter option crashkernel=X,[high,low]. One can reserve the crash kernel from high memory above DMA zone range by explicitly passing "crashkernel=X,high"; or reserve a memory range below 4G with "crashkernel=X,low". Besides, there are few rules need to take notice: 1. "crashkernel=X,[high,low]" will be ignored if "crashkernel=size" is specified. 2. "crashkernel=X,low" is valid only when "crashkernel=X,high" is passed and there is enough memory to be allocated under 4G. 3. When allocating crashkernel above 4G and no "crashkernel=X,low" is specified, a 128M low memory will be allocated automatically for swiotlb bounce buffer. See Documentation/admin-guide/kernel-parameters.txt for more information. To verify loading the crashkernel, adapted kexec-tools is attached below: https://github.com/chenjh005/kexec-tools/tree/build-test-riscv-v2 Following test cases have been performed as expected: 1) crashkernel=256M //low=256M 2) crashkernel=1G //low=1G 3) crashkernel=4G //high=4G, low=128M(default) 4) crashkernel=4G crashkernel=256M,high //high=4G, low=128M(default), high is ignored 5) crashkernel=4G crashkernel=256M,low //high=4G, low=128M(default), low is ignored 6) crashkernel=4G,high //high=4G, low=128M(default) 7) crashkernel=256M,low //low=0M, invalid 8) crashkernel=4G,high crashkernel=256M,low //high=4G, low=256M 9) crashkernel=4G,high crashkernel=4G,low //high=0M, low=0M, invalid 10) crashkernel=512M@0xd0000000 //low=512M 11) crashkernel=1G,high crashkernel=0M,low //high=1G, low=0M * b4-shazam-merge: docs: kdump: Update the crashkernel description for riscv riscv: kdump: Implement crashkernel=X,[high,low] Link: https://lore.kernel.org/r/20230726175000.2536220-1-chenjiahao16@huawei.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-31Merge patch series "riscv: kprobes: simulate some instructions"Palmer Dabbelt3-5/+116
Nam Cao <namcaov@gmail.com> says: Simulate some currently rejected instructions. Still to be simulated are: - c.jal - c.ebreak * b4-shazam-merge: riscv: kprobes: simulate c.beqz and c.bnez riscv: kprobes: simulate c.jr and c.jalr instructions riscv: kprobes: simulate c.j instruction Link: https://lore.kernel.org/r/cover.1690704360.git.namcaov@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-31riscv: remove redundant mv instructionsNam Cao1-5/+1
Some mv instructions were useful when first introduced to preserve a0 and a1 before function calls. However the code has changed and they are now redundant. Remove them. Signed-off-by: Nam Cao <namcaov@gmail.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20230725053835.138910-1-namcaov@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-31Merge tag 'devicetree-header-cleanups-for-6.6' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree include cleanups from Rob Herring: "These are the remaining few clean-ups of DT related includes which didn't get applied to subsystem trees" * tag 'devicetree-header-cleanups-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: ipmi: Explicitly include correct DT includes tpm: Explicitly include correct DT includes lib/genalloc: Explicitly include correct DT includes parport: Explicitly include correct DT includes sbus: Explicitly include correct DT includes mux: Explicitly include correct DT includes macintosh: Explicitly include correct DT includes hte: Explicitly include correct DT includes EDAC: Explicitly include correct DT includes clocksource: Explicitly include correct DT includes sparc: Explicitly include correct DT includes riscv: Explicitly include correct DT includes
2023-08-30Merge tag 'mm-nonmm-stable-2023-08-28-22-48' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - An extensive rework of kexec and crash Kconfig from Eric DeVolder ("refactor Kconfig to consolidate KEXEC and CRASH options") - kernel.h slimming work from Andy Shevchenko ("kernel.h: Split out a couple of macros to args.h") - gdb feature work from Kuan-Ying Lee ("Add GDB memory helper commands") - vsprintf inclusion rationalization from Andy Shevchenko ("lib/vsprintf: Rework header inclusions") - Switch the handling of kdump from a udev scheme to in-kernel handling, by Eric DeVolder ("crash: Kernel handling of CPU and memory hot un/plug") - Many singleton patches to various parts of the tree * tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (81 commits) document while_each_thread(), change first_tid() to use for_each_thread() drivers/char/mem.c: shrink character device's devlist[] array x86/crash: optimize CPU changes crash: change crash_prepare_elf64_headers() to for_each_possible_cpu() crash: hotplug support for kexec_load() x86/crash: add x86 crash hotplug support crash: memory and CPU hotplug sysfs attributes kexec: exclude elfcorehdr from the segment digest crash: add generic infrastructure for crash hotplug support crash: move a few code bits to setup support of crash hotplug kstrtox: consistently use _tolower() kill do_each_thread() nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuse scripts/bloat-o-meter: count weak symbol sizes treewide: drop CONFIG_EMBEDDED lockdep: fix static memory detection even more lib/vsprintf: declare no_hash_pointers in sprintf.h lib/vsprintf: split out sprintf() and friends kernel/fork: stop playing lockless games for exe_file replacement adfs: delete unused "union adfs_dirtail" definition ...
2023-08-28riscv: Explicitly include correct DT includesRob Herring1-1/+0
The DT of_device.h and of_platform.h date back to the separate of_platform_bus_type before it was merged into the regular platform bus. As part of that merge prepping Arm DT support 13 years ago, they "temporarily" include each other. They also include platform_device.h and of.h. As a result, there's a pretty much random mix of those include files used throughout the tree. In order to detangle these headers and replace the implicit includes with struct declarations, users need to explicitly include the correct includes. Link: https://lore.kernel.org/r/20230714174043.4040561-1-robh@kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2023-08-24riscv: allow kmalloc() caches aligned to the smallest valueJisheng Zhang1-0/+1
Currently, riscv defines ARCH_DMA_MINALIGN as L1_CACHE_BYTES, I.E 64Bytes, if CONFIG_RISCV_DMA_NONCOHERENT=y. To support unified kernel Image, usually we have to enable CONFIG_RISCV_DMA_NONCOHERENT, thus it brings some bad effects to coherent platforms: Firstly, it wastes memory, kmalloc-96, kmalloc-32, kmalloc-16 and kmalloc-8 slab caches don't exist any more, they are replaced with either kmalloc-128 or kmalloc-64. Secondly, larger than necessary kmalloc aligned allocations results in unnecessary cache/TLB pressure. This issue also exists on arm64 platforms. From last year, Catalin tried to solve this issue by decoupling ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN, limiting kmalloc() minimum alignment to dma_get_cache_alignment() and replacing ARCH_KMALLOC_MINALIGN usage in various drivers with ARCH_DMA_MINALIGN etc.[1] One fact we can make use of for riscv: if the CPU doesn't support ZICBOM or T-HEAD CMO, we know the platform is coherent. Based on Catalin's work and above fact, we can easily solve the kmalloc align issue for riscv: we can override dma_get_cache_alignment(), then let it return ARCH_DMA_MINALIGN at the beginning and return 1 once we know the underlying HW neither supports ZICBOM nor supports T-HEAD CMO. So what about if the CPU supports ZICBOM or T-HEAD CMO, but all the devices are dma coherent? Well, we use ARCH_DMA_MINALIGN as the kmalloc minimum alignment, nothing changed in this case. This case can be improved in the future. After this patch, a simple test of booting to a small buildroot rootfs on qemu shows: kmalloc-96 5041 5041 96 ... kmalloc-64 9606 9606 64 ... kmalloc-32 5128 5128 32 ... kmalloc-16 7682 7682 16 ... kmalloc-8 10246 10246 8 ... So we save about 1268KB memory. The saving will be much larger in normal OS env on real HW platforms. Link: https://lore.kernel.org/linux-arm-kernel/20230524171904.3967031-1-catalin.marinas@arm.com/ [1] Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230718152214.2907-2-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-24riscv: Add CFI error handlingSami Tolvanen3-1/+82
With CONFIG_CFI_CLANG, the compiler injects a type preamble immediately before each function and a check to validate the target function type before indirect calls: ; type preamble .word <id> function: ... ; indirect call check lw t1, -4(a0) lui t2, <hi20> addiw t2, t2, <lo12> beq t1, t2, .Ltmp0 ebreak .Ltmp0: jarl a0 Implement error handling code for the ebreak traps emitted for the checks. This produces the following oops on a CFI failure (generated using lkdtm): [ 21.177245] CFI failure at lkdtm_indirect_call+0x22/0x32 [lkdtm] (target: lkdtm_increment_int+0x0/0x18 [lkdtm]; expected type: 0x3ad55aca) [ 21.178483] Kernel BUG [#1] [ 21.178671] Modules linked in: lkdtm [ 21.179037] CPU: 1 PID: 104 Comm: sh Not tainted 6.3.0-rc6-00037-g37d5ec6297ab #1 [ 21.179511] Hardware name: riscv-virtio,qemu (DT) [ 21.179818] epc : lkdtm_indirect_call+0x22/0x32 [lkdtm] [ 21.180106] ra : lkdtm_CFI_FORWARD_PROTO+0x48/0x7c [lkdtm] [ 21.180426] epc : ffffffff01387092 ra : ffffffff01386f14 sp : ff20000000453cf0 [ 21.180792] gp : ffffffff81308c38 tp : ff6000000243f080 t0 : ff20000000453b78 [ 21.181157] t1 : 000000003ad55aca t2 : 000000007e0c52a5 s0 : ff20000000453d00 [ 21.181506] s1 : 0000000000000001 a0 : ffffffff0138d170 a1 : ffffffff013870bc [ 21.181819] a2 : b5fea48dd89aa700 a3 : 0000000000000001 a4 : 0000000000000fff [ 21.182169] a5 : 0000000000000004 a6 : 00000000000000b7 a7 : 0000000000000000 [ 21.182591] s2 : ff20000000453e78 s3 : ffffffffffffffea s4 : 0000000000000012 [ 21.183001] s5 : ff600000023c7000 s6 : 0000000000000006 s7 : ffffffff013882a0 [ 21.183653] s8 : 0000000000000008 s9 : 0000000000000002 s10: ffffffff0138d878 [ 21.184245] s11: ffffffff0138d878 t3 : 0000000000000003 t4 : 0000000000000000 [ 21.184591] t5 : ffffffff8133df08 t6 : ffffffff8133df07 [ 21.184858] status: 0000000000000120 badaddr: 0000000000000000 cause: 0000000000000003 [ 21.185415] [<ffffffff01387092>] lkdtm_indirect_call+0x22/0x32 [lkdtm] [ 21.185772] [<ffffffff01386f14>] lkdtm_CFI_FORWARD_PROTO+0x48/0x7c [lkdtm] [ 21.186093] [<ffffffff01383552>] lkdtm_do_action+0x22/0x34 [lkdtm] [ 21.186445] [<ffffffff0138350c>] direct_entry+0x128/0x13a [lkdtm] [ 21.186817] [<ffffffff8033ed8c>] full_proxy_write+0x58/0xb2 [ 21.187352] [<ffffffff801d4fe8>] vfs_write+0x14c/0x33a [ 21.187644] [<ffffffff801d5328>] ksys_write+0x64/0xd4 [ 21.187832] [<ffffffff801d53a6>] sys_write+0xe/0x1a [ 21.188171] [<ffffffff80003996>] ret_from_syscall+0x0/0x2 [ 21.188595] Code: 0513 0f65 a303 ffc5 53b7 7e0c 839b 2a53 0363 0073 (9002) 9582 [ 21.189178] ---[ end trace 0000000000000000 ]--- [ 21.189590] Kernel panic - not syncing: Fatal exception Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> # ISA bits Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20230710183544.999540-12-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-24riscv: Add ftrace_stub_graphSami Tolvanen1-0/+4
Commit 883bbbffa5a4 ("ftrace,kcfi: Separate ftrace_stub() and ftrace_stub_graph()") added a separate ftrace_stub_graph function for CFI_CLANG. Add the stub to fix FUNCTION_GRAPH_TRACER compatibility with CFI. Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20230710183544.999540-11-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-24riscv: Add types to indirectly called assembly functionsSami Tolvanen2-4/+6
With CONFIG_CFI_CLANG, assembly functions indirectly called from C code must be annotated with type identifiers to pass CFI checking. Use the SYM_TYPED_START macro to add types to the relevant functions. Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20230710183544.999540-10-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-24riscv: Implement syscall wrappersSami Tolvanen3-4/+18
Commit f0bddf50586d ("riscv: entry: Convert to generic entry") moved syscall handling to C code, which exposed function pointer type mismatches that trip fine-grained forward-edge Control-Flow Integrity (CFI) checks as syscall handlers are all called through the same syscall_t pointer type. To fix the type mismatches, implement pt_regs based syscall wrappers similarly to x86 and arm64. This patch is based on arm64 syscall wrappers added in commit 4378a7d4be30 ("arm64: implement syscall wrappers"), where the main goal was to minimize the risk of userspace-controlled values being used under speculation. This may be a concern for riscv in future as well. Following other architectures, the syscall wrappers generate three functions for each syscall; __riscv_<compat_>sys_<name> takes a pt_regs pointer and extracts arguments from registers, __se_<compat_>sys_<name> is a sign-extension wrapper that casts the long arguments to the correct types for the real syscall implementation, which is named __do_<compat_>sys_<name>. Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20230710183544.999540-9-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-23Merge patch series "riscv: fix ptrace and export VLENB"Palmer Dabbelt1-69/+0
Andy Chiu <andy.chiu@sifive.com> says: We add a vlenb field in Vector context and save it with the riscv_vstate_save() macro. It should not cause performance regression as VLENB is a design-time constant and is frequently used by hardware. Also, adding this field into the __sc_riscv_v_state may benifit us on a future compatibility issue becuse a hardware may have writable VLENB. Adding and saving VLENB have an immediate benifit as it gives ptrace a better view of the Vector extension and makes it possible to reconstruct Vector register files from the dump without doing an additional csr read. This patchset also sync the number of note types between us and gdb for riscv to solve a conflicting note. This is not an ABI break given that 6.5 has not been released yet. * b4-shazam-merge: RISC-V: vector: export VLENB csr in __sc_riscv_v_state RISC-V: Remove ptrace support for vectors Link: https://lore.kernel.org/r/20230816155450.26200-1-andy.chiu@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-22RISC-V: Remove ptrace support for vectorsPalmer Dabbelt1-69/+0
We've found two bugs here: NT_RISCV_VECTOR steps on NT_RISCV_CSR (which is only for embedded), and we don't have vlenb in the core dumps. Given that we've have a pair of bugs croup up as part of the GDB review we've probably got other issues, so let's just cut this for 6.5 and get it right. Fixes: 0c59922c769a ("riscv: Add ptrace vector support") Reviewed-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Link: https://lore.kernel.org/r/20230816155450.26200-2-andy.chiu@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-18kexec: rename ARCH_HAS_KEXEC_PURGATORYEric DeVolder1-2/+2
The Kconfig refactor to consolidate KEXEC and CRASH options utilized option names of the form ARCH_SUPPORTS_<option>. Thus rename the ARCH_HAS_KEXEC_PURGATORY to ARCH_SUPPORTS_KEXEC_PURGATORY to follow the same. Link: https://lkml.kernel.org/r/20230712161545.87870-15-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-16riscv: kdump: Implement crashkernel=X,[high,low]Chen Jiahao1-0/+5
On riscv, the current crash kernel allocation logic is trying to allocate within 32bit addressible memory region by default, if failed, try to allocate without 4G restriction. In need of saving DMA zone memory while allocating a relatively large crash kernel region, allocating the reserved memory top down in high memory, without overlapping the DMA zone, is a mature solution. Here introduce the parameter option crashkernel=X,[high,low]. One can reserve the crash kernel from high memory above DMA zone range by explicitly passing "crashkernel=X,high"; or reserve a memory range below 4G with "crashkernel=X,low". Signed-off-by: Chen Jiahao <chenjiahao16@huawei.com> Acked-by: Guo Ren <guoren@kernel.org> Acked-by: Baoquan He <bhe@redhat.com> Link: https://lore.kernel.org/r/20230726175000.2536220-2-chenjiahao16@huawei.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-16riscv: kprobes: simulate c.beqz and c.bnezNam Cao3-2/+48
kprobes currently rejects instruction c.beqz and c.bnez. Implement them. Signed-off-by: Nam Cao <namcaov@gmail.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/1d879dba4e4ee9a82e27625d6483b5c9cfed684f.1690704360.git.namcaov@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-16riscv: kprobes: simulate c.jr and c.jalr instructionsNam Cao3-2/+41
kprobes currently rejects c.jr and c.jalr instructions. Implement them. Signed-off-by: Nam Cao <namcaov@gmail.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/db8b7787e9208654cca50484f68334f412be2ea9.1690704360.git.namcaov@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-16riscv: kprobes: simulate c.j instructionNam Cao3-1/+27
kprobes currently rejects c.j instruction. Implement it. Signed-off-by: Nam Cao <namcaov@gmail.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/6ef76cd9984b8015826649d13f870f8ac45a2d0d.1690704360.git.namcaov@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-16riscv: Handle zicsr/zifencei issue between gcc and binutilsMingzheng Xing1-1/+7
Binutils-2.38 and GCC-12.1.0 bumped[0][1] the default ISA spec to the newer 20191213 version which moves some instructions from the I extension to the Zicsr and Zifencei extensions. So if one of the binutils and GCC exceeds that version, we should explicitly specifying Zicsr and Zifencei via -march to cope with the new changes. but this only occurs when binutils >= 2.36 and GCC >= 11.1.0. It's a different story when binutils < 2.36. binutils-2.36 supports the Zifencei extension[2] and splits Zifencei and Zicsr from I[3]. GCC-11.1.0 is particular[4] because it add support Zicsr and Zifencei extension for -march. binutils-2.35 does not support the Zifencei extension, and does not need to specify Zicsr and Zifencei when working with GCC >= 12.1.0. To make our lives easier, let's relax the check to binutils >= 2.36 in CONFIG_TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI. For the other two cases, where clang < 17 or GCC < 11.1.0, we will deal with them in CONFIG_TOOLCHAIN_NEEDS_OLD_ISA_SPEC. For more information, please refer to: commit 6df2a016c0c8 ("riscv: fix build with binutils 2.38") commit e89c2e815e76 ("riscv: Handle zicsr/zifencei issues between clang and binutils") Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=aed44286efa8ae8717a77d94b51ac3614e2ca6dc [0] Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=98416dbb0a62579d4a7a4a76bab51b5b52fec2cd [1] Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=5a1b31e1e1cee6e9f1c92abff59cdcfff0dddf30 [2] Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=729a53530e86972d1143553a415db34e6e01d5d2 [3] Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b03be74bad08c382da47e048007a78fa3fb4ef49 [4] Link: https://lore.kernel.org/all/20230308220842.1231003-1-conor@kernel.org Link: https://lore.kernel.org/all/20230223220546.52879-1-conor@kernel.org Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Guo Ren <guoren@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Mingzheng Xing <xingmingzheng@iscas.ac.cn> Link: https://lore.kernel.org/r/20230809165648.21071-1-xingmingzheng@iscas.ac.cn Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-16riscv: stack: Fixup independent softirq stack for CONFIG_FRAME_POINTER=nGuo Ren1-0/+3
The independent softirq stack uses s0 to save & restore sp, but s0 would be corrupted when CONFIG_FRAME_POINTER=n. So add s0 in the clobber list to fix the problem. Fixes: dd69d07a5a6c ("riscv: stack: Support HAVE_SOFTIRQ_ON_OWN_STACK") Cc: stable@vger.kernel.org Reported-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Tested-by: Drew Fustini <dfustini@baylibre.com> Link: https://lore.kernel.org/r/20230716001506.3506041-3-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-16riscv: stack: Fixup independent irq stack for CONFIG_FRAME_POINTER=nGuo Ren1-0/+3
The independent irq stack uses s0 to save & restore sp, but s0 would be corrupted when CONFIG_FRAME_POINTER=n. So add s0 in the clobber list to fix the problem. Fixes: 163e76cc6ef4 ("riscv: stack: Support HAVE_IRQ_EXIT_ON_IRQ_STACK") Cc: stable@vger.kernel.org Reported-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Tested-by: Drew Fustini <dfustini@baylibre.com> Link: https://lore.kernel.org/r/20230716001506.3506041-2-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-16riscv: entry: set a0 = -ENOSYS only when syscall != -1Celeste Liu1-3/+3
When we test seccomp with 6.4 kernel, we found errno has wrong value. If we deny NETLINK_AUDIT with EAFNOSUPPORT, after f0bddf50586d, we will get ENOSYS instead. We got same result with commit 9c2598d43510 ("riscv: entry: Save a0 prior syscall_enter_from_user_mode()"). After analysing code, we think that regs->a0 = -ENOSYS should only be executed when syscall != -1. In __seccomp_filter, when seccomp rejected this syscall with specified errno, they will set a0 to return number as syscall ABI, and then return -1. This return number is finally pass as return number of syscall_enter_from_user_mode, and then is compared with NR_syscalls after converted to ulong (so it will be ULONG_MAX). The condition syscall < NR_syscalls will always be false, so regs->a0 = -ENOSYS is always executed. It covered a0 set by seccomp, so we always get ENOSYS when match seccomp RET_ERRNO rule. Fixes: f0bddf50586d ("riscv: entry: Convert to generic entry") Reported-by: Felix Yan <felixonmars@archlinux.org> Co-developed-by: Ruizhe Pan <c141028@gmail.com> Signed-off-by: Ruizhe Pan <c141028@gmail.com> Co-developed-by: Shiqi Zhang <shiqi@isrc.iscas.ac.cn> Signed-off-by: Shiqi Zhang <shiqi@isrc.iscas.ac.cn> Signed-off-by: Celeste Liu <CoelacanthusHex@gmail.com> Tested-by: Felix Yan <felixonmars@archlinux.org> Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230801141607.435192-1-CoelacanthusHex@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-09riscv: Fix CPU feature detection with SMP disabledSamuel Holland2-5/+5
commit 914d6f44fc50 ("RISC-V: only iterate over possible CPUs in ISA string parser") changed riscv_fill_hwcap() from iterating over CPU DT nodes to iterating over logical CPU IDs. Since this function runs long before cpu_dev_init() creates CPU devices, it hits the fallback path in of_cpu_device_node_get(), which itself iterates over the DT nodes, searching for a node with the requested CPU ID. (Incidentally, this makes riscv_fill_hwcap() now take quadratic time.) riscv_fill_hwcap() passes a logical CPU ID to of_cpu_device_node_get(), which uses the arch_match_cpu_phys_id() hook to translate the logical ID to a physical ID as found in the DT. arch_match_cpu_phys_id() has a generic weak definition, and RISC-V provides a strong definition using cpuid_to_hartid_map(). However, the RISC-V specific implementation is located in arch/riscv/kernel/smp.c, and that file is only compiled when SMP is enabled. As a result, when SMP is disabled, the generic definition is used, and riscv_isa gets initialized based on the ISA string of hart 0, not the boot hart. On FU740, this means has_fpu() returns false, and userspace crashes when trying to use floating-point instructions. Fix this by moving arch_match_cpu_phys_id() to a file which is always compiled. Fixes: 70114560b285 ("RISC-V: Add RISC-V specific arch_match_cpu_phys_id") Fixes: 914d6f44fc50 ("RISC-V: only iterate over possible CPUs in ISA string parser") Reported-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230803012608.3540081-1-samuel.holland@sifive.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-04Merge patch series "RISC-V: Fix a few kexec_file_load(2) failures"Palmer Dabbelt1-1/+2
Petr Tesarik <petrtesarik@huaweicloud.com> says: From: Petr Tesarik <petr.tesarik.ext@huawei.com> The kexec_file_load(2) syscall does not work at least in some kernel builds. For details see the relevant section in this blog post: https://sigillatum.tesarici.cz/2023-07-21-state-of-riscv64-kdump.html This patch series handles an additional relocation types, removes the need to implement a Global Offset Table (GOT) for the purgatory and fixes the placement of initrd. * b4-shazam-merge: riscv/kexec: load initrd high in available memory riscv/kexec: handle R_RISCV_CALL_PLT relocation type Link: https://lore.kernel.org/r/cover.1690365011.git.petr.tesarik.ext@huawei.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-04riscv/kexec: load initrd high in available memoryTorsten Duwe1-1/+1
When initrd is loaded low, the secondary kernel fails like this: INITRD: 0xdc581000+0x00eef000 overlaps in-use memory region This initrd load address corresponds to the _end symbol, but the reservation is aligned on PMD_SIZE, as explained by a comment in setup_bootmem(). It is technically possible to align the initrd load address accordingly, leaving a hole between the end of kernel and the initrd, but it is much simpler to allocate the initrd top-down. Fixes: 838b3e28488f ("RISC-V: Load purgatory in kexec_file") Signed-off-by: Torsten Duwe <duwe@suse.de> Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Cc: stable@vger.kernel.org Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/all/67c8eb9eea25717c2c8208d9bfbfaa39e6e2a1c6.1690365011.git.petr.tesarik.ext@huawei.com/ Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-04riscv/kexec: handle R_RISCV_CALL_PLT relocation typeTorsten Duwe1-0/+1
R_RISCV_CALL has been deprecated and replaced by R_RISCV_CALL_PLT. See Enum 18-19 in Table 3. Relocation types here: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc It was deprecated in ("Deprecated R_RISCV_CALL, prefer R_RISCV_CALL_PLT"): https://github.com/riscv-non-isa/riscv-elf-psabi-doc/commit/a0dced85018d7a0ec17023c9389cbd70b1dbc1b0 Recent tools (at least GNU binutils-2.40) already use R_RISCV_CALL_PLT. Kernels built with such binutils fail kexec_load_file(2) with: kexec_image: Unknown rela relocation: 19 kexec_image: Error loading purgatory ret=-8 The binary code at the call site remains the same, so tell arch_kexec_apply_relocations_add() to handle _PLT alike. Fixes: 838b3e28488f ("RISC-V: Load purgatory in kexec_file") Signed-off-by: Torsten Duwe <duwe@suse.de> Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Cc: Li Zhengyu <lizhengyu3@huawei.com> Cc: stable@vger.kernel.org Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/all/b046b164af8efd33bbdb7d4003273bdf9196a5b0.1690365011.git.petr.tesarik.ext@huawei.com/ Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-02riscv: Export va_kernel_pa_offset in vmcoreinfoSong Shuai1-0/+2
Since RISC-V Linux v6.4, the commit 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping") changes phys_ram_base from the physical start of the kernel to the actual start of the DRAM. The Crash-utility's VTOP() still uses phys_ram_base and kernel_map.virt_addr to translate kernel virtual address, that failed the Crash with Linux v6.4 [1]. Export kernel_map.va_kernel_pa_offset in vmcoreinfo to help Crash translate the kernel virtual address correctly. Fixes: 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping") Link: https://lore.kernel.org/linux-riscv/20230724040649.220279-1-suagrfillet@gmail.com/ [1] Signed-off-by: Song Shuai <suagrfillet@gmail.com> Reviewed-by: Xianting Tian  <xianting.tian@linux.alibaba.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20230724100917.309061-1-suagrfillet@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-08-02RISC-V: ACPI: Fix acpi_os_ioremap to return iomem addressSunil V L1-2/+2
acpi_os_ioremap() currently is a wrapper to memremap() on RISC-V. But the callers of acpi_os_ioremap() expect it to return __iomem address and hence sparse tool reports a new warning. Fix this issue by type casting to __iomem type. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202307230357.egcTAefj-lkp@intel.com/ Fixes: a91a9ffbd3a5 ("RISC-V: Add support to build the ACPI core") Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230724100346.1302937-1-sunilvl@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>