summaryrefslogtreecommitdiff
path: root/arch/riscv/include
AgeCommit message (Collapse)AuthorFilesLines
2023-11-01riscv/mm: Fix the comment for swap pte formatXiao Wang1-1/+1
Swap type takes bits 7-11 and swap offset should start from bit 12. Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20230921141652.2657054-1-xiao.w.wang@intel.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-01RISC-V: Provide pgtable_l5_enabled on rv32Palmer Dabbelt2-1/+3
A few of the other page table level helpers are defined on rv32, but not pgtable_l5_enabled. This adds the definition as a constant and converts pgtable_l4_enabled to a constant as well. Link: https://lore.kernel.org/r/20230830044129.11481-2-palmer@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-31Merge tag 'kvm-riscv-6.7-1' of https://github.com/kvm-riscv/linux into HEADPaolo Bonzini6-1/+63
KVM/riscv changes for 6.7 - Smstateen and Zicond support for Guest/VM - Virtualized senvcfg CSR for Guest/VM - Added Smstateen registers to the get-reg-list selftests - Added Zicond to the get-reg-list selftests - Virtualized SBI debug console (DBCN) for Guest/VM - Added SBI debug console (DBCN) to the get-reg-list selftests
2023-10-28riscv: Use separate IRQ shadow call stacksSami Tolvanen1-0/+7
When both CONFIG_IRQ_STACKS and SCS are enabled, also use a separate per-CPU shadow call stack. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20230927224757.1154247-13-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-28riscv: Implement Shadow Call StackSami Tolvanen3-0/+66
Implement CONFIG_SHADOW_CALL_STACK for RISC-V. When enabled, the compiler injects instructions to all non-leaf C functions to store the return address to the shadow stack and unconditionally load it again before returning, which makes it harder to corrupt the return address through a stack overflow, for example. The active shadow call stack pointer is stored in the gp register, which makes SCS incompatible with gp relaxation. Use --no-relax-gp to ensure gp relaxation is disabled and disable global pointer loading. Add SCS pointers to struct thread_info, implement SCS initialization, and task switching Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20230927224757.1154247-12-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-28riscv: Move global pointer loading to a macroSami Tolvanen1-0/+8
In Clang 17, -fsanitize=shadow-call-stack uses the newly declared platform register gp for storing shadow call stack pointers. As this is obviously incompatible with gp relaxation, in preparation for CONFIG_SHADOW_CALL_STACK support, move global pointer loading to a single macro, which we can cleanly disable when SCS is used instead. Link: https://reviews.llvm.org/rGaa1d2693c256 Link: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/commit/a484e843e6eeb51f0cb7b8819e50da6d2444d769 Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20230927224757.1154247-11-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-28riscv: Deduplicate IRQ stack switchingSami Tolvanen2-0/+8
With CONFIG_IRQ_STACKS, we switch to a separate per-CPU IRQ stack before calling handle_riscv_irq or __do_softirq. We currently have duplicate inline assembly snippets for stack switching in both code paths. Now that we can access per-CPU variables in assembly, implement call_on_irq_stack in assembly, and use that instead of redundant inline assembly. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230927224757.1154247-10-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-28riscv: VMAP_STACK overflow detection thread-safeDeepak Gupta3-4/+22
commit 31da94c25aea ("riscv: add VMAP_STACK overflow detection") added support for CONFIG_VMAP_STACK. If overflow is detected, CPU switches to `shadow_stack` temporarily before switching finally to per-cpu `overflow_stack`. If two CPUs/harts are racing and end up in over flowing kernel stack, one or both will end up corrupting each other state because `shadow_stack` is not per-cpu. This patch optimizes per-cpu overflow stack switch by directly picking per-cpu `overflow_stack` and gets rid of `shadow_stack`. Following are the changes in this patch - Defines an asm macro to obtain per-cpu symbols in destination register. - In entry.S, when overflow is detected, per-cpu overflow stack is located using per-cpu asm macro. Computing per-cpu symbol requires a temporary register. x31 is saved away into CSR_SCRATCH (CSR_SCRATCH is anyways zero since we're in kernel). Please see Links for additional relevant disccussion and alternative solution. Tested by `echo EXHAUST_STACK > /sys/kernel/debug/provoke-crash/DIRECT` Kernel crash log below Insufficient stack space to handle exception!/debug/provoke-crash/DIRECT Task stack: [0xff20000010a98000..0xff20000010a9c000] Overflow stack: [0xff600001f7d98370..0xff600001f7d99370] CPU: 1 PID: 205 Comm: bash Not tainted 6.1.0-rc2-00001-g328a1f96f7b9 #34 Hardware name: riscv-virtio,qemu (DT) epc : __memset+0x60/0xfc ra : recursive_loop+0x48/0xc6 [lkdtm] epc : ffffffff808de0e4 ra : ffffffff0163a752 sp : ff20000010a97e80 gp : ffffffff815c0330 tp : ff600000820ea280 t0 : ff20000010a97e88 t1 : 000000000000002e t2 : 3233206874706564 s0 : ff20000010a982b0 s1 : 0000000000000012 a0 : ff20000010a97e88 a1 : 0000000000000000 a2 : 0000000000000400 a3 : ff20000010a98288 a4 : 0000000000000000 a5 : 0000000000000000 a6 : fffffffffffe43f0 a7 : 00007fffffffffff s2 : ff20000010a97e88 s3 : ffffffff01644680 s4 : ff20000010a9be90 s5 : ff600000842ba6c0 s6 : 00aaaaaac29e42b0 s7 : 00fffffff0aa3684 s8 : 00aaaaaac2978040 s9 : 0000000000000065 s10: 00ffffff8a7cad10 s11: 00ffffff8a76a4e0 t3 : ffffffff815dbaf4 t4 : ffffffff815dbaf4 t5 : ffffffff815dbab8 t6 : ff20000010a9bb48 status: 0000000200000120 badaddr: ff20000010a97e88 cause: 000000000000000f Kernel panic - not syncing: Kernel stack overflow CPU: 1 PID: 205 Comm: bash Not tainted 6.1.0-rc2-00001-g328a1f96f7b9 #34 Hardware name: riscv-virtio,qemu (DT) Call Trace: [<ffffffff80006754>] dump_backtrace+0x30/0x38 [<ffffffff808de798>] show_stack+0x40/0x4c [<ffffffff808ea2a8>] dump_stack_lvl+0x44/0x5c [<ffffffff808ea2d8>] dump_stack+0x18/0x20 [<ffffffff808dec06>] panic+0x126/0x2fe [<ffffffff800065ea>] walk_stackframe+0x0/0xf0 [<ffffffff0163a752>] recursive_loop+0x48/0xc6 [lkdtm] SMP: stopping secondary CPUs ---[ end Kernel panic - not syncing: Kernel stack overflow ]--- Cc: Guo Ren <guoren@kernel.org> Cc: Jisheng Zhang <jszhang@kernel.org> Link: https://lore.kernel.org/linux-riscv/Y347B0x4VUNOd6V7@xhacker/T/#t Link: https://lore.kernel.org/lkml/20221124094845.1907443-1-debug@rivosinc.com/ Signed-off-by: Deepak Gupta <debug@rivosinc.com> Co-developed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Guo Ren <guoren@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20230927224757.1154247-9-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-26RISC-V: ACPI: RHCT: Add function to get CBO block sizesSunil V L1-0/+6
Cache Block Operation (CBO) related block size in ACPI is provided by RHCT. Add support to read the CMO node in RHCT to get this information. Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20231018124007.1306159-4-sunilvl@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-20RISC-V: KVM: Forward SBI DBCN extension to user-spaceAnup Patel2-0/+2
The frozen SBI v2.0 specification defines the SBI debug console (DBCN) extension which replaces the legacy SBI v0.1 console functions namely sbi_console_getchar() and sbi_console_putchar(). The SBI DBCN extension needs to be emulated in the KVM user-space (i.e. QEMU-KVM or KVMTOOL) so we forward SBI DBCN calls from KVM guest to the KVM user-space which can then redirect the console input/output to wherever it wants (e.g. telnet, file, stdio, etc). The SBI debug console is simply a early console available to KVM guest for early prints and it does not intend to replace the proper console devices such as 8250, VirtIO console, etc. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-10-20RISC-V: KVM: Allow some SBI extensions to be disabled by defaultAnup Patel1-0/+4
Currently, all SBI extensions are enabled by default which is problematic for SBI extensions (such as DBCN) which are forwarded to the KVM user-space because we might have an older KVM user-space which is not aware/ready to handle newer SBI extensions. Ideally, the SBI extensions forwarded to the KVM user-space must be disabled by default. To address above, we allow certain SBI extensions to be disabled by default so that KVM user-space must explicitly enable such SBI extensions to receive forwarded calls from Guest VCPU. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-10-20RISC-V: KVM: Change the SBI specification version to v2.0Anup Patel1-1/+1
We will be implementing SBI DBCN extension for KVM RISC-V so let us change the KVM RISC-V SBI specification version to v2.0. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-10-20RISC-V: Add defines for SBI debug console extensionAnup Patel1-0/+7
We add SBI debug console extension related defines/enum to the asm/sbi.h header. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-10-19mm: delete checks for xor_unlock_is_negative_byte()Matthew Wilcox (Oracle)1-1/+0
Architectures which don't define their own use the one in asm-generic/bitops/lock.h. Get rid of all the ifdefs around "maybe we don't have it". Link: https://lkml.kernel.org/r/20231004165317.1061855-15-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-19riscv: implement xor_unlock_is_negative_byteMatthew Wilcox (Oracle)1-0/+13
Inspired by the riscv clear_bit_unlock(), this will surely be more efficient than the generic one defined in filemap.c. Link: https://lkml.kernel.org/r/20231004165317.1061855-13-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-13Merge tag 'riscv-for-linus-6.6-rc6' of ↵Linus Torvalds3-2/+43
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - A handful of build fixes - A fix to avoid mixing up user/kernel-mode breakpoints, which can manifest as a hang when mixing k/uprobes with other breakpoint sources - A fix to avoid double-allocting crash kernel memory - A fix for tracefs syscall name mangling, which was causing syscalls not to show up in tracefs - A fix to the perf driver to enable the hw events when selected, which can trigger a BUG on some userspace access patterns * tag 'riscv-for-linus-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: drivers: perf: Fix panic in riscv SBI mmap support riscv: Fix ftrace syscall handling which are now prefixed with __riscv_ RISC-V: Fix wrong use of CONFIG_HAVE_SOFTIRQ_ON_OWN_STACK riscv: kdump: fix crashkernel reserving problem on RISC-V riscv: Remove duplicate objcopy flag riscv: signal: fix sigaltstack frame size checking riscv: errata: andes: Makefile: Fix randconfig build issue riscv: Only consider swbp/ss handlers for correct privileged mode riscv: kselftests: Fix mm build by removing testcases subdirectory
2023-10-12riscv: Fix ftrace syscall handling which are now prefixed with __riscv_Alexandre Ghiti1-0/+21
ftrace creates entries for each syscall in the tracefs but has failed since commit 08d0ce30e0e4 ("riscv: Implement syscall wrappers") which prefixes all riscv syscalls with __riscv_. So fix this by implementing arch_syscall_match_sym_name() which allows us to ignore this prefix. And also ignore compat syscalls like x86/arm64 by implementing arch_trace_is_compat_syscall(). Fixes: 08d0ce30e0e4 ("riscv: Implement syscall wrappers") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Tested-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20231003182407.32198-1-alexghiti@rivosinc.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-12RISC-V: KVM: Allow Zicond extension for Guest/VMAnup Patel1-0/+1
We extend the KVM ISA extension ONE_REG interface to allow KVM user space to detect and enable Zicond extension for Guest/VM. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-10-12RISCV: KVM: Add sstateen0 to ONE_REGMayuresh Chitale1-0/+8
Add support for sstateen0 CSR to the ONE_REG interface to allow its access from user space. Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-10-12RISCV: KVM: Add sstateen0 context save/restoreMayuresh Chitale2-0/+9
Define sstateen0 and add sstateen0 save/restore for guest VCPUs. Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-10-12RISCV: KVM: Add senvcfg context save/restoreMayuresh Chitale3-0/+4
Add senvcfg context save/restore for guest VCPUs and also add it to the ONE_REG interface to allow its access from user space. Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-10-12RISC-V: KVM: Enable Smstateen accessesMayuresh Chitale3-0/+18
Configure hstateen0 register so that the AIA state and envcfg are accessible to the vcpus. This includes registers such as siselect, sireg, siph, sieh and all the IMISC registers. Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-10-12RISC-V: KVM: Add kvm_vcpu_configMayuresh Chitale1-0/+7
Add a placeholder for all registers such as henvcfg, hstateen etc which have 'static' configurations depending on extensions supported by the guest. The values are derived once and are then subsequently written to the corresponding CSRs while switching to the vcpu. Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-10-12RISC-V: Detect Zicond from ISA stringAnup Patel1-0/+1
The RISC-V integer conditional (Zicond) operation extension defines standard conditional arithmetic and conditional-select/move operations which are inspired from the XVentanaCondOps extension. In fact, QEMU RISC-V also has support for emulating Zicond extension. Let us detect Zicond extension from ISA string available through DT or ACPI. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-10-12RISC-V: Detect Smstateen extensionMayuresh Chitale1-0/+1
Extend the ISA string parsing to detect the Smstateen extension. If the extension is enabled then access to certain 'state' such as AIA CSRs in VS mode is controlled by *stateen0 registers. Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-10-10docs: move riscv under archCosta Shulyupin1-1/+1
and fix all in-tree references. Architecture-specific documentation is being moved into Documentation/arch/ as a way of cleaning up the top-level documentation directory and making the docs hierarchy more closely match the source hierarchy. Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20230930185354.3034118-1-costa.shul@redhat.com
2023-10-04riscv: kdump: use generic interface to simplify crashkernel reservationBaoquan He2-0/+13
With the help of newly changed function parse_crashkernel() and generic reserve_crashkernel_generic(), crashkernel reservation can be simplified by steps: 1) Add a new header file <asm/crash_core.h>, and define CRASH_ALIGN, CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX and DEFAULT_CRASH_KERNEL_LOW_SIZE in <asm/crash_core.h>; 2) Add arch_reserve_crashkernel() to call parse_crashkernel() and reserve_crashkernel_generic(); 3) Add ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION Kconfig in arch/riscv/Kconfig. The old reserve_crashkernel_low() and reserve_crashkernel() can be removed. [chenjiahao16@huawei.com: fix crashkernel reserving problem on RISC-V] Link: https://lkml.kernel.org/r/20230925024333.730964-1-chenjiahao16@huawei.com Link: https://lkml.kernel.org/r/20230914033142.676708-9-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Chen Jiahao <chenjiahao16@huawei.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Jiahao <chenjiahao16@huawei.com> Cc: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-30mm: hugetlb: add huge page size param to set_huge_pte_at()Ryan Roberts1-1/+2
Patch series "Fix set_huge_pte_at() panic on arm64", v2. This series fixes a bug in arm64's implementation of set_huge_pte_at(), which can result in an unprivileged user causing a kernel panic. The problem was triggered when running the new uffd poison mm selftest for HUGETLB memory. This test (and the uffd poison feature) was merged for v6.5-rc7. Ideally, I'd like to get this fix in for v6.6 and I've cc'ed stable (correctly this time) to get it backported to v6.5, where the issue first showed up. Description of Bug ================== arm64's huge pte implementation supports multiple huge page sizes, some of which are implemented in the page table with multiple contiguous entries. So set_huge_pte_at() needs to work out how big the logical pte is, so that it can also work out how many physical ptes (or pmds) need to be written. It previously did this by grabbing the folio out of the pte and querying its size. However, there are cases when the pte being set is actually a swap entry. But this also used to work fine, because for huge ptes, we only ever saw migration entries and hwpoison entries. And both of these types of swap entries have a PFN embedded, so the code would grab that and everything still worked out. But over time, more calls to set_huge_pte_at() have been added that set swap entry types that do not embed a PFN. And this causes the code to go bang. The triggering case is for the uffd poison test, commit 99aa77215ad0 ("selftests/mm: add uffd unit test for UFFDIO_POISON"), which causes a PTE_MARKER_POISONED swap entry to be set, coutesey of commit 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") - added in v6.5-rc7. Although review shows that there are other call sites that set PTE_MARKER_UFFD_WP (which also has no PFN), these don't trigger on arm64 because arm64 doesn't support UFFD WP. If CONFIG_DEBUG_VM is enabled, we do at least get a BUG(), but otherwise, it will dereference a bad pointer in page_folio(): static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry) { VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry)); return page_folio(pfn_to_page(swp_offset_pfn(entry))); } Fix === The simplest fix would have been to revert the dodgy cleanup commit 18f3962953e4 ("mm: hugetlb: kill set_huge_swap_pte_at()"), but since things have moved on, this would have required an audit of all the new set_huge_pte_at() call sites to see if they should be converted to set_huge_swap_pte_at(). As per the original intent of the change, it would also leave us open to future bugs when people invariably get it wrong and call the wrong helper. So instead, I've added a huge page size parameter to set_huge_pte_at(). This means that the arm64 code has the size in all cases. It's a bigger change, due to needing to touch the arches that implement the function, but it is entirely mechanical, so in my view, low risk. I've compile-tested all touched arches; arm64, parisc, powerpc, riscv, s390, sparc (and additionally x86_64). I've additionally booted and run mm selftests against arm64, where I observe the uffd poison test is fixed, and there are no other regressions. This patch (of 2): In order to fix a bug, arm64 needs to be told the size of the huge page for which the pte is being set in set_huge_pte_at(). Provide for this by adding an `unsigned long sz` parameter to the function. This follows the same pattern as huge_pte_clear(). This commit makes the required interface modifications to the core mm as well as all arches that implement this function (arm64, parisc, powerpc, riscv, s390, sparc). The actual arm64 bug will be fixed in a separate commit. No behavioral changes intended. Link: https://lkml.kernel.org/r/20230922115804.2043771-1-ryan.roberts@arm.com Link: https://lkml.kernel.org/r/20230922115804.2043771-2-ryan.roberts@arm.com Fixes: 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> [powerpc 8xx] Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> [vmalloc change] Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Xu <peterx@redhat.com> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: SeongJae Park <sj@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> [6.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-21RISC-V: hwprobe: Expose Zicboz extension and its block sizeAndrew Jones2-1/+3
Expose Zicboz through hwprobe and also provide a key to extract its respective block size. Opportunistically add a macro and apply it to current extensions in order to avoid duplicating code. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Evan Green <evan@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230918131518.56803-11-ajones@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-21RISC-V: Enable cbo.zero in usermodeAndrew Jones3-0/+18
When Zicboz is present, enable its instruction (cbo.zero) in usermode by setting its respective senvcfg bit. We don't bother trying to set this bit per-task, which would also require an interface for tasks to request enabling and/or disabling. Instead, permanently set the bit for each hart which has the extension when bringing it online. This patch also introduces riscv_cpu_has_extension_[un]likely() functions to check a specific hart's ISA bitmap for extensions. Prior to checking the specific hart's bitmap in these functions we try the bitmap which represents the LCD of extensions, but only when we know it will use its optimized, alternatives path by gating its call on CONFIG_RISCV_ALTERNATIVE. When alternatives are used, the compiler ensures that the invocation of the LCD search becomes a constant true or false. When it's true, even the new functions will completely vanish from their callsites. OTOH, when the LCD check is false, we need to do a search of the hart's ISA bitmap. Had we also checked the LCD bitmap without the use of alternatives, then we would have ended up with two bitmap searches instead of one. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230918131518.56803-10-ajones@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-20riscv: Only consider swbp/ss handlers for correct privileged modeBjörn Töpel2-2/+22
RISC-V software breakpoint trap handlers are used for {k,u}probes. When trapping from kernelmode, only the kernelmode handlers should be considered. Vice versa, only usermode handlers for usermode traps. This is not the case on RISC-V, which can trigger a bug if a userspace process uses uprobes, and a WARN() is triggered from kernelmode (which is implemented via {c.,}ebreak). The kernel will trap on the kernelmode {c.,}ebreak, look for uprobes handlers, realize incorrectly that uprobes need to be handled, and exit the trap handler early. The trap returns to re-executing the {c.,}ebreak, and enter an infinite trap-loop. The issue was found running the BPF selftest [1]. Fix this issue by only considering the swbp/ss handlers for kernel/usermode respectively. Also, move CONFIG ifdeffery from traps.c to the asm/{k,u}probes.h headers. Note that linux/uprobes.h only include asm/uprobes.h if CONFIG_UPROBES is defined, which is why asm/uprobes.h needs to be unconditionally included in traps.c Link: https://lore.kernel.org/linux-riscv/87v8d19aun.fsf@all.your.base.are.belong.to.us/ # [1] Fixes: 74784081aac8 ("riscv: Add uprobes supported") Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Nam Cao <namcaov@gmail.com> Tested-by: Puranjay Mohan <puranjay12@gmail.com> Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20230912065619.62020-1-bjorn@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-12riscv: errata: fix T-Head dcache.cva encodingIcenowy Zheng1-2/+2
The dcache.cva encoding shown in the comments are wrong, it's for dcache.cval1 (which is restricted to L1) instead. Fix this in the comment and in the hardcoded instruction. Signed-off-by: Icenowy Zheng <uwu@icenowy.me> Tested-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Reviewed-by: Guo Ren <guoren@kernel.org> Tested-by: Drew Fustini <dfustini@baylibre.com> Link: https://lore.kernel.org/r/20230912072410.2481-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-10Merge tag 'riscv-for-linus-6.6-mw2-2' of ↵Linus Torvalds9-9/+54
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: - The kernel now dynamically probes for misaligned access speed, as opposed to relying on a table of known implementations. - Support for non-coherent devices on systems using the Andes AX45MP core, including the RZ/Five SoCs. - Support for the V extension in ptrace(), again. - Support for KASLR. - Support for the BPF prog pack allocator in RISC-V. - A handful of bug fixes and cleanups. * tag 'riscv-for-linus-6.6-mw2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (25 commits) soc: renesas: Kconfig: For ARCH_R9A07G043 select the required configs if dependencies are met riscv: Kconfig.errata: Add dependency for RISCV_SBI in ERRATA_ANDES config riscv: Kconfig.errata: Drop dependency for MMU in ERRATA_ANDES_CMO config riscv: Kconfig: Select DMA_DIRECT_REMAP only if MMU is enabled bpf, riscv: use prog pack allocator in the BPF JIT riscv: implement a memset like function for text riscv: extend patch_text_nosync() for multiple pages bpf: make bpf_prog_pack allocator portable riscv: libstub: Implement KASLR by using generic functions libstub: Fix compilation warning for rv32 arm64: libstub: Move KASLR handling functions to kaslr.c riscv: Dump out kernel offset information on panic riscv: Introduce virtual kernel mapping KASLR RISC-V: Add ptrace support for vectors soc: renesas: Kconfig: Select the required configs for RZ/Five SoC cache: Add L2 cache management for Andes AX45MP RISC-V core dt-bindings: cache: andestech,ax45mp-cache: Add DT binding documentation for L2 cache controller riscv: mm: dma-noncoherent: nonstandard cache operations support riscv: errata: Add Andes alternative ports riscv: asm: vendorid_list: Add Andes Technology to the vendors list ...
2023-09-08Merge patch series "bpf, riscv: use BPF prog pack allocator in BPF JIT"Palmer Dabbelt1-0/+1
Puranjay Mohan <puranjay12@gmail.com> says: Here is some data to prove the V2 fixes the problem: Without this series: root@rv-selftester:~/src/kselftest/bpf# time ./test_tag test_tag: OK (40945 tests) real 7m47.562s user 0m24.145s sys 6m37.064s With this series applied: root@rv-selftester:~/src/selftest/bpf# time ./test_tag test_tag: OK (40945 tests) real 7m29.472s user 0m25.865s sys 6m18.401s BPF programs currently consume a page each on RISCV. For systems with many BPF programs, this adds significant pressure to instruction TLB. High iTLB pressure usually causes slow down for the whole system. Song Liu introduced the BPF prog pack allocator[1] to mitigate the above issue. It packs multiple BPF programs into a single huge page. It is currently only enabled for the x86_64 BPF JIT. I enabled this allocator on the ARM64 BPF JIT[2]. It is being reviewed now. This patch series enables the BPF prog pack allocator for the RISCV BPF JIT. ====================================================== Performance Analysis of prog pack allocator on RISCV64 ====================================================== Test setup: =========== Host machine: Debian GNU/Linux 11 (bullseye) Qemu Version: QEMU emulator version 8.0.3 (Debian 1:8.0.3+dfsg-1) u-boot-qemu Version: 2023.07+dfsg-1 opensbi Version: 1.3-1 To test the performance of the BPF prog pack allocator on RV, a stresser tool[4] linked below was built. This tool loads 8 BPF programs on the system and triggers 5 of them in an infinite loop by doing system calls. The runner script starts 20 instances of the above which loads 8*20=160 BPF programs on the system, 5*20=100 of which are being constantly triggered. The script is passed a command which would be run in the above environment. The script was run with following perf command: ./run.sh "perf stat -a \ -e iTLB-load-misses \ -e dTLB-load-misses \ -e dTLB-store-misses \ -e instructions \ --timeout 60000" The output of the above command is discussed below before and after enabling the BPF prog pack allocator. The tests were run on qemu-system-riscv64 with 8 cpus, 16G memory. The rootfs was created using Bjorn's riscv-cross-builder[5] docker container linked below. Results ======= Before enabling prog pack allocator: ------------------------------------ Performance counter stats for 'system wide': 4939048 iTLB-load-misses 5468689 dTLB-load-misses 465234 dTLB-store-misses 1441082097998 instructions 60.045791200 seconds time elapsed After enabling prog pack allocator: ----------------------------------- Performance counter stats for 'system wide': 3430035 iTLB-load-misses 5008745 dTLB-load-misses 409944 dTLB-store-misses 1441535637988 instructions 60.046296600 seconds time elapsed Improvements in metrics ======================= It was expected that the iTLB-load-misses would decrease as now a single huge page is used to keep all the BPF programs compared to a single page for each program earlier. -------------------------------------------- The improvement in iTLB-load-misses: -30.5 % -------------------------------------------- I repeated this expriment more than 100 times in different setups and the improvement was always greater than 30%. This patch series is boot tested on the Starfive VisionFive 2 board[6]. The performance analysis was not done on the board because it doesn't expose iTLB-load-misses, etc. The stresser program was run on the board to test the loading and unloading of BPF programs [1] https://lore.kernel.org/bpf/20220204185742.271030-1-song@kernel.org/ [2] https://lore.kernel.org/all/20230626085811.3192402-1-puranjay12@gmail.com/ [3] https://lore.kernel.org/all/20230626085811.3192402-2-puranjay12@gmail.com/ [4] https://github.com/puranjaymohan/BPF-Allocator-Bench [5] https://github.com/bjoto/riscv-cross-builder [6] https://www.starfivetech.com/en/site/boards * b4-shazam-merge: bpf, riscv: use prog pack allocator in the BPF JIT riscv: implement a memset like function for text riscv: extend patch_text_nosync() for multiple pages bpf: make bpf_prog_pack allocator portable Link: https://lore.kernel.org/r/20230831131229.497941-1-puranjay12@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08Merge patch series "riscv: Introduce KASLR"Palmer Dabbelt2-0/+5
Alexandre Ghiti <alexghiti@rivosinc.com> says: The following KASLR implementation allows to randomize the kernel mapping: - virtually: we expect the bootloader to provide a seed in the device-tree - physically: only implemented in the EFI stub, it relies on the firmware to provide a seed using EFI_RNG_PROTOCOL. arm64 has a similar implementation hence the patch 3 factorizes KASLR related functions for riscv to take advantage. The new virtual kernel location is limited by the early page table that only has one PUD and with the PMD alignment constraint, the kernel can only take < 512 positions. * b4-shazam-merge: riscv: libstub: Implement KASLR by using generic functions libstub: Fix compilation warning for rv32 arm64: libstub: Move KASLR handling functions to kaslr.c riscv: Dump out kernel offset information on panic riscv: Introduce virtual kernel mapping KASLR Link: https://lore.kernel.org/r/20230722123850.634544-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08Merge patch "RISC-V: Add ptrace support for vectors"Palmer Dabbelt1-4/+9
This resurrects the vector ptrace() support that was removed for 6.5 due to some bugs cropping up as part of the GDB review process. * b4-shazam-merge: RISC-V: Add ptrace support for vectors Link: https://lore.kernel.org/r/20230825050248.32681-1-andy.chiu@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08Merge patch series "Add non-coherent DMA support for AX45MP"Palmer Dabbelt4-0/+37
Prabhakar <prabhakar.csengg@gmail.com> says: From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> non-coherent DMA support for AX45MP ==================================== On the Andes AX45MP core, cache coherency is a specification option so it may not be supported. In this case DMA will fail. To get around with this issue this patch series does the below: 1] Andes alternative ports is implemented as errata which checks if the IOCP is missing and only then applies to CMO errata. One vendor specific SBI EXT (ANDES_SBI_EXT_IOCP_SW_WORKAROUND) is implemented as part of errata. Below are the configs which Andes port provides (and are selected by RZ/Five): - ERRATA_ANDES - ERRATA_ANDES_CMO OpenSBI patch supporting ANDES_SBI_EXT_IOCP_SW_WORKAROUND SBI is now part v1.3 release. 2] Andes AX45MP core has a Programmable Physical Memory Attributes (PMA) block that allows dynamic adjustment of memory attributes in the runtime. It contains a configurable amount of PMA entries implemented as CSR registers to control the attributes of memory locations in interest. OpenSBI configures the PMA regions as required and creates a reserve memory node and propagates it to the higher boot stack. Currently OpenSBI (upstream) configures the required PMA region and passes this a shared DMA pool to Linux. reserved-memory { #address-cells = <2>; #size-cells = <2>; ranges; pma_resv0@58000000 { compatible = "shared-dma-pool"; reg = <0x0 0x58000000 0x0 0x08000000>; no-map; linux,dma-default; }; }; The above shared DMA pool gets appended to Linux DTB so the DMA memory requests go through this region. 3] We provide callbacks to synchronize specific content between memory and cache. 4] RZ/Five SoC selects the below configs - AX45MP_L2_CACHE - DMA_GLOBAL_POOL - ERRATA_ANDES - ERRATA_ANDES_CMO ----------x---------------------x--------------------x---------------x---- * b4-shazam-merge: soc: renesas: Kconfig: Select the required configs for RZ/Five SoC cache: Add L2 cache management for Andes AX45MP RISC-V core dt-bindings: cache: andestech,ax45mp-cache: Add DT binding documentation for L2 cache controller riscv: mm: dma-noncoherent: nonstandard cache operations support riscv: errata: Add Andes alternative ports riscv: asm: vendorid_list: Add Andes Technology to the vendors list Link: https://lore.kernel.org/r/20230818135723.80612-1-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08Merge patch series "RISC-V: Probe for misaligned access speed"Palmer Dabbelt2-5/+2
Evan Green <evan@rivosinc.com> says: The current setting for the hwprobe bit indicating misaligned access speed is controlled by a vendor-specific feature probe function. This is essentially a per-SoC table we have to maintain on behalf of each vendor going forward. Let's convert that instead to something we detect at runtime. We have two assembly routines at the heart of our probe: one that does a bunch of word-sized accesses (without aligning its input buffer), and the other that does byte accesses. If we can move a larger number of bytes using misaligned word accesses than we can with the same amount of time doing byte accesses, then we can declare misaligned accesses as "fast". The tradeoff of reducing this maintenance burden is boot time. We spend 4-6 jiffies per core doing this measurement (0-2 on jiffie edge alignment, and 4 on measurement). The timing loop was based on raid6_choose_gen(), which uses (16+1)*N jiffies (where N is the number of algorithms). By taking only the fastest iteration out of all attempts for use in the comparison, variance between runs is very low. On my THead C906, it looks like this: [ 0.047563] cpu0: Ratio of byte access time to unaligned word access is 4.34, unaligned accesses are fast Several others have chimed in with results on slow machines with the older algorithm, which took all runs into account, including noise like interrupts. Even with this variation, results indicate that in all cases (fast, slow, and emulated) the measured numbers are nowhere near each other (always multiple factors away). * b4-shazam-merge: RISC-V: alternative: Remove feature_probe_func RISC-V: Probe for unaligned access speed Link: https://lore.kernel.org/r/20230818194136.4084400-1-evan@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds4-4/+29
Pull kvm updates from Paolo Bonzini: "ARM: - Clean up vCPU targets, always returning generic v8 as the preferred target - Trap forwarding infrastructure for nested virtualization (used for traps that are taken from an L2 guest and are needed by the L1 hypervisor) - FEAT_TLBIRANGE support to only invalidate specific ranges of addresses when collapsing a table PTE to a block PTE. This avoids that the guest refills the TLBs again for addresses that aren't covered by the table PTE. - Fix vPMU issues related to handling of PMUver. - Don't unnecessary align non-stack allocations in the EL2 VA space - Drop HCR_VIRT_EXCP_MASK, which was never used... - Don't use smp_processor_id() in kvm_arch_vcpu_load(), but the cpu parameter instead - Drop redundant call to kvm_set_pfn_accessed() in user_mem_abort() - Remove prototypes without implementations RISC-V: - Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for guest - Added ONE_REG interface for SATP mode - Added ONE_REG interface to enable/disable multiple ISA extensions - Improved error codes returned by ONE_REG interfaces - Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V - Added get-reg-list selftest for KVM RISC-V s390: - PV crypto passthrough enablement (Tony, Steffen, Viktor, Janosch) Allows a PV guest to use crypto cards. Card access is governed by the firmware and once a crypto queue is "bound" to a PV VM every other entity (PV or not) looses access until it is not bound anymore. Enablement is done via flags when creating the PV VM. - Guest debug fixes (Ilya) x86: - Clean up KVM's handling of Intel architectural events - Intel bugfixes - Add support for SEV-ES DebugSwap, allowing SEV-ES guests to use debug registers and generate/handle #DBs - Clean up LBR virtualization code - Fix a bug where KVM fails to set the target pCPU during an IRTE update - Fix fatal bugs in SEV-ES intrahost migration - Fix a bug where the recent (architecturally correct) change to reinject #BP and skip INT3 broke SEV guests (can't decode INT3 to skip it) - Retry APIC map recalculation if a vCPU is added/enabled - Overhaul emergency reboot code to bring SVM up to par with VMX, tie the "emergency disabling" behavior to KVM actually being loaded, and move all of the logic within KVM - Fix user triggerable WARNs in SVM where KVM incorrectly assumes the TSC ratio MSR cannot diverge from the default when TSC scaling is disabled up related code - Add a framework to allow "caching" feature flags so that KVM can check if the guest can use a feature without needing to search guest CPUID - Rip out the ancient MMU_DEBUG crud and replace the useful bits with CONFIG_KVM_PROVE_MMU - Fix KVM's handling of !visible guest roots to avoid premature triple fault injection - Overhaul KVM's page-track APIs, and KVMGT's usage, to reduce the API surface that is needed by external users (currently only KVMGT), and fix a variety of issues in the process Generic: - Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier events to pass action specific data without needing to constantly update the main handlers. - Drop unused function declarations Selftests: - Add testcases to x86's sync_regs_test for detecting KVM TOCTOU bugs - Add support for printf() in guest code and covert all guest asserts to use printf-based reporting - Clean up the PMU event filter test and add new testcases - Include x86 selftests in the KVM x86 MAINTAINERS entry" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (279 commits) KVM: x86/mmu: Include mmu.h in spte.h KVM: x86/mmu: Use dummy root, backed by zero page, for !visible guest roots KVM: x86/mmu: Disallow guest from using !visible slots for page tables KVM: x86/mmu: Harden TDP MMU iteration against root w/o shadow page KVM: x86/mmu: Harden new PGD against roots without shadow pages KVM: x86/mmu: Add helper to convert root hpa to shadow page drm/i915/gvt: Drop final dependencies on KVM internal details KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers KVM: x86/mmu: Drop @slot param from exported/external page-track APIs KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled KVM: x86/mmu: Assert that correct locks are held for page write-tracking KVM: x86/mmu: Rename page-track APIs to reflect the new reality KVM: x86/mmu: Drop infrastructure for multiple page-track modes KVM: x86/mmu: Use page-track notifiers iff there are external users KVM: x86/mmu: Move KVM-only page-track declarations to internal header KVM: x86: Remove the unused page-track hook track_flush_slot() drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region() KVM: x86: Add a new page-track hook to handle memslot deletion drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot KVM: x86: Reject memslot MOVE operations if KVMGT is attached ...
2023-09-06riscv: implement a memset like function for textPuranjay Mohan1-0/+1
The BPF JIT needs to write invalid instructions to RX regions of memory to invalidate removed BPF programs. This needs a function like memset() that can work with RX memory. Implement patch_text_set_nosync() which is similar to text_poke_set() of x86. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Pu Lehui <pulehui@huawei.com> Acked-by: Björn Töpel <bjorn@kernel.org> Tested-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20230831131229.497941-4-puranjay12@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-06riscv: libstub: Implement KASLR by using generic functionsAlexandre Ghiti1-0/+2
We can now use arm64 functions to handle the move of the kernel physical mapping: if KASLR is enabled, we will try to get a random seed from the firmware, if not possible, the kernel will be moved to a location that suits its alignment constraints. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Song Shuai <songshuaishuai@tinylab.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20230722123850.634544-6-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-06riscv: Introduce virtual kernel mapping KASLRAlexandre Ghiti1-0/+3
KASLR implementation relies on a relocatable kernel so that we can move the kernel mapping. The seed needed to virtually move the kernel is taken from the device tree, so we rely on the bootloader to provide a correct seed. Zkr could be used unconditionnally instead if implemented, but that's for another patch. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Song Shuai <songshuaishuai@tinylab.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20230722123850.634544-2-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-01RISC-V: Add ptrace support for vectorsAndy Chiu1-4/+9
This patch add back the ptrace support with the following fix: - Define NT_RISCV_CSR and re-number NT_RISCV_VECTOR to prevent conflicting with gdb's NT_RISCV_CSR. - Use struct __riscv_v_regset_state to handle ptrace requests Since gdb does not directly include the note description header in Linux and has already defined NT_RISCV_CSR as 0x900, we decide to sync with gdb and renumber NT_RISCV_VECTOR to solve and prevent future conflicts. Fixes: 0c59922c769a ("riscv: Add ptrace vector support") Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Link: https://lore.kernel.org/r/20230825050248.32681-1-andy.chiu@sifive.com [Palmer: Drop the unused "size" variable in riscv_vr_set().] Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-01riscv: mm: dma-noncoherent: nonstandard cache operations supportLad Prabhakar1-0/+28
Introduce support for nonstandard noncoherent systems in the RISC-V architecture. It enables function pointer support to handle cache management in such systems. This patch adds a new configuration option called "RISCV_NONSTANDARD_CACHE_OPS." This option is a boolean flag that depends on "RISCV_DMA_NONCOHERENT" and enables the function pointer support for cache management in nonstandard noncoherent systems. Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> # tyre-kicking on a d1 Reviewed-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> # Link: https://lore.kernel.org/r/20230818135723.80612-4-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-01riscv: errata: Add Andes alternative portsLad Prabhakar2-0/+8
Add required ports of the Alternative scheme for Andes CPU cores. I/O Coherence Port (IOCP) provides an AXI interface for connecting external non-caching masters, such as DMA controllers. IOCP is a specification option and is disabled on the Renesas RZ/Five SoC due to this reason cache management needs a software workaround. Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> # tyre-kicking on a d1 Link: https://lore.kernel.org/r/20230818135723.80612-3-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-01riscv: asm: vendorid_list: Add Andes Technology to the vendors listLad Prabhakar1-0/+1
Add Andes Technology to the vendors list. Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Conor Dooley <conor.dooley@microchip.com> # tyre-kicking on a d1 Link: https://lore.kernel.org/r/20230818135723.80612-2-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-01RISC-V: alternative: Remove feature_probe_funcEvan Green1-5/+0
Now that we're testing unaligned memory copy and making that determination generically, there are no more users of the vendor feature_probe_func(). While I think it's probably going to need to come back, there are no users right now, so let's remove it until it's needed. Signed-off-by: Evan Green <evan@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230818194136.4084400-3-evan@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-01RISC-V: Probe for unaligned access speedEvan Green1-0/+2
Rather than deferring unaligned access speed determinations to a vendor function, let's probe them and find out how fast they are. If we determine that an unaligned word access is faster than N byte accesses, mark the hardware's unaligned access as "fast". Otherwise, we mark accesses as slow. The algorithm itself runs for a fixed amount of jiffies. Within each iteration it attempts to time a single loop, and then keeps only the best (fastest) loop it saw. This algorithm was found to have lower variance from run to run than my first attempt, which counted the total number of iterations that could be done in that fixed amount of jiffies. By taking only the best iteration in the loop, assuming at least one loop wasn't perturbed by an interrupt, we eliminate the effects of interrupts and other "warm up" factors like branch prediction. The only downside is it depends on having an rdtime granular and accurate enough to measure a single copy. If we ever manage to complete a loop in 0 rdtime ticks, we leave the unaligned setting at UNKNOWN. There is a slight change in user-visible behavior here. Previously, all boards except the THead C906 reported misaligned access speed of UNKNOWN. C906 reported FAST. With this change, since we're now measuring misaligned access speed on each hart, all RISC-V systems will have this key set as either FAST or SLOW. Currently, we don't have a way to confidently measure the difference between SLOW and EMULATED, so we label anything not fast as SLOW. This will mislabel some systems that are actually EMULATED as SLOW. When we get support for delegating misaligned access traps to the kernel (as opposed to the firmware quietly handling it), we can explicitly test in Linux to see if unaligned accesses trap. Those systems will start to report EMULATED, though older (today's) systems without that new SBI mechanism will continue to report SLOW. I've updated the documentation for those hwprobe values to reflect this, specifically: SLOW may or may not be emulated by software, and FAST represents means being faster than equivalent byte accesses. The change in documentation is accurate with respect to both the former and current behavior. Signed-off-by: Evan Green <evan@rivosinc.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230818194136.4084400-2-evan@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-01Merge tag 'riscv-for-linus-6.6-mw1' of ↵Linus Torvalds14-23/+245
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for the new "riscv,isa-extensions" and "riscv,isa-base" device tree interfaces for probing extensions - Support for userspace access to the performance counters - Support for more instructions in kprobes - Crash kernels can be allocated above 4GiB - Support for KCFI - Support for ELFs in !MMU configurations - ARCH_KMALLOC_MINALIGN has been reduced to 8 - mmap() defaults to sv48-sized addresses, with longer addresses hidden behind a hint (similar to Arm and Intel) - Also various fixes and cleanups * tag 'riscv-for-linus-6.6-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (51 commits) lib/Kconfig.debug: Restrict DEBUG_INFO_SPLIT for RISC-V riscv: support PREEMPT_DYNAMIC with static keys riscv: Move create_tmp_mapping() to init sections riscv: Mark KASAN tmp* page tables variables as static riscv: mm: use bitmap_zero() API riscv: enable DEBUG_FORCE_FUNCTION_ALIGN_64B riscv: remove redundant mv instructions RISC-V: mm: Document mmap changes RISC-V: mm: Update pgtable comment documentation RISC-V: mm: Add tests for RISC-V mm RISC-V: mm: Restrict address space for sv39,sv48,sv57 riscv: enable DMA_BOUNCE_UNALIGNED_KMALLOC for !dma_coherent riscv: allow kmalloc() caches aligned to the smallest value riscv: support the elf-fdpic binfmt loader binfmt_elf_fdpic: support 64-bit systems riscv: Allow CONFIG_CFI_CLANG to be selected riscv/purgatory: Disable CFI riscv: Add CFI error handling riscv: Add ftrace_stub_graph riscv: Add types to indirectly called assembly functions ...
2023-08-31Merge tag 'x86_shstk_for_6.6-rc1' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 shadow stack support from Dave Hansen: "This is the long awaited x86 shadow stack support, part of Intel's Control-flow Enforcement Technology (CET). CET consists of two related security features: shadow stacks and indirect branch tracking. This series implements just the shadow stack part of this feature, and just for userspace. The main use case for shadow stack is providing protection against return oriented programming attacks. It works by maintaining a secondary (shadow) stack using a special memory type that has protections against modification. When executing a CALL instruction, the processor pushes the return address to both the normal stack and to the special permission shadow stack. Upon RET, the processor pops the shadow stack copy and compares it to the normal stack copy. For more information, refer to the links below for the earlier versions of this patch set" Link: https://lore.kernel.org/lkml/20220130211838.8382-1-rick.p.edgecombe@intel.com/ Link: https://lore.kernel.org/lkml/20230613001108.3040476-1-rick.p.edgecombe@intel.com/ * tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits) x86/shstk: Change order of __user in type x86/ibt: Convert IBT selftest to asm x86/shstk: Don't retry vm_munmap() on -EINTR x86/kbuild: Fix Documentation/ reference x86/shstk: Move arch detail comment out of core mm x86/shstk: Add ARCH_SHSTK_STATUS x86/shstk: Add ARCH_SHSTK_UNLOCK x86: Add PTRACE interface for shadow stack selftests/x86: Add shadow stack test x86/cpufeatures: Enable CET CR4 bit for shadow stack x86/shstk: Wire in shadow stack interface x86: Expose thread features in /proc/$PID/status x86/shstk: Support WRSS for userspace x86/shstk: Introduce map_shadow_stack syscall x86/shstk: Check that signal frame is shadow stack mem x86/shstk: Check that SSP is aligned on sigreturn x86/shstk: Handle signals for shadow stack x86/shstk: Introduce routines modifying shstk x86/shstk: Handle thread shadow stack x86/shstk: Add user-mode shadow stack support ...