summaryrefslogtreecommitdiff
path: root/arch/riscv/include/asm
AgeCommit message (Collapse)AuthorFilesLines
2025-06-05Merge patch series "riscv: ftrace: atmoic patching and preempt improvements"Alexandre Ghiti3-30/+59
Andy Chiu <andybnac@gmail.com> says: This series makes atomic code patching in ftrace possible and eliminates the need of the stop_machine dance. The major difference of this version is that we merge the CALL_OPS support from Puranjay [1] and make direct calls available for practical uses such as BPF. Thanks for the time reviewing the series and suggestions, we hope this version gets a step closer to happening in the upstream. Please reference the link to v3 below for more introductory view of the implementation [2] Added patch: 2, 4, 10, 11, 12 Modified patch: 5, 6 Unchanged patch: 1, 3, 7, 8, 9 (1, 8 has commit msg modified) Special thanks to Björn for his efforts on testing and guiding the series! [1]: https://lore.kernel.org/lkml/20240306165904.108141-1-puranjay12@gmail.com/ [2]: https://lore.kernel.org/linux-riscv/20241127172908.17149-1-andybnac@gmail.com/ * patches from https://lore.kernel.org/r/20250407180838.42877-1-andybnac@gmail.com: riscv: Documentation: add a description about dynamic ftrace riscv: ftrace: support direct call using call_ops riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS riscv: ftrace: support PREEMPT riscv: add a data fence for CMODX in the kernel mode riscv: vector: Support calling schedule() for preemptible Vector riscv: ftrace: do not use stop_machine to update code riscv: ftrace: prepare ftrace for atomic code patching kernel: ftrace: export ftrace_sync_ipi riscv: ftrace: align patchable functions to 4 Byte boundary riscv: ftrace factor out code defined by !WITH_ARG riscv: ftrace: support fastcc in Clang for WITH_ARGS Link: https://lore.kernel.org/r/20250407180838.42877-1-andybnac@gmail.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2025-06-05riscv: Add support for PUD THPAlexandre Ghiti3-2/+102
Add the necessary page table functions to deal with PUD THP, this enables the use of PUD pfnmap. Link: https://lore.kernel.org/r/20250321123954.225097-1-alexghiti@rivosinc.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05riscv: xchg: Prefetch the destination word for sc.wGuo Ren1-1/+3
The cost of changing a cacheline from shared to exclusive state can be significant, especially when this is triggered by an exclusive store, since it may result in having to retry the transaction. This patch makes use of prefetch.w to prefetch cachelines for write prior to lr/sc loops when using the xchg_small atomic routine. This patch is inspired by commit 0ea366f5e1b6 ("arm64: atomics: prefetch the destination word for write prior to stxr"). Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20231231082955.16516-4-guoren@kernel.org Tested-by: Andrea Parri <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20250421142441.395849-5-alexghiti@rivosinc.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05riscv: Add ARCH_HAS_PREFETCH[W] support with ZicbopGuo Ren1-0/+24
Enable Linux prefetch and prefetchw primitives using Zicbop. Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20231231082955.16516-3-guoren@kernel.org Tested-by: Andrea Parri <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20250421142441.395849-4-alexghiti@rivosinc.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05riscv: Add support for ZicbopAlexandre Ghiti5-6/+8
Zicbop introduces cache blocks prefetching instructions, add the necessary support for the kernel to use it in the coming commits. Co-developed-by: Guo Ren <guoren@kernel.org> Signed-off-by: Guo Ren <guoren@kernel.org> Tested-by: Andrea Parri <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20250421142441.395849-3-alexghiti@rivosinc.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05riscv: Introduce Zicbop instructionsAlexandre Ghiti1-0/+60
The S-type instructions are first introduced and then used to define the encoding of the Zicbop prefetching instructions. Co-developed-by: Guo Ren <guoren@kernel.org> Signed-off-by: Guo Ren <guoren@kernel.org> Tested-by: Andrea Parri <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20250421142441.395849-2-alexghiti@rivosinc.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05riscv: kexec_file: Support loading Image binary fileSong Shuai2-0/+3
This patch creates image_kexec_ops to load Image binary file for kexec_file_load() syscall. Signed-off-by: Song Shuai <songshuaishuai@tinylab.org> Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20250409193004.643839-3-bjorn@kernel.org Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05riscv: kexec_file: Split the loading of kernel and othersSong Shuai1-0/+5
This is the preparative patch for kexec_file_load Image support. It separates the elf_kexec_load() as two parts: - the first part loads the vmlinux (or Image) - the second part loads other segments (e.g. initrd,fdt,purgatory) And the second part is exported as the load_extra_segments() function which would be used in both kexec-elf.c and kexec-image.c. No functional change intended. Signed-off-by: Song Shuai <songshuaishuai@tinylab.org> Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20250409193004.643839-2-bjorn@kernel.org Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05riscv: ftrace: support direct call using call_opsAndy Chiu1-0/+6
jump to FTRACE_ADDR if distance is out of reach Co-developed-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Andy Chiu <andybnac@gmail.com> Link: https://lore.kernel.org/r/20250407180838.42877-11-andybnac@gmail.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05riscv: vector: Support calling schedule() for preemptible VectorAndy Chiu2-3/+24
Each function entry implies a call to ftrace infrastructure. And it may call into schedule in some cases. So, it is possible for preemptible kernel-mode Vector to implicitly call into schedule. Since all V-regs are caller-saved, it is possible to drop all V context when a thread voluntarily call schedule(). Besides, we currently don't pass argument through vector register, so we don't have to save/restore V-regs in ftrace trampoline. Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Link: https://lore.kernel.org/r/20250407180838.42877-7-andybnac@gmail.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05riscv: ftrace: prepare ftrace for atomic code patchingAndy Chiu1-27/+22
We use an AUIPC+JALR pair to jump into a ftrace trampoline. Since instruction fetch can break down to 4 byte at a time, it is impossible to update two instructions without a race. In order to mitigate it, we initialize the patchable entry to AUIPC + NOP4. Then, the run-time code patching can change NOP4 to JALR to eable/disable ftrcae from a function. This limits the reach of each ftrace entry to +-2KB displacing from ftrace_caller. Starting from the trampoline, we add a level of indirection for it to reach ftrace caller target. Now, it loads the target address from a memory location, then perform the jump. This enable the kernel to update the target atomically. The new don't-stop-the-world text patching on change only one RISC-V instruction: | -8: &ftrace_ops of the associated tracer function. | <ftrace enable>: | 0: auipc t0, hi(ftrace_caller) | 4: jalr t0, lo(ftrace_caller) | | -8: &ftrace_nop_ops | <ftrace disable>: | 0: auipc t0, hi(ftrace_caller) | 4: nop This means that f+0x0 is fixed, and should not be claimed by ftrace, e.g. kprobe should be able to put a probe in f+0x0. Thus, we adjust the offset and MCOUNT_INSN_SIZE accordingly. [ alex: Fix build errors with !CONFIG_DYNAMIC_FTRACE ] Co-developed-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Link: https://lore.kernel.org/r/20250407180838.42877-5-andybnac@gmail.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05riscv: ftrace: support fastcc in Clang for WITH_ARGSAndy Chiu1-0/+7
Some caller-saved registers which are not defined as function arguments in the ABI can still be passed as arguments when the kernel is compiled with Clang. As a result, we must save and restore those registers to prevent ftrace from clobbering them. - [1]: https://reviews.llvm.org/D68559 Reported-by: Evgenii Shatokhin <e.shatokhin@yadro.com> Closes: https://lore.kernel.org/linux-riscv/7e7c7914-445d-426d-89a0-59a9199c45b1@yadro.com/ Fixes: 7caa9765465f ("ftrace: riscv: move from REGS to ARGS") Acked-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Andy Chiu <andy.chiu@sifive.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20250407180838.42877-1-andybnac@gmail.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05riscv: misaligned: add a function to check misalign trap delegabilityClément Léger1-0/+6
Checking for the delegability of the misaligned access trap is needed for the KVM FWFT extension implementation. Add a function to get the delegability of the misaligned trap exception. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20250523101932.1594077-11-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05riscv: misaligned: declare misaligned_access_speed under CONFIG_RISCV_MISALIGNEDClément Léger1-1/+4
While misaligned_access_speed was defined in a file compile with CONFIG_RISCV_MISALIGNED, its definition was under CONFIG_RISCV_SCALAR_MISALIGNED. This resulted in compilation problems when using it in a file compiled with CONFIG_RISCV_MISALIGNED. Move the declaration under CONFIG_RISCV_MISALIGNED so that it can be used unconditionnally when compiled with that config and remove the check for that variable in traps_misaligned.c. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20250523101932.1594077-9-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05riscv: misaligned: request misaligned exception from SBIClément Léger1-1/+2
Now that the kernel can handle misaligned accesses in S-mode, request misaligned access exception delegation from SBI. This uses the FWFT SBI extension defined in SBI version 3.0. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20250523101932.1594077-7-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05riscv: sbi: add FWFT extension interfaceClément Léger1-0/+17
This SBI extensions enables supervisor mode to control feature that are under M-mode control (For instance, Svadu menvcfg ADUE bit, Ssdbltrp DTE, etc). Add an interface to set local features for a specific cpu mask as well as for the online cpu mask. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250523101932.1594077-5-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05riscv: sbi: add new SBI error mappingsClément Léger1-0/+10
A few new errors have been added with SBI V3.0, maps them as close as possible to errno values. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250523101932.1594077-4-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05riscv: sbi: add Firmware Feature (FWFT) SBI extensions definitionsClément Léger1-0/+33
The Firmware Features extension (FWFT) was added as part of the SBI 3.0 specification. Add SBI definitions to use this extension. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Tested-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Deepak Gupta <debug@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250523101932.1594077-2-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-02Revert "RISC-V: vDSO: Wire up getrandom() vDSO implementation"Palmer Dabbelt1-30/+0
This has been on -next for a bit, but it's broken and there's already a v2. So I'm reverting it to avoid more rebasing. This reverts commit 89079520cef65d6da1e864eab4464effe5396e23. Link: https://lore.kernel.org/r/20250602173315.20228-1-palmer@dabbelt.com Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-01Merge tag 'mm-stable-2025-05-31-14-50' of ↵Linus Torvalds3-4/+19
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of creating a pte which addresses the first page in a folio and reduces the amount of plumbing which architecture must implement to provide this. - "Misc folio patches for 6.16" from Matthew Wilcox is a shower of largely unrelated folio infrastructure changes which clean things up and better prepare us for future work. - "memory,x86,acpi: hotplug memory alignment advisement" from Gregory Price adds early-init code to prevent x86 from leaving physical memory unused when physical address regions are not aligned to memory block size. - "mm/compaction: allow more aggressive proactive compaction" from Michal Clapinski provides some tuning of the (sadly, hard-coded (more sadly, not auto-tuned)) thresholds for our invokation of proactive compaction. In a simple test case, the reduction of a guest VM's memory consumption was dramatic. - "Minor cleanups and improvements to swap freeing code" from Kemeng Shi provides some code cleaups and a small efficiency improvement to this part of our swap handling code. - "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin adds the ability for a ptracer to modify syscalls arguments. At this time we can alter only "system call information that are used by strace system call tampering, namely, syscall number, syscall arguments, and syscall return value. This series should have been incorporated into mm.git's "non-MM" branch, but I goofed. - "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl against /proc/pid/pagemap. This permits CRIU to more efficiently get at the info about guard regions. - "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan implements that fix. No runtime effect is expected because validate_page_before_insert() happens to fix up this error. - "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David Hildenbrand basically brings uprobe text poking into the current decade. Remove a bunch of hand-rolled implementation in favor of using more current facilities. - "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman Khandual provides enhancements and generalizations to the pte dumping code. This might be needed when 128-bit Page Table Descriptors are enabled for ARM. - "Always call constructor for kernel page tables" from Kevin Brodsky ensures that the ctor/dtor is always called for kernel pgtables, as it already is for user pgtables. This permits the addition of more functionality such as "insert hooks to protect page tables". This change does result in various architectures performing unnecesary work, but this is fixed up where it is anticipated to occur. - "Rust support for mm_struct, vm_area_struct, and mmap" from Alice Ryhl adds plumbing to permit Rust access to core MM structures. - "fix incorrectly disallowed anonymous VMA merges" from Lorenzo Stoakes takes advantage of some VMA merging opportunities which we've been missing for 15 years. - "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from SeongJae Park optimizes process_madvise()'s TLB flushing. Instead of flushing each address range in the provided iovec, we batch the flushing across all the iovec entries. The syscall's cost was approximately halved with a microbenchmark which was designed to load this particular operation. - "Track node vacancy to reduce worst case allocation counts" from Sidhartha Kumar makes the maple tree smarter about its node preallocation. stress-ng mmap performance increased by single-digit percentages and the amount of unnecessarily preallocated memory was dramaticelly reduced. - "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes a few unnecessary things which Baoquan noted when reading the code. - ""Enhance sysfs handling for memory hotplug in weighted interleave" from Rakie Kim "enhances the weighted interleave policy in the memory management subsystem by improving sysfs handling, fixing memory leaks, and introducing dynamic sysfs updates for memory hotplug support". Fixes things on error paths which we are unlikely to hit. - "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory" from SeongJae Park introduces new DAMOS quota goal metrics which eliminate the manual tuning which is required when utilizing DAMON for memory tiering. - "mm/vmalloc.c: code cleanup and improvements" from Baoquan He provides cleanups and small efficiency improvements which Baoquan found via code inspection. - "vmscan: enforce mems_effective during demotion" from Gregory Price changes reclaim to respect cpuset.mems_effective during demotion when possible. because presently, reclaim explicitly ignores cpuset.mems_effective when demoting, which may cause the cpuset settings to violated. This is useful for isolating workloads on a multi-tenant system from certain classes of memory more consistently. - "Clean up split_huge_pmd_locked() and remove unnecessary folio pointers" from Gavin Guo provides minor cleanups and efficiency gains in in the huge page splitting and migrating code. - "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache for `struct mem_cgroup', yielding improved memory utilization. - "add max arg to swappiness in memory.reclaim and lru_gen" from Zhongkun He adds a new "max" argument to the "swappiness=" argument for memory.reclaim MGLRU's lru_gen. This directs proactive reclaim to reclaim from only anon folios rather than file-backed folios. - "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the first step on the path to permitting the kernel to maintain existing VMs while replacing the host kernel via file-based kexec. At this time only memblock's reserve_mem is preserved. - "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides and uses a smarter way of looping over a pfn range. By skipping ranges of invalid pfns. - "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning when a task is pinned a single NUMA mode. Dramatic performance benefits were seen in some real world cases. - "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank Garg addresses a warning which occurs during memory compaction when using JFS. - "move all VMA allocation, freeing and duplication logic to mm" from Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more appropriate mm/vma.c. - "mm, swap: clean up swap cache mapping helper" from Kairui Song provides code consolidation and cleanups related to the folio_index() function. - "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that. - "memcg: Fix test_memcg_min/low test failures" from Waiman Long addresses some bogus failures which are being reported by the test_memcontrol selftest. - "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo Stoakes commences the deprecation of file_operations.mmap() in favor of the new file_operations.mmap_prepare(). The latter is more restrictive and prevents drivers from messing with things in ways which, amongst other problems, may defeat VMA merging. - "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples the per-cpu memcg charge cache from the objcg's one. This is a step along the way to making memcg and objcg charging NMI-safe, which is a BPF requirement. - "mm/damon: minor fixups and improvements for code, tests, and documents" from SeongJae Park is yet another batch of miscellaneous DAMON changes. Fix and improve minor problems in code, tests and documents. - "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg stats to be irq safe. Another step along the way to making memcg charging and stats updates NMI-safe, a BPF requirement. - "Let unmap_hugepage_range() and several related functions take folio instead of page" from Fan Ni provides folio conversions in the hugetlb code. * tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (285 commits) mm: pcp: increase pcp->free_count threshold to trigger free_high mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range() mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page mm/hugetlb: pass folio instead of page to unmap_ref_private() memcg: objcg stock trylock without irq disabling memcg: no stock lock for cpu hot-unplug memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs memcg: make count_memcg_events re-entrant safe against irqs memcg: make mod_memcg_state re-entrant safe against irqs memcg: move preempt disable to callers of memcg_rstat_updated memcg: memcg_rstat_updated re-entrant safe against irqs mm: khugepaged: decouple SHMEM and file folios' collapse selftests/eventfd: correct test name and improve messages alloc_tag: check mem_profiling_support in alloc_tag_init Docs/damon: update titles and brief introductions to explain DAMOS selftests/damon/_damon_sysfs: read tried regions directories in order mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject() mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat() mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs ...
2025-05-29Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds4-13/+16
Pull kvm updates from Paolo Bonzini: "As far as x86 goes this pull request "only" includes TDX host support. Quotes are appropriate because (at 6k lines and 100+ commits) it is much bigger than the rest, which will come later this week and consists mostly of bugfixes and selftests. s390 changes will also come in the second batch. ARM: - Add large stage-2 mapping (THP) support for non-protected guests when pKVM is enabled, clawing back some performance. - Enable nested virtualisation support on systems that support it, though it is disabled by default. - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and protected modes. - Large rework of the way KVM tracks architecture features and links them with the effects of control bits. While this has no functional impact, it ensures correctness of emulation (the data is automatically extracted from the published JSON files), and helps dealing with the evolution of the architecture. - Significant changes to the way pKVM tracks ownership of pages, avoiding page table walks by storing the state in the hypervisor's vmemmap. This in turn enables the THP support described above. - New selftest checking the pKVM ownership transition rules - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests even if the host didn't have it. - Fixes for the address translation emulation, which happened to be rather buggy in some specific contexts. - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N from the number of counters exposed to a guest and addressing a number of issues in the process. - Add a new selftest for the SVE host state being corrupted by a guest. - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, ensuring that the window for interrupts is slightly bigger, and avoiding a pretty bad erratum on the AmpereOne HW. - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from a pretty bad case of TLB corruption unless accesses to HCR_EL2 are heavily synchronised. - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables in a human-friendly fashion. - and the usual random cleanups. LoongArch: - Don't flush tlb if the host supports hardware page table walks. - Add KVM selftests support. RISC-V: - Add vector registers to get-reg-list selftest - VCPU reset related improvements - Remove scounteren initialization from VCPU reset - Support VCPU reset from userspace using set_mpstate() ioctl x86: - Initial support for TDX in KVM. This finally makes it possible to use the TDX module to run confidential guests on Intel processors. This is quite a large series, including support for private page tables (managed by the TDX module and mirrored in KVM for efficiency), forwarding some TDVMCALLs to userspace, and handling several special VM exits from the TDX module. This has been in the works for literally years and it's not really possible to describe everything here, so I'll defer to the various merge commits up to and including commit 7bcf7246c42a ('Merge branch 'kvm-tdx-finish-initial' into HEAD')" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (248 commits) x86/tdx: mark tdh_vp_enter() as __flatten Documentation: virt/kvm: remove unreferenced footnote RISC-V: KVM: lock the correct mp_state during reset KVM: arm64: Fix documentation for vgic_its_iter_next() KVM: arm64: np-guest CMOs with PMD_SIZE fixmap KVM: arm64: Stage-2 huge mappings for np-guests KVM: arm64: Add a range to pkvm_mappings KVM: arm64: Convert pkvm_mappings to interval tree KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest() KVM: arm64: Add a range to __pkvm_host_wrprotect_guest() KVM: arm64: Add a range to __pkvm_host_unshare_guest() KVM: arm64: Add a range to __pkvm_host_share_guest() KVM: arm64: Introduce for_each_hyp_page KVM: arm64: Handle huge mappings for np-guest CMOs KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET RISC-V: KVM: Remove scounteren initialization KVM: RISC-V: remove unnecessary SBI reset state ...
2025-05-21RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESETRadim Krčmář2-0/+4
Add a toggleable VM capability to reset the VCPU from userspace by setting MP_STATE_INIT_RECEIVED through IOCTL. Reset through a mp_state to avoid adding a new IOCTL. Do not reset on a transition from STOPPED to RUNNABLE, because it's better to avoid side effects that would complicate userspace adoption. The MP_STATE_INIT_RECEIVED is not a permanent mp_state -- IOCTL resets the VCPU while preserving the original mp_state -- because we wouldn't gain much from having a new state it in the rest of KVM, but it's a very non-standard use of the IOCTL. Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20250515143723.2450630-5-rkrcmar@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-05-21KVM: RISC-V: remove unnecessary SBI reset stateRadim Krčmář2-9/+8
The SBI reset state has only two variables -- pc and a1. The rest is known, so keep only the necessary information. The reset structures make sense if we want userspace to control the reset state (which we do), but I'd still remove them now and reintroduce with the userspace interface later -- we could probably have just a single reset state per VM, instead of a reset state for each VCPU. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com> Link: https://lore.kernel.org/r/20250403112522.1566629-6-rkrcmar@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-05-21KVM: RISC-V: refactor sbi reset requestRadim Krčmář1-0/+2
The same code is used twice and SBI reset sets only two variables. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com> Link: https://lore.kernel.org/r/20250403112522.1566629-5-rkrcmar@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-05-21KVM: RISC-V: refactor vector state resetRadim Krčmář1-4/+2
Do not depend on the reset structures. vector.datap is a kernel memory pointer that needs to be preserved as it is not a part of the guest vector data. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com> Link: https://lore.kernel.org/r/20250403112522.1566629-4-rkrcmar@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org>
2025-05-12syscall.h: introduce syscall_set_nr()Dmitry V. Levin1-0/+7
Similar to syscall_set_arguments() that complements syscall_get_arguments(), introduce syscall_set_nr() that complements syscall_get_nr(). syscall_set_nr() is going to be needed along with syscall_set_arguments() on all HAVE_ARCH_TRACEHOOK architectures to implement PTRACE_SET_SYSCALL_INFO API. Link: https://lkml.kernel.org/r/20250303112020.GD24170@strace.io Signed-off-by: Dmitry V. Levin <ldv@strace.io> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Acked-by: Helge Deller <deller@gmx.de> # parisc Reviewed-by: Maciej W. Rozycki <macro@orcam.me.uk> # mips Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexey Gladkov (Intel) <legion@kernel.org> Cc: Andreas Larsson <andreas@gaisler.com> Cc: anton ivanov <anton.ivanov@cambridgegreys.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Betkov <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Zankel <chris@zankel.net> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Davide Berardi <berardi.dav@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Eugene Syromiatnikov <esyr@redhat.com> Cc: Eugene Syromyatnikov <evgsyr@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonas Bonn <jonas@southpole.se> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Renzo Davoi <renzo@cs.unibo.it> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russel King <linux@armlinux.org.uk> Cc: Shuah Khan <shuah@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12syscall.h: add syscall_set_arguments()Dmitry V. Levin1-0/+12
This function is going to be needed on all HAVE_ARCH_TRACEHOOK architectures to implement PTRACE_SET_SYSCALL_INFO API. This partially reverts commit 7962c2eddbfe ("arch: remove unused function syscall_set_arguments()") by reusing some of old syscall_set_arguments() implementations. [nathan@kernel.org: fix compile time fortify checks] Link: https://lkml.kernel.org/r/20250408213131.GA2872426@ax162 Link: https://lkml.kernel.org/r/20250303112009.GC24170@strace.io Signed-off-by: Dmitry V. Levin <ldv@strace.io> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Acked-by: Helge Deller <deller@gmx.de> # parisc Reviewed-by: Maciej W. Rozycki <macro@orcam.me.uk> [mips] Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexey Gladkov (Intel) <legion@kernel.org> Cc: Andreas Larsson <andreas@gaisler.com> Cc: anton ivanov <anton.ivanov@cambridgegreys.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Betkov <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Zankel <chris@zankel.net> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Davide Berardi <berardi.dav@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Eugene Syromiatnikov <esyr@redhat.com> Cc: Eugene Syromyatnikov <evgsyr@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonas Bonn <jonas@southpole.se> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Renzo Davoi <renzo@cs.unibo.it> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russel King <linux@armlinux.org.uk> Cc: Shuah Khan <shuah@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12arch: remove mk_pmd()Matthew Wilcox (Oracle)1-2/+0
There are now no callers of mk_huge_pmd() and mk_pmd(). Remove them. Link: https://lkml.kernel.org/r/20250402181709.2386022-12-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Zi Yan <ziy@nvidia.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Muchun Song <muchun.song@linux.dev> Cc: Richard Weinberger <richard@nod.at> Cc: <x86@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12mm: introduce a common definition of mk_pte()Matthew Wilcox (Oracle)1-2/+0
Most architectures simply call pfn_pte(). Centralise that as the normal definition and remove the definition of mk_pte() from the architectures which have either that exact definition or something similar. Link: https://lkml.kernel.org/r/20250402181709.2386022-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> # s390 Cc: Zi Yan <ziy@nvidia.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Muchun Song <muchun.song@linux.dev> Cc: Richard Weinberger <richard@nod.at> Cc: <x86@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-08Merge patch series "riscv: Add vendor extensions support for SiFive"Palmer Dabbelt3-1/+37
Cyan Yang <cyan.yang@sifive.com> says: This patch set adds four vendor-specific ISA extensions from SiFive: "xsfvqmaccdod", "xsfvqmaccqoq", "xsfvfnrclipxfqf", and "xsfvfwmaccqqq". Additionally, a new hwprobe key, RISCV_HWPROBE_KEY_VENDOR_EXT_SIFIVE_0, has been added to query which SiFive vendor extensions are supported on the current platform. Signed-off-by: Cyan Yang <cyan.yang@sifive.com> Link: https://lore.kernel.org/r/20250418053239.4351-1-cyan.yang@sifive.com * b4-shazam-merge: riscv: hwprobe: Add SiFive xsfvfwmaccqqq vendor extension riscv: hwprobe: Document SiFive xsfvfwmaccqqq vendor extension riscv: Add SiFive xsfvfwmaccqqq vendor extension dt-bindings: riscv: Add xsfvfwmaccqqq ISA extension description riscv: hwprobe: Add SiFive xsfvfnrclipxfqf vendor extension riscv: hwprobe: Document SiFive xsfvfnrclipxfqf vendor extension riscv: Add SiFive xsfvfnrclipxfqf vendor extension dt-bindings: riscv: Add xsfvfnrclipxfqf ISA extension description riscv: hwprobe: Add SiFive vendor extension support and probe for xsfqmaccdod and xsfqmaccqoq riscv: hwprobe: Document SiFive xsfvqmaccdod and xsfvqmaccqoq vendor extensions riscv: Add SiFive xsfvqmaccdod and xsfvqmaccqoq vendor extensions dt-bindings: riscv: Add xsfvqmaccdod and xsfvqmaccqoq ISA extension description Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-05-08riscv: Add SiFive xsfvfwmaccqqq vendor extensionCyan Yang1-0/+1
Add SiFive vendor extension "xsfvfwmaccqqq" support to the kernel. Signed-off-by: Cyan Yang <cyan.yang@sifive.com> Link: https://lore.kernel.org/r/20250418053239.4351-11-cyan.yang@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-05-08riscv: Add SiFive xsfvfnrclipxfqf vendor extensionCyan Yang1-0/+1
Add SiFive vendor extension "xsfvfnrclipxfqf" support to the kernel. Signed-off-by: Cyan Yang <cyan.yang@sifive.com> Link: https://lore.kernel.org/r/20250418053239.4351-7-cyan.yang@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-05-08riscv: hwprobe: Add SiFive vendor extension support and probe for ↵Cyan Yang2-0/+20
xsfqmaccdod and xsfqmaccqoq Add a new hwprobe key "RISCV_HWPROBE_KEY_VENDOR_EXT_SIFIVE_0" which allows userspace to probe for the new vendor extensions from SiFive. Also, add new hwprobe for SiFive "xsfvqmaccdod" and "xsfvqmaccqoq" vendor extensions. Signed-off-by: Cyan Yang <cyan.yang@sifive.com> Link: https://lore.kernel.org/r/20250418053239.4351-5-cyan.yang@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-05-08riscv: hwprobe: Document SiFive xsfvqmaccdod and xsfvqmaccqoq vendor extensionsCyan Yang1-1/+1
Document the support for sifive vendor extensions using the key RISCV_HWPROBE_KEY_VENDOR_EXT_SIFIVE_0 and two vendor extensions for SiFive Int8 Matrix Multiplication Instructions using RISCV_HWPROBE_VENDOR_EXT_XSFVQMACCDOD and RISCV_HWPROBE_VENDOR_EXT_XSFVQMACCQOQ. Signed-off-by: Cyan Yang <cyan.yang@sifive.com> Link: https://lore.kernel.org/r/20250418053239.4351-4-cyan.yang@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-05-08riscv: Add SiFive xsfvqmaccdod and xsfvqmaccqoq vendor extensionsCyan Yang1-0/+14
Add SiFive vendor extension support to the kernel with the target of "xsfvqmaccdod" and "xsfvqmaccqoq". Signed-off-by: Cyan Yang <cyan.yang@sifive.com> Link: https://lore.kernel.org/r/20250418053239.4351-3-cyan.yang@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-05-08Merge patch series "riscv: uaccess: optimisations"Palmer Dabbelt2-53/+166
Cyril Bur <cyrilbur@tenstorrent.com> says: This series tries to optimize riscv uaccess by allowing the use of user_access_begin() and user_access_end() which permits grouping user accesses and avoiding the CSR write penalty for each access. The error path can also be optimised using asm goto which patches 3 and 4 achieve. This will speed up jumping to labels by avoiding the need of an intermediary error type variable within the uaccess macros I did read the discussion this series generated. It isn't clear to me which direction to take the patches, if any. * b4-shazam-merge: riscv: uaccess: use 'asm_goto_output' for get_user() riscv: uaccess: use 'asm goto' for put_user() riscv: uaccess: use input constraints for ptr of __put_user() riscv: implement user_access_begin() and families riscv: save the SR_SUM status over switches Link: https://lore.kernel.org/r/20250410070526.3160847-1-cyrilbur@tenstorrent.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-05-08riscv: uaccess: use 'asm_goto_output' for get_user()Jisheng Zhang1-27/+68
With 'asm goto' we don't need to test the error etc, the exception just jumps to the error handling directly. Unlike put_user(), get_user() must work around GCC bugs [1] when using output clobbers in an asm goto statement. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 # 1 Signed-off-by: Jisheng Zhang <jszhang@kernel.org> [Cyril Bur: Rewritten commit message] Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250410070526.3160847-6-cyrilbur@tenstorrent.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-05-08riscv: uaccess: use 'asm goto' for put_user()Jisheng Zhang1-38/+33
With 'asm goto' we don't need to test the error etc, the exception just jumps to the error handling directly. Because there are no output clobbers which could trigger gcc bugs [1] the use of asm_goto_output() macro is not necessary here. Not using asm_goto_output() is desirable as the generated output asm will be cleaner. Use of the volatile keyword is redundant as per gcc 14.2.0 manual section 6.48.2.7 Goto Labels: > Also note that an asm goto statement is always implicitly considered volatile. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 # 1 Signed-off-by: Jisheng Zhang <jszhang@kernel.org> [Cyril Bur: Rewritten commit message] Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250410070526.3160847-5-cyrilbur@tenstorrent.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-05-08riscv: uaccess: use input constraints for ptr of __put_user()Jisheng Zhang1-9/+9
Putting ptr in the inputs as opposed to output may seem incorrect but this is done for a few reasons: - Not having it in the output permits the use of asm goto in a subsequent patch. There are bugs in gcc [1] which would otherwise prevent it. - Since the output memory is userspace there isn't any real benefit from telling the compiler about the memory clobber. - x86, arm and powerpc all use this technique. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 # 1 Signed-off-by: Jisheng Zhang <jszhang@kernel.org> [Cyril Bur: Rewritten commit message] Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250410070526.3160847-4-cyrilbur@tenstorrent.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-05-08riscv: implement user_access_begin() and familiesJisheng Zhang1-0/+76
Currently, when a function like strncpy_from_user() is called, the userspace access protection is disabled and enabled for every word read. By implementing user_access_begin() and families, the protection is disabled at the beginning of the copy and enabled at the end. The __inttype macro is borrowed from x86 implementation. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250410070526.3160847-3-cyrilbur@tenstorrent.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-05-08riscv: save the SR_SUM status over switchesBen Dooks1-0/+1
When threads/tasks are switched we need to ensure the old execution's SR_SUM state is saved and the new thread has the old SR_SUM state restored. The issue was seen under heavy load especially with the syz-stress tool running, with crashes as follows in schedule_tail: Unable to handle kernel access to user memory without uaccess routines at virtual address 000000002749f0d0 Oops [#1] Modules linked in: CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0 Hardware name: riscv-virtio,qemu (DT) epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264 ra : task_pid_vnr include/linux/sched.h:1421 [inline] ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264 epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0 gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000 t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0 s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003 a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00 a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0 s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850 s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8 s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2 t5 : ffffffc4043cafba t6 : 0000000000040000 status: 0000000000000120 badaddr: 000000002749f0d0 cause: 000000000000000f Call Trace: [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264 [<ffffffe000005570>] ret_from_exception+0x0/0x14 Dumping ftrace buffer: (ftrace buffer empty) ---[ end trace b5f8f9231dc87dda ]--- The issue comes from the put_user() in schedule_tail (kernel/sched/core.c) doing the following: asmlinkage __visible void schedule_tail(struct task_struct *prev) { ... if (current->set_child_tid) put_user(task_pid_vnr(current), current->set_child_tid); ... } the put_user() macro causes the code sequence to come out as follows: 1: __enable_user_access() 2: reg = task_pid_vnr(current); 3: *current->set_child_tid = reg; 4: __disable_user_access() The problem is that we may have a sleeping function as argument which could clear SR_SUM causing the panic above. This was fixed by evaluating the argument of the put_user() macro outside the user-enabled section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before enabling user access")" In order for riscv to take advantage of unsafe_get/put_XXX() macros and to avoid the same issue we had with put_user() and sleeping functions we must ensure code flow can go through switch_to() from within a region of code with SR_SUM enabled and come back with SR_SUM still enabled. This patch addresses the problem allowing future work to enable full use of unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost on every access. Make switch_to() save and restore SR_SUM. Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Deepak Gupta <debug@rivosinc.com> Link: https://lore.kernel.org/r/20250410070526.3160847-2-cyrilbur@tenstorrent.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-04-29riscv: entry: Split ret_from_fork() into user and kernelCharlie Jenkins1-1/+2
This function was unified into a single function in commit ab9164dae273 ("riscv: entry: Consolidate ret_from_kernel_thread into ret_from_fork"). However that imposed a performance degradation. Partially reverting this commit to have ret_from_fork() split again, results in a 1% increase on the number of times fork is able to be called per second. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/all/20250320-riscv_optimize_entry-v6-2-63e187e26041@rivosinc.com
2025-04-29riscv: entry: Convert ret_from_fork() to CCharlie Jenkins1-0/+1
Move the main section of ret_from_fork() to C to allow inlining of syscall_exit_to_user_mode(). Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/all/20250320-riscv_optimize_entry-v6-1-63e187e26041@rivosinc.com
2025-04-24riscv: Replace function-like macro by static inline functionBjörn Töpel1-5/+10
The flush_icache_range() function is implemented as a "function-like macro with unused parameters", which can result in "unused variables" warnings. Replace the macro with a static inline function, as advised by Documentation/process/coding-style.rst. Fixes: 08f051eda33b ("RISC-V: Flush I$ when making a dirty page executable") Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20250419111402.1660267-1-bjorn@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-04-16Merge tag 'riscv-fixes-6.15-rc3' of ↵Palmer Dabbelt1-12/+7
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/alexghiti/linux into fixes riscv fixes for 6.15-rc3 - A couple of fixes regarding module relocations - Fix a build error by implementing missing alternative macros - Another fix for kexec by fixing /proc/iomem * tag 'riscv-fixes-6.15-rc3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/alexghiti/linux: riscv: Avoid fortify warning in syscall_get_arguments() riscv: Provide all alternative macros all the time riscv: module: Allocate PLT entries for R_RISCV_PLT32 riscv: module: Fix out-of-bounds relocation access riscv: Properly export reserved regions in /proc/iomem riscv: Fix unaligned access info messages
2025-04-16Merge patch series "riscv: Rework the arch_kgdb_breakpoint() implementation"Palmer Dabbelt1-8/+1
WangYuli <wangyuli@uniontech.com> says: 1. The arch_kgdb_breakpoint() function defines the kgdb_compiled_break symbol using inline assembly. There's a potential issue where the compiler might inline arch_kgdb_breakpoint(), which would then define the kgdb_compiled_break symbol multiple times, leading to fail to link vmlinux.o. This isn't merely a potential compilation problem. The intent here is to determine the global symbol address of kgdb_compiled_break, and if this function is inlined multiple times, it would logically be a grave error. 2. Remove ".option norvc/.option rvc" to fix a bug that the C extension would unconditionally enable even if the kernel is being built with CONFIG_RISCV_ISA_C=n. * b4-shazam-merge: riscv: KGDB: Remove ".option norvc/.option rvc" for kgdb_compiled_break riscv: KGDB: Do not inline arch_kgdb_breakpoint() Link: https://lore.kernel.org/r/D5A83DF3A06E1DF9+20250411072905.55134-1-wangyuli@uniontech.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-04-16riscv: KGDB: Do not inline arch_kgdb_breakpoint()WangYuli1-8/+1
The arch_kgdb_breakpoint() function defines the kgdb_compiled_break symbol using inline assembly. There's a potential issue where the compiler might inline arch_kgdb_breakpoint(), which would then define the kgdb_compiled_break symbol multiple times, leading to fail to link vmlinux.o. This isn't merely a potential compilation problem. The intent here is to determine the global symbol address of kgdb_compiled_break, and if this function is inlined multiple times, it would logically be a grave error. Link: https://lore.kernel.org/all/4b4187c1-77e5-44b7-885f-d6826723dd9a@sifive.com/ Link: https://lore.kernel.org/all/5b0adf9b-2b22-43fe-ab74-68df94115b9a@ghiti.fr/ Link: https://lore.kernel.org/all/23693e7f-4fff-40f3-a437-e06d827278a5@ghiti.fr/ Fixes: fe89bd2be866 ("riscv: Add KGDB support") Co-developed-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: WangYuli <wangyuli@uniontech.com> Link: https://lore.kernel.org/r/F22359AFB6FF9FD8+20250411073222.56820-1-wangyuli@uniontech.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-04-16RISC-V: vDSO: Wire up getrandom() vDSO implementationXi Ruoyao1-0/+30
Hook up the generic vDSO implementation to the generic vDSO getrandom implementation by providing the required __arch_chacha20_blocks_nostack and getrandom_syscall implementations. Also wire up the selftests. The benchmark result: vdso: 25000000 times in 2.466341333 seconds libc: 25000000 times in 41.447720005 seconds syscall: 25000000 times in 41.043926672 seconds vdso: 25000000 x 256 times in 162.286219353 seconds libc: 25000000 x 256 times in 2953.855018685 seconds syscall: 25000000 x 256 times in 2796.268546000 seconds Signed-off-by: Xi Ruoyao <xry111@xry111.site> Link: https://lore.kernel.org/r/20250411024600.16045-1-xry111@xry111.site Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-04-16riscv: Avoid fortify warning in syscall_get_arguments()Nathan Chancellor1-2/+5
When building with CONFIG_FORTIFY_SOURCE=y and W=1, there is a warning because of the memcpy() in syscall_get_arguments(): In file included from include/linux/string.h:392, from include/linux/bitmap.h:13, from include/linux/cpumask.h:12, from arch/riscv/include/asm/processor.h:55, from include/linux/sched.h:13, from kernel/ptrace.c:13: In function 'fortify_memcpy_chk', inlined from 'syscall_get_arguments.isra' at arch/riscv/include/asm/syscall.h:66:2: include/linux/fortify-string.h:580:25: error: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning] 580 | __read_overflow2_field(q_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors The fortified memcpy() routine enforces that the source is not overread and the destination is not overwritten if the size of either field and the size of the copy are known at compile time. The memcpy() in syscall_get_arguments() intentionally overreads from a1 to a5 in 'struct pt_regs' but this is bigger than the size of a1. Normally, this could be solved by wrapping a1 through a5 with struct_group() but there was already a struct_group() applied to these members in commit bba547810c66 ("riscv: tracing: Fix __write_overflow_field in ftrace_partial_regs()"). Just avoid memcpy() altogether and write the copying of args from regs manually, which clears up the warning at the expense of three extra lines of code. Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Dmitry V. Levin <ldv@strace.io> Link: https://lore.kernel.org/r/20250409-riscv-avoid-fortify-warning-syscall_get_arguments-v1-1-7853436d4755@kernel.org Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
2025-04-14riscv: Provide all alternative macros all the timeAndrew Jones1-12/+7
We need to provide all six forms of the alternative macros (ALTERNATIVE, ALTERNATIVE_2, _ALTERNATIVE_CFG, _ALTERNATIVE_CFG_2, __ALTERNATIVE_CFG, __ALTERNATIVE_CFG_2) for all four cases derived from the two ifdefs (RISCV_ALTERNATIVE, __ASSEMBLY__) in order to ensure all configs can compile. Define this missing ones and ensure all are defined to consume all parameters passed. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202504130710.3IKz6Ibs-lkp@intel.com/ Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250414120947.135173-2-ajones@ventanamicro.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>