starfive-tech/linux.git - StarFive Tech Linux Kernel for VisionFive (JH7110) boards (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2024-01-11	Merge patch series "riscv: mm: Fixup & Optimize COMPAT code"	Palmer Dabbelt	2	-2/+2
	guoren@kernel.org <guoren@kernel.org> says: From: Guo Ren <guoren@linux.alibaba.com> When the task is in COMPAT mode, the TASK_SIZE should be 2GB, so STACK_TOP_MAX and arch_get_mmap_end must be limited to 2 GB. This series fixes the problem made by commit: add2cc6b6515 ("RISC-V: mm: Restrict address space for sv39,sv48,sv57") and optimizes the related coding convention of TASK_SIZE. * b4-shazam-merge: riscv: mm: Fixup compat arch_get_mmap_end riscv: mm: Fixup compat mode boot failure Link: https://lore.kernel.org/r/20231222115703.2404036-1-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-11	riscv: mm: Fixup compat arch_get_mmap_end	Guo Ren	1	-1/+1
	When the task is in COMPAT mode, the arch_get_mmap_end should be 2GB, not TASK_SIZE_64. The TASK_SIZE has contained is_compat_mode() detection, so change the definition of STACK_TOP_MAX to TASK_SIZE directly. Cc: stable@vger.kernel.org Fixes: add2cc6b6515 ("RISC-V: mm: Restrict address space for sv39,sv48,sv57") Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Reviewed-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20231222115703.2404036-3-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-11	riscv: mm: Fixup compat mode boot failure	Guo Ren	1	-1/+1
	In COMPAT mode, the STACK_TOP is DEFAULT_MAP_WINDOW (0x80000000), but the TASK_SIZE is 0x7fff000. When the user stack is upon 0x7fff000, it will cause a user segment fault. Sometimes, it would cause boot failure when the whole rootfs is rv32. Freeing unused kernel image (initmem) memory: 2236K Run /sbin/init as init process Starting init: /sbin/init exists but couldn't execute it (error -14) Run /etc/init as init process ... Increase the TASK_SIZE to cover STACK_TOP. Cc: stable@vger.kernel.org Fixes: add2cc6b6515 ("RISC-V: mm: Restrict address space for sv39,sv48,sv57") Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Reviewed-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20231222115703.2404036-2-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-11	riscv: Add support for BATCHED_UNMAP_TLB_FLUSH	Alexandre Ghiti	2	-0/+23
	Allow to defer the flushing of the TLB when unmapping pages, which allows to reduce the numbers of IPI and the number of sfence.vma. The ubenchmarch used in commit 43b3dfdd0455 ("arm64: support batched/deferred tlb shootdown during page reclamation/migration") that was multithreaded to force the usage of IPI shows good performance improvement on all platforms: * Unmatched: ~34% * TH1520 : ~78% * Qemu : ~81% In addition, perf on qemu reports an important decrease in time spent dealing with IPIs: Before: 68.17% main [kernel.kallsyms] [k] __sbi_rfence_v02_call After : 8.64% main [kernel.kallsyms] [k] __sbi_rfence_v02_call * Benchmark: int stick_this_thread_to_core(int core_id) { int num_cores = sysconf(_SC_NPROCESSORS_ONLN); if (core_id < 0 \|\| core_id >= num_cores) return EINVAL; cpu_set_t cpuset; CPU_ZERO(&cpuset); CPU_SET(core_id, &cpuset); pthread_t current_thread = pthread_self(); return pthread_setaffinity_np(current_thread, sizeof(cpu_set_t), &cpuset); } static void fn_thread (void p_data) { int ret; pthread_t thread; stick_this_thread_to_core((int)p_data); while (1) { sleep(1); } return NULL; } int main() { volatile unsigned char p = mmap(NULL, SIZE, PROT_READ \| PROT_WRITE, MAP_SHARED \| MAP_ANONYMOUS, -1, 0); pthread_t threads[4]; int ret; for (int i = 0; i < 4; ++i) { ret = pthread_create(&threads[i], NULL, fn_thread, (void )i); if (ret) { printf("%s", strerror (ret)); } } memset(p, 0x88, SIZE); for (int k = 0; k < 10000; k++) { /* swap in / for (int i = 0; i < SIZE; i += 4096) { (void)p[i]; } / swap out */ madvise(p, SIZE, MADV_PAGEOUT); } for (int i = 0; i < 4; i++) { pthread_cancel(threads[i]); } for (int i = 0; i < 4; i++) { pthread_join(threads[i], NULL); } return 0; } Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Jisheng Zhang <jszhang@kernel.org> Tested-by: Jisheng Zhang <jszhang@kernel.org> # Tested on TH1520 Tested-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20240108193640.344929-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-11	Merge patch series "riscv: errata: thead: use riscv_nonstd_cache_ops for CMO"	Palmer Dabbelt	1	-44/+6
	Jisheng Zhang <jszhang@kernel.org> says: Previously, we use alternative mechanism to dynamically patch the CMO operations for THEAD C906/C910 during boot for performance reason. But as pointed out by Arnd, "there is already a significant cost in accessing the invalidated cache lines afterwards, which is likely going to be much higher than the cost of an indirect branch". And indeed, there's no performance difference with GMAC and EMMC per my test on Sipeed Lichee Pi 4A board. Use riscv_nonstd_cache_ops for THEAD C906/C910 CMO to simplify the alternative code, and to acchieve Arnd's goal -- "I think moving the THEAD ops at the same level as all nonstandard operations makes sense, but I'd still leave CMO as an explicit fast path that avoids the indirect branch. This seems like the right thing to do both for readability and for platforms on which the indirect branch has a noticeable overhead." To make bisect easy, I use two patches here: patch1 does the conversion which just mimics current CMO behavior via. riscv_nonstd_cache_ops, I assume no functionalities changes. patch2 uses T-HEAD PA based CMO instructions so that we don't need to covert PA to VA. * b4-shazam-merge: riscv: errata: thead: use pa based instructions for CMO riscv: errata: thead: use riscv_nonstd_cache_ops for CMO Link: https://lore.kernel.org/r/20231114143338.2406-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-11	Merge patch series "RISC-V SBI debug console extension support"	Palmer Dabbelt	1	-0/+10
	Anup Patel <apatel@ventanamicro.com> says: The SBI v2.0 specification is now frozen. The SBI v2.0 specification defines SBI debug console (DBCN) extension which replaces the legacy SBI v0.1 functions sbi_console_putchar() and sbi_console_getchar(). (Refer v2.0-rc5 at https://github.com/riscv-non-isa/riscv-sbi-doc/releases) This series adds support for SBI debug console (DBCN) extension in Linux RISC-V. To try these patches with KVM RISC-V, use KVMTOOL from the riscv_zbx_zicntr_smstateen_condops_v1 branch at: https://github.com/avpatel/kvmtool.git * b4-shazam-merge: RISC-V: Enable SBI based earlycon support tty: Add SBI debug console support to HVC SBI driver tty/serial: Add RISC-V SBI debug console based earlycon RISC-V: Add SBI debug console helper routines RISC-V: Add stubs for sbi_console_putchar/getchar() Link: https://lore.kernel.org/r/20231124070905.1043092-1-apatel@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-11	riscv: sbi: Introduce system suspend support	Andrew Jones	1	-0/+9
	When the SUSP SBI extension is present it implies that the standard "suspend to RAM" type is available. Wire it up to the generic platform suspend support, also applying the already present support for non-retentive CPU suspend. When the kernel is built with CONFIG_SUSPEND, one can do 'echo mem > /sys/power/state' to suspend. Resumption will occur when a platform-specific wake-up event arrives. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Tested-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20231206110807.35882-4-ajones@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-11	Merge patch series "riscv: enable EFFICIENT_UNALIGNED_ACCESS and ↵	Palmer Dabbelt	2	-0/+42
	DCACHE_WORD_ACCESS" Jisheng Zhang <jszhang@kernel.org> says: Some riscv implementations such as T-HEAD's C906, C908, C910 and C920 support efficient unaligned access, for performance reason we want to enable HAVE_EFFICIENT_UNALIGNED_ACCESS on these platforms. To avoid performance regressions on non efficient unaligned access platforms, HAVE_EFFICIENT_UNALIGNED_ACCESS can't be globally selected. To solve this problem, runtime code patching based on the detected speed is a good solution. But that's not easy, it involves lots of work to modify vairous subsystems such as net, mm, lib and so on. This can be done step by step. So let's take an easier solution: add support to efficient unaligned access and hide the support under NONPORTABLE. patch1 introduces RISCV_EFFICIENT_UNALIGNED_ACCESS which depends on NONPORTABLE, if users know during config time that the kernel will be only run on those efficient unaligned access hw platforms, they can enable it. Obviously, generic unified kernel Image shouldn't enable it. patch2 adds support DCACHE_WORD_ACCESS when MMU and RISCV_EFFICIENT_UNALIGNED_ACCESS. Below test program and step shows how much performance can be improved: $ cat tt.c #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> #define ITERATIONS 1000000 #define PATH "123456781234567812345678123456781" int main(void) { unsigned long i; struct stat buf; for (i = 0; i < ITERATIONS; i++) stat(PATH, &buf); return 0; } $ gcc -O2 tt.c $ touch 123456781234567812345678123456781 $ time ./a.out Per my test on T-HEAD C910 platforms, the above test performance is improved by about 7.5%. * b4-shazam-merge: riscv: select DCACHE_WORD_ACCESS for efficient unaligned access HW riscv: introduce RISCV_EFFICIENT_UNALIGNED_ACCESS Link: https://lore.kernel.org/r/20231225044207.3821-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-11	Merge tag 'asm-generic-6.8' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic cleanups from Arnd Bergmann: "A series from Baoquan He cleans up the asm-generic/io.h to remove the ioremap_uc() definition from everything except x86, which still needs it for pre-PAT systems. This series notably contains a patch from Jiaxun Yang that converts MIPS to use asm-generic/io.h like every other architecture does, enabling future cleanups. Some of my own patches fix -Wmissing-prototype warnings in architecture specific code across several architectures. This is now needed as the warning is enabled by default. There are still some remaining warnings in minor platforms, but the series should catch most of the widely used ones make them more consistent with one another. David McKay fixes a bug in __generic_cmpxchg_local() when this is used on 64-bit architectures. This could currently only affect parisc64 and sparc64. Additional cleanups address from Linus Walleij, Uwe Kleine-König, Thomas Huth, and Kefeng Wang help reduce unnecessary inconsistencies between architectures" * tag 'asm-generic-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: asm-generic: Fix 32 bit __generic_cmpxchg_local Hexagon: Make pfn accessors statics inlines ARC: mm: Make virt_to_pfn() a static inline mips: remove extraneous asm-generic/iomap.h include sparc: Use $(kecho) to announce kernel images being ready arm64: vdso32: Define BUILD_VDSO32_64 to correct prototypes csky: fix arch_jump_label_transform_static override arch: add do_page_fault prototypes arch: add missing prepare_ftrace_return() prototypes arch: vdso: consolidate gettime prototypes arch: include linux/cpu.h for trap_init() prototype arch: fix asm-offsets.c building with -Wmissing-prototypes arch: consolidate arch_irq_work_raise prototypes hexagon: Remove CONFIG_HEXAGON_ARCH_VERSION from uapi header asm/io: remove unnecessary xlate_dev_mem_ptr() and unxlate_dev_mem_ptr() mips: io: remove duplicated codes arch/*/io.h: remove ioremap_uc in some architectures mips: add <asm-generic/io.h> including
2024-01-10	riscv: errata: thead: use riscv_nonstd_cache_ops for CMO	Jisheng Zhang	1	-44/+6
	Previously, we use alternative mechanism to dynamically patch the CMO operations for THEAD C906/C910 during boot for performance reason. But as pointed out by Arnd, "there is already a significant cost in accessing the invalidated cache lines afterwards, which is likely going to be much higher than the cost of an indirect branch". And indeed, there's no performance difference with GMAC and EMMC per my test on Sipeed Lichee Pi 4A board. Use riscv_nonstd_cache_ops for THEAD C906/C910 CMO to simplify the alternative code, and to acchieve Arnd's goal -- "I think moving the THEAD ops at the same level as all nonstandard operations makes sense, but I'd still leave CMO as an explicit fast path that avoids the indirect branch. This seems like the right thing to do both for readability and for platforms on which the indirect branch has a noticeable overhead." Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20231114143338.2406-2-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-10	RISC-V: Add SBI debug console helper routines	Anup Patel	1	-0/+5
	Let us provide SBI debug console helper routines which can be shared by serial/earlycon-riscv-sbi.c and hvc/hvc_riscv_sbi.c. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20231124070905.1043092-3-apatel@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-10	RISC-V: Add stubs for sbi_console_putchar/getchar()	Anup Patel	1	-0/+5
	The functions sbi_console_putchar() and sbi_console_getchar() are not defined when CONFIG_RISCV_SBI_V01 is disabled so let us add stub of these functions to avoid "#ifdef" on user side. Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20231124070905.1043092-2-apatel@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-10	riscv: select DCACHE_WORD_ACCESS for efficient unaligned access HW	Jisheng Zhang	2	-0/+42
	DCACHE_WORD_ACCESS uses the word-at-a-time API for optimised string comparisons in the vfs layer. This patch implements support for load_unaligned_zeropad in much the same way as has been done for arm64. Here is the test program and step: $ cat tt.c #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> #define ITERATIONS 1000000 #define PATH "123456781234567812345678123456781" int main(void) { unsigned long i; struct stat buf; for (i = 0; i < ITERATIONS; i++) stat(PATH, &buf); return 0; } $ gcc -O2 tt.c $ touch 123456781234567812345678123456781 $ time ./a.out Per my test on T-HEAD C910 platforms, the above test performance is improved by about 7.5%. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Link: https://lore.kernel.org/r/20231225044207.3821-3-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-10	Merge patch series "riscv: hwprobe: add Zicond, Zacas and Ztso support"	Palmer Dabbelt	2	-0/+5
	Clément Léger <cleger@rivosinc.com> says: This series add support for a few more extensions that are present in the RVA22U64/RVA23U64 (either mandatory or optional) and that are useful for userspace: - Zicond - Zacas - Ztso Series currently based on riscv/for-next. * b4-shazam-lts: riscv: hwprobe: export Zicond extension riscv: hwprobe: export Zacas ISA extension riscv: add ISA extension parsing for Zacas dt-bindings: riscv: add Zacas ISA extension description riscv: hwprobe: export Ztso ISA extension riscv: add ISA extension parsing for Ztso Link: https://lore.kernel.org/r/20231220155723.684081-1-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-10	riscv: hwprobe: export Zicond extension	Clément Léger	1	-0/+1
	Export the zicond extension to userspace using hwprobe. Signed-off-by: Clément Léger <cleger@rivosinc.com> Link: https://lore.kernel.org/r/20231220155723.684081-7-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-10	riscv: hwprobe: export Zacas ISA extension	Clément Léger	1	-0/+1
	Export Zacas ISA extension through hwprobe. Signed-off-by: Clément Léger <cleger@rivosinc.com> Link: https://lore.kernel.org/r/20231220155723.684081-6-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-10	riscv: add ISA extension parsing for Zacas	Clément Léger	1	-0/+1
	Add parsing for Zacas ISA extension which was ratified recently in the riscv-zacas manual. Signed-off-by: Clément Léger <cleger@rivosinc.com> Link: https://lore.kernel.org/r/20231220155723.684081-5-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-10	riscv: hwprobe: export Ztso ISA extension	Clément Léger	1	-0/+1
	Export the Ztso extension to userspace. Signed-off-by: Clément Léger <cleger@rivosinc.com> Link: https://lore.kernel.org/r/20231220155723.684081-3-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-10	riscv: add ISA extension parsing for Ztso	Clément Léger	1	-0/+1
	Add support to parse the Ztso string in the riscv,isa string. The bindings already supports it but not the ISA parsing code. Signed-off-by: Clément Léger <cleger@rivosinc.com> Link: https://lore.kernel.org/r/20231220155723.684081-2-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-10	Merge patch series "Fix XIP boot and make XIP testable in QEMU"	Palmer Dabbelt	1	-1/+1
	Frederik Haxel <haxel@fzi.de> says: XIP boot seems to be broken for some time now. A likely reason why no one seems to have noticed this is that XIP is more difficult to test, as it is currently not easily testable with QEMU. These patches fix the XIP boot and allow an XIP build without BUILTIN_DTB, which in turn makes it easier to test an image with the QEMU virt machine. * b4-shazam-merge: riscv: Allow disabling of BUILTIN_DTB for XIP riscv: Fixed wrong register in XIP_FIXUP_FLASH_OFFSET macro riscv: Make XIP bootable again Link: https://lore.kernel.org/r/20231212130116.848530-1-haxel@fzi.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-10	riscv: Remove SHADOW_OVERFLOW_STACK_SIZE macro	Song Shuai	1	-1/+0
	The commit be97d0db5f44 ("riscv: VMAP_STACK overflow detection thread-safe") got rid of `shadow_stack`, so SHADOW_OVERFLOW_STACK_SIZE should be removed too. Fixes: be97d0db5f44 ("riscv: VMAP_STACK overflow detection thread-safe") Signed-off-by: Song Shuai <songshuaishuai@tinylab.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20231211110331.359534-1-songshuaishuai@tinylab.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-10	Merge remote-tracking branch 'palmer/fixes' into for-next	Palmer Dabbelt	1	-0/+1
	I don't usually merge these in, but I missed sending a PR due to the holidays. * palmer/fixes: riscv: Fix set_direct_map_default_noflush() to reset _PAGE_EXEC riscv: Fix module_alloc() that did not reset the linear mapping permissions riscv: Fix wrong usage of lm_alias() when splitting a huge linear mapping riscv: Check if the code to patch lies in the exit section riscv: errata: andes: Probe for IOCP only once in boot stage riscv: Fix SMP when shadow call stacks are enabled dt-bindings: perf: riscv,pmu: drop unneeded quotes riscv: fix misaligned access handling of C.SWSP and C.SDSP RISC-V: hwprobe: Always use u64 for extension bits Support rv32 ULEB128 test riscv: Correct type casting in module loading riscv: Safely remove entries from relocation list Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-10	Merge patch series "riscv: CPU operations cleanup"	Palmer Dabbelt	1	-12/+2
	Samuel Holland <samuel.holland@sifive.com> says: This series cleans up some duplicated and dead code around the RISC-V CPU operations, that was copied from arm64 but is not needed here. The result is a bit of memory savings and removal of a few SBI calls during boot, with no functional change. * b4-shazam-merge: riscv: Use the same CPU operations for all CPUs riscv: Remove unused members from struct cpu_operations riscv: Deduplicate code in setup_smp() Link: https://lore.kernel.org/r/20231121234736.3489608-1-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-10	Merge patch series "RISC-V: hwprobe: Introduce which-cpus"	Palmer Dabbelt	2	-0/+27
	Andrew Jones <ajones@ventanamicro.com> says: This series introduces a flag for the hwprobe syscall which effectively reverses its behavior from getting the values of keys for a set of cpus to getting the cpus for a set of key-value pairs. * b4-shazam-merge: RISC-V: selftests: Add which-cpus hwprobe test RISC-V: hwprobe: Introduce which-cpus flag RISC-V: Move the hwprobe syscall to its own file RISC-V: hwprobe: Clarify cpus size parameter Link: https://lore.kernel.org/r/20231122164700.127954-6-ajones@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-10	riscv: Fixed wrong register in XIP_FIXUP_FLASH_OFFSET macro	Frederik Haxel	1	-1/+1
	During the refactoring, a bug was introduced in the rarly used XIP_FIXUP_FLASH_OFFSET macro. Fixes: bee7fbc38579 ("RISC-V CPU Idle Support") Fixes: e7681beba992 ("RISC-V: Split out the XIP fixups into their own file") Signed-off-by: Frederik Haxel <haxel@fzi.de> Link: https://lore.kernel.org/r/20231212130116.848530-3-haxel@fzi.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-09	Merge tag 'mm-stable-2024-01-08-15-31' of ↵	Linus Torvalds	1	-0/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Peng Zhang has done some mapletree maintainance work in the series 'maple_tree: add mt_free_one() and mt_attr() helpers' 'Some cleanups of maple tree' - In the series 'mm: use memmap_on_memory semantics for dax/kmem' Vishal Verma has altered the interworking between memory-hotplug and dax/kmem so that newly added 'device memory' can more easily have its memmap placed within that newly added memory. - Matthew Wilcox continues folio-related work (including a few fixes) in the patch series 'Add folio_zero_tail() and folio_fill_tail()' 'Make folio_start_writeback return void' 'Fix fault handler's handling of poisoned tail pages' 'Convert aops->error_remove_page to ->error_remove_folio' 'Finish two folio conversions' 'More swap folio conversions' - Kefeng Wang has also contributed folio-related work in the series 'mm: cleanup and use more folio in page fault' - Jim Cromie has improved the kmemleak reporting output in the series 'tweak kmemleak report format'. - In the series 'stackdepot: allow evicting stack traces' Andrey Konovalov to permits clients (in this case KASAN) to cause eviction of no longer needed stack traces. - Charan Teja Kalla has fixed some accounting issues in the page allocator's atomic reserve calculations in the series 'mm: page_alloc: fixes for high atomic reserve caluculations'. - Dmitry Rokosov has added to the samples/ dorectory some sample code for a userspace memcg event listener application. See the series 'samples: introduce cgroup events listeners'. - Some mapletree maintanance work from Liam Howlett in the series 'maple_tree: iterator state changes'. - Nhat Pham has improved zswap's approach to writeback in the series 'workload-specific and memory pressure-driven zswap writeback'. - DAMON/DAMOS feature and maintenance work from SeongJae Park in the series 'mm/damon: let users feed and tame/auto-tune DAMOS' 'selftests/damon: add Python-written DAMON functionality tests' 'mm/damon: misc updates for 6.8' - Yosry Ahmed has improved memcg's stats flushing in the series 'mm: memcg: subtree stats flushing and thresholds'. - In the series 'Multi-size THP for anonymous memory' Ryan Roberts has added a runtime opt-in feature to transparent hugepages which improves performance by allocating larger chunks of memory during anonymous page faults. - Matthew Wilcox has also contributed some cleanup and maintenance work against eh buffer_head code int he series 'More buffer_head cleanups'. - Suren Baghdasaryan has done work on Andrea Arcangeli's series 'userfaultfd move option'. UFFDIO_MOVE permits userspace heap compaction algorithms to move userspace's pages around rather than UFFDIO_COPY'a alloc/copy/free. - Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm: Add ksm advisor'. This is a governor which tunes KSM's scanning aggressiveness in response to userspace's current needs. - Chengming Zhou has optimized zswap's temporary working memory use in the series 'mm/zswap: dstmem reuse optimizations and cleanups'. - Matthew Wilcox has performed some maintenance work on the writeback code, both code and within filesystems. The series is 'Clean up the writeback paths'. - Andrey Konovalov has optimized KASAN's handling of alloc and free stack traces for secondary-level allocators, in the series 'kasan: save mempool stack traces'. - Andrey also performed some KASAN maintenance work in the series 'kasan: assorted clean-ups'. - David Hildenbrand has gone to town on the rmap code. Cleanups, more pte batching, folio conversions and more. See the series 'mm/rmap: interface overhaul'. - Kinsey Ho has contributed some maintenance work on the MGLRU code in the series 'mm/mglru: Kconfig cleanup'. - Matthew Wilcox has contributed lruvec page accounting code cleanups in the series 'Remove some lruvec page accounting functions'" * tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits) mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER mm, treewide: introduce NR_PAGE_ORDERS selftests/mm: add separate UFFDIO_MOVE test for PMD splitting selftests/mm: skip test if application doesn't has root privileges selftests/mm: conform test to TAP format output selftests: mm: hugepage-mmap: conform to TAP format output selftests/mm: gup_test: conform test to TAP format output mm/selftests: hugepage-mremap: conform test to TAP format output mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large mm/memcontrol: remove __mod_lruvec_page_state() mm/khugepaged: use a folio more in collapse_file() slub: use a folio in __kmalloc_large_node slub: use folio APIs in free_large_kmalloc() slub: use alloc_pages_node() in alloc_slab_page() mm: remove inc/dec lruvec page state functions mm: ratelimit stat flush from workingset shrinker kasan: stop leaking stack trace handles mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE mm/mglru: add dummy pmd_dirty() ...
2024-01-09	riscv: Check if the code to patch lies in the exit section	Alexandre Ghiti	1	-0/+1
	Otherwise we fall through to vmalloc_to_page() which panics since the address does not lie in the vmalloc region. Fixes: 043cb41a85de ("riscv: introduce interfaces to patch kernel code") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20231214091926.203439-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-08	Merge branch 'sched/urgent' into sched/core, to pick up pending v6.7 fixes ↵	Ingo Molnar	1	-5/+0
	for the v6.8 merge window This fix didn't make it upstream in time, pick it up for the v6.8 merge window. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-01-05	mm/mglru: add dummy pmd_dirty()	Kinsey Ho	1	-0/+1
	Add dummy pmd_dirty() for architectures that don't provide it. This is similar to commit 6617da8fb565 ("mm: add dummy pmd_young() for architectures not having it"). Link: https://lkml.kernel.org/r/20231227141205.2200125-5-kinseyho@google.com Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202312210606.1Etqz3M4-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202312210042.xQEiqlEh-lkp@intel.com/ Signed-off-by: Kinsey Ho <kinseyho@google.com> Suggested-by: Yu Zhao <yuzhao@google.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Donet Tom <donettom@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-05	riscv: Use the same CPU operations for all CPUs	Samuel Holland	1	-2/+2
	RISC-V provides no binding (ACPI or DT) to describe per-cpu start/stop operations, so cpu_set_ops() will always detect the same operations for every CPU. Replace the cpu_ops array with a single pointer to save space and reduce boot time. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20231121234736.3489608-4-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-05	riscv: Remove unused members from struct cpu_operations	Samuel Holland	1	-10/+0
	name is not used anywhere at all. cpu_prepare and cpu_disable do nothing and always return 0 if implemented. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20231121234736.3489608-3-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-03	RISC-V: hwprobe: Introduce which-cpus flag	Andrew Jones	2	-0/+27
	Introduce the first flag for the hwprobe syscall. The flag basically reverses its behavior, i.e. instead of populating the values of keys for a given set of cpus, the set of cpus after the call is the result of finding a set which supports the values of the keys. In order to do this, we implement a pair compare function which takes the type of value (a single value vs. a bitmask of booleans) into consideration. We also implement vdso support for the new flag. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Evan Green <evan@rivosinc.com> Link: https://lore.kernel.org/r/20231122164700.127954-9-ajones@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-03	RISC-V: Remove the removed single-letter extensions	Palmer Dabbelt	1	-6/+0
	There were a few single-letter extensions that we had references to floating around in the kernel, but that never ended up as actual ISA specs and have mostly been replaced by multi-letter extensions. This removes the references to those extensions. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20231110175903.2631-1-palmer@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-02	Merge tag 'kvm-riscv-6.8-1' of https://github.com/kvm-riscv/linux into HEAD	Paolo Bonzini	7	-9/+85
	KVM/riscv changes for 6.8 part #1 - KVM_GET_REG_LIST improvement for vector registers - Generate ISA extension reg_list using macros in get-reg-list selftest - Steal time account support along with selftest
2024-01-02	Merge tag 'loongarch-kvm-6.8' of ↵	Paolo Bonzini	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD LoongArch KVM changes for v6.8 1. Optimization for memslot hugepage checking. 2. Cleanup and fix some HW/SW timer issues. 3. Add LSX/LASX (128bit/256bit SIMD) support.
2023-12-30	RISC-V: KVM: Add support for SBI STA registers	Andrew Jones	2	-0/+14
	KVM userspace needs to be able to save and restore the steal-time shared memory address. Provide the address through the get/set-one-reg interface with two ulong-sized SBI STA extension registers (lo and hi). 64-bit KVM userspace must not set the hi register to anything other than zero and is allowed to completely neglect saving/restoring it. Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30	RISC-V: KVM: Add support for SBI extension registers	Andrew Jones	2	-0/+7
	Some SBI extensions have state that needs to be saved / restored when migrating the VM. Provide a get/set-one-reg register type for SBI extension registers. Each SBI extension that uses this type will have its own subtype. There are currently no subtypes defined. The next patch introduces the first one. Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30	RISC-V: KVM: Add SBI STA info to vcpu_arch	Andrew Jones	1	-0/+7
	KVM's implementation of SBI STA needs to track the address of each VCPU's steal-time shared memory region as well as the amount of stolen time. Add a structure to vcpu_arch to contain this state and make sure that the address is always set to INVALID_GPA on vcpu reset. And, of course, ensure KVM won't try to update steal- time when the shared memory address is invalid. Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30	RISC-V: KVM: Add steal-update vcpu request	Andrew Jones	1	-0/+3
	Add a new vcpu request to inform a vcpu that it should record its steal-time information. The request is made each time it has been detected that the vcpu task was not assigned a cpu for some time, which is easy to do by making the request from vcpu-load. The record function is just a stub for now and will be filled in with the rest of the steal-time support functions in following patches. Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30	RISC-V: KVM: Add SBI STA extension skeleton	Andrew Jones	2	-0/+2
	Add the files and functions needed to support the SBI STA (steal-time accounting) extension. In the next patches we'll complete the functions to fully enable SBI STA support. Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30	RISC-V: Add SBI STA extension definitions	Andrew Jones	1	-0/+17
	The SBI STA extension enables steal-time accounting. Add the definitions it specifies. Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30	RISC-V: paravirt: Add skeleton for pv-time support	Andrew Jones	2	-0/+29
	Add the files and functions needed to support paravirt time on RISC-V. Also include the common code needed for the first application of pv-time, which is steal-time. In the next patches we'll complete the functions to fully enable steal-time support. Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-29	RISC-V: KVM: Make SBI uapi consistent with ISA uapi	Andrew Jones	1	-4/+6
	When an SBI extension cannot be enabled, that's a distinct state vs. enabled and disabled. Modify enum kvm_riscv_sbi_ext_status to accommodate it, which allows KVM userspace to tell the difference in state too, as the SBI extension register will disappear when it cannot be enabled, i.e. accesses to it return ENOENT. get-reg-list is updated as well to only add SBI extension registers to the list which may be enabled. Returning ENOENT for SBI extension registers which cannot be enabled makes them consistent with ISA extension registers. Any SBI extensions which were enabled by default are still enabled by default, if they can be enabled at all. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-23	sched/topology: Add a new arch_scale_freq_ref() method	Vincent Guittot	1	-0/+1
	Create a new method to get a unique and fixed max frequency. Currently cpuinfo.max_freq or the highest (or last) state of performance domain are used as the max frequency when computing the frequency for a level of utilization, but: - cpuinfo_max_freq can change at runtime. boost is one example of such change. - cpuinfo.max_freq and last item of the PD can be different leading to different results between cpufreq and energy model. We need to save the reference frequency that has been used when computing the CPUs capacity and use this fixed and coherent value to convert between frequency and CPU's capacity. In fact, we already save the frequency that has been used when computing the capacity of each CPU. We extend the precision to save kHz instead of MHz currently and we modify the type to be aligned with other variables used when converting frequency to capacity and the other way. [ mingo: Minor edits. ] Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/20231211104855.558096-2-vincent.guittot@linaro.org
2023-12-22	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Paolo Abeni	2	-6/+1
	Cross-merge networking fixes after downstream PR. Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c 23c93c3b6275 ("bnxt_en: do not map packet buffers twice") 6d1add95536b ("bnxt_en: Modify TX ring indexing logic.") tools/testing/selftests/net/Makefile 2258b666482d ("selftests: add vlan hw filter tests") a0bc96c0cd6e ("selftests: net: verify fq per-band packet limit") Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-21	posix-timers: Get rid of [COMPAT_]SYS_NI() uses	Linus Torvalds	1	-5/+0
	Only the posix timer system calls use this (when the posix timer support is disabled, which does not actually happen in any normal case), because they had debug code to print out a warning about missing system calls. Get rid of that special case, and just use the standard COND_SYSCALL interface that creates weak system call stubs that return -ENOSYS for when the system call does not exist. This fixes a kCFI issue with the SYS_NI() hackery: CFI failure at int80_emulation+0x67/0xb0 (target: sys_ni_posix_timers+0x0/0x70; expected type: 0xb02b34d9) WARNING: CPU: 0 PID: 48 at int80_emulation+0x67/0xb0 Reported-by: kernel test robot <oliver.sang@intel.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Sami Tolvanen <samitolvanen@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-20	Merge patch series "riscv: Use READ_ONCE()/WRITE_ONCE() for pte accesses"	Palmer Dabbelt	3	-44/+15
	Alexandre Ghiti <alexghiti@rivosinc.com> says: This series is a follow-up for riscv of a recent series from Ryan [1] which converts all direct dereferences of pte_t into a ptet_get() access. The goal here for riscv is to use READ_ONCE()/WRITE_ONCE() for all page table entries accesses to avoid any compiler transformation when the hardware can concurrently modify the page tables entries (A/D bits for example). I went a bit further and added pud/p4d/pgd_get() helpers as such concurrent modifications can happen too at those levels. [1] https://lore.kernel.org/all/20230612151545.3317766-1-ryan.roberts@arm.com/ * b4-shazam-merge: riscv: Use accessors to page table entries instead of direct dereference riscv: mm: Only compile pgtable.c if MMU mm: Introduce pudp/p4dp/pgdp_get() functions riscv: Use WRITE_ONCE() when setting page table entries Link: https://lore.kernel.org/r/20231213203001.179237-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-12-20	riscv: Use accessors to page table entries instead of direct dereference	Alexandre Ghiti	3	-39/+10
	As very well explained in commit 20a004e7b017 ("arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables"), an architecture whose page table walker can modify the PTE in parallel must use READ_ONCE()/WRITE_ONCE() macro to avoid any compiler transformation. So apply that to riscv which is such architecture. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Acked-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20231213203001.179237-5-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-12-20	riscv: Use WRITE_ONCE() when setting page table entries	Alexandre Ghiti	2	-5/+5
	To avoid any compiler "weirdness" when accessing page table entries which are concurrently modified by the HW, let's use WRITE_ONCE() macro (commit 20a004e7b017 ("arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables") gives a great explanation with more details). Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20231213203001.179237-2-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-12-16	cfi: Flip headers	Peter Zijlstra	1	-1/+2
	Normal include order is that linux/foo.h should include asm/foo.h, CFI has it the wrong way around. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20231215092707.231038174@infradead.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>