summaryrefslogtreecommitdiff
path: root/arch/riscv
AgeCommit message (Collapse)AuthorFilesLines
2025-04-10riscv/purgatory: 4B align purgatory_startBjörn Töpel1-0/+1
[ Upstream commit 3f7023171df43641a8a8a1c9a12124501e589010 ] When a crashkernel is launched on RISC-V, the entry to purgatory is done by trapping via the stvec CSR. From riscv_kexec_norelocate(): | ... | /* | * Switch to physical addressing | * This will also trigger a jump to CSR_STVEC | * which in this case is the address of the new | * kernel. | */ | csrw CSR_STVEC, a2 | csrw CSR_SATP, zero stvec requires that the address is 4B aligned, which was not the case, e.g.: | Loaded purgatory at 0xffffc000 | kexec_file: kexec_file_load: type:1, start:0xffffd232 head:0x4 flags:0x6 The address 0xffffd232 not 4B aligned. Correct by adding proper function alignment. With this change, crashkernels loaded with kexec-file will be able to properly enter the purgatory. Fixes: 736e30af583fb ("RISC-V: Add purgatory") Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20250328085313.1193815-1-bjorn@kernel.org Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10riscv/kexec_file: Handle R_RISCV_64 in purgatory relocatorYao Zi1-0/+3
[ Upstream commit 28093cfef5dd62f4cbd537f2bdf6f0bf85309c45 ] Commit 58ff537109ac ("riscv: Omit optimized string routines when using KASAN") introduced calls to EXPORT_SYMBOL() in assembly string routines, which result in R_RISCV_64 relocations against .export_symbol section. As these rountines are reused by RISC-V purgatory and our relocator doesn't recognize these relocations, this fails kexec-file-load with dmesg like [ 11.344251] kexec_image: Unknown rela relocation: 2 [ 11.345972] kexec_image: Error loading purgatory ret=-8 Let's support R_RISCV_64 relocation to fix kexec on 64-bit RISC-V. 32-bit variant isn't covered since KEXEC_FILE and KEXEC_PURGATORY isn't available. Fixes: 58ff537109ac ("riscv: Omit optimized string routines when using KASAN") Signed-off-by: Yao Zi <ziyao@disroot.org> Tested-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20250326051445.55131-2-ziyao@disroot.org Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10riscv: Fix hugetlb retrieval of number of ptes in case of !present pteAlexandre Ghiti1-31/+45
[ Upstream commit 83d78ac677b9fdd8ea763507c6fe02d6bf415f3a ] Ryan sent a fix [1] for arm64 that applies to riscv too: in some hugetlb functions, we must not use the pte value to get the size of a mapping because the pte may not be present. So use the already present size parameter for huge_pte_clear() and the newly introduced size parameter for huge_ptep_get_and_clear(). And make sure to gather A/D bits only on present ptes. Fixes: 82a1a1f3bfb6 ("riscv: mm: support Svnapot in hugetlb page") Link: https://lore.kernel.org/all/20250217140419.1702389-1-ryan.roberts@arm.com/ [1] Link: https://lore.kernel.org/r/20250317072551.572169-1-alexghiti@rivosinc.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10RISC-V: errata: Use medany for relocatable buildsPalmer Dabbelt1-1/+5
[ Upstream commit bb58e1579f431d42469b6aed0f03eff383ba6db5 ] We're trying to mix non-PIC/PIE objects into the otherwise-PIE relocatable kernels, to avoid GOT/PLT references during early boot alternative resolution (which happens before the GOT/PLT are set up). riscv64-unknown-linux-gnu-ld: arch/riscv/errata/sifive/errata.o: relocation R_RISCV_HI20 against `tlb_flush_all_threshold' can not be used when making a shared object; recompile with -fPIC riscv64-unknown-linux-gnu-ld: arch/riscv/errata/thead/errata.o: relocation R_RISCV_HI20 against `riscv_cbom_block_size' can not be used when making a shared object; recompile with -fPIC Fixes: 8dc2a7e8027f ("riscv: Fix relocatable kernels with early alternatives using -fno-pie") Link: https://lore.kernel.org/r/20250326224506.27165-2-palmer@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10riscv: Fix check_unaligned_access_all_cpusAndrew Jones2-10/+7
[ Upstream commit e6d0adf2eb5bb3244cb21a7a15899aa058bd384f ] check_vector_unaligned_access_emulated_all_cpus(), like its name suggests, will return true when all cpus emulate unaligned vector accesses. If the function returned false it may have been because vector isn't supported at all (!has_vector()) or because at least one cpu doesn't emulate unaligned vector accesses. Since false may be returned for two cases, checking for it isn't sufficient when attempting to determine if we should proceed with the vector speed check. Move the !has_vector() functionality to check_unaligned_access_all_cpus() in order for check_vector_unaligned_access_emulated_all_cpus() to return false for a single case. Fixes: e7c9d66e313b ("RISC-V: Report vector unaligned access speed hwprobe") Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20250304120014.143628-13-ajones@ventanamicro.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10riscv: Fix riscv_online_cpu_vecAndrew Jones1-2/+4
[ Upstream commit 5af72a818612332a11171b16f27a62ec0a0f91d7 ] We shouldn't probe when we already know vector is unsupported and we should probe when we see we don't yet know whether it's supported. Furthermore, we should ensure we've set the access type to unsupported when we don't have vector at all. Fixes: e7c9d66e313b ("RISC-V: Report vector unaligned access speed hwprobe") Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20250304120014.143628-12-ajones@ventanamicro.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10riscv: Annotate unaligned access init functionsAndrew Jones3-13/+13
[ Upstream commit a00e022be5315c5a1f47521a1cc6d3b71c8e9c44 ] Several functions used in unaligned access probing are only run at init time. Annotate them appropriately. Fixes: f413aae96cda ("riscv: Set unaligned access speed at compile time") Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20250304120014.143628-11-ajones@ventanamicro.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10riscv: Fix missing __free_pages() in check_vector_unaligned_access()Alexandre Ghiti1-1/+4
[ Upstream commit 33981b1c4e499021421686dcfa7b3d23a430d00e ] The locally allocated pages are never freed up, so add the corresponding __free_pages(). Fixes: e7c9d66e313b ("RISC-V: Report vector unaligned access speed hwprobe") Link: https://lore.kernel.org/r/20250228090613.345309-1-alexghiti@rivosinc.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10riscv: Fix the __riscv_copy_vec_words_unaligned implementationTingbo Liao1-1/+1
[ Upstream commit 475afa39b123699e910c61ad9a51cedce4a0d310 ] Correct the VEC_S macro definition to fix the implementation of vector words copy in the case of unalignment in RISC-V. Fixes: e7c9d66e313b ("RISC-V: Report vector unaligned access speed hwprobe") Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Tingbo Liao <tingbo.liao@starfivetech.com> Link: https://lore.kernel.org/r/20250228090801.8334-1-tingbo.liao@starfivetech.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10riscv: ftrace: Add parentheses in macro definitions of make_call_t0 and ↵Juhan Jin1-2/+2
make_call_ra [ Upstream commit 5f1a58ed91a040d4625d854f9bb3dd4995919202 ] This patch adds parentheses to parameters caller and callee of macros make_call_t0 and make_call_ra. Every existing invocation of these two macros uses a single variable for each argument, so the absence of the parentheses seems okay. However, future invocations might use more complex expressions as arguments. For example, a future invocation might look like this: make_call_t0(a - b, c, call). Without parentheses in the macro definition, the macro invocation expands to: ... unsigned int offset = (unsigned long) c - (unsigned long) a - b; ... which is clearly wrong. The use of parentheses ensures arguments are correctly evaluated and potentially saves future users of make_call_t0 and make_call_ra debugging trouble. Fixes: 6724a76cff85 ("riscv: ftrace: Reduce the detour code size to half") Signed-off-by: Juhan Jin <juhan.jin@foxmail.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/tencent_AE90AA59903A628E87E9F80E563DA5BA5508@qq.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10RISC-V: KVM: Teardown riscv specific bits after kvm_exitAtish Patra1-2/+2
[ Upstream commit 2d117e67f318303f6ab699a5511d1fac3f170545 ] During a module removal, kvm_exit invokes arch specific disable call which disables AIA. However, we invoke aia_exit before kvm_exit resulting in the following warning. KVM kernel module can't be inserted afterwards due to inconsistent state of IRQ. [25469.031389] percpu IRQ 31 still enabled on CPU0! [25469.031732] WARNING: CPU: 3 PID: 943 at kernel/irq/manage.c:2476 __free_percpu_irq+0xa2/0x150 [25469.031804] Modules linked in: kvm(-) [25469.031848] CPU: 3 UID: 0 PID: 943 Comm: rmmod Not tainted 6.14.0-rc5-06947-g91c763118f47-dirty #2 [25469.031905] Hardware name: riscv-virtio,qemu (DT) [25469.031928] epc : __free_percpu_irq+0xa2/0x150 [25469.031976] ra : __free_percpu_irq+0xa2/0x150 [25469.032197] epc : ffffffff8007db1e ra : ffffffff8007db1e sp : ff2000000088bd50 [25469.032241] gp : ffffffff8131cef8 tp : ff60000080b96400 t0 : ff2000000088baf8 [25469.032285] t1 : fffffffffffffffc t2 : 5249207570637265 s0 : ff2000000088bd90 [25469.032329] s1 : ff60000098b21080 a0 : 037d527a15eb4f00 a1 : 037d527a15eb4f00 [25469.032372] a2 : 0000000000000023 a3 : 0000000000000001 a4 : ffffffff8122dbf8 [25469.032410] a5 : 0000000000000fff a6 : 0000000000000000 a7 : ffffffff8122dc10 [25469.032448] s2 : ff60000080c22eb0 s3 : 0000000200000022 s4 : 000000000000001f [25469.032488] s5 : ff60000080c22e00 s6 : ffffffff80c351c0 s7 : 0000000000000000 [25469.032582] s8 : 0000000000000003 s9 : 000055556b7fb490 s10: 00007ffff0e12fa0 [25469.032621] s11: 00007ffff0e13e9a t3 : ffffffff81354ac7 t4 : ffffffff81354ac7 [25469.032664] t5 : ffffffff81354ac8 t6 : ffffffff81354ac7 [25469.032698] status: 0000000200000100 badaddr: ffffffff8007db1e cause: 0000000000000003 [25469.032738] [<ffffffff8007db1e>] __free_percpu_irq+0xa2/0x150 [25469.032797] [<ffffffff8007dbfc>] free_percpu_irq+0x30/0x5e [25469.032856] [<ffffffff013a57dc>] kvm_riscv_aia_exit+0x40/0x42 [kvm] [25469.033947] [<ffffffff013b4e82>] cleanup_module+0x10/0x32 [kvm] [25469.035300] [<ffffffff8009b150>] __riscv_sys_delete_module+0x18e/0x1fc [25469.035374] [<ffffffff8000c1ca>] syscall_handler+0x3a/0x46 [25469.035456] [<ffffffff809ec9a4>] do_trap_ecall_u+0x72/0x134 [25469.035536] [<ffffffff809f5e18>] handle_exception+0x148/0x156 Invoke aia_exit and other arch specific cleanup functions after kvm_exit so that disable gets a chance to be called first before exit. Fixes: 54e43320c2ba ("RISC-V: KVM: Initial skeletal support for AIA") Fixes: eded6754f398 ("riscv: KVM: add basic support for host vs guest profiling") Signed-off-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250317-kvm_exit_fix-v1-1-aa5240c5dbd2@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10RISC-V: KVM: Disable the kernel perf counter during configureAtish Patra1-0/+1
[ Upstream commit bbb622488749478955485765ddff9d56be4a7e4b ] The perf event should be marked disabled during the creation as it is not ready to be scheduled until there is SBI PMU start call or config matching is called with auto start. Otherwise, event add/start gets called during perf_event_create_kernel_counter function. It will be enabled and scheduled to run via perf_event_enable during either the above mentioned scenario. Fixes: 0cb74b65d2e5 ("RISC-V: KVM: Implement perf support without sampling") Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20250303-kvm_pmu_improve-v2-1-41d177e45929@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-29riscv: dts: starfive: Fix a typo in StarFive JH7110 pin function definitionsE Shattow1-1/+1
commit 1b133129ad6b28186214259af3bd5fc651a85509 upstream. Fix a typo in StarFive JH7110 pin function definitions for GPOUT_SYS_SDIO1_DATA4 Fixes: e22f09e598d12 ("riscv: dts: starfive: Add StarFive JH7110 pin function definitions") Signed-off-by: E Shattow <e@freeshell.de> Acked-by: Hal Feng <hal.feng@starfivetech.com> CC: stable@vger.kernel.org Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-13mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear()Ryan Roberts2-2/+3
commit 02410ac72ac3707936c07ede66e94360d0d65319 upstream. In order to fix a bug, arm64 needs to be told the size of the huge page for which the huge_pte is being cleared in huge_ptep_get_and_clear(). Provide for this by adding an `unsigned long sz` parameter to the function. This follows the same pattern as huge_pte_clear() and set_huge_pte_at(). This commit makes the required interface modifications to the core mm as well as all arches that implement this function (arm64, loongarch, mips, parisc, powerpc, riscv, s390, sparc). The actual arm64 bug will be fixed in a separate commit. Cc: stable@vger.kernel.org Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit") Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> # s390 Link: https://lore.kernel.org/r/20250226120656.2400136-2-ryan.roberts@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-07riscv: cpufeature: use bitmap_equal() instead of memcmp()Clément Léger1-1/+1
commit c6ec1e1b078d8e2ecd075e46db6197a14930a3fc upstream. Comparison of bitmaps should be done using bitmap_equal(), not memcmp(), use the former one to compare isa bitmaps. Signed-off-by: Clément Léger <cleger@rivosinc.com> Fixes: 625034abd52a8c ("riscv: add ISA extensions validation callback") Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250210155615.1545738-1-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-07riscv: signal: fix signal_minsigstkszYong-Xuan Wang1-1/+1
commit 564fc8eb6f78e01292ff10801f318feae6153fdd upstream. The init_rt_signal_env() funciton is called before the alternative patch is applied, so using the alternative-related API to check the availability of an extension within this function doesn't have the intended effect. This patch reorders the init_rt_signal_env() and apply_boot_alternatives() to get the correct signal_minsigstksz. Fixes: e92f469b0771 ("riscv: signal: Report signal frame size to userspace via auxv") Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Reviewed-by: Zong Li <zong.li@sifive.com> Reviewed-by: Andy Chiu <andybnac@gmail.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20241220083926.19453-3-yongxuan.wang@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-07riscv: cacheinfo: Use of_property_present() for non-boolean propertiesRob Herring1-6/+6
commit fb8179ce2996bffaa36a04e2b6262843b01b7565 upstream. The use of of_property_read_bool() for non-boolean properties is deprecated in favor of of_property_present() when testing for property presence. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Clément Léger <cleger@rivosinc.com> Cc: stable@vger.kernel.org Fixes: 76d2a0493a17 ("RISC-V: Init and Halt Code") Link: https://lore.kernel.org/r/20241104190314.270095-1-robh@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-07riscv: signal: fix signal frame sizeYong-Xuan Wang1-6/+0
commit aa49bc2ca8524186ceb0811c23a7f00c3dea6987 upstream. The signal context of certain RISC-V extensions will be appended after struct __riscv_extra_ext_header, which already includes an empty context header. Therefore, there is no need to preserve a separate hdr for the END of signal context. Fixes: 8ee0b41898fa ("riscv: signal: Add sigcontext save/restore for vector") Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Reviewed-by: Zong Li <zong.li@sifive.com> Reviewed-by: Andy Chiu <AndybnAC@gmail.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20241220083926.19453-2-yongxuan.wang@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-07riscv/futex: sign extend compare value in atomic cmpxchgAndreas Schwab1-1/+1
commit 599c44cd21f4967774e0acf58f734009be4aea9a upstream. Make sure the compare value in the lr/sc loop is sign extended to match what lr.w does. Fortunately, due to the compiler keeping the register contents sign extended anyway the lack of the explicit extension didn't result in wrong code so far, but this cannot be relied upon. Fixes: b90edb33010b ("RISC-V: Add futex support.") Signed-off-by: Andreas Schwab <schwab@suse.de> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/mvmfrkv2vhz.fsf@suse.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-07riscv/atomic: Do proper sign extension also for unsigned in arch_cmpxchgAndreas Schwab1-1/+1
commit 1898300abf3508bca152e65b36cce5bf93d7e63e upstream. Sign extend also an unsigned compare value to match what lr.w is doing. Otherwise try_cmpxchg may spuriously return true when used on a u32 value that has the sign bit set, as it happens often in inode_set_ctime_current. Do this in three conversion steps. The first conversion to long is needed to avoid a -Wpointer-to-int-cast warning when arch_cmpxchg is used with a pointer type. Then convert to int and back to long to always sign extend the 32-bit value to 64-bit. Fixes: 6c58f25e6938 ("riscv/atomic: Fix sign extension for RV64I") Signed-off-by: Andreas Schwab <schwab@suse.de> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Tested-by: Xi Ruoyao <xry111@xry111.site> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/mvmed0k4prh.fsf@suse.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-07riscv: KVM: Fix SBI TIME error generationAndrew Jones1-1/+1
[ Upstream commit b901484852992cf3d162a5eab72251cc813ca624 ] When an invalid function ID of an SBI extension is used we should return not-supported, not invalid-param. Fixes: 5f862df5585c ("RISC-V: KVM: Add v0.1 replacement SBI extensions defined in v0.2") Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20250217084506.18763-11-ajones@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-07riscv: KVM: Fix SBI IPI error generationAndrew Jones1-2/+11
[ Upstream commit 0611f78f83c93c000029ab01daa28166d03590ed ] When an invalid function ID of an SBI extension is used we should return not-supported, not invalid-param. Also, when we see that at least one hartid constructed from the base and mask parameters is invalid, then we should return invalid-param. Finally, rather than relying on overflowing a left shift to result in zero and then using that zero in a condition which [correctly] skips sending an IPI (but loops unnecessarily), explicitly check for overflow and exit the loop immediately. Fixes: 5f862df5585c ("RISC-V: KVM: Add v0.1 replacement SBI extensions defined in v0.2") Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20250217084506.18763-10-ajones@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-07riscv: KVM: Fix hart suspend_type useAndrew Jones1-1/+2
[ Upstream commit e3219b0c491f2aa0e0b200a39d3352ab05cdda96 ] The spec says suspend_type is 32 bits wide and "In case the data is defined as 32bit wide, higher privilege software must ensure that it only uses 32 bit data." Mask off upper bits of suspend_type before using it. Fixes: 763c8bed8c05 ("RISC-V: KVM: Implement SBI HSM suspend call") Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20250217084506.18763-9-ajones@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-07riscv: KVM: Fix hart suspend status checkAndrew Jones1-4/+4
[ Upstream commit c7db342e3b4744688be1e27e31254c1d31a35274 ] "Not stopped" means started or suspended so we need to check for a single state in order to have a chance to check for each state. Also, we need to use target_vcpu when checking for the suspend state. Fixes: 763c8bed8c05 ("RISC-V: KVM: Implement SBI HSM suspend call") Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20250217084506.18763-8-ajones@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-08RISC-V: Mark riscv_v_init() as __initPalmer Dabbelt1-1/+1
[ Upstream commit 9d87cf525fd2e1a5fcbbb40ee3df216d1d266c88 ] This trips up with Xtheadvector enabled, but as far as I can tell it's just been an issue since the original patchset. Fixes: 7ca7a7b9b635 ("riscv: Add sysctl to set the default vector rule for new processes") Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20250115180251.31444-1-palmer@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-09Merge patch series "SBI PMU event related fixes"Palmer Dabbelt1-0/+1
Atish Patra <atishp@rivosinc.com> says: Here are two minor improvement/fixes in the PMU event path. The first patch was part of the series[1]. The 2nd patch was suggested during the series review. While the series can only be merged once SBI v3.0 is frozen, these two patches can be independent of SBI v3.0 and can be merged sooner. Hence, these two patches are sent as a separate series. * b4-shazam-merge: drivers/perf: riscv: Do not allow invalid raw event config drivers/perf: riscv: Return error for default case drivers/perf: riscv: Fix Platform firmware event data Link: https://lore.kernel.org/r/20241212-pmu_event_fixes_v2-v2-0-813e8a4f5962@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-01-09drivers/perf: riscv: Fix Platform firmware event dataAtish Patra1-0/+1
Platform firmware event data field is allowed to be 62 bits for Linux as uppper most two bits are reserved to indicate SBI fw or platform specific firmware events. However, the event data field is masked as per the hardware raw event mask which is not correct. Fix the platform firmware event data field with proper mask. Fixes: f0c9363db2dd ("perf/riscv-sbi: Add platform specific firmware event handling") Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20241212-pmu_event_fixes_v2-v2-1-813e8a4f5962@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-01-08riscv: use local label names instead of global ones in assemblyClément Léger1-10/+10
Local labels should be prefix by '.L' or they'll be exported in the symbol table. Additionally, this messes up the backtrace by displaying an incorrect symbol: ... [ 12.751810] [<ffffffff80441628>] _copy_from_user+0x28/0xc2 [ 12.752035] [<ffffffff800152ca>] handle_misaligned_load+0x1ca/0x2fc [ 12.752310] [<ffffffff80a033e8>] do_trap_load_misaligned+0x24/0xee [ 12.752596] [<ffffffff80a0dcae>] _new_vmalloc_restore_context_a0+0xc2/0xce After: ... [ 10.243916] [<ffffffff804415e4>] _copy_from_user+0x28/0xc2 [ 10.244026] [<ffffffff800152ca>] handle_misaligned_load+0x1ca/0x2fc [ 10.244150] [<ffffffff80a033a0>] do_trap_load_misaligned+0x24/0xee [ 10.244268] [<ffffffff80a0dc66>] handle_exception+0x146/0x152 Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Fixes: 503638e0babf3 ("riscv: Stop emitting preventive sfence.vma for new vmalloc mappings") Link: https://lore.kernel.org/r/20250103141814.508865-1-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-01-08riscv: qspinlock: Fixup _Q_PENDING_LOOPS definitionGuo Ren1-1/+4
When CONFIG_RISCV_QUEUED_SPINLOCKS=y, the _Q_PENDING_LOOPS definition is missing. Add the _Q_PENDING_LOOPS definition for pure qspinlock usage. Fixes: ab83647fadae ("riscv: Add qspinlock support") Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20241215135252.201983-1-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-01-08riscv: stacktrace: fix backtracing through exceptionsClément Léger2-1/+4
Prior to commit 5d5fc33ce58e ("riscv: Improve exception and system call latency"), backtrace through exception worked since ra was filled with ret_from_exception symbol address and the stacktrace code checked 'pc' to be equal to that symbol. Now that handle_exception uses regular 'call' instructions, this isn't working anymore and backtrace stops at handle_exception(). Since there are multiple call site to C code in the exception handling path, rather than checking multiple potential return addresses, add a new symbol at the end of exception handling and check pc to be in that range. Fixes: 5d5fc33ce58e ("riscv: Improve exception and system call latency") Signed-off-by: Clément Léger <cleger@rivosinc.com> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20241209155714.1239665-1-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-01-08riscv: mm: Fix the out of bound issue of vmemmap addressXu Lu3-2/+18
In sparse vmemmap model, the virtual address of vmemmap is calculated as: ((struct page *)VMEMMAP_START - (phys_ram_base >> PAGE_SHIFT)). And the struct page's va can be calculated with an offset: (vmemmap + (pfn)). However, when initializing struct pages, kernel actually starts from the first page from the same section that phys_ram_base belongs to. If the first page's physical address is not (phys_ram_base >> PAGE_SHIFT), then we get an va below VMEMMAP_START when calculating va for it's struct page. For example, if phys_ram_base starts from 0x82000000 with pfn 0x82000, the first page in the same section is actually pfn 0x80000. During init_unavailable_range(), we will initialize struct page for pfn 0x80000 with virtual address ((struct page *)VMEMMAP_START - 0x2000), which is below VMEMMAP_START as well as PCI_IO_END. This commit fixes this bug by introducing a new variable 'vmemmap_start_pfn' which is aligned with memory section size and using it to calculate vmemmap address instead of phys_ram_base. Fixes: a11dd49dcb93 ("riscv: Sparse-Memory/vmemmap out-of-bounds fix") Signed-off-by: Xu Lu <luxu.kernel@bytedance.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20241209122617.53341-1-luxu.kernel@bytedance.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-01-08riscv: kprobes: Fix incorrect address calculationNam Cao1-1/+1
p->ainsn.api.insn is a pointer to u32, therefore arithmetic operations are multiplied by four. This is clearly undesirable for this case. Cast it to (void *) first before any calculation. Below is a sample before/after. The dumped memory is two kprobe slots, the first slot has - c.addiw a0, 0x1c (0x7125) - ebreak (0x00100073) and the second slot has: - c.addiw a0, -4 (0x7135) - ebreak (0x00100073) Before this patch: (gdb) x/16xh 0xff20000000135000 0xff20000000135000: 0x7125 0x0000 0x0000 0x0000 0x7135 0x0010 0x0000 0x0000 0xff20000000135010: 0x0073 0x0010 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 After this patch: (gdb) x/16xh 0xff20000000125000 0xff20000000125000: 0x7125 0x0073 0x0010 0x0000 0x7135 0x0073 0x0010 0x0000 0xff20000000125010: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 Fixes: b1756750a397 ("riscv: kprobes: Use patch_text_nosync() for insn slots") Signed-off-by: Nam Cao <namcao@linutronix.de> Cc: stable@vger.kernel.org Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20241119111056.2554419-1-namcao@linutronix.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-01-08riscv: Fix sleeping in invalid context in die()Nam Cao1-3/+3
die() can be called in exception handler, and therefore cannot sleep. However, die() takes spinlock_t which can sleep with PREEMPT_RT enabled. That causes the following warning: BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 285, name: mutex preempt_count: 110001, expected: 0 RCU nest depth: 0, expected: 0 CPU: 0 UID: 0 PID: 285 Comm: mutex Not tainted 6.12.0-rc7-00022-ge19049cf7d56-dirty #234 Hardware name: riscv-virtio,qemu (DT) Call Trace: dump_backtrace+0x1c/0x24 show_stack+0x2c/0x38 dump_stack_lvl+0x5a/0x72 dump_stack+0x14/0x1c __might_resched+0x130/0x13a rt_spin_lock+0x2a/0x5c die+0x24/0x112 do_trap_insn_illegal+0xa0/0xea _new_vmalloc_restore_context_a0+0xcc/0xd8 Oops - illegal instruction [#1] Switch to use raw_spinlock_t, which does not sleep even with PREEMPT_RT enabled. Fixes: 76d2a0493a17 ("RISC-V: Init and Halt Code") Signed-off-by: Nam Cao <namcao@linutronix.de> Cc: stable@vger.kernel.org Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20241118091333.1185288-1-namcao@linutronix.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-01-08riscv: module: remove relocation_head rel_entry member allocationClément Léger1-14/+4
relocation_head's list_head member, rel_entry, doesn't need to be allocated, its storage can just be part of the allocated relocation_head. Remove the pointer which allows to get rid of the allocation as well as an existing memory leak found by Kai Zhang using kmemleak. Fixes: 8fd6c5142395 ("riscv: Add remaining module relocations") Reported-by: Kai Zhang <zhangkai@iscas.ac.cn> Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20241128081636.3620468-1-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-12-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-1/+1
Pull kvm fixes from Paolo Bonzini: "ARM64: - Fix confusion with implicitly-shifted MDCR_EL2 masks breaking SPE/TRBE initialization - Align nested page table walker with the intended memory attribute combining rules of the architecture - Prevent userspace from constraining the advertised ASID width, avoiding horrors of guest TLBIs not matching the intended context in hardware - Don't leak references on LPIs when insertion into the translation cache fails RISC-V: - Replace csr_write() with csr_set() for HVIEN PMU overflow bit x86: - Cache CPUID.0xD XSTATE offsets+sizes during module init On Intel's Emerald Rapids CPUID costs hundreds of cycles and there are a lot of leaves under 0xD. Getting rid of the CPUIDs during nested VM-Enter and VM-Exit is planned for the next release, for now just cache them: even on Skylake that is 40% faster" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Cache CPUID.0xD XSTATE offsets+sizes during module init RISC-V: KVM: Fix csr_write -> csr_set for HVIEN PMU overflow bit KVM: arm64: vgic-its: Add error handling in vgic_its_cache_translation KVM: arm64: Do not allow ID_AA64MMFR0_EL1.ASIDbits to be overridden KVM: arm64: Fix S1/S2 combination when FWB==1 and S2 has Device memory type arm64: Fix usage of new shifted MDCR_EL2 values
2024-12-11riscv: mm: Do not call pmd dtor on vmemmap page table teardownBjörn Töpel1-3/+4
The vmemmap's, which is used for RV64 with SPARSEMEM_VMEMMAP, page tables are populated using pmd (page middle directory) hugetables. However, the pmd allocation is not using the generic mechanism used by the VMA code (e.g. pmd_alloc()), or the RISC-V specific create_pgd_mapping()/alloc_pmd_late(). Instead, the vmemmap page table code allocates a page, and calls vmemmap_set_pmd(). This results in that the pmd ctor is *not* called, nor would it make sense to do so. Now, when tearing down a vmemmap page table pmd, the cleanup code would unconditionally, and incorrectly call the pmd dtor, which results in a crash (best case). This issue was found when running the HMM selftests: | tools/testing/selftests/mm# ./test_hmm.sh smoke | ... # when unloading the test_hmm.ko module | page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10915b | flags: 0x1000000000000000(node=0|zone=1) | raw: 1000000000000000 0000000000000000 dead000000000122 0000000000000000 | raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 | page dumped because: VM_BUG_ON_PAGE(ptdesc->pmd_huge_pte) | ------------[ cut here ]------------ | kernel BUG at include/linux/mm.h:3080! | Kernel BUG [#1] | Modules linked in: test_hmm(-) sch_fq_codel fuse drm drm_panel_orientation_quirks backlight dm_mod | CPU: 1 UID: 0 PID: 514 Comm: modprobe Tainted: G W 6.12.0-00982-gf2a4f1682d07 #2 | Tainted: [W]=WARN | Hardware name: riscv-virtio qemu/qemu, BIOS 2024.10 10/01/2024 | epc : remove_pgd_mapping+0xbec/0x1070 | ra : remove_pgd_mapping+0xbec/0x1070 | epc : ffffffff80010a68 ra : ffffffff80010a68 sp : ff20000000a73940 | gp : ffffffff827b2d88 tp : ff6000008785da40 t0 : ffffffff80fbce04 | t1 : 0720072007200720 t2 : 706d756420656761 s0 : ff20000000a73a50 | s1 : ff6000008915cff8 a0 : 0000000000000039 a1 : 0000000000000008 | a2 : ff600003fff0de20 a3 : 0000000000000000 a4 : 0000000000000000 | a5 : 0000000000000000 a6 : c0000000ffffefff a7 : ffffffff824469b8 | s2 : ff1c0000022456c0 s3 : ff1ffffffdbfffff s4 : ff6000008915c000 | s5 : ff6000008915c000 s6 : ff6000008915c000 s7 : ff1ffffffdc00000 | s8 : 0000000000000001 s9 : ff1ffffffdc00000 s10: ffffffff819a31f0 | s11: ffffffffffffffff t3 : ffffffff8000c950 t4 : ff60000080244f00 | t5 : ff60000080244000 t6 : ff20000000a73708 | status: 0000000200000120 badaddr: ffffffff80010a68 cause: 0000000000000003 | [<ffffffff80010a68>] remove_pgd_mapping+0xbec/0x1070 | [<ffffffff80fd238e>] vmemmap_free+0x14/0x1e | [<ffffffff8032e698>] section_deactivate+0x220/0x452 | [<ffffffff8032ef7e>] sparse_remove_section+0x4a/0x58 | [<ffffffff802f8700>] __remove_pages+0x7e/0xba | [<ffffffff803760d8>] memunmap_pages+0x2bc/0x3fe | [<ffffffff02a3ca28>] dmirror_device_remove_chunks+0x2ea/0x518 [test_hmm] | [<ffffffff02a3e026>] hmm_dmirror_exit+0x3e/0x1018 [test_hmm] | [<ffffffff80102c14>] __riscv_sys_delete_module+0x15a/0x2a6 | [<ffffffff80fd020c>] do_trap_ecall_u+0x1f2/0x266 | [<ffffffff80fde0a2>] _new_vmalloc_restore_context_a0+0xc6/0xd2 | Code: bf51 7597 0184 8593 76a5 854a 4097 0029 80e7 2c00 (9002) 7597 | ---[ end trace 0000000000000000 ]--- | Kernel panic - not syncing: Fatal exception in interrupt Add a check to avoid calling the pmd dtor, if the calling context is vmemmap_free(). Fixes: c75a74f4ba19 ("riscv: mm: Add memory hotplugging support") Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20241120131203.1859787-1-bjorn@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-12-11riscv: Fix IPIs usage in kfence_protect_page()Alexandre Ghiti1-1/+3
flush_tlb_kernel_range() may use IPIs to flush the TLBs of all the cores, which triggers the following warning when the irqs are disabled: [ 3.455330] WARNING: CPU: 1 PID: 0 at kernel/smp.c:815 smp_call_function_many_cond+0x452/0x520 [ 3.456647] Modules linked in: [ 3.457218] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.12.0-rc7-00010-g91d3de7240b8 #1 [ 3.457416] Hardware name: QEMU QEMU Virtual Machine, BIOS [ 3.457633] epc : smp_call_function_many_cond+0x452/0x520 [ 3.457736] ra : on_each_cpu_cond_mask+0x1e/0x30 [ 3.457786] epc : ffffffff800b669a ra : ffffffff800b67c2 sp : ff2000000000bb50 [ 3.457824] gp : ffffffff815212b8 tp : ff6000008014f080 t0 : 000000000000003f [ 3.457859] t1 : ffffffff815221e0 t2 : 000000000000000f s0 : ff2000000000bc10 [ 3.457920] s1 : 0000000000000040 a0 : ffffffff815221e0 a1 : 0000000000000001 [ 3.457953] a2 : 0000000000010000 a3 : 0000000000000003 a4 : 0000000000000000 [ 3.458006] a5 : 0000000000000000 a6 : ffffffffffffffff a7 : 0000000000000000 [ 3.458042] s2 : ffffffff815223be s3 : 00fffffffffff000 s4 : ff600001ffe38fc0 [ 3.458076] s5 : ff600001ff950d00 s6 : 0000000200000120 s7 : 0000000000000001 [ 3.458109] s8 : 0000000000000001 s9 : ff60000080841ef0 s10: 0000000000000001 [ 3.458141] s11: ffffffff81524812 t3 : 0000000000000001 t4 : ff60000080092bc0 [ 3.458172] t5 : 0000000000000000 t6 : ff200000000236d0 [ 3.458203] status: 0000000200000100 badaddr: ffffffff800b669a cause: 0000000000000003 [ 3.458373] [<ffffffff800b669a>] smp_call_function_many_cond+0x452/0x520 [ 3.458593] [<ffffffff800b67c2>] on_each_cpu_cond_mask+0x1e/0x30 [ 3.458625] [<ffffffff8000e4ca>] __flush_tlb_range+0x118/0x1ca [ 3.458656] [<ffffffff8000e6b2>] flush_tlb_kernel_range+0x1e/0x26 [ 3.458683] [<ffffffff801ea56a>] kfence_protect+0xc0/0xce [ 3.458717] [<ffffffff801e9456>] kfence_guarded_free+0xc6/0x1c0 [ 3.458742] [<ffffffff801e9d6c>] __kfence_free+0x62/0xc6 [ 3.458764] [<ffffffff801c57d8>] kfree+0x106/0x32c [ 3.458786] [<ffffffff80588cf2>] detach_buf_split+0x188/0x1a8 [ 3.458816] [<ffffffff8058708c>] virtqueue_get_buf_ctx+0xb6/0x1f6 [ 3.458839] [<ffffffff805871da>] virtqueue_get_buf+0xe/0x16 [ 3.458880] [<ffffffff80613d6a>] virtblk_done+0x5c/0xe2 [ 3.458908] [<ffffffff8058766e>] vring_interrupt+0x6a/0x74 [ 3.458930] [<ffffffff800747d8>] __handle_irq_event_percpu+0x7c/0xe2 [ 3.458956] [<ffffffff800748f0>] handle_irq_event+0x3c/0x86 [ 3.458978] [<ffffffff800786cc>] handle_simple_irq+0x9e/0xbe [ 3.459004] [<ffffffff80073934>] generic_handle_domain_irq+0x1c/0x2a [ 3.459027] [<ffffffff804bf87c>] imsic_handle_irq+0xba/0x120 [ 3.459056] [<ffffffff80073934>] generic_handle_domain_irq+0x1c/0x2a [ 3.459080] [<ffffffff804bdb76>] riscv_intc_aia_irq+0x24/0x34 [ 3.459103] [<ffffffff809d0452>] handle_riscv_irq+0x2e/0x4c [ 3.459133] [<ffffffff809d923e>] call_on_irq_stack+0x32/0x40 So only flush the local TLB and let the lazy kfence page fault handling deal with the faults which could happen when a core has an old protected pte version cached in its TLB. That leads to potential inaccuracies which can be tolerated when using kfence. Fixes: 47513f243b45 ("riscv: Enable KFENCE for riscv64") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20241209074125.52322-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-12-11riscv: Fix wrong usage of __pa() on a fixmap addressAlexandre Ghiti1-1/+1
riscv uses fixmap addresses to map the dtb so we can't use __pa() which is reserved for linear mapping addresses. Fixes: b2473a359763 ("of/fdt: add dt_phys arg to early_init_dt_scan and early_init_dt_verify") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20241209074508.53037-1-alexghiti@rivosinc.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-12-11riscv: Fixup boot failure when CONFIG_DEBUG_RT_MUTEXES=yGuo Ren1-3/+9
When CONFIG_DEBUG_RT_MUTEXES=y, mutex_lock->rt_mutex_try_acquire would change from rt_mutex_cmpxchg_acquire to rt_mutex_slowtrylock(): raw_spin_lock_irqsave(&lock->wait_lock, flags); ret = __rt_mutex_slowtrylock(lock); raw_spin_unlock_irqrestore(&lock->wait_lock, flags); Because queued_spin_#ops to ticket_#ops is changed one by one by jump_label, raw_spin_lock/unlock would cause a deadlock during the changing. That means in arch/riscv/kernel/jump_label.c: 1. arch_jump_label_transform_queue() -> mutex_lock(&text_mutex); +-> raw_spin_lock -> queued_spin_lock |-> raw_spin_unlock -> queued_spin_unlock patch_insn_write -> change the raw_spin_lock to ticket_lock mutex_unlock(&text_mutex); ... 2. /* Dirty the lock value */ arch_jump_label_transform_queue() -> mutex_lock(&text_mutex); +-> raw_spin_lock -> *ticket_lock* |-> raw_spin_unlock -> *queued_spin_unlock* /* BUG: ticket_lock with queued_spin_unlock */ patch_insn_write -> change the raw_spin_unlock to ticket_unlock mutex_unlock(&text_mutex); ... 3. /* Dead lock */ arch_jump_label_transform_queue() -> mutex_lock(&text_mutex); +-> raw_spin_lock -> ticket_lock /* deadlock! */ |-> raw_spin_unlock -> ticket_unlock patch_insn_write -> change other raw_spin_#op -> ticket_#op mutex_unlock(&text_mutex); So, the solution is to disable mutex usage of arch_jump_label_transform_queue() during early_boot_irqs_disabled, just like we have done for stop_machine. Reported-by: Conor Dooley <conor@kernel.org> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Fixes: ab83647fadae ("riscv: Add qspinlock support") Link: https://lore.kernel.org/linux-riscv/CAJF2gTQwYTGinBmCSgVUoPv0_q4EPt_+WiyfUA1HViAKgUzxAg@mail.gmail.com/T/#mf488e6347817fca03bb93a7d34df33d8615b3775 Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Nam Cao <namcao@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20241130153310.3349484-1-guoren@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-12-06RISC-V: KVM: Fix csr_write -> csr_set for HVIEN PMU overflow bitMichael Neuling1-1/+1
This doesn't cause a problem currently as HVIEN isn't used elsewhere yet. Found by inspection. Signed-off-by: Michael Neuling <michaelneuling@tenstorrent.com> Fixes: 16b0bde9a37c ("RISC-V: KVM: Add perf sampling support for guests") Reviewed-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20241127041840.419940-1-michaelneuling@tenstorrent.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-12-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds8-1/+49
Pull more kvm updates from Paolo Bonzini: - ARM fixes - RISC-V Svade and Svadu (accessed and dirty bit) extension support for host and guest * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: riscv: selftests: Add Svade and Svadu Extension to get-reg-list test RISC-V: KVM: Add Svade and Svadu Extensions Support for Guest/VM dt-bindings: riscv: Add Svade and Svadu Entries RISC-V: Add Svade and Svadu Extensions Support KVM: arm64: Use MDCR_EL2.HPME to evaluate overflow of hyp counters KVM: arm64: Ignore PMCNTENSET_EL0 while checking for overflow status KVM: arm64: Mark set_sysreg_masks() as inline to avoid build failure KVM: arm64: vgic-its: Add stronger type-checking to the ITS entry sizes KVM: arm64: vgic: Kill VGIC_MAX_PRIVATE definition KVM: arm64: vgic: Make vgic_get_irq() more robust KVM: arm64: vgic-v3: Sanitise guest writes to GICR_INVLPIR
2024-12-01Merge tag 'kbuild-v6.13' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Add generic support for built-in boot DTB files - Enable TAB cycling for dialog buttons in nconfig - Fix issues in streamline_config.pl - Refactor Kconfig - Add support for Clang's AutoFDO (Automatic Feedback-Directed Optimization) - Add support for Clang's Propeller, a profile-guided optimization. - Change the working directory to the external module directory for M= builds - Support building external modules in a separate output directory - Enable objtool for *.mod.o and additional kernel objects - Use lz4 instead of deprecated lz4c - Work around a performance issue with "git describe" - Refactor modpost * tag 'kbuild-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (85 commits) kbuild: rename .tmp_vmlinux.kallsyms0.syms to .tmp_vmlinux0.syms gitignore: Don't ignore 'tags' directory kbuild: add dependency from vmlinux to resolve_btfids modpost: replace tdb_hash() with hash_str() kbuild: deb-pkg: add python3:native to build dependency genksyms: reduce indentation in export_symbol() modpost: improve error messages in device_id_check() modpost: rename alias symbol for MODULE_DEVICE_TABLE() modpost: rename variables in handle_moddevtable() modpost: move strstarts() to modpost.h modpost: convert do_usb_table() to a generic handler modpost: convert do_of_table() to a generic handler modpost: convert do_pnp_device_entry() to a generic handler modpost: convert do_pnp_card_entries() to a generic handler modpost: call module_alias_printf() from all do_*_entry() functions modpost: pass (struct module *) to do_*_entry() functions modpost: remove DEF_FIELD_ADDR_VAR() macro modpost: deduplicate MODULE_ALIAS() for all drivers modpost: introduce module_alias_printf() helper modpost: remove unnecessary check in do_acpi_entry() ...
2024-11-27Merge tag 'riscv-for-linus-6.13-mw1' of ↵Linus Torvalds37-181/+1247
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-v updates from Palmer Dabbelt: - Support for pointer masking in userspace - Support for probing vector misaligned access performance - Support for qspinlock on systems with Zacas and Zabha * tag 'riscv-for-linus-6.13-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (38 commits) RISC-V: Remove unnecessary include from compat.h riscv: Fix default misaligned access trap riscv: Add qspinlock support dt-bindings: riscv: Add Ziccrse ISA extension description riscv: Add ISA extension parsing for Ziccrse asm-generic: ticket-lock: Add separate ticket-lock.h asm-generic: ticket-lock: Reuse arch_spinlock_t of qspinlock riscv: Implement xchg8/16() using Zabha riscv: Implement arch_cmpxchg128() using Zacas riscv: Improve zacas fully-ordered cmpxchg() riscv: Implement cmpxchg8/16() using Zabha dt-bindings: riscv: Add Zabha ISA extension description riscv: Implement cmpxchg32/64() using Zacas riscv: Do not fail to build on byte/halfword operations with Zawrs riscv: Move cpufeature.h macros into their own header KVM: riscv: selftests: Add Smnpm and Ssnpm to get-reg-list test RISC-V: KVM: Allow Smnpm and Ssnpm extensions for guests riscv: hwprobe: Export the Supm ISA extension riscv: selftests: Add a pointer masking test riscv: Allow ptrace control of the tagged address ABI ...
2024-11-27Merge tag 'kvm-riscv-6.13-2' of https://github.com/kvm-riscv/linux into HEADPaolo Bonzini8-1/+49
KVM/riscv changes for 6.13 part #2 - Svade and Svadu extension support for Host and Guest/VM
2024-11-27Merge tag 'riscv-for-linus-6.13-mw1' of ↵Paolo Bonzini37-181/+1247
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux into HEAD RISC-V Paches for the 6.13 Merge Window, Part 1 * Support for pointer masking in userspace, * Support for probing vector misaligned access performance. * Support for qspinlock on systems with Zacas and Zabha.
2024-11-27kbuild: add $(objtree)/ prefix to some in-kernel build artifactsMasahiro Yamada1-1/+1
$(objtree) refers to the top of the output directory of kernel builds. This commit adds the explicit $(objtree)/ prefix to build artifacts needed for building external modules. This change has no immediate impact, as the top-level Makefile currently defines: objtree := . This commit prepares for supporting the building of external modules in a different directory. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
2024-11-26RISC-V: Remove unnecessary include from compat.hPalmer Dabbelt1-1/+0
Without this I get a bunch of build errors like In file included from ./include/linux/sched/task_stack.h:12, from ./arch/riscv/include/asm/compat.h:12, from ./arch/riscv/include/asm/pgtable.h:115, from ./include/linux/pgtable.h:6, from ./include/linux/mm.h:30, from arch/riscv/kernel/asm-offsets.c:8: ./include/linux/kasan.h:50:37: error: ‘MAX_PTRS_PER_PTE’ undeclared here (not in a function); did you mean ‘PTRS_PER_PTE’? 50 | extern pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS]; | ^~~~~~~~~~~~~~~~ | PTRS_PER_PTE ./include/linux/kasan.h:51:8: error: unknown type name ‘pmd_t’; did you mean ‘pgd_t’? 51 | extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD]; | ^~~~~ | pgd_t ./include/linux/kasan.h:51:37: error: ‘MAX_PTRS_PER_PMD’ undeclared here (not in a function); did you mean ‘PTRS_PER_PGD’? 51 | extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD]; | ^~~~~~~~~~~~~~~~ | PTRS_PER_PGD ./include/linux/kasan.h:52:8: error: unknown type name ‘pud_t’; did you mean ‘pgd_t’? 52 | extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD]; | ^~~~~ | pgd_t ./include/linux/kasan.h:52:37: error: ‘MAX_PTRS_PER_PUD’ undeclared here (not in a function); did you mean ‘PTRS_PER_PGD’? 52 | extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD]; | ^~~~~~~~~~~~~~~~ | PTRS_PER_PGD ./include/linux/kasan.h:53:8: error: unknown type name ‘p4d_t’; did you mean ‘pgd_t’? 53 | extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D]; | ^~~~~ | pgd_t ./include/linux/kasan.h:53:37: error: ‘MAX_PTRS_PER_P4D’ undeclared here (not in a function); did you mean ‘PTRS_PER_PGD’? 53 | extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D]; | ^~~~~~~~~~~~~~~~ | PTRS_PER_PGD Link: https://lore.kernel.org/r/20241126143250.29708-1-palmer@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-11-26Merge tag 'trace-rust-v6.13' of ↵Linus Torvalds1-22/+28
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull rust trace event support from Steven Rostedt: "Allow Rust code to have trace events Trace events is a popular way to debug what is happening inside the kernel or just to find out what is happening. Rust code is being added to the Linux kernel but it currently does not support the tracing infrastructure. Add support of trace events inside Rust code" * tag 'trace-rust-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: rust: jump_label: skip formatting generated file jump_label: rust: pass a mut ptr to `static_key_count` samples: rust: fix `rust_print` build making it a combined module rust: add arch_static_branch jump_label: adjust inline asm to be consistent rust: samples: add tracepoint to Rust sample rust: add tracepoint support rust: add static_branch_unlikely for static_key_false
2024-11-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds17-190/+995
Pull kvm updates from Paolo Bonzini: "The biggest change here is eliminating the awful idea that KVM had of essentially guessing which pfns are refcounted pages. The reason to do so was that KVM needs to map both non-refcounted pages (for example BARs of VFIO devices) and VM_PFNMAP/VM_MIXMEDMAP VMAs that contain refcounted pages. However, the result was security issues in the past, and more recently the inability to map VM_IO and VM_PFNMAP memory that _is_ backed by struct page but is not refcounted. In particular this broke virtio-gpu blob resources (which directly map host graphics buffers into the guest as "vram" for the virtio-gpu device) with the amdgpu driver, because amdgpu allocates non-compound higher order pages and the tail pages could not be mapped into KVM. This requires adjusting all uses of struct page in the per-architecture code, to always work on the pfn whenever possible. The large series that did this, from David Stevens and Sean Christopherson, also cleaned up substantially the set of functions that provided arch code with the pfn for a host virtual addresses. The previous maze of twisty little passages, all different, is replaced by five functions (__gfn_to_page, __kvm_faultin_pfn, the non-__ versions of these two, and kvm_prefetch_pages) saving almost 200 lines of code. ARM: - Support for stage-1 permission indirection (FEAT_S1PIE) and permission overlays (FEAT_S1POE), including nested virt + the emulated page table walker - Introduce PSCI SYSTEM_OFF2 support to KVM + client driver. This call was introduced in PSCIv1.3 as a mechanism to request hibernation, similar to the S4 state in ACPI - Explicitly trap + hide FEAT_MPAM (QoS controls) from KVM guests. As part of it, introduce trivial initialization of the host's MPAM context so KVM can use the corresponding traps - PMU support under nested virtualization, honoring the guest hypervisor's trap configuration and event filtering when running a nested guest - Fixes to vgic ITS serialization where stale device/interrupt table entries are not zeroed when the mapping is invalidated by the VM - Avoid emulated MMIO completion if userspace has requested synchronous external abort injection - Various fixes and cleanups affecting pKVM, vCPU initialization, and selftests LoongArch: - Add iocsr and mmio bus simulation in kernel. - Add in-kernel interrupt controller emulation. - Add support for virtualization extensions to the eiointc irqchip. PPC: - Drop lingering and utterly obsolete references to PPC970 KVM, which was removed 10 years ago. - Fix incorrect documentation references to non-existing ioctls RISC-V: - Accelerate KVM RISC-V when running as a guest - Perf support to collect KVM guest statistics from host side s390: - New selftests: more ucontrol selftests and CPU model sanity checks - Support for the gen17 CPU model - List registers supported by KVM_GET/SET_ONE_REG in the documentation x86: - Cleanup KVM's handling of Accessed and Dirty bits to dedup code, improve documentation, harden against unexpected changes. Even if the hardware A/D tracking is disabled, it is possible to use the hardware-defined A/D bits to track if a PFN is Accessed and/or Dirty, and that removes a lot of special cases. - Elide TLB flushes when aging secondary PTEs, as has been done in x86's primary MMU for over 10 years. - Recover huge pages in-place in the TDP MMU when dirty page logging is toggled off, instead of zapping them and waiting until the page is re-accessed to create a huge mapping. This reduces vCPU jitter. - Batch TLB flushes when dirty page logging is toggled off. This reduces the time it takes to disable dirty logging by ~3x. - Remove the shrinker that was (poorly) attempting to reclaim shadow page tables in low-memory situations. - Clean up and optimize KVM's handling of writes to MSR_IA32_APICBASE. - Advertise CPUIDs for new instructions in Clearwater Forest - Quirk KVM's misguided behavior of initialized certain feature MSRs to their maximum supported feature set, which can result in KVM creating invalid vCPU state. E.g. initializing PERF_CAPABILITIES to a non-zero value results in the vCPU having invalid state if userspace hides PDCM from the guest, which in turn can lead to save/restore failures. - Fix KVM's handling of non-canonical checks for vCPUs that support LA57 to better follow the "architecture", in quotes because the actual behavior is poorly documented. E.g. most MSR writes and descriptor table loads ignore CR4.LA57 and operate purely on whether the CPU supports LA57. - Bypass the register cache when querying CPL from kvm_sched_out(), as filling the cache from IRQ context is generally unsafe; harden the cache accessors to try to prevent similar issues from occuring in the future. The issue that triggered this change was already fixed in 6.12, but was still kinda latent. - Advertise AMD_IBPB_RET to userspace, and fix a related bug where KVM over-advertises SPEC_CTRL when trying to support cross-vendor VMs. - Minor cleanups - Switch hugepage recovery thread to use vhost_task. These kthreads can consume significant amounts of CPU time on behalf of a VM or in response to how the VM behaves (for example how it accesses its memory); therefore KVM tried to place the thread in the VM's cgroups and charge the CPU time consumed by that work to the VM's container. However the kthreads did not process SIGSTOP/SIGCONT, and therefore cgroups which had KVM instances inside could not complete freezing. Fix this by replacing the kthread with a PF_USER_WORKER thread, via the vhost_task abstraction. Another 100+ lines removed, with generally better behavior too like having these threads properly parented in the process tree. - Revert a workaround for an old CPU erratum (Nehalem/Westmere) that didn't really work; there was really nothing to work around anyway: the broken patch was meant to fix nested virtualization, but the PERF_GLOBAL_CTRL MSR is virtualized and therefore unaffected by the erratum. - Fix 6.12 regression where CONFIG_KVM will be built as a module even if asked to be builtin, as long as neither KVM_INTEL nor KVM_AMD is 'y'. x86 selftests: - x86 selftests can now use AVX. Documentation: - Use rST internal links - Reorganize the introduction to the API document Generic: - Protect vcpu->pid accesses outside of vcpu->mutex with a rwlock instead of RCU, so that running a vCPU on a different task doesn't encounter long due to having to wait for all CPUs become quiescent. In general both reads and writes are rare, but userspace that supports confidential computing is introducing the use of "helper" vCPUs that may jump from one host processor to another. Those will be very happy to trigger a synchronize_rcu(), and the effect on performance is quite the disaster" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (298 commits) KVM: x86: Break CONFIG_KVM_X86's direct dependency on KVM_INTEL || KVM_AMD KVM: x86: add back X86_LOCAL_APIC dependency Revert "KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config()" KVM: x86: switch hugepage recovery thread to vhost_task KVM: x86: expose MSR_PLATFORM_INFO as a feature MSR x86: KVM: Advertise CPUIDs for new instructions in Clearwater Forest Documentation: KVM: fix malformed table irqchip/loongson-eiointc: Add virt extension support LoongArch: KVM: Add irqfd support LoongArch: KVM: Add PCHPIC user mode read and write functions LoongArch: KVM: Add PCHPIC read and write functions LoongArch: KVM: Add PCHPIC device support LoongArch: KVM: Add EIOINTC user mode read and write functions LoongArch: KVM: Add EIOINTC read and write functions LoongArch: KVM: Add EIOINTC device support LoongArch: KVM: Add IPI user mode read and write function LoongArch: KVM: Add IPI read and write function LoongArch: KVM: Add IPI device support LoongArch: KVM: Add iocsr and mmio bus simulation in kernel KVM: arm64: Pass on SVE mapping failures ...
2024-11-23Merge tag 'mm-stable-2024-11-18-19-27' of ↵Linus Torvalds16-12/+48
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - The series "zram: optimal post-processing target selection" from Sergey Senozhatsky improves zram's post-processing selection algorithm. This leads to improved memory savings. - Wei Yang has gone to town on the mapletree code, contributing several series which clean up the implementation: - "refine mas_mab_cp()" - "Reduce the space to be cleared for maple_big_node" - "maple_tree: simplify mas_push_node()" - "Following cleanup after introduce mas_wr_store_type()" - "refine storing null" - The series "selftests/mm: hugetlb_fault_after_madv improvements" from David Hildenbrand fixes this selftest for s390. - The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng implements some rationaizations and cleanups in the page mapping code. - The series "mm: optimize shadow entries removal" from Shakeel Butt optimizes the file truncation code by speeding up the handling of shadow entries. - The series "Remove PageKsm()" from Matthew Wilcox completes the migration of this flag over to being a folio-based flag. - The series "Unify hugetlb into arch_get_unmapped_area functions" from Oscar Salvador implements a bunch of consolidations and cleanups in the hugetlb code. - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain takes away the wp-fault time practice of turning a huge zero page into small pages. Instead we replace the whole thing with a THP. More consistent cleaner and potentiall saves a large number of pagefaults. - The series "percpu: Add a test case and fix for clang" from Andy Shevchenko enhances and fixes the kernel's built in percpu test code. - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett optimizes mremap() by avoiding doing things which we didn't need to do. - The series "Improve the tmpfs large folio read performance" from Baolin Wang teaches tmpfs to copy data into userspace at the folio size rather than as individual pages. A 20% speedup was observed. - The series "mm/damon/vaddr: Fix issue in damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON splitting. - The series "memcg-v1: fully deprecate charge moving" from Shakeel Butt removes the long-deprecated memcgv2 charge moving feature. - The series "fix error handling in mmap_region() and refactor" from Lorenzo Stoakes cleanup up some of the mmap() error handling and addresses some potential performance issues. - The series "x86/module: use large ROX pages for text allocations" from Mike Rapoport teaches x86 to use large pages for read-only-execute module text. - The series "page allocation tag compression" from Suren Baghdasaryan is followon maintenance work for the new page allocation profiling feature. - The series "page->index removals in mm" from Matthew Wilcox remove most references to page->index in mm/. A slow march towards shrinking struct page. - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs interface tests" from Andrew Paniakin performs maintenance work for DAMON's self testing code. - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar improves zswap's batching of compression and decompression. It is a step along the way towards using Intel IAA hardware acceleration for this zswap operation. - The series "kasan: migrate the last module test to kunit" from Sabyrzhan Tasbolatov completes the migration of the KASAN built-in tests over to the KUnit framework. - The series "implement lightweight guard pages" from Lorenzo Stoakes permits userapace to place fault-generating guard pages within a single VMA, rather than requiring that multiple VMAs be created for this. Improved efficiencies for userspace memory allocators are expected. - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses tracepoints to provide increased visibility into memcg stats flushing activity. - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky fixes a zram buglet which potentially affected performance. - The series "mm: add more kernel parameters to control mTHP" from Maíra Canal enhances our ability to control/configuremultisize THP from the kernel boot command line. - The series "kasan: few improvements on kunit tests" from Sabyrzhan Tasbolatov has a couple of fixups for the KASAN KUnit tests. - The series "mm/list_lru: Split list_lru lock into per-cgroup scope" from Kairui Song optimizes list_lru memory utilization when lockdep is enabled. * tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (215 commits) cma: enforce non-zero pageblock_order during cma_init_reserved_mem() mm/kfence: add a new kunit test test_use_after_free_read_nofault() zram: fix NULL pointer in comp_algorithm_show() memcg/hugetlb: add hugeTLB counters to memcg vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount zram: ZRAM_DEF_COMP should depend on ZRAM MAINTAINERS/MEMORY MANAGEMENT: add document files for mm Docs/mm/damon: recommend academic papers to read and/or cite mm: define general function pXd_init() kmemleak: iommu/iova: fix transient kmemleak false positive mm/list_lru: simplify the list_lru walk callback function mm/list_lru: split the lock to per-cgroup scope mm/list_lru: simplify reparenting and initial allocation mm/list_lru: code clean up for reparenting mm/list_lru: don't export list_lru_add mm/list_lru: don't pass unnecessary key parameters kasan: add kunit tests for kmalloc_track_caller, kmalloc_node_track_caller kasan: change kasan_atomics kunit test as KUNIT_CASE_SLOW kasan: use EXPORT_SYMBOL_IF_KUNIT to export symbols ...