summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2023-11-28powerpc/powernv: Fix fortify source warnings in opal-prd.cMichael Ellerman1-5/+12
commit feea65a338e52297b68ceb688eaf0ffc50310a83 upstream. As reported by Mahesh & Aneesh, opal_prd_msg_notifier() triggers a FORTIFY_SOURCE warning: memcpy: detected field-spanning write (size 32) of single field "&item->msg" at arch/powerpc/platforms/powernv/opal-prd.c:355 (size 4) WARNING: CPU: 9 PID: 660 at arch/powerpc/platforms/powernv/opal-prd.c:355 opal_prd_msg_notifier+0x174/0x188 [opal_prd] NIP opal_prd_msg_notifier+0x174/0x188 [opal_prd] LR opal_prd_msg_notifier+0x170/0x188 [opal_prd] Call Trace: opal_prd_msg_notifier+0x170/0x188 [opal_prd] (unreliable) notifier_call_chain+0xc0/0x1b0 atomic_notifier_call_chain+0x2c/0x40 opal_message_notify+0xf4/0x2c0 This happens because the copy is targeting item->msg, which is only 4 bytes in size, even though the enclosing item was allocated with extra space following the msg. To fix the warning define struct opal_prd_msg with a union of the header and a flex array, and have the memcpy target the flex array. Reported-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Reported-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Tested-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230821142820.497107-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28riscv: kprobes: allow writing to x0Nam Cao1-1/+1
commit 8cb22bec142624d21bc85ff96b7bad10b6220e6a upstream. Instructions can write to x0, so we should simulate these instructions normally. Currently, the kernel hangs if an instruction who writes to x0 is simulated. Fixes: c22b0bcb1dd0 ("riscv: Add kprobes supported") Cc: stable@vger.kernel.org Signed-off-by: Nam Cao <namcaov@gmail.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Acked-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230829182500.61875-1-namcaov@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28riscv: correct pt_level name via pgtable_l5/4_enabledSong Shuai1-0/+3
commit e59e5e2754bf983fc58ad18f99b5eec01f1a0745 upstream. The pt_level uses CONFIG_PGTABLE_LEVELS to display page table names. But if page mode is downgraded from kernel cmdline or restricted by the hardware in 64BIT, it will give a wrong name. Like, using no4lvl for sv39, ptdump named the 1G-mapping as "PUD" that should be "PGD": 0xffffffd840000000-0xffffffd900000000 0x00000000c0000000 3G PUD D A G . . W R V So select "P4D/PUD" or "PGD" via pgtable_l5/4_enabled to correct it. Fixes: e8a62cc26ddf ("riscv: Implement sv48 support") Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Song Shuai <suagrfillet@gmail.com> Link: https://lore.kernel.org/r/20230712115740.943324-1-suagrfillet@gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230830044129.11481-3-palmer@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28riscv: mm: Update the comment of CONFIG_PAGE_OFFSETSong Shuai1-2/+2
commit 559fe94a449cba5b50a7cffea60474b385598c00 upstream. Since the commit 011f09d12052 set sv57 as default for CONFIG_64BIT, the comment of CONFIG_PAGE_OFFSET should be updated too. Fixes: 011f09d12052 ("riscv: mm: Set sv57 on defaultly") Signed-off-by: Song Shuai <suagrfillet@gmail.com> Link: https://lore.kernel.org/r/20230809031023.3575407-1-songshuaishuai@tinylab.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28riscv: put interrupt entries into .irqentry.textNam Cao1-0/+2
commit 87615e95f6f9ccd36d4a3905a2d87f91967ea9d2 upstream. The interrupt entries are expected to be in the .irqentry.text section. For example, for kprobes to work properly, exception code cannot be probed; this is ensured by blacklisting addresses in the .irqentry.text section. Fixes: 7db91e57a0ac ("RISC-V: Task implementation") Signed-off-by: Nam Cao <namcaov@gmail.com> Link: https://lore.kernel.org/r/20230821145708.21270-1-namcaov@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28riscv: Using TOOLCHAIN_HAS_ZIHINTPAUSE marco replace zihintpauseMinda Chen1-1/+1
commit dd16ac404a685cce07e67261a94c6225d90ea7ba upstream. Actually it is a part of Conor's commit aae538cd03bc ("riscv: fix detection of toolchain Zihintpause support"). It is looks like a merge issue. Samuel's commit 0b1d60d6dd9e ("riscv: Fix build with CONFIG_CC_OPTIMIZE_FOR_SIZE=y") do not base on Conor's commit and revert to __riscv_zihintpause. So this patch can fix it. Signed-off-by: Minda Chen <minda.chen@starfivetech.com> Fixes: 3c349eacc559 ("Merge patch "riscv: Fix build with CONFIG_CC_OPTIMIZE_FOR_SIZE=y"") Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230802064215.31111-1-minda.chen@starfivetech.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28LoongArch: Mark __percpu functions as always inlineNathan Chancellor1-5/+5
commit 71945968d8b128c955204baa33ec03bdd91bdc26 upstream. A recent change to the optimization pipeline in LLVM reveals some fragility around the inlining of LoongArch's __percpu functions, which manifests as a BUILD_BUG() failure: In file included from kernel/sched/build_policy.c:17: In file included from include/linux/sched/cputime.h:5: In file included from include/linux/sched/signal.h:5: In file included from include/linux/rculist.h:11: In file included from include/linux/rcupdate.h:26: In file included from include/linux/irqflags.h:18: arch/loongarch/include/asm/percpu.h:97:3: error: call to '__compiletime_assert_51' declared with 'error' attribute: BUILD_BUG failed 97 | BUILD_BUG(); | ^ include/linux/build_bug.h:59:21: note: expanded from macro 'BUILD_BUG' 59 | #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed") | ^ include/linux/build_bug.h:39:37: note: expanded from macro 'BUILD_BUG_ON_MSG' 39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) | ^ include/linux/compiler_types.h:425:2: note: expanded from macro 'compiletime_assert' 425 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) | ^ include/linux/compiler_types.h:413:2: note: expanded from macro '_compiletime_assert' 413 | __compiletime_assert(condition, msg, prefix, suffix) | ^ include/linux/compiler_types.h:406:4: note: expanded from macro '__compiletime_assert' 406 | prefix ## suffix(); \ | ^ <scratch space>:86:1: note: expanded from here 86 | __compiletime_assert_51 | ^ 1 error generated. If these functions are not inlined (which the compiler is free to do even with functions marked with the standard 'inline' keyword), the BUILD_BUG() in the default case cannot be eliminated since the compiler cannot prove it is never used, resulting in a build failure due to the error attribute. Mark these functions as __always_inline to guarantee inlining so that the BUILD_BUG() only triggers when the default case genuinely cannot be eliminated due to an unexpected size. Cc: <stable@vger.kernel.org> Closes: https://github.com/ClangBuiltLinux/linux/issues/1955 Fixes: 46859ac8af52 ("LoongArch: Add multi-processor (SMP) support") Link: https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a566eebb16e Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28parisc/pgtable: Do not drop upper 5 address bits of physical addressHelge Deller1-4/+3
commit 166b0110d1ee53290bd11618df6e3991c117495a upstream. When calculating the pfn for the iitlbt/idtlbt instruction, do not drop the upper 5 address bits. This doesn't seem to have an effect on physical hardware which uses less physical address bits, but in qemu the missing bits are visible. Signed-off-by: Helge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28parisc: Prevent booting 64-bit kernels on PA1.x machinesHelge Deller1-3/+2
commit a406b8b424fa01f244c1aab02ba186258448c36b upstream. Bail out early with error message when trying to boot a 64-bit kernel on 32-bit machines. This fixes the previous commit to include the check for true 64-bit kernels as well. Signed-off-by: Helge Deller <deller@gmx.de> Fixes: 591d2108f3abc ("parisc: Add runtime check to prevent PA2.0 kernels on PA1.x machines") Cc: <stable@vger.kernel.org> # v6.0+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28s390/cmma: fix detection of DAT pagesHeiko Carstens1-3/+3
commit 44d93045247661acbd50b1629e62f415f2747577 upstream. If the cmma no-dat feature is available the kernel page tables are walked to identify and mark all pages which are used for address translation (all region, segment, and page tables). In a subsequent loop all other pages are marked as "no-dat" pages with the ESSA instruction. This information is visible to the hypervisor, so that the hypervisor can optimize purging of guest TLB entries. The initial loop however is incorrect: only the first three of the four pages which belong to segment and region tables will be marked as being used for DAT. The last page is incorrectly marked as no-dat. This can result in incorrect guest TLB flushes. Fix this by simply marking all four pages. Cc: <stable@vger.kernel.org> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28s390/mm: add missing arch_set_page_dat() call to vmem_crst_alloc()Heiko Carstens1-2/+6
commit 09cda0a400519b1541591c506e54c9c48e3101bf upstream. If the cmma no-dat feature is available all pages that are not used for dynamic address translation are marked as "no-dat" with the ESSA instruction. This information is visible to the hypervisor, so that the hypervisor can optimize purging of guest TLB entries. This also means that pages which are used for dynamic address translation must not be marked as "no-dat", since the hypervisor may then incorrectly not purge guest TLB entries. Region and segment tables allocated via vmem_crst_alloc() are incorrectly marked as "no-dat", as soon as slab_is_available() returns true. Such tables are allocated e.g. when kernel page tables are split, memory is hotplugged, or a DCSS segment is loaded. Fix this by adding the missing arch_set_page_dat() call. Cc: <stable@vger.kernel.org> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28arm64: dts: qcom: ipq6018: Fix tcsr_mutex register sizeVignesh Viswanathan1-1/+1
commit 72fc3d58b87b0d622039c6299b89024fbb7b420f upstream. IPQ6018's TCSR Mutex HW lock register has 32 locks of size 4KB each. Total size of the TCSR Mutex registers is 128KB. Fix size of the tcsr_mutex hwlock register to 0x20000. Changes in v2: - Drop change to remove qcom,ipq6018-tcsr-mutex compatible string - Added Fixes and stable tags Cc: stable@vger.kernel.org Fixes: 5bf635621245 ("arm64: dts: ipq6018: Add a few device nodes") Signed-off-by: Vignesh Viswanathan <quic_viswanat@quicinc.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Link: https://lore.kernel.org/r/20230905095535.1263113-2-quic_viswanat@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28arm64: dts: qcom: ipq9574: Fix hwlock index for SMEMVignesh Viswanathan1-1/+1
commit 5fe8508e2bc8eb4208b0434b6c1ca306c1519ade upstream. SMEM uses lock index 3 of the TCSR Mutex hwlock for allocations in SMEM region shared by the Host and FW. Fix the SMEM hwlock index to 3 for IPQ9574. Cc: stable@vger.kernel.org Fixes: 46384ac7a618 ("arm64: dts: qcom: ipq9574: Add SMEM support") Signed-off-by: Vignesh Viswanathan <quic_viswanat@quicinc.com> Acked-by: Konrad Dybcio <konrad.dybcio@linaro.org> Link: https://lore.kernel.org/r/20230904172516.479866-5-quic_viswanat@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28arm64: dts: qcom: ipq8074: Fix hwlock index for SMEMVignesh Viswanathan1-1/+1
commit 8a781d04e580705d36f7db07f5c80e748100b69d upstream. SMEM uses lock index 3 of the TCSR Mutex hwlock for allocations in SMEM region shared by the Host and FW. Fix the SMEM hwlock index to 3 for IPQ8074. Cc: stable@vger.kernel.org Fixes: 42124b947e8e ("arm64: dts: qcom: ipq8074: add SMEM support") Signed-off-by: Vignesh Viswanathan <quic_viswanat@quicinc.com> Acked-by: Konrad Dybcio <konrad.dybcio@linaro.org> Link: https://lore.kernel.org/r/20230904172516.479866-4-quic_viswanat@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28arm64: dts: qcom: ipq5332: Fix hwlock index for SMEMVignesh Viswanathan1-1/+1
commit d08afd80158399a081b478a19902364e3dd0f84c upstream. SMEM uses lock index 3 of the TCSR Mutex hwlock for allocations in SMEM region shared by the Host and FW. Fix the SMEM hwlock index to 3 for IPQ5332. Cc: stable@vger.kernel.org Fixes: d56dd7f935e1 ("arm64: dts: qcom: ipq5332: add SMEM support") Signed-off-by: Vignesh Viswanathan <quic_viswanat@quicinc.com> Acked-by: Konrad Dybcio <konrad.dybcio@linaro.org> Link: https://lore.kernel.org/r/20230904172516.479866-2-quic_viswanat@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28arm64: dts: qcom: ipq6018: Fix hwlock index for SMEMVignesh Viswanathan1-1/+1
commit 95d97b111e1e184b0c8656137033ed64f2cf21e4 upstream. SMEM uses lock index 3 of the TCSR Mutex hwlock for allocations in SMEM region shared by the Host and FW. Fix the SMEM hwlock index to 3 for IPQ6018. Cc: stable@vger.kernel.org Fixes: 5bf635621245 ("arm64: dts: ipq6018: Add a few device nodes") Signed-off-by: Vignesh Viswanathan <quic_viswanat@quicinc.com> Acked-by: Konrad Dybcio <konrad.dybcio@linaro.org> Link: https://lore.kernel.org/r/20230904172516.479866-3-quic_viswanat@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28parisc/pdc: Add width field to struct pdc_modelHelge Deller1-0/+1
commit 6240553b52c475d9fc9674de0521b77e692f3764 upstream. PDC2.0 specifies the additional PSW-bit field. Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28arm64: module: Fix PLT counting when CONFIG_RANDOMIZE_BASE=nMaria Yu1-6/+0
commit d35686444fc80950c731e33a2f6ad4a55822be9b upstream. The counting of module PLTs has been broken when CONFIG_RANDOMIZE_BASE=n since commit: 3e35d303ab7d22c4 ("arm64: module: rework module VA range selection") Prior to that commit, when CONFIG_RANDOMIZE_BASE=n, the kernel image and all modules were placed within a 128M region, and no PLTs were necessary for B or BL. Hence count_plts() and partition_branch_plt_relas() skipped handling B and BL when CONFIG_RANDOMIZE_BASE=n. After that commit, modules can be placed anywhere within a 2G window regardless of CONFIG_RANDOMIZE_BASE, and hence PLTs may be necessary for B and BL even when CONFIG_RANDOMIZE_BASE=n. Unfortunately that commit failed to update count_plts() and partition_branch_plt_relas() accordingly. Due to this, module_emit_plt_entry() may fail if an insufficient number of PLT entries have been reserved, resulting in modules failing to load with -ENOEXEC. Fix this by counting PLTs regardless of CONFIG_RANDOMIZE_BASE in count_plts() and partition_branch_plt_relas(). Fixes: 3e35d303ab7d ("arm64: module: rework module VA range selection") Signed-off-by: Maria Yu <quic_aiquny@quicinc.com> Cc: <stable@vger.kernel.org> # 6.5.x Acked-by: Ard Biesheuvel <ardb@kernel.org> Fixes: 3e35d303ab7d ("arm64: module: rework module VA range selection") Reviewed-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20231024010954.6768-1-quic_aiquny@quicinc.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28arm64: Restrict CPU_BIG_ENDIAN to GNU as or LLVM IAS 15.x or newerNathan Chancellor1-0/+2
commit 146a15b873353f8ac28dc281c139ff611a3c4848 upstream. Prior to LLVM 15.0.0, LLVM's integrated assembler would incorrectly byte-swap NOP when compiling for big-endian, and the resulting series of bytes happened to match the encoding of FNMADD S21, S30, S0, S0. This went unnoticed until commit: 34f66c4c4d5518c1 ("arm64: Use a positive cpucap for FP/SIMD") Prior to that commit, the kernel would always enable the use of FPSIMD early in boot when __cpu_setup() initialized CPACR_EL1, and so usage of FNMADD within the kernel was not detected, but could result in the corruption of user or kernel FPSIMD state. After that commit, the instructions happen to trap during boot prior to FPSIMD being detected and enabled, e.g. | Unhandled 64-bit el1h sync exception on CPU0, ESR 0x000000001fe00000 -- ASIMD | CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc3-00013-g34f66c4c4d55 #1 | Hardware name: linux,dummy-virt (DT) | pstate: 400000c9 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : __pi_strcmp+0x1c/0x150 | lr : populate_properties+0xe4/0x254 | sp : ffffd014173d3ad0 | x29: ffffd014173d3af0 x28: fffffbfffddffcb8 x27: 0000000000000000 | x26: 0000000000000058 x25: fffffbfffddfe054 x24: 0000000000000008 | x23: fffffbfffddfe000 x22: fffffbfffddfe000 x21: fffffbfffddfe044 | x20: ffffd014173d3b70 x19: 0000000000000001 x18: 0000000000000005 | x17: 0000000000000010 x16: 0000000000000000 x15: 00000000413e7000 | x14: 0000000000000000 x13: 0000000000001bcc x12: 0000000000000000 | x11: 00000000d00dfeed x10: ffffd414193f2cd0 x9 : 0000000000000000 | x8 : 0101010101010101 x7 : ffffffffffffffc0 x6 : 0000000000000000 | x5 : 0000000000000000 x4 : 0101010101010101 x3 : 000000000000002a | x2 : 0000000000000001 x1 : ffffd014171f2988 x0 : fffffbfffddffcb8 | Kernel panic - not syncing: Unhandled exception | CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc3-00013-g34f66c4c4d55 #1 | Hardware name: linux,dummy-virt (DT) | Call trace: | dump_backtrace+0xec/0x108 | show_stack+0x18/0x2c | dump_stack_lvl+0x50/0x68 | dump_stack+0x18/0x24 | panic+0x13c/0x340 | el1t_64_irq_handler+0x0/0x1c | el1_abort+0x0/0x5c | el1h_64_sync+0x64/0x68 | __pi_strcmp+0x1c/0x150 | unflatten_dt_nodes+0x1e8/0x2d8 | __unflatten_device_tree+0x5c/0x15c | unflatten_device_tree+0x38/0x50 | setup_arch+0x164/0x1e0 | start_kernel+0x64/0x38c | __primary_switched+0xbc/0xc4 Restrict CONFIG_CPU_BIG_ENDIAN to a known good assembler, which is either GNU as or LLVM's IAS 15.0.0 and newer, which contains the linked commit. Closes: https://github.com/ClangBuiltLinux/linux/issues/1948 Link: https://github.com/llvm/llvm-project/commit/1379b150991f70a5782e9a143c2ba5308da1161c Signed-off-by: Nathan Chancellor <nathan@kernel.org> Cc: stable@vger.kernel.org Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20231025-disable-arm64-be-ias-b4-llvm-15-v1-1-b25263ed8b23@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28parisc: Add nop instructions after TLB insertsJohn David Anglin1-29/+52
commit ad4aa06e1d92b06ed56c7240252927bd60632efe upstream. An excerpt from the PA8800 ERS states: * The PA8800 violates the seven instruction pipeline rule when performing TLB inserts or PxTLBE instructions with the PSW C bit on. The instruction will take effect by the 12th instruction after the insert or purge. I believe we have a problem with handling TLB misses. We don't fill the pipeline following TLB inserts. As a result, we likely fault again after returning from the interruption. The above statement indicates that we need at least seven instructions after the insert on pre PA8800 processors and we need 12 instructions on PA8800/PA8900 processors. Here we add macros and code to provide the required number instructions after a TLB insert. Signed-off-by: John David Anglin <dave.anglin@bell.net> Suggested-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28KVM: x86: Fix lapic timer interrupt lost after loading a snapshot.Haitao Shan4-2/+8
commit 9cfec6d097c607e36199cf0cfbb8cf5acbd8e9b2 upstream. When running android emulator (which is based on QEMU 2.12) on certain Intel hosts with kernel version 6.3-rc1 or above, guest will freeze after loading a snapshot. This is almost 100% reproducible. By default, the android emulator will use snapshot to speed up the next launching of the same android guest. So this breaks the android emulator badly. I tested QEMU 8.0.4 from Debian 12 with an Ubuntu 22.04 guest by running command "loadvm" after "savevm". The same issue is observed. At the same time, none of our AMD platforms is impacted. More experiments show that loading the KVM module with "enable_apicv=false" can workaround it. The issue started to show up after commit 8e6ed96cdd50 ("KVM: x86: fire timer when it is migrated and expired, and in oneshot mode"). However, as is pointed out by Sean Christopherson, it is introduced by commit 967235d32032 ("KVM: vmx: clear pending interrupts on KVM_SET_LAPIC"). commit 8e6ed96cdd50 ("KVM: x86: fire timer when it is migrated and expired, and in oneshot mode") just makes it easier to hit the issue. Having both commits, the oneshot lapic timer gets fired immediately inside the KVM_SET_LAPIC call when loading the snapshot. On Intel platforms with APIC virtualization and posted interrupt processing, this eventually leads to setting the corresponding PIR bit. However, the whole PIR bits get cleared later in the same KVM_SET_LAPIC call by apicv_post_state_restore. This leads to timer interrupt lost. The fix is to move vmx_apicv_post_state_restore to the beginning of the KVM_SET_LAPIC call and rename to vmx_apicv_pre_state_restore. What vmx_apicv_post_state_restore does is actually clearing any former apicv state and this behavior is more suitable to carry out in the beginning. Fixes: 967235d32032 ("KVM: vmx: clear pending interrupts on KVM_SET_LAPIC") Cc: stable@vger.kernel.org Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Haitao Shan <hshan@google.com> Link: https://lore.kernel.org/r/20230913000215.478387-1-hshan@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28KVM: x86: Clear bit12 of ICR after APIC-write VM-exitTao Su1-13/+13
commit 629d3698f6958ee6f8131ea324af794f973b12ac upstream. When IPI virtualization is enabled, a WARN is triggered if bit12 of ICR MSR is set after APIC-write VM-exit. The reason is kvm_apic_send_ipi() thinks the APIC_ICR_BUSY bit should be cleared because KVM has no delay, but kvm_apic_write_nodecode() doesn't clear the APIC_ICR_BUSY bit. Under the x2APIC section, regarding ICR, the SDM says: It remains readable only to aid in debugging; however, software should not assume the value returned by reading the ICR is the last written value. I.e. the guest is allowed to set bit 12. However, the SDM also gives KVM free reign to do whatever it wants with the bit, so long as KVM's behavior doesn't confuse userspace or break KVM's ABI. Clear bit 12 so that it reads back as '0'. This approach is safer than "do nothing" and is consistent with the case where IPI virtualization is disabled or not supported, i.e., handle_fastpath_set_x2apic_icr_irqoff() -> kvm_x2apic_icr_write() Opportunistically replace the TODO with a comment calling out that eating the write is likely faster than a conditional branch around the busy bit. Link: https://lore.kernel.org/all/ZPj6iF0Q7iynn62p@google.com/ Fixes: 5413bcba7ed5 ("KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode") Cc: stable@vger.kernel.org Signed-off-by: Tao Su <tao1.su@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20230914055504.151365-1-tao1.su@linux.intel.com [sean: tweak changelog, replace TODO with comment, drop local "val"] Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28KVM: x86: Ignore MSR_AMD64_TW_CFG accessMaciej S. Szmigiero2-0/+3
commit 2770d4722036d6bd24bcb78e9cd7f6e572077d03 upstream. Hyper-V enabled Windows Server 2022 KVM VM cannot be started on Zen1 Ryzen since it crashes at boot with SYSTEM_THREAD_EXCEPTION_NOT_HANDLED + STATUS_PRIVILEGED_INSTRUCTION (in other words, because of an unexpected #GP in the guest kernel). This is because Windows tries to set bit 8 in MSR_AMD64_TW_CFG and can't handle receiving a #GP when doing so. Give this MSR the same treatment that commit 2e32b7190641 ("x86, kvm: Add MSR_AMD64_BU_CFG2 to the list of ignored MSRs") gave MSR_AMD64_BU_CFG2 under justification that this MSR is baremetal-relevant only. Although apparently it was then needed for Linux guests, not Windows as in this case. With this change, the aforementioned guest setup is able to finish booting successfully. This issue can be reproduced either on a Summit Ridge Ryzen (with just "-cpu host") or on a Naples EPYC (with "-cpu host,stepping=1" since EPYC is ordinarily stepping 2). Alternatively, userspace could solve the problem by using MSR filters, but forcing every userspace to define a filter isn't very friendly and doesn't add much, if any, value. The only potential hiccup is if one of these "baremetal-only" MSRs ever requires actual emulation and/or has F/M/S specific behavior. But if that happens, then KVM can still punt *that* handling to userspace since userspace MSR filters "win" over KVM's default handling. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1ce85d9c7c9e9632393816cf19c902e0a3f411f1.1697731406.git.maciej.szmigiero@oracle.com [sean: call out MSR filtering alternative] Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28KVM: x86: hyper-v: Don't auto-enable stimer on write from user-spaceNicolas Saenz Julienne1-4/+6
commit d6800af51c76b6dae20e6023bbdc9b3da3ab5121 upstream. Don't apply the stimer's counter side effects when modifying its value from user-space, as this may trigger spurious interrupts. For example: - The stimer is configured in auto-enable mode. - The stimer's count is set and the timer enabled. - The stimer expires, an interrupt is injected. - The VM is live migrated. - The stimer config and count are deserialized, auto-enable is ON, the stimer is re-enabled. - The stimer expires right away, and injects an unwarranted interrupt. Cc: stable@vger.kernel.org Fixes: 1f4b34f825e8 ("kvm/x86: Hyper-V SynIC timers") Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20231017155101.40677-1-nsaenz@amazon.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28x86/cpu/hygon: Fix the CPU topology evaluation for realPu Wen1-2/+6
commit ee545b94d39a00c93dc98b1dbcbcf731d2eadeb4 upstream. Hygon processors with a model ID > 3 have CPUID leaf 0xB correctly populated and don't need the fixed package ID shift workaround. The fixup is also incorrect when running in a guest. Fixes: e0ceeae708ce ("x86/CPU/hygon: Fix phys_proc_id calculation logic for multi-die processors") Signed-off-by: Pu Wen <puwen@hygon.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/tencent_594804A808BD93A4EBF50A994F228E3A7F07@qq.com Link: https://lore.kernel.org/r/20230814085112.089607918@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28x86/apic/msi: Fix misconfigured non-maskable MSI quirkKoichiro Den1-5/+3
commit b56ebe7c896dc78b5865ec2c4b1dae3c93537517 upstream. commit ef8dd01538ea ("genirq/msi: Make interrupt allocation less convoluted"), reworked the code so that the x86 specific quirk for affinity setting of non-maskable PCI/MSI interrupts is not longer activated if necessary. This could be solved by restoring the original logic in the core MSI code, but after a deeper analysis it turned out that the quirk flag is not required at all. The quirk is only required when the PCI/MSI device cannot mask the MSI interrupts, which in turn also prevents reservation mode from being enabled for the affected interrupt. This allows ot remove the NOMASK quirk bit completely as msi_set_affinity() can instead check whether reservation mode is enabled for the interrupt, which gives exactly the same answer. Even in the momentary non-existing case that the reservation mode would be not set for a maskable MSI interrupt this would not cause any harm as it just would cause msi_set_affinity() to go needlessly through the functionaly equivalent slow path, which works perfectly fine with maskable interrupts as well. Rework msi_set_affinity() to query the reservation mode and remove all NOMASK quirk logic from the core code. [ tglx: Massaged changelog ] Fixes: ef8dd01538ea ("genirq/msi: Make interrupt allocation less convoluted") Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Koichiro Den <den@valinux.co.jp> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231026032036.2462428-1-den@valinux.co.jp Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28x86/PCI: Avoid PME from D3hot/D3cold for AMD Rembrandt and Phoenix USB4Mario Limonciello1-0/+59
commit 7d08f21f8c6307cb05cabb8d86e90ff6ccba57e9 upstream. Iain reports that USB devices can't be used to wake a Lenovo Z13 from suspend. This occurs because on some AMD platforms, even though the Root Ports advertise PME_Support for D3hot and D3cold, wakeup events from devices on a USB4 controller don't result in wakeup interrupts from the Root Port when amd-pmc has put the platform in a hardware sleep state. If amd-pmc will be involved in the suspend, remove D3hot and D3cold from the PME_Support mask of Root Ports above USB4 controllers so we avoid those states if we need wakeups. Restore D3 support at resume so that it can be used by runtime suspend. This affects both AMD Rembrandt and Phoenix SoCs. "pm_suspend_target_state == PM_SUSPEND_ON" means we're doing runtime suspend, and amd-pmc will not be involved. In that case PMEs work as advertised in D3hot/D3cold, so we don't need to do anything. Note that amd-pmc is technically optional, and there's no need for this quirk if it's not present, but we assume it's always present because power consumption is so high without it. Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend") Link: https://lore.kernel.org/r/20231004144959.158840-1-mario.limonciello@amd.com Reported-by: Iain Lane <iain@orangesquash.org.uk> Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-from-suspend-with-external-USB-keyboard/m-p/5217121 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> [bhelgaas: commit log, move to arch/x86/pci/fixup.c, add #includes] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28crypto: x86/sha - load modules based on CPU featuresRoxana Nicolescu2-0/+24
commit 1c43c0f1f84aa59dfc98ce66f0a67b2922aa7f9d upstream. x86 optimized crypto modules are built as modules rather than build-in and they are not loaded when the crypto API is initialized, resulting in the generic builtin module (sha1-generic) being used instead. It was discovered when creating a sha1/sha256 checksum of a 2Gb file by using kcapi-tools because it would take significantly longer than creating a sha512 checksum of the same file. trace-cmd showed that for sha1/256 the generic module was used, whereas for sha512 the optimized module was used instead. Add module aliases() for these x86 optimized crypto modules based on CPU feature bits so udev gets a chance to load them later in the boot process. This resulted in ~3x decrease in the real-time execution of kcapi-dsg. Fix is inspired from commit aa031b8f702e ("crypto: x86/sha512 - load based on CPU features") where a similar fix was done for sha512. Cc: stable@vger.kernel.org # 5.15+ Suggested-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com> Suggested-by: Julian Andres Klode <julian.klode@canonical.com> Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28powerpc/perf: Fix disabling BHRB and instruction samplingNicholas Piggin1-3/+2
commit ea142e590aec55ba40c5affb4d49e68c713c63dc upstream. When the PMU is disabled, MMCRA is not updated to disable BHRB and instruction sampling. This can lead to those features remaining enabled, which can slow down a real or emulated CPU. Fixes: 1cade527f6e9 ("powerpc/perf: BHRB control to disable BHRB logic when not used") Cc: stable@vger.kernel.org # v5.9+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231018153423.298373-1-npiggin@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-28riscv: provide riscv-specific is_trap_insn()Nam Cao1-0/+6
[ Upstream commit b701f9e726f0a30a94ea6af596b74c1f07b95b6b ] uprobes expects is_trap_insn() to return true for any trap instructions, not just the one used for installing uprobe. The current default implementation only returns true for 16-bit c.ebreak if C extension is enabled. This can confuse uprobes if a 32-bit ebreak generates a trap exception from userspace: uprobes asks is_trap_insn() who says there is no trap, so uprobes assume a probe was there before but has been removed, and return to the trap instruction. This causes an infinite loop of entering and exiting trap handler. Instead of using the default implementation, implement this function speficially for riscv with checks for both ebreak and c.ebreak. Fixes: 74784081aac8 ("riscv: Add uprobes supported") Signed-off-by: Nam Cao <namcaov@gmail.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20230829083614.117748-1-namcaov@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28RISC-V: hwprobe: Fix vDSO SIGSEGVAndrew Jones2-1/+6
[ Upstream commit e1c05b3bf80f829ced464bdca90f1dfa96e8d251 ] A hwprobe pair key is signed, but the hwprobe vDSO function was only checking that the upper bound was valid. In order to help avoid this type of problem in the future, and in anticipation of this check becoming more complicated with sparse keys, introduce and use a "key is valid" predicate function for the check. Fixes: aa5af0aa90ba ("RISC-V: Add hwprobe vDSO function and data") Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Evan Green <evan@rivosinc.com> Link: https://lore.kernel.org/r/20231010165101.14942-2-ajones@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28riscv: VMAP_STACK overflow detection thread-safeDeepak Gupta6-99/+34
[ Upstream commit be97d0db5f44c0674480cb79ac6f5b0529b84c76 ] commit 31da94c25aea ("riscv: add VMAP_STACK overflow detection") added support for CONFIG_VMAP_STACK. If overflow is detected, CPU switches to `shadow_stack` temporarily before switching finally to per-cpu `overflow_stack`. If two CPUs/harts are racing and end up in over flowing kernel stack, one or both will end up corrupting each other state because `shadow_stack` is not per-cpu. This patch optimizes per-cpu overflow stack switch by directly picking per-cpu `overflow_stack` and gets rid of `shadow_stack`. Following are the changes in this patch - Defines an asm macro to obtain per-cpu symbols in destination register. - In entry.S, when overflow is detected, per-cpu overflow stack is located using per-cpu asm macro. Computing per-cpu symbol requires a temporary register. x31 is saved away into CSR_SCRATCH (CSR_SCRATCH is anyways zero since we're in kernel). Please see Links for additional relevant disccussion and alternative solution. Tested by `echo EXHAUST_STACK > /sys/kernel/debug/provoke-crash/DIRECT` Kernel crash log below Insufficient stack space to handle exception!/debug/provoke-crash/DIRECT Task stack: [0xff20000010a98000..0xff20000010a9c000] Overflow stack: [0xff600001f7d98370..0xff600001f7d99370] CPU: 1 PID: 205 Comm: bash Not tainted 6.1.0-rc2-00001-g328a1f96f7b9 #34 Hardware name: riscv-virtio,qemu (DT) epc : __memset+0x60/0xfc ra : recursive_loop+0x48/0xc6 [lkdtm] epc : ffffffff808de0e4 ra : ffffffff0163a752 sp : ff20000010a97e80 gp : ffffffff815c0330 tp : ff600000820ea280 t0 : ff20000010a97e88 t1 : 000000000000002e t2 : 3233206874706564 s0 : ff20000010a982b0 s1 : 0000000000000012 a0 : ff20000010a97e88 a1 : 0000000000000000 a2 : 0000000000000400 a3 : ff20000010a98288 a4 : 0000000000000000 a5 : 0000000000000000 a6 : fffffffffffe43f0 a7 : 00007fffffffffff s2 : ff20000010a97e88 s3 : ffffffff01644680 s4 : ff20000010a9be90 s5 : ff600000842ba6c0 s6 : 00aaaaaac29e42b0 s7 : 00fffffff0aa3684 s8 : 00aaaaaac2978040 s9 : 0000000000000065 s10: 00ffffff8a7cad10 s11: 00ffffff8a76a4e0 t3 : ffffffff815dbaf4 t4 : ffffffff815dbaf4 t5 : ffffffff815dbab8 t6 : ff20000010a9bb48 status: 0000000200000120 badaddr: ff20000010a97e88 cause: 000000000000000f Kernel panic - not syncing: Kernel stack overflow CPU: 1 PID: 205 Comm: bash Not tainted 6.1.0-rc2-00001-g328a1f96f7b9 #34 Hardware name: riscv-virtio,qemu (DT) Call Trace: [<ffffffff80006754>] dump_backtrace+0x30/0x38 [<ffffffff808de798>] show_stack+0x40/0x4c [<ffffffff808ea2a8>] dump_stack_lvl+0x44/0x5c [<ffffffff808ea2d8>] dump_stack+0x18/0x20 [<ffffffff808dec06>] panic+0x126/0x2fe [<ffffffff800065ea>] walk_stackframe+0x0/0xf0 [<ffffffff0163a752>] recursive_loop+0x48/0xc6 [lkdtm] SMP: stopping secondary CPUs ---[ end Kernel panic - not syncing: Kernel stack overflow ]--- Cc: Guo Ren <guoren@kernel.org> Cc: Jisheng Zhang <jszhang@kernel.org> Link: https://lore.kernel.org/linux-riscv/Y347B0x4VUNOd6V7@xhacker/T/#t Link: https://lore.kernel.org/lkml/20221124094845.1907443-1-debug@rivosinc.com/ Signed-off-by: Deepak Gupta <debug@rivosinc.com> Co-developed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Guo Ren <guoren@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20230927224757.1154247-9-samitolvanen@google.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28ARM: 9320/1: fix stack depot IRQ stack filterVincent Whitchurch1-4/+0
[ Upstream commit b0150014878c32197cfa66e3e2f79e57f66babc0 ] Place IRQ handlers such as gic_handle_irq() in the irqentry section even if FUNCTION_GRAPH_TRACER is not enabled. Without this, the stack depot's filter_irq_stacks() does not correctly filter out IRQ stacks in those configurations, which hampers deduplication and eventually leads to "Stack depot reached limit capacity" splats with KASAN. A similar fix was done for arm64 in commit f6794950f0e5ba37e3bbed ("arm64: set __exception_irq_entry with __irq_entry as a default"). Link: https://lore.kernel.org/r/20230803-arm-irqentry-v1-1-8aad8e260b1c@axis.com Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28arm64: dts: ls208xa: use a pseudo-bus to constrain usb dma sizeLaurentiu Tudor1-19/+27
[ Upstream commit b39d5016456871a88f5cd141914a5043591b46f3 ] Wrap the usb controllers in an intermediate simple-bus and use it to constrain the dma address size of these usb controllers to the 40b that they generate toward the interconnect. This is required because the SoC uses 48b address sizes and this mismatch would lead to smmu context faults [1] because the usb generates 40b addresses while the smmu page tables are populated with 48b wide addresses. [1] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 1 xhci-hcd xhci-hcd.0.auto: hcc params 0x0220f66d hci version 0x100 quirks 0x0000000002000010 xhci-hcd xhci-hcd.0.auto: irq 108, io mem 0x03100000 xhci-hcd xhci-hcd.0.auto: xHCI Host Controller xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 2 xhci-hcd xhci-hcd.0.auto: Host supports USB 3.0 SuperSpeed arm-smmu 5000000.iommu: Unhandled context fault: fsr=0x402, iova=0xffffffb000, fsynr=0x0, cbfrsynra=0xc01, cb=3 Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-28x86/mm: Drop the 4 MB restriction on minimal NUMA node memory sizeMike Rapoport (IBM)2-14/+0
[ Upstream commit a1e2b8b36820d8c91275f207e77e91645b7c6836 ] Qi Zheng reported crashes in a production environment and provided a simplified example as a reproducer: | For example, if we use Qemu to start a two NUMA node kernel, | one of the nodes has 2M memory (less than NODE_MIN_SIZE), | and the other node has 2G, then we will encounter the | following panic: | | BUG: kernel NULL pointer dereference, address: 0000000000000000 | <...> | RIP: 0010:_raw_spin_lock_irqsave+0x22/0x40 | <...> | Call Trace: | <TASK> | deactivate_slab() | bootstrap() | kmem_cache_init() | start_kernel() | secondary_startup_64_no_verify() The crashes happen because of inconsistency between the nodemask that has nodes with less than 4MB as memoryless, and the actual memory fed into the core mm. The commit: 9391a3f9c7f1 ("[PATCH] x86_64: Clear more state when ignoring empty node in SRAT parsing") ... that introduced minimal size of a NUMA node does not explain why a node size cannot be less than 4MB and what boot failures this restriction might fix. Fixes have been submitted to the core MM code to tighten up the memory topologies it accepts and to not crash on weird input: mm: page_alloc: skip memoryless nodes entirely mm: memory_hotplug: drop memoryless node from fallback lists Andrew has accepted them into the -mm tree, but there are no stable SHA1's yet. This patch drops the limitation for minimal node size on x86: - which works around the crash without the fixes to the core MM. - makes x86 topologies less weird, - removes an arbitrary and undocumented limitation on NUMA topologies. [ mingo: Improved changelog clarity. ] Reported-by: Qi Zheng <zhengqi.arch@bytedance.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Rik van Riel <riel@surriel.com> Link: https://lore.kernel.org/r/ZS+2qqjEO5/867br@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20bpf, x86: initialize the variable "first_off" in save_args()Menglong Dong1-1/+1
commit 492e797fdab25f2d8eb1b6bb3236f4aac474f878 upstream. As Dan Carpenter reported, the variable "first_off" which is passed to clean_stack_garbage() in save_args() can be uninitialized, which can cause runtime warnings with KMEMsan. Therefore, init it with 0. Fixes: 473e3150e30a ("bpf, x86: allow function arguments up to 12 for TRACING") Cc: Hao Peng <flyingpeng@tencent.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/bpf/09784025-a812-493f-9829-5e26c8691e07@moroto.mountain/ Signed-off-by: Menglong Dong <imagedong@tencent.com> Link: https://lore.kernel.org/r/20230719110330.2007949-1-imagedong@tencent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-20x86/amd_nb: Use Family 19h Models 60h-7Fh Function 4 IDsYazen Ghannam1-0/+3
commit 2a565258b3f4bbdc7a3c09cd02082cb286a7bffc upstream. Three PCI IDs for DF Function 4 were defined but not used. Add them to the "link" list. Fixes: f8faf3496633 ("x86/amd_nb: Add AMD PCI IDs for SMN communication") Fixes: 23a5b8bb022c ("x86/amd_nb: Add PCI ID for family 19h model 78h") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230803150430.3542854-1-yazen.ghannam@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-20arm64/arm: arm_pmuv3: perf: Don't truncate 64-bit registersIlkka Koskinen2-45/+28
[ Upstream commit 403edfa436286b21f5ffe6856ae5b36396e8966c ] The driver used to truncate several 64-bit registers such as PMCEID[n] registers used to describe whether architectural and microarchitectural events in range 0x4000-0x401f exist. Due to discarding the bits, the driver made the events invisible, even if they existed. Moreover, PMCCFILTR and PMCR registers have additional bits in the upper 32 bits. This patch makes them available although they aren't currently used. Finally, functions handling PMXEVCNTR and PMXEVTYPER registers are removed as they not being used at all. Fixes: df29ddf4f04b ("arm64: perf: Abstract system register accesses away") Reported-by: Carl Worth <carl@os.amperecomputing.com> Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Acked-by: Will Deacon <will@kernel.org> Closes: https://lore.kernel.org/.. Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20231102183012.1251410-1-ilkka@os.amperecomputing.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20RISC-V: Don't fail in riscv_of_parent_hartid() for disabled HARTsAnup Patel1-5/+6
[ Upstream commit c4676f8dc1e12e68d6511f9ed89707fdad4c962c ] The riscv_of_processor_hartid() used by riscv_of_parent_hartid() fails for HARTs disabled in the DT. This results in the following warning thrown by the RISC-V INTC driver for the E-core on SiFive boards: [ 0.000000] riscv-intc: unable to find hart id for /cpus/cpu@0/interrupt-controller The riscv_of_parent_hartid() is only expected to read the hartid from the DT so we directly call of_get_cpu_hwid() instead of calling riscv_of_processor_hartid(). Fixes: ad635e723e17 ("riscv: cpu: Add 64bit hartid support on RV64") Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20231027154254.355853-2-apatel@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20riscv: boot: Fix creation of loader.binGeert Uytterhoeven1-0/+1
[ Upstream commit 57a4542cb7c9baa1509c3366b57a08d75b212ead ] When flashing loader.bin for K210 using kflash:     [ERROR] This is an ELF file and cannot be programmed to flash directly: arch/riscv/boot/loader.bin Before, loader.bin relied on "OBJCOPYFLAGS := -O binary" in the main RISC-V Makefile to create a boot image with the right format. With this removed, the image is now created in the wrong (ELF) format. Fix this by adding an explicit rule. Fixes: 505b02957e74f0c5 ("riscv: Remove duplicate objcopy flag") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/1086025809583809538dfecaa899892218f44e7e.1698159066.git.geert+renesas@glider.be Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20powerpc/pseries: fix potential memory leak in init_cpu_associativity()Wang Yufen1-1/+3
[ Upstream commit 95f1a128cd728a7257d78e868f1f5a145fc43736 ] If the vcpu_associativity alloc memory successfully but the pcpu_associativity fails to alloc memory, the vcpu_associativity memory leaks. Fixes: d62c8deeb6e6 ("powerpc/pseries: Provide vcpu dispatch statistics") Signed-off-by: Wang Yufen <wangyufen@huawei.com> Reviewed-by: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/1671003983-10794-1-git-send-email-wangyufen@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20powerpc/imc-pmu: Use the correct spinlock initializer.Sebastian Andrzej Siewior1-1/+1
[ Upstream commit 007240d59c11f87ac4f6cfc6a1d116630b6b634c ] The macro __SPIN_LOCK_INITIALIZER() is implementation specific. Users that desire to initialize a spinlock in a struct must use __SPIN_LOCK_UNLOCKED(). Use __SPIN_LOCK_UNLOCKED() for the spinlock_t in imc_global_refc. Fixes: 76d588dddc459 ("powerpc/imc-pmu: Fix use of mutex in IRQs disabled section") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230309134831.Nz12nqsU@linutronix.de Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20powerpc/vas: Limit open window failure messages in log buffferHaren Myneni2-20/+18
[ Upstream commit 73b25505ce043b561028e5571d84dc82aa53c2b4 ] The VAS open window call prints error message and returns -EBUSY after the migration suspend event initiated and until the resume event completed on the destination system. It can cause the log buffer filled with these error messages if the user space issues continuous open window calls. Similar case even for DLPAR CPU remove event when no credits are available until the credits are freed or with the other DLPAR CPU add event. So changes in the patch to use pr_err_ratelimited() instead of pr_err() to display open window failure and not-available credits error messages. Use pr_fmt() and make the corresponding changes to have the consistencein prefix all pr_*() messages (vas-api.c). Fixes: 37e6764895ef ("powerpc/pseries/vas: Add VAS migration handler") Signed-off-by: Haren Myneni <haren@linux.ibm.com> [mpe: Use "vas-api" as the prefix to match the file name.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231019215033.1335251-1-haren@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20powerpc: Hide empty pt_regs at base of the stackMichael Ellerman1-3/+23
[ Upstream commit d45c4b48dafb5820e5cc267ff9a6d7784d13a43c ] A thread started via eg. user_mode_thread() runs in the kernel to begin with and then may later return to userspace. While it's running in the kernel it has a pt_regs at the base of its kernel stack, but that pt_regs is all zeroes. If the thread oopses in that state, it leads to an ugly stack trace with a big block of zero GPRs, as reported by Joel: Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc7-00004-gf7757129e3de-dirty #3 Hardware name: IBM PowerNV (emulated by qemu) POWER9 0x4e1200 opal:v7.0 PowerNV Call Trace: [c0000000036afb00] [c0000000010dd058] dump_stack_lvl+0x6c/0x9c (unreliable) [c0000000036afb30] [c00000000013c524] panic+0x178/0x424 [c0000000036afbd0] [c000000002005100] mount_root_generic+0x250/0x324 [c0000000036afca0] [c0000000020057d0] prepare_namespace+0x2d4/0x344 [c0000000036afd20] [c0000000020049c0] kernel_init_freeable+0x358/0x3ac [c0000000036afdf0] [c0000000000111b0] kernel_init+0x30/0x1a0 [c0000000036afe50] [c00000000000debc] ret_from_kernel_user_thread+0x14/0x1c --- interrupt: 0 at 0x0 NIP: 0000000000000000 LR: 0000000000000000 CTR: 0000000000000000 REGS: c0000000036afe80 TRAP: 0000 Not tainted (6.5.0-rc7-00004-gf7757129e3de-dirty) MSR: 0000000000000000 <> CR: 00000000 XER: 00000000 CFAR: 0000000000000000 IRQMASK: 0 GPR00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 NIP [0000000000000000] 0x0 LR [0000000000000000] 0x0 --- interrupt: 0 The all-zero pt_regs looks ugly and conveys no useful information, other than its presence. So detect that case and just show the presence of the frame by printing the interrupt marker, eg: Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc3-00126-g18e9506562a0-dirty #301 Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries Call Trace: [c000000003aabb00] [c000000001143db8] dump_stack_lvl+0x6c/0x9c (unreliable) [c000000003aabb30] [c00000000014c624] panic+0x178/0x424 [c000000003aabbd0] [c0000000020050fc] mount_root_generic+0x250/0x324 [c000000003aabca0] [c0000000020057cc] prepare_namespace+0x2d4/0x344 [c000000003aabd20] [c0000000020049bc] kernel_init_freeable+0x358/0x3ac [c000000003aabdf0] [c0000000000111b0] kernel_init+0x30/0x1a0 [c000000003aabe50] [c00000000000debc] ret_from_kernel_user_thread+0x14/0x1c --- interrupt: 0 at 0x0 To avoid ever suppressing a valid pt_regs make sure the pt_regs has a zero MSR and TRAP value, and is located at the very base of the stack. Fixes: 6895dfc04741 ("powerpc: copy_thread fill in interrupt frame marker and back chain") Reported-by: Joel Stanley <joel@jms.id.au> Reported-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230824064210.907266-1-mpe@ellerman.id.au Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20powerpc/xive: Fix endian conversion sizeBenjamin Gray1-1/+1
[ Upstream commit ff7a60ab1e065257a0e467c13b519f4debcd7fcf ] Sparse reports a size mismatch in the endian swap. The Opal implementation[1] passes the value as a __be64, and the receiving variable out_qsize is a u64, so the use of be32_to_cpu() appears to be an error. [1]: https://github.com/open-power/skiboot/blob/80e2b1dc73/hw/xive.c#L3854 Fixes: 88ec6b93c8e7 ("powerpc/xive: add OPAL extensions for the XIVE native exploitation support") Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231011053711.93427-2-bgray@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20powerpc/40x: Remove stale PTE_ATOMIC_UPDATES macroChristophe Leroy1-3/+0
[ Upstream commit cc8ee288f484a2a59c01ccd4d8a417d6ed3466e3 ] 40x TLB handlers were reworked by commit 2c74e2586bb9 ("powerpc/40x: Rework 40x PTE access and TLB miss") to not require PTE_ATOMIC_UPDATES anymore. Then commit 4e1df545e2fa ("powerpc/pgtable: Drop PTE_ATOMIC_UPDATES") removed all code related to PTE_ATOMIC_UPDATES. Remove left over PTE_ATOMIC_UPDATES macro. Fixes: 2c74e2586bb9 ("powerpc/40x: Rework 40x PTE access and TLB miss") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/f061db5857fcd748f84a6707aad01754686ce97e.1695659959.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20powerpc: Only define __parse_fpscr() when requiredChristophe Leroy1-0/+2
[ Upstream commit c7e0d9bb9154c6e6b2ac8746faba27b53393f25e ] Clang 17 reports: arch/powerpc/kernel/traps.c:1167:19: error: unused function '__parse_fpscr' [-Werror,-Wunused-function] __parse_fpscr() is called from two sites. First call is guarded by #ifdef CONFIG_PPC_FPU_REGS Second call is guarded by CONFIG_MATH_EMULATION which selects CONFIG_PPC_FPU_REGS. So only define __parse_fpscr() when CONFIG_PPC_FPU_REGS is defined. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202309210327.WkqSd5Bq-lkp@intel.com/ Fixes: b6254ced4da6 ("powerpc/signal: Don't manage floating point regs when no FPU") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/5de2998c57f3983563b27b39228ea9a7229d4110.1695385984.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20sh: bios: Revive earlyprintk supportGeert Uytterhoeven1-0/+11
[ Upstream commit 553f7ac78fbb41b2c93ab9b9d78e42274d27daa9 ] The SuperH BIOS earlyprintk code is protected by CONFIG_EARLY_PRINTK. However, when this protection was added, it was missed that SuperH no longer defines an EARLY_PRINTK config symbol since commit e76fe57447e88916 ("sh: Remove old early serial console code V2"), so BIOS earlyprintk can no longer be used. Fix this by reviving the EARLY_PRINTK config symbol. Fixes: d0380e6c3c0f6edb ("early_printk: consolidate random copies of identical code") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/c40972dfec3dcc6719808d5df388857360262878.1697708489.git.geert+renesas@glider.be Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20ARM: 9323/1: mm: Fix ARCH_LOW_ADDRESS_LIMIT when CONFIG_ZONE_DMAwahrenst1-0/+3
[ Upstream commit 399da29ff5eb3f675c71423bec4cf2208f218576 ] Configuring VMSPLIT_2G + LPAE on Raspberry Pi 4 leads to SWIOTLB buffer allocation beyond platform dma_zone_size of SZ_1G, which results in broken SD card boot. So fix this be setting ARCH_LOW_ADDRESS_LIMIT in CONFIG_ZONE_DMA case. Suggested-by: Russell King <rmk+kernel@armlinux.org.uk> Fixes: e9faf9b0b07a ("ARM: add multi_v7_lpae_defconfig") Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20ARM: 9321/1: memset: cast the constant byte to unsigned charKursad Oney1-0/+1
[ Upstream commit c0e824661f443b8cab3897006c1bbc69fd0e7bc4 ] memset() description in ISO/IEC 9899:1999 (and elsewhere) says: The memset function copies the value of c (converted to an unsigned char) into each of the first n characters of the object pointed to by s. The kernel's arm32 memset does not cast c to unsigned char. This results in the following code to produce erroneous output: char a[128]; memset(a, -128, sizeof(a)); This is because gcc will generally emit the following code before it calls memset() : mov r0, r7 mvn r1, #127 ; 0x7f bl 00000000 <memset> r1 ends up with 0xffffff80 before being used by memset() and the 'a' array will have -128 once in every four bytes while the other bytes will be set incorrectly to -1 like this (printing the first 8 bytes) : test_module: -128 -1 -1 -1 test_module: -1 -1 -1 -128 The change here is to 'and' r1 with 255 before it is used. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Kursad Oney <kursad.oney@broadcom.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>