summaryrefslogtreecommitdiff
path: root/arch/powerpc/lib
AgeCommit message (Collapse)AuthorFilesLines
2025-03-26Merge tag 'crc-for-linus' of ↵Linus Torvalds2-11/+5
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull CRC updates from Eric Biggers: "Another set of improvements to the kernel's CRC (cyclic redundancy check) code: - Rework the CRC64 library functions to be directly optimized, like what I did last cycle for the CRC32 and CRC-T10DIF library functions - Rewrite the x86 PCLMULQDQ-optimized CRC code, and add VPCLMULQDQ support and acceleration for crc64_be and crc64_nvme - Rewrite the riscv Zbc-optimized CRC code, and add acceleration for crc_t10dif, crc64_be, and crc64_nvme - Remove crc_t10dif and crc64_rocksoft from the crypto API, since they are no longer needed there - Rename crc64_rocksoft to crc64_nvme, as the old name was incorrect - Add kunit test cases for crc64_nvme and crc7 - Eliminate redundant functions for calculating the Castagnoli CRC32, settling on just crc32c() - Remove unnecessary prompts from some of the CRC kconfig options - Further optimize the x86 crc32c code" * tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (36 commits) x86/crc: drop the avx10_256 functions and rename avx10_512 to avx512 lib/crc: remove unnecessary prompt for CONFIG_CRC64 lib/crc: remove unnecessary prompt for CONFIG_LIBCRC32C lib/crc: remove unnecessary prompt for CONFIG_CRC8 lib/crc: remove unnecessary prompt for CONFIG_CRC7 lib/crc: remove unnecessary prompt for CONFIG_CRC4 lib/crc7: unexport crc7_be_syndrome_table lib/crc_kunit.c: update comment in crc_benchmark() lib/crc_kunit.c: add test and benchmark for crc7_be() x86/crc32: optimize tail handling for crc32c short inputs riscv/crc64: add Zbc optimized CRC64 functions riscv/crc-t10dif: add Zbc optimized CRC-T10DIF function riscv/crc32: reimplement the CRC32 functions using new template riscv/crc: add "template" for Zbc optimized CRC functions x86/crc: add ANNOTATE_NOENDBR to suppress objtool warnings x86/crc32: improve crc32c_arch() code generation with clang x86/crc64: implement crc64_be and crc64_nvme using new template x86/crc-t10dif: implement crc_t10dif using new template x86/crc32: implement crc32_le using new template x86/crc: add "template" for [V]PCLMULQDQ based CRC functions ...
2025-02-12powerpc/code-patching: Fix KASAN hit by not flagging text patching area as ↵Christophe Leroy1-1/+1
VM_ALLOC Erhard reported the following KASAN hit while booting his PowerMac G4 with a KASAN-enabled kernel 6.13-rc6: BUG: KASAN: vmalloc-out-of-bounds in copy_to_kernel_nofault+0xd8/0x1c8 Write of size 8 at addr f1000000 by task chronyd/1293 CPU: 0 UID: 123 PID: 1293 Comm: chronyd Tainted: G W 6.13.0-rc6-PMacG4 #2 Tainted: [W]=WARN Hardware name: PowerMac3,6 7455 0x80010303 PowerMac Call Trace: [c2437590] [c1631a84] dump_stack_lvl+0x70/0x8c (unreliable) [c24375b0] [c0504998] print_report+0xdc/0x504 [c2437610] [c050475c] kasan_report+0xf8/0x108 [c2437690] [c0505a3c] kasan_check_range+0x24/0x18c [c24376a0] [c03fb5e4] copy_to_kernel_nofault+0xd8/0x1c8 [c24376c0] [c004c014] patch_instructions+0x15c/0x16c [c2437710] [c00731a8] bpf_arch_text_copy+0x60/0x7c [c2437730] [c0281168] bpf_jit_binary_pack_finalize+0x50/0xac [c2437750] [c0073cf4] bpf_int_jit_compile+0xb30/0xdec [c2437880] [c0280394] bpf_prog_select_runtime+0x15c/0x478 [c24378d0] [c1263428] bpf_prepare_filter+0xbf8/0xc14 [c2437990] [c12677ec] bpf_prog_create_from_user+0x258/0x2b4 [c24379d0] [c027111c] do_seccomp+0x3dc/0x1890 [c2437ac0] [c001d8e0] system_call_exception+0x2dc/0x420 [c2437f30] [c00281ac] ret_from_syscall+0x0/0x2c --- interrupt: c00 at 0x5a1274 NIP: 005a1274 LR: 006a3b3c CTR: 005296c8 REGS: c2437f40 TRAP: 0c00 Tainted: G W (6.13.0-rc6-PMacG4) MSR: 0200f932 <VEC,EE,PR,FP,ME,IR,DR,RI> CR: 24004422 XER: 00000000 GPR00: 00000166 af8f3fa0 a7ee3540 00000001 00000000 013b6500 005a5858 0200f932 GPR08: 00000000 00001fe9 013d5fc8 005296c8 2822244c 00b2fcd8 00000000 af8f4b57 GPR16: 00000000 00000001 00000000 00000000 00000000 00000001 00000000 00000002 GPR24: 00afdbb0 00000000 00000000 00000000 006e0004 013ce060 006e7c1c 00000001 NIP [005a1274] 0x5a1274 LR [006a3b3c] 0x6a3b3c --- interrupt: c00 The buggy address belongs to the virtual mapping at [f1000000, f1002000) created by: text_area_cpu_up+0x20/0x190 The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0x76e30 flags: 0x80000000(zone=2) raw: 80000000 00000000 00000122 00000000 00000000 00000000 ffffffff 00000001 raw: 00000000 page dumped because: kasan: bad access detected Memory state around the buggy address: f0ffff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0ffff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >f1000000: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ^ f1000080: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f1000100: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 ================================================================== f8 corresponds to KASAN_VMALLOC_INVALID which means the area is not initialised hence not supposed to be used yet. Powerpc text patching infrastructure allocates a virtual memory area using get_vm_area() and flags it as VM_ALLOC. But that flag is meant to be used for vmalloc() and vmalloc() allocated memory is not supposed to be used before a call to __vmalloc_node_range() which is never called for that area. That went undetected until commit e4137f08816b ("mm, kasan, kmsan: instrument copy_from/to_kernel_nofault") The area allocated by text_area_cpu_up() is not vmalloc memory, it is mapped directly on demand when needed by map_kernel_page(). There is no VM flag corresponding to such usage, so just pass no flag. That way the area will be unpoisonned and usable immediately. Reported-by: Erhard Furtner <erhard_f@mailbox.org> Closes: https://lore.kernel.org/all/20250112135832.57c92322@yea/ Fixes: 37bc3e5fd764 ("powerpc/lib/code-patching: Use alternate map for patch_instruction()") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/06621423da339b374f48c0886e3a5db18e896be8.1739342693.git.christophe.leroy@csgroup.eu
2025-02-10powerpc/code-patching: Disable KASAN report during patching via temporary mmChristophe Leroy1-0/+2
Erhard reports the following KASAN hit on Talos II (power9) with kernel 6.13: [ 12.028126] ================================================================== [ 12.028198] BUG: KASAN: user-memory-access in copy_to_kernel_nofault+0x8c/0x1a0 [ 12.028260] Write of size 8 at addr 0000187e458f2000 by task systemd/1 [ 12.028346] CPU: 87 UID: 0 PID: 1 Comm: systemd Tainted: G T 6.13.0-P9-dirty #3 [ 12.028408] Tainted: [T]=RANDSTRUCT [ 12.028446] Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV [ 12.028500] Call Trace: [ 12.028536] [c000000008dbf3b0] [c000000001656a48] dump_stack_lvl+0xbc/0x110 (unreliable) [ 12.028609] [c000000008dbf3f0] [c0000000006e2fc8] print_report+0x6b0/0x708 [ 12.028666] [c000000008dbf4e0] [c0000000006e2454] kasan_report+0x164/0x300 [ 12.028725] [c000000008dbf600] [c0000000006e54d4] kasan_check_range+0x314/0x370 [ 12.028784] [c000000008dbf640] [c0000000006e6310] __kasan_check_write+0x20/0x40 [ 12.028842] [c000000008dbf660] [c000000000578e8c] copy_to_kernel_nofault+0x8c/0x1a0 [ 12.028902] [c000000008dbf6a0] [c0000000000acfe4] __patch_instructions+0x194/0x210 [ 12.028965] [c000000008dbf6e0] [c0000000000ade80] patch_instructions+0x150/0x590 [ 12.029026] [c000000008dbf7c0] [c0000000001159bc] bpf_arch_text_copy+0x6c/0xe0 [ 12.029085] [c000000008dbf800] [c000000000424250] bpf_jit_binary_pack_finalize+0x40/0xc0 [ 12.029147] [c000000008dbf830] [c000000000115dec] bpf_int_jit_compile+0x3bc/0x930 [ 12.029206] [c000000008dbf990] [c000000000423720] bpf_prog_select_runtime+0x1f0/0x280 [ 12.029266] [c000000008dbfa00] [c000000000434b18] bpf_prog_load+0xbb8/0x1370 [ 12.029324] [c000000008dbfb70] [c000000000436ebc] __sys_bpf+0x5ac/0x2e00 [ 12.029379] [c000000008dbfd00] [c00000000043a228] sys_bpf+0x28/0x40 [ 12.029435] [c000000008dbfd20] [c000000000038eb4] system_call_exception+0x334/0x610 [ 12.029497] [c000000008dbfe50] [c00000000000c270] system_call_vectored_common+0xf0/0x280 [ 12.029561] --- interrupt: 3000 at 0x3fff82f5cfa8 [ 12.029608] NIP: 00003fff82f5cfa8 LR: 00003fff82f5cfa8 CTR: 0000000000000000 [ 12.029660] REGS: c000000008dbfe80 TRAP: 3000 Tainted: G T (6.13.0-P9-dirty) [ 12.029735] MSR: 900000000280f032 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI> CR: 42004848 XER: 00000000 [ 12.029855] IRQMASK: 0 GPR00: 0000000000000169 00003fffdcf789a0 00003fff83067100 0000000000000005 GPR04: 00003fffdcf78a98 0000000000000090 0000000000000000 0000000000000008 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 00003fff836ff7e0 c000000000010678 0000000000000000 GPR16: 0000000000000000 0000000000000000 00003fffdcf78f28 00003fffdcf78f90 GPR20: 0000000000000000 0000000000000000 0000000000000000 00003fffdcf78f80 GPR24: 00003fffdcf78f70 00003fffdcf78d10 00003fff835c7239 00003fffdcf78bd8 GPR28: 00003fffdcf78a98 0000000000000000 0000000000000000 000000011f547580 [ 12.030316] NIP [00003fff82f5cfa8] 0x3fff82f5cfa8 [ 12.030361] LR [00003fff82f5cfa8] 0x3fff82f5cfa8 [ 12.030405] --- interrupt: 3000 [ 12.030444] ================================================================== Commit c28c15b6d28a ("powerpc/code-patching: Use temporary mm for Radix MMU") is inspired from x86 but unlike x86 is doesn't disable KASAN reports during patching. This wasn't a problem at the begining because __patch_mem() is not instrumented. Commit 465cabc97b42 ("powerpc/code-patching: introduce patch_instructions()") use copy_to_kernel_nofault() to copy several instructions at once. But when using temporary mm the destination is not regular kernel memory but a kind of kernel-like memory located in user address space. Because it is not in kernel address space it is not covered by KASAN shadow memory. Since commit e4137f08816b ("mm, kasan, kmsan: instrument copy_from/to_kernel_nofault") KASAN reports bad accesses from copy_to_kernel_nofault(). Here a bad access to user memory is reported because KASAN detects the lack of shadow memory and the address is below TASK_SIZE. Do like x86 in commit b3fd8e83ada0 ("x86/alternatives: Use temporary mm for text poking") and disable KASAN reports during patching when using temporary mm. Reported-by: Erhard Furtner <erhard_f@mailbox.org> Close: https://lore.kernel.org/all/20250201151435.48400261@yea/ Fixes: 465cabc97b42 ("powerpc/code-patching: introduce patch_instructions()") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/1c05b2a1b02ad75b981cfc45927e0b4a90441046.1738577687.git.christophe.leroy@csgroup.eu
2025-02-09lib/crc-t10dif: remove crc_t10dif_is_optimized()Eric Biggers1-6/+0
With the "crct10dif" algorithm having been removed from the crypto API, crc_t10dif_is_optimized() is no longer used. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250208175647.12333-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-02-09lib/crc32: remove "_le" from crc32c base and arch functionsEric Biggers1-5/+5
Following the standardization on crc32c() as the lib entry point for the Castagnoli CRC32 instead of the previous mix of crc32c(), crc32c_le(), and __crc32c_le(), make the same change to the underlying base and arch functions that implement it. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250208024911.14936-7-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-01-23Merge tag 'crc-for-linus' of ↵Linus Torvalds6-0/+2620
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull CRC updates from Eric Biggers: - Reorganize the architecture-optimized CRC32 and CRC-T10DIF code to be directly accessible via the library API, instead of requiring the crypto API. This is much simpler and more efficient. - Convert some users such as ext4 to use the CRC32 library API instead of the crypto API. More conversions like this will come later. - Add a KUnit test that tests and benchmarks multiple CRC variants. Remove older, less-comprehensive tests that are made redundant by this. - Add an entry to MAINTAINERS for the kernel's CRC library code. I'm volunteering to maintain it. I have additional cleanups and optimizations planned for future cycles. * tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (31 commits) MAINTAINERS: add entry for CRC library powerpc/crc: delete obsolete crc-vpmsum_test.c lib/crc32test: delete obsolete crc32test.c lib/crc16_kunit: delete obsolete crc16_kunit.c lib/crc_kunit.c: add KUnit test suite for CRC library functions powerpc/crc-t10dif: expose CRC-T10DIF function through lib arm64/crc-t10dif: expose CRC-T10DIF function through lib arm/crc-t10dif: expose CRC-T10DIF function through lib x86/crc-t10dif: expose CRC-T10DIF function through lib crypto: crct10dif - expose arch-optimized lib function lib/crc-t10dif: add support for arch overrides lib/crc-t10dif: stop wrapping the crypto API scsi: target: iscsi: switch to using the crc32c library f2fs: switch to using the crc32 library jbd2: switch to using the crc32c library ext4: switch to using the crc32c library lib/crc32: make crc32c() go directly to lib bcachefs: Explicitly select CRYPTO from BCACHEFS_FS x86/crc32: expose CRC32 functions through lib x86/crc32: update prototype for crc32_pclmul_le_16() ...
2024-12-19powerpc: Large user copy aware of full:rt:lazy preemptionShrikanth Hegde1-1/+1
Large user copy_to/from (more than 16 bytes) uses vmx instructions to speed things up. Once the copy is done, it makes sense to try schedule as soon as possible for preemptible kernels. So do this for preempt=full/lazy and rt kernel. Not checking for lazy bit here, since it could lead to unnecessary context switches. Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20241116192306.88217-3-sshegde@linux.ibm.com
2024-12-02powerpc/crc-t10dif: expose CRC-T10DIF function through libEric Biggers3-0/+937
Move the powerpc CRC-T10DIF assembly code into the lib directory and wire it up to the library interface. This allows it to be used without going through the crypto API. It remains usable via the crypto API too via the shash algorithms that use the library interface. Thus all the arch-specific "shash" code becomes unnecessary and is removed. Note: to see the diff from arch/powerpc/crypto/crct10dif-vpmsum_glue.c to arch/powerpc/lib/crc-t10dif-glue.c, view this commit with 'git show -M10'. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20241202012056.209768-8-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2024-12-02powerpc/crc32: expose CRC32 functions through libEric Biggers4-0/+1683
Move the powerpc CRC32C assembly code into the lib directory and wire it up to the library interface. This allows it to be used without going through the crypto API. It remains usable via the crypto API too via the shash algorithms that use the library interface. Thus all the arch-specific "shash" code becomes unnecessary and is removed. Note: to see the diff from arch/powerpc/crypto/crc32c-vpmsum_glue.c to arch/powerpc/lib/crc32-glue.c, view this commit with 'git show -M10'. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20241202010844.144356-9-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2024-11-23Merge tag 'powerpc-6.13-1' of ↵Linus Torvalds1-8/+4
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Rework kfence support for the HPT MMU to work on systems with >= 16TB of RAM. - Remove the powerpc "maple" platform, used by the "Yellow Dog Powerstation". - Add support for DYNAMIC_FTRACE_WITH_CALL_OPS, DYNAMIC_FTRACE_WITH_DIRECT_CALLS & BPF Trampolines. - Add support for running KVM nested guests on Power11. - Other small features, cleanups and fixes. Thanks to Amit Machhiwal, Arnd Bergmann, Christophe Leroy, Costa Shulyupin, David Hunter, David Wang, Disha Goel, Gautam Menghani, Geert Uytterhoeven, Hari Bathini, Julia Lawall, Kajol Jain, Keith Packard, Lukas Bulwahn, Madhavan Srinivasan, Markus Elfring, Michal Suchanek, Ming Lei, Mukesh Kumar Chaurasiya, Nathan Chancellor, Naveen N Rao, Nicholas Piggin, Nysal Jan K.A, Paulo Miguel Almeida, Pavithra Prakash, Ritesh Harjani (IBM), Rob Herring (Arm), Sachin P Bappalige, Shen Lichuan, Simon Horman, Sourabh Jain, Thomas Weißschuh, Thorsten Blum, Thorsten Leemhuis, Venkat Rao Bagalkote, Zhang Zekun, and zhang jiao. * tag 'powerpc-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (89 commits) EDAC/powerpc: Remove PPC_MAPLE drivers powerpc/perf: Add per-task/process monitoring to vpa_pmu driver powerpc/kvm: Add vpa latency counters to kvm_vcpu_arch docs: ABI: sysfs-bus-event_source-devices-vpa-pmu: Document sysfs event format entries for vpa_pmu powerpc/perf: Add perf interface to expose vpa counters MAINTAINERS: powerpc: Mark Maddy as "M" powerpc/Makefile: Allow overriding CPP powerpc-km82xx.c: replace of_node_put() with __free ps3: Correct some typos in comments powerpc/kexec: Fix return of uninitialized variable macintosh: Use common error handling code in via_pmu_led_init() powerpc/powermac: Use of_property_match_string() in pmac_has_backlight_type() powerpc: remove dead config options for MPC85xx platform support powerpc/xive: Use cpumask_intersects() selftests/powerpc: Remove the path after initialization. powerpc/xmon: symbol lookup length fixed powerpc/ep8248e: Use %pa to format resource_size_t powerpc/ps3: Reorganize kerneldoc parameter names KVM: PPC: Book3S HV: Fix kmv -> kvm typo powerpc/sstep: make emulate_vsx_load and emulate_vsx_store static ...
2024-11-14powerpc/sstep: make emulate_vsx_load and emulate_vsx_store staticMichal Suchanek1-8/+4
These functions are not used outside of sstep.c Fixes: 350779a29f11 ("powerpc: Handle most loads and stores in instruction emulation code") Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241001130356.14664-1-msuchanek@suse.de
2024-11-08asm-generic: introduce text-patching.hMike Rapoport (Microsoft)4-4/+4
Several architectures support text patching, but they name the header files that declare patching functions differently. Make all such headers consistently named text-patching.h and add an empty header in asm-generic for architectures that do not support text patching. Link: https://lkml.kernel.org/r/20241023162711.2579610-4-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: kdevops <kdevops@lists.linux.dev> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Song Liu <song@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-20powerpc/vdso32: Fix use of crtsavres for PPC64Christophe Leroy1-1/+1
crtsavres.S content is encloded by a #ifndef CONFIG_PPC64 To be used on VDSO32 on PPC64 it's content must available on PPC64 as well. Replace #ifndef CONFIG_PPC64 by #ifndef __powerpc64__ as __powerpc64__ is not set when building VDSO32 on PPC64. Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Closes: https://lore.kernel.org/linuxppc-dev/047b7503-af0c-4bb0-b12a-2f6b1e461752@csgroup.eu/T/ Fixes: b163596a5b6f ("powerpc/vdso32: Add crtsavres") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/aded2b257018fe654db759fdfa4ab1a0b5426b1b.1726772140.git.christophe.leroy@csgroup.eu
2024-09-19Merge tag 'powerpc-6.12-1' of ↵Linus Torvalds2-15/+96
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Reduce alignment constraints on STRICT_KERNEL_RWX and speed-up TLB misses on 8xx and 603 - Replace kretprobe code with rethook and enable fprobe - Remove the "fast endian switch" syscall - Handle DLPAR device tree updates in kernel, allowing the deprecation of the binary /proc/powerpc/ofdt interface Thanks to Abhishek Dubey, Alex Shi, Benjamin Gray, Christophe Leroy, Gaosheng Cui, Gautam Menghani, Geert Uytterhoeven, Haren Myneni, Hari Bathini, Huang Xiaojia, Jinjie Ruan, Madhavan Srinivasan, Miguel Ojeda, Mina Almasry, Narayana Murty N, Naveen Rao, Rob Herring (Arm), Scott Cheloha, Segher Boessenkool, Stephen Rothwell, Thomas Zimmermann, Uwe Kleine-König, Vaibhav Jain, and Zhang Zekun. * tag 'powerpc-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (59 commits) powerpc/atomic: Use YZ constraints for DS-form instructions MAINTAINERS: powerpc: Add Maddy powerpc: Switch back to struct platform_driver::remove() powerpc/pseries/eeh: Fix pseries_eeh_err_inject selftests/powerpc: Allow building without static libc macintosh/via-pmu: register_pmu_pm_ops() can be __init powerpc: Stop using no_llseek powerpc/64s: Remove the "fast endian switch" syscall powerpc/mm/64s: Restrict THP to Radix or HPT w/64K pages powerpc/mm/64s: Move THP reqs into a separate symbol powerpc/64s: Make mmu_hash_ops __ro_after_init powerpc: Replace kretprobe code with rethook on powerpc powerpc: pseries: Constify struct kobj_type powerpc: powernv: Constify struct kobj_type powerpc: Constify struct kobj_type powerpc/pseries/dlpar: Add device tree nodes for DLPAR IO add powerpc/pseries/dlpar: Remove device tree node for DLPAR IO remove powerpc/pseries: Use correct data types from pseries_hp_errorlog struct powerpc/vdso: Inconditionally use CFUNC macro powerpc/32: Implement validation of emergency stack ...
2024-08-29powerpc/qspinlock: Fix deadlock in MCS queueNysal Jan K.A.1-1/+9
If an interrupt occurs in queued_spin_lock_slowpath() after we increment qnodesp->count and before node->lock is initialized, another CPU might see stale lock values in get_tail_qnode(). If the stale lock value happens to match the lock on that CPU, then we write to the "next" pointer of the wrong qnode. This causes a deadlock as the former CPU, once it becomes the head of the MCS queue, will spin indefinitely until it's "next" pointer is set by its successor in the queue. Running stress-ng on a 16 core (16EC/16VP) shared LPAR, results in occasional lockups similar to the following: $ stress-ng --all 128 --vm-bytes 80% --aggressive \ --maximize --oomable --verify --syslog \ --metrics --times --timeout 5m watchdog: CPU 15 Hard LOCKUP ...... NIP [c0000000000b78f4] queued_spin_lock_slowpath+0x1184/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 Call Trace: 0xc000002cfffa3bf0 (unreliable) _raw_spin_lock+0x6c/0x90 raw_spin_rq_lock_nested.part.135+0x4c/0xd0 sched_ttwu_pending+0x60/0x1f0 __flush_smp_call_function_queue+0x1dc/0x670 smp_ipi_demux_relaxed+0xa4/0x100 xive_muxed_ipi_action+0x20/0x40 __handle_irq_event_percpu+0x80/0x240 handle_irq_event_percpu+0x2c/0x80 handle_percpu_irq+0x84/0xd0 generic_handle_irq+0x54/0x80 __do_irq+0xac/0x210 __do_IRQ+0x74/0xd0 0x0 do_IRQ+0x8c/0x170 hardware_interrupt_common_virt+0x29c/0x2a0 --- interrupt: 500 at queued_spin_lock_slowpath+0x4b8/0x1490 ...... NIP [c0000000000b6c28] queued_spin_lock_slowpath+0x4b8/0x1490 LR [c000000001037c5c] _raw_spin_lock+0x6c/0x90 --- interrupt: 500 0xc0000029c1a41d00 (unreliable) _raw_spin_lock+0x6c/0x90 futex_wake+0x100/0x260 do_futex+0x21c/0x2a0 sys_futex+0x98/0x270 system_call_exception+0x14c/0x2f0 system_call_vectored_common+0x15c/0x2ec The following code flow illustrates how the deadlock occurs. For the sake of brevity, assume that both locks (A and B) are contended and we call the queued_spin_lock_slowpath() function. CPU0 CPU1 ---- ---- spin_lock_irqsave(A) | spin_unlock_irqrestore(A) | spin_lock(B) | | | ▼ | id = qnodesp->count++; | (Note that nodes[0].lock == A) | | | ▼ | Interrupt | (happens before "nodes[0].lock = B") | | | ▼ | spin_lock_irqsave(A) | | | ▼ | id = qnodesp->count++ | nodes[1].lock = A | | | ▼ | Tail of MCS queue | | spin_lock_irqsave(A) ▼ | Head of MCS queue ▼ | CPU0 is previous tail ▼ | Spin indefinitely ▼ (until "nodes[1].next != NULL") prev = get_tail_qnode(A, CPU0) | ▼ prev == &qnodes[CPU0].nodes[0] (as qnodes[CPU0].nodes[0].lock == A) | ▼ WRITE_ONCE(prev->next, node) | ▼ Spin indefinitely (until nodes[0].locked == 1) Thanks to Saket Kumar Bhaskar for help with recreating the issue Fixes: 84990b169557 ("powerpc/qspinlock: add mcs queueing for contended waiters") Cc: stable@vger.kernel.org # v6.2+ Reported-by: Geetika Moolchandani <geetika@linux.ibm.com> Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com> Reported-by: Jijo Varghese <vargjijo@in.ibm.com> Signed-off-by: Nysal Jan K.A. <nysal@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240829022830.1164355-1-nysal@linux.ibm.com
2024-08-21powerpc/code-patching: Add boot selftest for data patchingBenjamin Gray1-0/+41
Extend the code patching selftests with some basic coverage of the new data patching variants too. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240515024445.236364-6-bgray@linux.ibm.com
2024-08-21powerpc/code-patching: Add data patch alignment checkBenjamin Gray1-0/+6
The new data patching still needs to be aligned within a cacheline too for the flushes to work correctly. To simplify this requirement, we just say data patches must be aligned. Detect when data patching is not aligned, returning an invalid argument error. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240515024445.236364-3-bgray@linux.ibm.com
2024-08-21powerpc/code-patching: Add generic memory patchingBenjamin Gray1-15/+49
patch_instruction() is designed for patching instructions in otherwise readonly memory. Other consumers also sometimes need to patch readonly memory, so have abused patch_instruction() for arbitrary data patches. This is a problem on ppc64 as patch_instruction() decides on the patch width using the 'instruction' opcode to see if it's a prefixed instruction. Data that triggers this can lead to larger writes, possibly crossing a page boundary and failing the write altogether. Introduce patch_uint(), and patch_ulong(), with aliases patch_u32(), and patch_u64() (on ppc64) designed for aligned data patches. The patch size is now determined by the called function, and is passed as an additional parameter to generic internals. While the instruction flushing is not required for data patches, it remains unconditional in this patch. A followup series is possible if benchmarking shows fewer flushes gives an improvement in some data-patching workload. ppc32 does not support prefixed instructions, so is unaffected by the original issue. Care is taken in not exposing the size parameter in the public (non-static) interface, so the compiler can const-propagate it away. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Reviewed-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240515024445.236364-2-bgray@linux.ibm.com
2024-05-17Merge tag 'powerpc-6.10-1' of ↵Linus Torvalds4-6/+127
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Enable BPF Kernel Functions (kfuncs) in the powerpc BPF JIT. - Allow per-process DEXCR (Dynamic Execution Control Register) settings via prctl, notably NPHIE which controls hashst/hashchk for ROP protection. - Install powerpc selftests in sub-directories. Note this changes the way run_kselftest.sh needs to be invoked for powerpc selftests. - Change fadump (Firmware Assisted Dump) to better handle memory add/remove. - Add support for passing additional parameters to the fadump kernel. - Add support for updating the kdump image on CPU/memory add/remove events. - Other small features, cleanups and fixes. Thanks to Andrew Donnellan, Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann, Benjamin Gray, Bjorn Helgaas, Christian Zigotzky, Christophe Jaillet, Christophe Leroy, Colin Ian King, Cédric Le Goater, Dr. David Alan Gilbert, Erhard Furtner, Frank Li, GUO Zihua, Ganesh Goudar, Geoff Levand, Ghanshyam Agrawal, Greg Kurz, Hari Bathini, Joel Stanley, Justin Stitt, Kunwu Chan, Li Yang, Lidong Zhong, Madhavan Srinivasan, Mahesh Salgaonkar, Masahiro Yamada, Matthias Schiffer, Naresh Kamboju, Nathan Chancellor, Nathan Lynch, Naveen N Rao, Nicholas Miehlbradt, Ran Wang, Randy Dunlap, Ritesh Harjani, Sachin Sant, Shirisha Ganta, Shrikanth Hegde, Sourabh Jain, Stephen Rothwell, sundar, Thorsten Blum, Vaibhav Jain, Xiaowei Bao, Yang Li, and Zhao Chenhui. * tag 'powerpc-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (85 commits) powerpc/fadump: Fix section mismatch warning powerpc/85xx: fix compile error without CONFIG_CRASH_DUMP powerpc/fadump: update documentation about bootargs_append powerpc/fadump: pass additional parameters when fadump is active powerpc/fadump: setup additional parameters for dump capture kernel powerpc/pseries/fadump: add support for multiple boot memory regions selftests/powerpc/dexcr: Fix spelling mistake "predicition" -> "prediction" KVM: PPC: Book3S HV nestedv2: Fix an error handling path in gs_msg_ops_kvmhv_nestedv2_config_fill_info() KVM: PPC: Fix documentation for ppc mmu caps KVM: PPC: code cleanup for kvmppc_book3s_irqprio_deliver KVM: PPC: Book3S HV nestedv2: Cancel pending DEC exception powerpc/xmon: Check cpu id in commands "c#", "dp#" and "dx#" powerpc/code-patching: Use dedicated memory routines for patching powerpc/code-patching: Test patch_instructions() during boot powerpc64/kasan: Pass virtual addresses to kasan_init_phys_region() powerpc: rename SPRN_HID2 define to SPRN_HID2_750FX powerpc: Fix typos powerpc/eeh: Fix spelling of the word "auxillary" and update comment macintosh/ams: Fix unused variable warning powerpc/Makefile: Remove bits related to the previous use of -mcmodel=large ...
2024-05-14powerpc: use CONFIG_EXECMEM instead of CONFIG_MODULES where appropriateMike Rapoport (IBM)1-1/+1
There are places where CONFIG_MODULES guards the code that depends on memory allocation being done with module_alloc(). Replace CONFIG_MODULES with CONFIG_EXECMEM in such places. Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2024-05-07powerpc/code-patching: Use dedicated memory routines for patchingBenjamin Gray1-4/+27
The patching page set up as a writable alias may be in quadrant 0 (userspace) if the temporary mm path is used. This causes sanitiser failures if so. Sanitiser failures also occur on the non-mm path because the plain memset family is instrumented, and KASAN treats the patching window as poisoned. Introduce locally defined patch_* variants of memset that perform an uninstrumented lower level set, as well as detecting write errors like the original single patch variant does. copy_to_user() is not correct here, as the PTE makes it a proper kernel page (the EAA is privileged access only, RW). It just happens to be in quadrant 0 because that's the hardware's mechanism for using the current PID vs PID 0 in translations. Importantly, it's incorrect to allow user page accesses. Now that the patching memsets are used, we also propagate a failure up to the caller as the single patch variant does. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240325052815.854044-2-bgray@linux.ibm.com
2024-05-07powerpc/code-patching: Test patch_instructions() during bootBenjamin Gray1-0/+92
patch_instructions() introduces new behaviour with a couple of variations. Test each case of * a repeated 32-bit instruction, * a repeated 64-bit instruction (ppc64), and * a copied sequence of instructions for both on a single page and when it crosses a page boundary. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240325052815.854044-1-bgray@linux.ibm.com
2024-05-07powerpc/Makefile: Remove bits related to the previous use of -mcmodel=largeNaveen N Rao1-2/+0
All supported compilers today (gcc v5.1+ and clang v11+) have support for -mcmodel=medium. As such, NO_MINIMAL_TOC is no longer being set. Remove NO_MINIMAL_TOC as well as the fallback to -mminimal-toc. Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240110141237.3179199-1-naveen@kernel.org
2024-04-15powerpc: Add static_key_feature_checks_initialized flagNicholas Miehlbradt1-0/+8
JUMP_LABEL_FEATURE_CHECK_DEBUG used static_key_intialized to determine whether {cpu,mmu}_has_feature() is used before static keys were initialized. However, {cpu,mmu}_has_feature() should not be used before setup_feature_keys() is called but static_key_initialized is set well before this by the call to jump_label_init() in early_init_devtree(). This creates a window in which JUMP_LABEL_FEATURE_CHECK_DEBUG will not detect misuse and report errors. Add a flag specifically to indicate when {cpu,mmu}_has_feature() is safe to use. Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240408052358.5030-1-nicholas@linux.ibm.com
2024-03-06powerpc: xor_vmx: Add '-mhard-float' to CFLAGSNathan Chancellor1-1/+1
arch/powerpc/lib/xor_vmx.o is built with '-msoft-float' (from the main powerpc Makefile) and '-maltivec' (from its CFLAGS), which causes an error when building with clang after a recent change in main: error: option '-msoft-float' cannot be specified with '-maltivec' make[6]: *** [scripts/Makefile.build:243: arch/powerpc/lib/xor_vmx.o] Error 1 Explicitly add '-mhard-float' before '-maltivec' in xor_vmx.o's CFLAGS to override the previous inclusion of '-msoft-float' (as the last option wins), which matches how other areas of the kernel use '-maltivec', such as AMDGPU. Cc: stable@vger.kernel.org Closes: https://github.com/ClangBuiltLinux/linux/issues/1986 Link: https://github.com/llvm/llvm-project/commit/4792f912b232141ecba4cbae538873be3c28556c Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240127-ppc-xor_vmx-drop-msoft-float-v1-1-f24140e81376@kernel.org
2024-03-03powerpc/64s: Move dcbt/dcbtst sequence into a macroMichael Ellerman3-31/+3
There's an almost identical code sequence to specify load/store access hints in __copy_tofrom_user_power7(), copypage_power7() and memcpy_power7(). Move the sequence into a common macro, which is passed the registers to use as they differ slightly. There also needs to be a copy in the selftests, it could be shared in future if the headers are cleaned up / refactored. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240229122521.762431-1-mpe@ellerman.id.au
2024-02-22powerpc: Use user_mode() macro when possibleChristophe Leroy1-12/+11
There is a nice macro to check user mode. Use it instead of open coding anding with MSR_PR to increase readability and avoid having to comment what that anding is for. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/fbf74887dcf1f1ba9e1680fc3247cbb581b00662.1708078228.git.christophe.leroy@csgroup.eu
2023-11-27powerpc/lib: Validate size for vector operationsNaveen N Rao1-0/+10
Some of the fp/vmx code in sstep.c assume a certain maximum size for the instructions being emulated. The size of those operations however is determined separately in analyse_instr(). Add a check to validate the assumption on the maximum size of the operations, so as to prevent any unintended kernel stack corruption. Signed-off-by: Naveen N Rao <naveen@kernel.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Build-tested-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231123071705.397625-1-naveen@kernel.org
2023-11-27powerpc/lib: Avoid array bounds warnings in vec opsMichael Ellerman1-2/+2
Building with GCC with -Warray-bounds enabled there are several warnings in sstep.c along the lines of: In function ‘do_byte_reverse’, inlined from ‘do_vec_load’ at arch/powerpc/lib/sstep.c:691:3, inlined from ‘emulate_loadstore’ at arch/powerpc/lib/sstep.c:3439:9: arch/powerpc/lib/sstep.c:289:23: error: array subscript 2 is outside array bounds of ‘u8[16]’ {aka ‘unsigned char[16]’} [-Werror=array-bounds=] 289 | up[2] = byterev_8(up[1]); | ~~~~~~^~~~~~~~~~~~~~~~~~ arch/powerpc/lib/sstep.c: In function ‘emulate_loadstore’: arch/powerpc/lib/sstep.c:681:11: note: at offset 16 into object ‘u’ of size 16 681 | } u = {}; | ^ do_byte_reverse() supports a size up to 32 bytes, but in these cases the caller is only passing a 16 byte buffer. In practice there is no bug, do_vec_load() is only called from the LOAD_VMX case in emulate_loadstore(). That in turn is only reached when analyse_instr() recognises VMX ops, and in all cases the size is no greater than 16: $ git grep -w LOAD_VMX arch/powerpc/lib/sstep.c arch/powerpc/lib/sstep.c: op->type = MKOP(LOAD_VMX, 0, 1); arch/powerpc/lib/sstep.c: op->type = MKOP(LOAD_VMX, 0, 2); arch/powerpc/lib/sstep.c: op->type = MKOP(LOAD_VMX, 0, 4); arch/powerpc/lib/sstep.c: op->type = MKOP(LOAD_VMX, 0, 16); Similarly for do_vec_store(). Although the warning is incorrect, the code would be safer if it clamped the size from the caller to the known size of the buffer. Do that using min_t(). Reported-by: Bagas Sanjaya <bagasdotme@gmail.com> Closes: https://lore.kernel.org/linuxppc-dev/YpbUcPrm61RLIiZF@debian.me/ Reported-by: Jan-Benedict Glaw <jbglaw@lug-owl.de> Closes: https://lore.kernel.org/linuxppc-dev/20221212215117.aa7255t7qd6yefk4@lug-owl.de/ Reported-by: "Gustavo A. R. Silva" <gustavo@embeddedor.com> Closes: https://lore.kernel.org/linuxppc-dev/6a8bf78c-aedb-4d5a-b0aa-82a51a17b884@embeddedor.com/ Reviewed-by: "Gustavo A. R. Silva" <gustavoars@kernel.org> Build-tested-by: "Gustavo A. R. Silva" <gustavoars@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231120235436.1569255-1-mpe@ellerman.id.au
2023-11-27powerpc: add crtsavres.o to always-y instead of extra-yMasahiro Yamada1-1/+1
crtsavres.o is linked to modules. However, as explained in commit d0e628cd817f ("kbuild: doc: clarify the difference between extra-y and always-y"), 'make modules' does not build extra-y. For example, the following command fails: $ make ARCH=powerpc LLVM=1 KBUILD_MODPOST_WARN=1 mrproper ps3_defconfig modules [snip] LD [M] arch/powerpc/platforms/cell/spufs/spufs.ko ld.lld: error: cannot open arch/powerpc/lib/crtsavres.o: No such file or directory make[3]: *** [scripts/Makefile.modfinal:56: arch/powerpc/platforms/cell/spufs/spufs.ko] Error 1 make[2]: *** [Makefile:1844: modules] Error 2 make[1]: *** [/home/masahiro/workspace/linux-kbuild/Makefile:350: __build_one_by_one] Error 2 make: *** [Makefile:234: __sub-make] Error 2 Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Fixes: baa25b571a16 ("powerpc/64: Do not link crtsavres.o in vmlinux") Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231120232332.4100288-1-masahiroy@kernel.org
2023-10-23powerpc/code-patching: introduce patch_instructions()Hari Bathini1-3/+138
patch_instruction() entails setting up pte, patching the instruction, clearing the pte and flushing the tlb. If multiple instructions need to be patched, every instruction would have to go through the above drill unnecessarily. Instead, introduce patch_instructions() function that sets up the pte, clears the pte and flushes the tlb only once per page range of instructions to be patched. Duplicate most of the patch_instruction() code instead of merging with it, to avoid the performance degradation observed on ppc32, for patch_instruction(), with the code path merged. Also, setup poking_init() always as BPF expects poking_init() to be setup even when STRICT_KERNEL_RWX is off. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231020141358.643575-2-hbathini@linux.ibm.com
2023-10-20powerpc/code-patching: Perform hwsync in __patch_instruction() in case of ↵Christophe Leroy1-4/+1
failure Commit c28c15b6d28a ("powerpc/code-patching: Use temporary mm for Radix MMU") added a hwsync for when __patch_instruction() fails, we results in a quite odd unbalanced logic. Instead of calling mb() when __patch_instruction() returns an error, call mb() in the __patch_instruction()'s error path directly. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/e88b154eaf2efd9ff177d472d3411dcdec8ff4f5.1696675567.git.christophe.leroy@csgroup.eu
2023-10-20powerpc/qspinlock: Rename yield_propagate_owner tunableNicholas Piggin1-9/+9
Rename yield_propagate_owner to yield_sleepy_owner, which better describes what it does (what, not how). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com> Reviewed-by: "Nysal Jan K.A" <nysal@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231016124305.139923-7-npiggin@gmail.com
2023-10-20powerpc/qspinlock: Propagate sleepy if previous waiter is preemptedNicholas Piggin1-1/+5
The sleepy (aka lock-owner-is-preempted) condition is propagated down the queue by each waiter. If a waiter becomes preempted, it can no longer propagate sleepy. To allow subsequent waiters to yield to the lock owner, also check the lock owner in this case. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com> Reviewed-by: "Nysal Jan K.A" <nysal@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231016124305.139923-6-npiggin@gmail.com
2023-10-20powerpc/qspinlock: don't propagate the not-sleepy stateNicholas Piggin1-18/+8
To simplify things, don't propagate the not-sleepy condition back down the queue. Instead, have the waiters clear their own node->sleepy when finding the lock owner is not preempted. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com> Reviewed-by: "Nysal Jan K.A" <nysal@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231016124305.139923-5-npiggin@gmail.com
2023-10-20powerpc/qspinlock: propagate owner preemptedness rather than CPU numberNicholas Piggin1-44/+36
Rather than propagating the CPU number of the preempted lock owner, just propagate whether the owner was preempted. Waiters must read the lock value when yielding to it to prevent races anyway, so might as well always load the owner CPU from the lock. To further simplify the code, also don't propagate the -1 (or sleepy=false in the new scheme) down the queue. Instead, have the waiters clear it themselves when finding the lock owner is not preempted. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com> Reviewed-by: "Nysal Jan K.A" <nysal@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231016124305.139923-4-npiggin@gmail.com
2023-10-20powerpc/qspinlock: stop queued waiters trying to set lock sleepyNicholas Piggin1-14/+10
If a queued waiter notices the lock owner or the previous waiter has been preempted, it attempts to mark the lock sleepy, but it does this as a try-set operation using the original lock value it got when queueing, which will become stale as the queue progresses, and the try-set will fail. Drop this and just set the sleepy seen clock. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com> Reviewed-by: "Nysal Jan K.A" <nysal@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231016124305.139923-3-npiggin@gmail.com
2023-10-18powerpc/qspinlock: Fix stale propagated yield_cpuNicholas Piggin1-0/+3
yield_cpu is a sample of a preempted lock holder that gets propagated back through the queue. Queued waiters use this to yield to the preempted lock holder without continually sampling the lock word (which would defeat the purpose of MCS queueing by bouncing the cache line). The problem is that yield_cpu can become stale. It can take some time to be passed down the chain, and if any queued waiter gets preempted then it will cease to propagate the yield_cpu to later waiters. This can result in yielding to a CPU that no longer holds the lock, which is bad, but particularly if it is currently in H_CEDE (idle), then it appears to be preempted and some hypervisors (PowerVM) can cause very long H_CONFER latencies waiting for H_CEDE wakeup. This results in latency spikes and hard lockups on oversubscribed partitions with lock contention. This is a minimal fix. Before yielding to yield_cpu, sample the lock word to confirm yield_cpu is still the owner, and bail out of it is not. Thanks to a bunch of people who reported this and tracked down the exact problem using tracepoints and dispatch trace logs. Fixes: 28db61e207ea ("powerpc/qspinlock: allow propagation of yield CPU down the queue") Cc: stable@vger.kernel.org # v6.2+ Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Reported-by: Laurent Dufour <ldufour@linux.ibm.com> Reported-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com> Debugged-by: "Nysal Jan K.A" <nysal@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231016124305.139923-2-npiggin@gmail.com
2023-08-24powerpc: Drop zalloc_maybe_bootmem()Michael Ellerman2-24/+1
The only callers of zalloc_maybe_bootmem() are PCI setup routines. These used to be called early during boot before slab setup, and also during runtime due to hotplug. But commit 5537fcb319d0 ("powerpc/pci: Add ppc_md.discover_phbs()") moved the boot-time calls later, after slab setup, meaning there's no longer any need for zalloc_maybe_bootmem(), kzalloc() can be used in all cases. Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230823055430.752550-1-mpe@ellerman.id.au
2023-08-16powerpc: replace #include <asm/export.h> with #include <linux/export.h>Masahiro Yamada15-15/+15
Commit ddb5cdbafaaa ("kbuild: generate KSYMTAB entries by modpost") deprecated <asm/export.h>, which is now a wrapper of <linux/export.h>. Replace #include <asm/export.h> with #include <linux/export.h>. After all the <asm/export.h> lines are converted, <asm/export.h> and <asm-generic/export.h> will be removed. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> [mpe: Fixup selftests that stub asm/export.h] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230806150954.394189-2-masahiroy@kernel.org
2023-08-16powerpc/step: Mark __copy_mem_out() and __emulate_dcbz() __always_inlineChristophe Leroy1-2/+2
objtool reports two folliwng warnings: arch/powerpc/lib/sstep.o: warning: objtool: copy_mem_out+0x3c (.text+0x30c): call to __copy_mem_out() with UACCESS enabled arch/powerpc/lib/sstep.o: warning: objtool: emulate_dcbz+0x70 (.text+0x4dc): call to __emulate_dcbz() with UACCESS enabled Mark __copy_mem_out() and __emulate_dcbz() __always_inline Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/f1d4a15da70190f8c2fcddb377bbc1e09827242c.1687343857.git.christophe.leroy@csgroup.eu
2023-08-02powerpc/features: Add capability to update mmu features laterChristophe Leroy1-4/+27
On powerpc32, features fixup is performed very early and that's too early to read the cmdline and take into account 'nosmap' parameter. On the other hand, no userspace access is performed that early and KUAP feature fixup can be performed later. Add a function to update mmu features. The function is passed a mask with the features that can be updated. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/31b27ee2c9d338f4f82cd8cd69d6bff979495290.1689091022.git.christophe.leroy@csgroup.eu
2023-06-27powerpc: remove checks for binutils older than 2.25Masahiro Yamada1-1/+1
Commit e4412739472b ("Documentation: raise minimum supported version of binutils to 2.25") allows us to remove the checks for old binutils. There is no more user for ld-ifversion. Remove it as well. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230119082250.151485-1-masahiroy@kernel.org
2023-06-21powerpc: qspinlock: Enforce qnode writes prior to publishing to queueRohan McLure1-0/+7
Annotate the release barrier and memory clobber (in effect, producing a compiler barrier) in the publish_tail_cpu call. These barriers have the effect of ensuring that qnode attributes are all written to prior to publish the node to the waitqueue. Even while the initial write to the 'locked' attribute is guaranteed to terminate prior to the node being visible, KCSAN still complains that the write is reorderable by the compiler. Issue a kcsan_release() to inform KCSAN of the release barrier contained in publish_tail_cpu(). Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230510033117.1395895-3-rmclure@linux.ibm.com
2023-06-21powerpc: qspinlock: Mark accesses to qnode lock checksRohan McLure1-2/+2
The powerpc implementation of qspinlocks will both poll and spin on the bitlock guarding a qnode. Mark these accesses with READ_ONCE to convey to KCSAN that polling is intentional here. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230510033117.1395895-2-rmclure@linux.ibm.com
2023-04-20powerpc/64: vmlinux support building with PCREL addresingNicholas Piggin1-0/+10
PC-Relative or PCREL addressing is an extension to the ELF ABI which uses Power ISA v3.1 PC-relative instructions to calculate addresses, rather than the traditional TOC scheme. Add an option to build vmlinux using pcrel addressing. Modules continue to use TOC addressing. - TOC address helpers and r2 are poisoned with -1 when running vmlinux. r2 could be used for something useful once things are ironed out. - Assembly must call C functions with @notoc annotation, or the linker complains aobut a missing nop after the call. This is done with the CFUNC macro introduced earlier. - Boot: with the exception of prom_init, the execution branches to the kernel virtual address early in boot, before any addresses are generated, which ensures 34-bit pcrel addressing does not miss the high PAGE_OFFSET bits. TOC relative addressing has a similar requirement. prom_init does not go to the virtual address and its addresses should not carry over to the post-prom kernel. - Ftrace trampolines are converted from TOC addressing to pcrel addressing, including module ftrace trampolines that currently use the kernel TOC to find ftrace target functions. - BPF function prologue and function calling generation are converted from TOC to pcrel. - copypage_64.S has an interesting problem, prefixed instructions have alignment restrictions so the linker can add padding, which makes the assembler treat the difference between two local labels as non-constant even if alignment is arranged so padding is not required. This may need toolchain help to solve nicely, for now move the prefix instruction out of the alternate patch section to work around it. This reduces kernel text size by about 6%. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230408021752.862660-6-npiggin@gmail.com
2023-04-20powerpc: add CFUNC assembly label annotationNicholas Piggin5-15/+15
This macro is to be used in assembly where C functions are called. pcrel addressing mode requires branches to functions with a localentry value of 1 to have either a trailing nop or @notoc. This macro permits the latter without changing callers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add dummy definitions to fix selftests build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230408021752.862660-5-npiggin@gmail.com
2023-03-29powerpc: Remove memcpy_page_flushcache()Ira Weiny1-7/+0
Commit 21b56c847753 ("iov_iter: get rid of separate bvec and xarray callbacks") removed the calls to memcpy_page_flushcache(). Remove the unnecessary memcpy_page_flushcache() call. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20221230-kmap-x86-v1-2-15f1ecccab50@intel.com
2023-02-10powerpc/kcsan: Add exclusions from instrumentationRohan McLure1-0/+2
Exclude various incompatible compilation units from KCSAN instrumentation. Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230206021801.105268-2-rmclure@linux.ibm.com
2022-12-16powerpc/code-patching: Fix oops with DEBUG_VM enabledMichael Ellerman1-3/+7
Nathan reported that the new per-cpu mm patching oopses if DEBUG_VM is enabled: ------------[ cut here ]------------ kernel BUG at arch/powerpc/mm/pgtable.c:333! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc2+ #1 Hardware name: IBM PowerNV (emulated by qemu) POWER9 0x4e1200 opal:v7.0 PowerNV ... NIP assert_pte_locked+0x180/0x1a0 LR assert_pte_locked+0x170/0x1a0 Call Trace: 0x60000000 (unreliable) patch_instruction+0x618/0x6d0 arch_prepare_kprobe+0xfc/0x2d0 register_kprobe+0x520/0x7c0 arch_init_kprobes+0x28/0x3c init_kprobes+0x108/0x184 do_one_initcall+0x60/0x2e0 kernel_init_freeable+0x1f0/0x3e0 kernel_init+0x34/0x1d0 ret_from_kernel_thread+0x5c/0x64 It's caused by the assert_spin_locked() failing in assert_pte_locked(). The assert fails because the PTE was unlocked in text_area_cpu_up_mm(), and never relocked. The PTE page shouldn't be freed, the patching_mm is only used for patching on this CPU, only that single PTE is ever mapped, and it's only unmapped at CPU offline. In fact assert_pte_locked() has a special case to ignore init_mm entirely, and the patching_mm is more-or-less like init_mm, so possibly the check could be skipped for patching_mm too. But for now be conservative, and use the proper PTE accessors at patching time, so that the PTE lock is held while the PTE is used. That also avoids the warning in assert_pte_locked(). With that it's no longer necessary to save the PTE in cpu_patching_context for the mm_patch_enabled() case. Fixes: c28c15b6d28a ("powerpc/code-patching: Use temporary mm for Radix MMU") Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221216125913.990972-1-mpe@ellerman.id.au