summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-12-09powerpc/32s: Do kuep_lock() and kuep_unlock() in assemblyChristophe Leroy9-49/+119
When interrupt and syscall entries where converted to C, KUEP locking and unlocking was also converted. It improved performance by unrolling the loop, and allowed easily implementing boot time deactivation of KUEP. However, null_syscall selftest shows that KUEP is still heavy (361 cycles with KUEP, 212 cycles without). A way to improve more is to group 'mtsr's together, instead of repeating 'addi' + 'mtsr' several times. In order to do that, more registers need to be available. In C, GCC will always be able to provide the requested number of registers, but at the cost of saving some data on the stack, which is counter performant here. So let's do it in assembly, when we have full control of which register can be used. It also has the advantage of locking earlier and unlocking later and it helps GCC generating less tricky code. The only drawback is to make boot time deactivation less straight forward and require 'hand' instruction patching. Group 'mtsr's by 4. With this change, null_syscall selftest reports 336 cycles. Without the change it was 361 cycles, that's a 7% reduction. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/115cb279e9b9948dfd93a065e047081c59e3a2a6.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09powerpc/32s: Remove capability to disable KUEP at boottimeChristophe Leroy2-10/+3
Disabling KUEP at boottime makes things unnecessarily complex. Still allow disabling KUEP at build time, but when it's built-in it is always there. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/96f583f82423a29a4205c60b9721079111b35567.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09powerpc/book3e: Activate KUEP at all timeChristophe Leroy2-0/+3
On book3e, - When using 64 bits PTE: User pages don't have the SX bit defined so KUEP is always active. - When using 32 bits PTE: Implement KUEP by clearing SX bit during TLB miss for user pages. The impact is minimal and worth neither boot time nor build time selection. Activate it at all time. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e376b114283fb94504e2aa2de846780063252cde.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09powerpc/44x: Activate KUEP at all timeChristophe Leroy4-16/+4
On 44x, KUEP is implemented by clearing SX bit during TLB miss for user pages. The impact is minimal and not worth neither boot time nor build time selection. Activate it at all time. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/2414d662558e7fb27d1ed41c8e47c591d576acac.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09powerpc/8xx: Activate KUEP at all timeChristophe Leroy3-9/+3
On the 8xx, there is absolutely no runtime impact with KUEP. Protection against execution of user code in kernel mode is set up at boot time by configuring the groups with contain all user pages as having swapped protection rights, in extenso EX for user and NA for supervisor. Configure KUEP at startup and force selection of CONFIG_PPC_KUEP. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/2129e86944323ffe9ed07fffbeafdfd2e363690a.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09Revert "powerpc: Inline setup_kup()"Christophe Leroy3-8/+10
This reverts commit 1791ebd131c46539b024c0f2ebf12b6c88a265b9. setup_kup() was inlined to manage conflict between PPC32 marking setup_{kuap/kuep}() __init and PPC64 not marking them __init. But in fact PPC32 has removed the __init mark for all but 8xx in order to properly handle SMP. In order to make setup_kup() grow a bit, revert the commit mentioned above but remove __init for 8xx as well so that we don't have to mark setup_kup() as __ref. Also switch the order so that KUAP is initialised before KUEP because on the 40x, KUEP will depend on the activation of KUAP. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7691088fd0994ee3c8db6298dc8c00259e3f6a7f.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09powerpc/40x: Map 32Mbytes of memory at startupChristophe Leroy1-1/+8
As reported by Carlo, 16Mbytes is not enough with modern kernels that tend to be a bit big, so map another 16M page at boot. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/89b5f974a7fa5011206682cd092e2c905530ff46.1632755552.git.christophe.leroy@csgroup.eu
2021-12-09powerpc/microwatt: add POWER9_CPU, clear PPC_64S_HASH_MMUNicholas Piggin2-2/+2
Microwatt implements a subset of ISA v3.0 (which is equivalent to the POWER9_CPU option). It is radix-only, so does not require hash MMU support. This saves 20kB compressed dtbImage and 56kB vmlinux size. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-19-npiggin@gmail.com
2021-12-09powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMUNicholas Piggin35-57/+172
Compiling out hash support code when CONFIG_PPC_64S_HASH_MMU=n saves 128kB kernel image size (90kB text) on powernv_defconfig minus KVM, 350kB on pseries_defconfig minus KVM, 40kB on a tiny config. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fixup defined(ARCH_HAS_MEMREMAP_COMPAT_ALIGN), which needs CONFIG. Fix radix_enabled() use in setup_initial_memory_limit(). Add some stubs to reduce number of ifdefs.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-18-npiggin@gmail.com
2021-12-09powerpc/64s: Make hash MMU support configurableNicholas Piggin14-15/+71
This adds Kconfig selection which allows 64s hash MMU support to be disabled. It can be disabled if radix support is enabled, the minimum supported CPU type is POWER9 (or higher), and KVM is not selected. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-17-npiggin@gmail.com
2021-12-09powerpc/64s: Always define arch unmapped area callsNicholas Piggin5-35/+51
To avoid any functional changes to radix paths when building with hash MMU support disabled (and CONFIG_PPC_MM_SLICES=n), always define the arch get_unmapped_area calls on 64s platforms. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-16-npiggin@gmail.com
2021-12-09powerpc/64s: Fix radix MMU when MMU_FTR_HPTE_TABLE is clearNicholas Piggin1-3/+6
There are a few places that require MMU_FTR_HPTE_TABLE to be set even when running in radix mode. Fix those up. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-15-npiggin@gmail.com
2021-12-09powerpc/64e: remove mmu_linear_psizeNicholas Piggin1-9/+0
mmu_linear_psize is only set at boot once on 64e, is not necessarily the correct size of the linear map pages, and is never used anywhere. Remove it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Retain the extern, so we can use IS_ENABLED() for related code] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-14-npiggin@gmail.com
2021-12-02powerpc: make memremap_compat_align 64s-onlyNicholas Piggin3-21/+21
memremap_compat_align is only relevant when ZONE_DEVICE is selected. ZONE_DEVICE depends on ARCH_HAS_PTE_DEVMAP, which is only selected by PPC_BOOK3S_64. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-13-npiggin@gmail.com
2021-12-02powerpc/64: pcpu setup avoid reading mmu_linear_psize on 64e or radixNicholas Piggin1-6/+15
Radix never sets mmu_linear_psize so it's always 4K, which causes pcpu atom_size to always be PAGE_SIZE. 64e sets it to 1GB always. Make paths for these platforms to be explicit about what value they set atom_size to. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-12-npiggin@gmail.com
2021-12-02powerpc/64s: Rename hash_hugetlbpage.c to hugetlbpage.cNicholas Piggin2-1/+1
This file contains functions and data common to radix, so rename it to remove the hash_ prefix. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-11-npiggin@gmail.com
2021-12-02powerpc/64s: move page size definitions from hash specific fileNicholas Piggin2-5/+7
The radix code uses some of the psize variables. Move the common ones from hash_utils.c to pgtable.c. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-10-npiggin@gmail.com
2021-12-02powerpc/64s: Make flush_and_reload_slb a no-op when radix is enabledNicholas Piggin1-3/+3
The radix test can exclude slb_flush_all_realmode() from being called because flush_and_reload_slb() is only expected to flush ERAT when called by flush_erat(), which is only on pre-ISA v3.0 CPUs that do not support radix. This helps the later change to make hash support configurable to not introduce runtime changes to radix mode behaviour. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-9-npiggin@gmail.com
2021-12-02powerpc/64s: move THP trace point creation out of hash specific fileNicholas Piggin3-2/+9
In preparation for making hash MMU support configurable, move THP trace point function definitions out of an otherwise hash-specific file. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-8-npiggin@gmail.com
2021-12-02powerpc/pseries: lparcfg don't include slb_size line in radix modeNicholas Piggin1-1/+2
This avoids a change in behaviour in the later patch making hash support configurable. This is possibly a user interface change, so the alternative would be a hard-coded slb_size=0 here. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-7-npiggin@gmail.com
2021-12-02powerpc/pseries: move process table registration away from hash-specific codeNicholas Piggin1-28/+28
This reduces ifdefs in a later change which makes hash support configurable. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-6-npiggin@gmail.com
2021-12-02powerpc/64s: Move and rename do_bad_slb_fault as it is not hash specificNicholas Piggin4-19/+27
slb.c is hash-specific SLB management, but do_bad_slb_fault deals with segment interrupts that occur with radix MMU as well. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-5-npiggin@gmail.com
2021-12-02powerpc/pseries: Stop selecting PPC_HASH_MMU_NATIVENicholas Piggin4-109/+104
The pseries platform does not use the native hash code but the PAPR virtualised hash interfaces, so remove PPC_HASH_MMU_NATIVE. This requires moving tlbiel code from hash_native.c to hash_utils.c. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-4-npiggin@gmail.com
2021-12-02powerpc: Rename PPC_NATIVE to PPC_HASH_MMU_NATIVENicholas Piggin13-14/+14
PPC_NATIVE now only controls the native HPT code, so rename it to be more descriptive. Restrict it to Book3S only. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-3-npiggin@gmail.com
2021-12-02powerpc: Remove unused FW_FEATURE_NATIVE referencesNicholas Piggin1-8/+0
FW_FEATURE_NATIVE_ALWAYS and FW_FEATURE_NATIVE_POSSIBLE are always zero and never do anything. Remove them. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-2-npiggin@gmail.com
2021-12-02powerpc/xive: Fix compile when !CONFIG_PPC_POWERNV.Cédric Le Goater3-1/+15
The automatic "save & restore" of interrupt context is a POWER10/XIVE2 feature exploited by KVM under the PowerNV platform. It is not available under pSeries and the associated toggle should not be exposed under the XIVE debugfs directory. Introduce a platform handler for debugfs initialization and move the 'save-restore' entry under the native (PowerNV) backend to fix compile when !CONFIG_PPC_POWERNV. Fixes: 1e7684dc4fc7 ("powerpc/xive: Add a debugfs toggle for save-restore") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201165418.1041842-1-clg@kaod.org
2021-12-02powerpc/signal32: Use struct_group() to zero spe regsKees Cook2-7/+13
In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memset(), avoid intentionally writing across neighboring fields. Add a struct_group() for the spe registers so that memset() can correctly reason about the size: In function 'fortify_memset_chk', inlined from 'restore_user_regs.part.0' at arch/powerpc/kernel/signal_32.c:539:3: >> include/linux/fortify-string.h:195:4: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning] 195 | __write_overflow_field(); | ^~~~~~~~~~~~~~~~~~~~~~~~ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211118203604.1288379-1-keescook@chromium.org
2021-11-30powerpc/32s: Fix shift-out-of-bounds in KASAN initChristophe Leroy1-1/+2
================================================================================ UBSAN: shift-out-of-bounds in arch/powerpc/mm/kasan/book3s_32.c:22:23 shift exponent -1 is negative CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.5-gentoo-PowerMacG4 #9 Call Trace: [c214be60] [c0ba0048] dump_stack_lvl+0x80/0xb0 (unreliable) [c214be80] [c0b99288] ubsan_epilogue+0x10/0x5c [c214be90] [c0b98fe0] __ubsan_handle_shift_out_of_bounds+0x94/0x138 [c214bf00] [c1c0f010] kasan_init_region+0xd8/0x26c [c214bf30] [c1c0ed84] kasan_init+0xc0/0x198 [c214bf70] [c1c08024] setup_arch+0x18/0x54c [c214bfc0] [c1c037f0] start_kernel+0x90/0x33c [c214bff0] [00003610] 0x3610 ================================================================================ This happens when the directly mapped memory is a power of 2. Fix it by checking the shift and set the result to 0 when shift is -1 Fixes: 7974c4732642 ("powerpc/32s: Implement dedicated kasan_init_region()") Reported-by: Erhard Furtner <erhard_f@mailbox.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215169 Link: https://lore.kernel.org/r/15cbc3439d4ad988b225e2119ec99502a5cc6ad3.1638261744.git.christophe.leroy@csgroup.eu
2021-11-30powerpc/powermac: Add missing lockdep_register_key()Christophe Leroy1-0/+1
KeyWest i2c @0xf8001003 irq 42 /uni-n@f8000000/i2c@f8001000 BUG: key c2d00cbc has not been registered! ------------[ cut here ]------------ DEBUG_LOCKS_WARN_ON(1) WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:4801 lockdep_init_map_type+0x4c0/0xb4c Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.15.5-gentoo-PowerMacG4 #9 NIP: c01a9428 LR: c01a9428 CTR: 00000000 REGS: e1033cf0 TRAP: 0700 Not tainted (5.15.5-gentoo-PowerMacG4) MSR: 00029032 <EE,ME,IR,DR,RI> CR: 24002002 XER: 00000000 GPR00: c01a9428 e1033db0 c2d1cf20 00000016 00000004 00000001 c01c0630 e1033a73 GPR08: 00000000 00000000 00000000 e1033db0 24002004 00000000 f8729377 00000003 GPR16: c1829a9c 00000000 18305357 c1416fc0 c1416f80 c006ac60 c2d00ca8 c1416f00 GPR24: 00000000 c21586f0 c2160000 00000000 c2d00cbc c2170000 c216e1a0 c2160000 NIP [c01a9428] lockdep_init_map_type+0x4c0/0xb4c LR [c01a9428] lockdep_init_map_type+0x4c0/0xb4c Call Trace: [e1033db0] [c01a9428] lockdep_init_map_type+0x4c0/0xb4c (unreliable) [e1033df0] [c1c177b8] kw_i2c_add+0x334/0x424 [e1033e20] [c1c18294] pmac_i2c_init+0x9ec/0xa9c [e1033e80] [c1c1a790] smp_core99_probe+0xbc/0x35c [e1033eb0] [c1c03cb0] kernel_init_freeable+0x190/0x5a4 [e1033f10] [c000946c] kernel_init+0x28/0x154 [e1033f30] [c0035148] ret_from_kernel_thread+0x14/0x1c Add missing lockdep_register_key() Reported-by: Erhard Furtner <erhard_f@mailbox.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/69e4f55565bb45ebb0843977801b245af0c666fe.1638264741.git.christophe.leroy@csgroup.eu
2021-11-30powerpc/modules: Don't WARN on first module allocation attemptChristophe Leroy1-5/+6
module_alloc() first tries to allocate module text within 24 bits direct jump from kernel text, and tries a wider allocation if first one fails. When first allocation fails the following is observed in kernel logs: vmap allocation for size 2400256 failed: use vmalloc=<size> to increase size systemd-udevd: vmalloc error: size 2395133, vm_struct allocation failed, mode:0xcc0(GFP_KERNEL), nodemask=(null) CPU: 0 PID: 127 Comm: systemd-udevd Tainted: G W 5.15.5-gentoo-PowerMacG4 #9 Call Trace: [e2a53a50] [c0ba0048] dump_stack_lvl+0x80/0xb0 (unreliable) [e2a53a70] [c0540128] warn_alloc+0x11c/0x2b4 [e2a53b50] [c0531be8] __vmalloc_node_range+0xd8/0x64c [e2a53c10] [c00338c0] module_alloc+0xa0/0xac [e2a53c40] [c027a368] load_module+0x2ae0/0x8148 [e2a53e30] [c027fc78] sys_finit_module+0xfc/0x130 [e2a53f30] [c0035098] ret_from_syscall+0x0/0x28 ... Add __GFP_NOWARN flag to first allocation so that no warning appears when it fails. Reported-by: Erhard Furtner <erhard_f@mailbox.org> Fixes: 2ec13df16704 ("powerpc/modules: Load modules closer to kernel text") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/93c9b84d6ec76aaf7b4f03468e22433a6d308674.1638267035.git.christophe.leroy@csgroup.eu
2021-11-30powerpc/64s: Get LPID bit width from device treeNicholas Piggin4-22/+51
Allow the LPID bit width and partition table size to be set at runtime from the device tree. Move the PID bit width detection into the same place. KVM does not support using the extra bits yet, this is mainly required to get the PTCR register values correct (so KVM will run but it will not allocate > 4096 LPIDs). OPAL firmware provides this property for POWER10 CPUs since skiboot commit 9b85f7d961f2 ("hdata: add mmu-pid-bits and mmu-lpid-bits for POWER10 CPUs"). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211129030915.1888332-1-npiggin@gmail.com
2021-11-30powerpc/perf: Fix PMU callbacks to clear pending PMI before resetting an ↵Athira Rajeev2-1/+97
overflown PMC Running perf fuzzer showed below in dmesg logs: "Can't find PMC that caused IRQ" This means a PMU exception happened, but none of the PMC's (Performance Monitor Counter) were found to be overflown. There are some corner cases that clears the PMCs after PMI gets masked. In such cases, the perf interrupt handler will not find the active PMC values that had caused the overflow and thus leads to this message while replaying. Case 1: PMU Interrupt happens during replay of other interrupts and counter values gets cleared by PMU callbacks before replay: During replay of interrupts like timer, __do_irq() and doorbell exception, we conditionally enable interrupts via may_hard_irq_enable(). This could potentially create a window to generate a PMI. Since irq soft mask is set to ALL_DISABLED, the PMI will get masked here. We could get IPIs run before perf interrupt is replayed and the PMU events could be deleted or stopped. This will change the PMU SPR values and resets the counters. Snippet of ftrace log showing PMU callbacks invoked in __do_irq(): <idle>-0 [051] dns. 132025441306354: __do_irq <-call_do_irq <idle>-0 [051] dns. 132025441306430: irq_enter <-__do_irq <idle>-0 [051] dns. 132025441306503: irq_enter_rcu <-__do_irq <idle>-0 [051] dnH. 132025441306599: xive_get_irq <-__do_irq <<>> <idle>-0 [051] dnH. 132025441307770: generic_smp_call_function_single_interrupt <-smp_ipi_demux_relaxed <idle>-0 [051] dnH. 132025441307839: flush_smp_call_function_queue <-smp_ipi_demux_relaxed <idle>-0 [051] dnH. 132025441308057: _raw_spin_lock <-event_function <idle>-0 [051] dnH. 132025441308206: power_pmu_disable <-perf_pmu_disable <idle>-0 [051] dnH. 132025441308337: power_pmu_del <-event_sched_out <idle>-0 [051] dnH. 132025441308407: power_pmu_read <-power_pmu_del <idle>-0 [051] dnH. 132025441308477: read_pmc <-power_pmu_read <idle>-0 [051] dnH. 132025441308590: isa207_disable_pmc <-power_pmu_del <idle>-0 [051] dnH. 132025441308663: write_pmc <-power_pmu_del <idle>-0 [051] dnH. 132025441308787: power_pmu_event_idx <-perf_event_update_userpage <idle>-0 [051] dnH. 132025441308859: rcu_read_unlock_strict <-perf_event_update_userpage <idle>-0 [051] dnH. 132025441308975: power_pmu_enable <-perf_pmu_enable <<>> <idle>-0 [051] dnH. 132025441311108: irq_exit <-__do_irq <idle>-0 [051] dns. 132025441311319: performance_monitor_exception <-replay_soft_interrupts Case 2: PMI's masked during local_* operations, example local_add(). If the local_add() operation happens within a local_irq_save(), replay of PMI will be during local_irq_restore(). Similar to case 1, this could also create a window before replay where PMU events gets deleted or stopped. Fix it by updating the PMU callback function power_pmu_disable() to check for pending perf interrupt. If there is an overflown PMC and pending perf interrupt indicated in paca, clear the PMI bit in paca to drop that sample. Clearing of PMI bit is done in power_pmu_disable() since disable is invoked before any event gets deleted/stopped. With this fix, if there are more than one event running in the PMU, there is a chance that we clear the PMI bit for the event which is not getting deleted/stopped. The other events may still remain active. Hence to make sure we don't drop valid sample in such cases, another check is added in power_pmu_enable. This checks if there is an overflown PMC found among the active events and if so enable back the PMI bit. Two new helper functions are introduced to clear/set the PMI, ie clear_pmi_irq_pending() and set_pmi_irq_pending(). Helper function pmi_irq_pending() is introduced to give a warning if there is pending PMI bit in paca, but no PMC is overflown. Also there are corner cases which result in performance monitor interrupts being triggered during power_pmu_disable(). This happens since PMXE bit is not cleared along with disabling of other MMCR0 bits in the pmu_disable. Such PMI's could leave the PMU running and could trigger PMI again which will set MMCR0 PMAO bit. This could lead to spurious interrupts in some corner cases. Example, a timer after power_pmu_del() which will re-enable interrupts and triggers a PMI again since PMAO bit is still set. But fails to find valid overflow since PMC was cleared in power_pmu_del(). Fix that by disabling PMXE along with disabling of other MMCR0 bits in power_pmu_disable(). We can't just replay PMI any time. Hence this approach is preferred rather than replaying PMI before resetting overflown PMC. Patch also documents core-book3s on a race condition which can trigger these PMC messages during idle path in PowerNV. Fixes: f442d004806e ("powerpc/64s: Add support to mask perf interrupts and replay them") Reported-by: Nageswara R Sastry <nasastry@in.ibm.com> Suggested-by: Nicholas Piggin <npiggin@gmail.com> Suggested-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Make pmi_irq_pending() return bool, reflow/reword some comments] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1626846509-1350-2-git-send-email-atrajeev@linux.vnet.ibm.com
2021-11-30powerpc/atomics: Remove atomic_inc()/atomic_dec() and friendsChristophe Leroy1-95/+0
Now that atomic_add() and atomic_sub() handle immediate operands, atomic_inc() and atomic_dec() have no added value compared to the generic fallback which calls atomic_add(1) and atomic_sub(1). Also remove atomic_inc_not_zero() which fallsback to atomic_add_unless() which itself fallsback to atomic_fetch_add_unless() which now handles immediate operands. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0bc64a2f18726055093dbb2e479cefc60a409cfd.1632236981.git.christophe.leroy@csgroup.eu
2021-11-30powerpc/atomics: Use immediate operand when possibleChristophe Leroy1-28/+28
Today we get the following code generation for atomic operations: c001bb2c: 39 20 00 01 li r9,1 c001bb30: 7d 40 18 28 lwarx r10,0,r3 c001bb34: 7d 09 50 50 subf r8,r9,r10 c001bb38: 7d 00 19 2d stwcx. r8,0,r3 c001c7a8: 39 40 00 01 li r10,1 c001c7ac: 7d 00 18 28 lwarx r8,0,r3 c001c7b0: 7c ea 42 14 add r7,r10,r8 c001c7b4: 7c e0 19 2d stwcx. r7,0,r3 By allowing GCC to choose between immediate or regular operation, we get: c001bb2c: 7d 20 18 28 lwarx r9,0,r3 c001bb30: 39 49 ff ff addi r10,r9,-1 c001bb34: 7d 40 19 2d stwcx. r10,0,r3 -- c001c7a4: 7d 40 18 28 lwarx r10,0,r3 c001c7a8: 39 0a 00 01 addi r8,r10,1 c001c7ac: 7d 00 19 2d stwcx. r8,0,r3 For "and", the dot form has to be used because "andi" doesn't exist. For logical operations we use unsigned 16 bits immediate. For arithmetic operations we use signed 16 bits immediate. On pmac32_defconfig, it reduces the text by approx another 8 kbytes. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/2ec558d44db8045752fe9dbd29c9ba84bab6030b.1632236981.git.christophe.leroy@csgroup.eu
2021-11-30powerpc/bitops: Use immediate operand when possibleChristophe Leroy1-8/+81
Today we get the following code generation for bitops like set or clear bit: c0009fe0: 39 40 08 00 li r10,2048 c0009fe4: 7c e0 40 28 lwarx r7,0,r8 c0009fe8: 7c e7 53 78 or r7,r7,r10 c0009fec: 7c e0 41 2d stwcx. r7,0,r8 c000d568: 39 00 18 00 li r8,6144 c000d56c: 7c c0 38 28 lwarx r6,0,r7 c000d570: 7c c6 40 78 andc r6,r6,r8 c000d574: 7c c0 39 2d stwcx. r6,0,r7 Most set bits are constant on lower 16 bits, so it can easily be replaced by the "immediate" version of the operation. Allow GCC to choose between the normal or immediate form. For clear bits, on 32 bits 'rlwinm' can be used instead of 'andc' for when all bits to be cleared are consecutive. On 64 bits we don't have any equivalent single operation for clearing, single bits or a few bits, we'd need two 'rldicl' so it is not worth it, the li/andc sequence is doing the same. With this patch we get: c0009fe0: 7d 00 50 28 lwarx r8,0,r10 c0009fe4: 61 08 08 00 ori r8,r8,2048 c0009fe8: 7d 00 51 2d stwcx. r8,0,r10 c000d558: 7c e0 40 28 lwarx r7,0,r8 c000d55c: 54 e7 05 64 rlwinm r7,r7,0,21,18 c000d560: 7c e0 41 2d stwcx. r7,0,r8 On pmac32_defconfig, it reduces the text by approx 10 kbytes. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e6f815d9181bab09df3b350af51149437863e9f9.1632236981.git.christophe.leroy@csgroup.eu
2021-11-29powerpc: flexible GPR range save/restore macrosNicholas Piggin15-126/+94
Introduce macros that operate on a (start, end) range of GPRs, which reduces lines of code and need to do mental arithmetic while reading the code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211022061322.2671178-1-npiggin@gmail.com
2021-11-29powerpc/watchdog: help remote CPUs to flush NMI printk outputNicholas Piggin1-6/+31
The printk layer at the moment does not seem to have a good way to force flush printk messages that are created in NMI context, except in the panic path. NMI-context printk messages normally get to the console with irq_work, but that won't help if the CPU is stuck with irqs disabled, as can be the case for hard lockup watchdog messages. The watchdog currently flushes the printk buffers after detecting a lockup on remote CPUs, but they may not have processed their NMI IPI yet by that stage, or they may have self-detected a lockup in which case they won't go via this NMI IPI path. Improve the situation by having NMI-context mark a flag if it called printk, and have watchdog timer interrupts check if that flag was set and try to flush if it was. Latency is not a big problem because we were already stuck for a while, just need to try to make sure the messages eventually make it out. Depends-on: 5d5e4522a7f4 ("printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211119113146.752759-6-npiggin@gmail.com
2021-11-29powerpc: Don't bother about .data..Lubsan sectionsChristophe Leroy1-8/+0
Since commit 9a427556fb8e ("vmlinux.lds.h: catch compound literals into data and BSS") .data..Lubsan sections are taken into account in DATA_MAIN which is included in DATA_DATA macro. No need to take care of them anymore in powerpc vmlinux.lds.S Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3eb14570612eef17e01bb67f14a4450136001794.1637840601.git.christophe.leroy@csgroup.eu
2021-11-29powerpc/ptdump: Fix display a BAT's size unitChristophe Leroy1-2/+2
We have wrong units on BAT's sizes (G instead of M, M instead of ...) ---[ Instruction Block Address Translation ]--- 0: 0xc0000000-0xc03fffff 0x00000000 4G Kernel x m 1: 0xc0400000-0xc05fffff 0x00400000 2G Kernel x m 2: 0xc0600000-0xc06fffff 0x00600000 1G Kernel x m 3: 0xc0700000-0xc077ffff 0x00700000 512M Kernel x m 4: 0xc0780000-0xc079ffff 0x00780000 128M Kernel x m 5: 0xc07a0000-0xc07bffff 0x007a0000 128M Kernel x m 6: - 7: - This is because pt_dump_size() expects a size in Kbytes but bat_show_603() gives the size in bytes. To avoid risk of confusion, change pt_dump_size() to take bytes. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f16c30f5c9185a63335322cf1a8b22f189d335ef.1637922595.git.christophe.leroy@csgroup.eu
2021-11-29powerpc/ftrace: Activate HAVE_DYNAMIC_FTRACE_WITH_REGS on PPC32Christophe Leroy4-12/+125
Unlike PPC64, PPC32 doesn't require any special compiler option to get _mcount() call not clobbering registers. Provide ftrace_regs_caller() and ftrace_regs_call() and activate HAVE_DYNAMIC_FTRACE_WITH_REGS. That's heavily copied from ftrace_64_mprofile.S For the time being leave livepatching aside, it will come with following patch. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1862dc7719855cc2a4eec80920d94c955877557e.1635423081.git.christophe.leroy@csgroup.eu
2021-11-29powerpc/ftrace: Add module_trampoline_target() for PPC32Christophe Leroy2-33/+29
module_trampoline_target() is used by __ftrace_modify_call(). Implement it for PPC32 so that CONFIG_DYNAMIC_FTRACE_WITH_REGS can be activated on PPC32 as well. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/42345f464fb465f0fc76f3090e250be8fc1729f0.1635423081.git.christophe.leroy@csgroup.eu
2021-11-29powerpc/ftrace: No need to read LR from stack in _mcount()Christophe Leroy1-5/+4
All functions calling _mcount do it exactly the same way, with the following sequence of instructions: c07de788: 7c 08 02 a6 mflr r0 c07de78c: 90 01 00 04 stw r0,4(r1) c07de790: 4b 84 13 65 bl c001faf4 <_mcount> Allthough LR is pushed on stack, it is still in r0 while entering _mcount(). Function arguments are in r3-r10, so r11 and r12 are still available at that point. Do like PPC64 and use r12 to move LR into CTR, so that r0 is preserved and doesn't need to be restored from the stack. While at it, bring back the EXPORT_SYMBOL at the end of _mcount. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/24a3ba7db388537c44a038026f926d885372e6d3.1635423081.git.christophe.leroy@csgroup.eu
2021-11-29powerpc: Mark probe_machine() __init and staticMichael Ellerman2-3/+1
Prior to commit b1923caa6e64 ("powerpc: Merge 32-bit and 64-bit setup_arch()") probe_machine() was called from setup_32/64.c and lived in setup-common.c. But now it's only called from setup-common.c so it can be static and __init, and we don't need the declaration in machdep.h either. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211124093254.1054750-6-mpe@ellerman.id.au
2021-11-29powerpc/smp: Move setup_profiling_timer() under CONFIG_PROFILINGMichael Ellerman1-0/+2
setup_profiling_timer() is only needed when CONFIG_PROFILING is enabled. Fixes the following W=1 warning when CONFIG_PROFILING=n: linux/arch/powerpc/kernel/smp.c:1638:5: error: no previous prototype for ‘setup_profiling_timer’ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211124093254.1054750-5-mpe@ellerman.id.au
2021-11-29powerpc/mm: Move tlbcam_sz() and make it staticMichael Ellerman1-5/+5
Building with W=1 we see a warning: linux/arch/powerpc/mm/nohash/fsl_book3e.c:63:15: error: no previous prototype for ‘tlbcam_sz’ tlbcam_sz() is not used outside this file, so we can make it static. However it's only used inside #ifdef CONFIG_PPC32, so move it within that ifdef, otherwise we would get a defined but not used error. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211124093254.1054750-4-mpe@ellerman.id.au
2021-11-29powerpc/85xx: Make c293_pcie_pic_init() staticMichael Ellerman1-1/+1
To fix the W=1 warning: linux/arch/powerpc/platforms/85xx/c293pcie.c:22:13: error: no previous prototype for ‘c293_pcie_pic_init’ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211124093254.1054750-3-mpe@ellerman.id.au
2021-11-29powerpc/85xx: Make mpc85xx_smp_kexec_cpu_down() staticMichael Ellerman1-2/+2
To fix the W=1 warning: arch/powerpc/platforms/85xx/smp.c:369:6: error: no previous prototype for ‘mpc85xx_smp_kexec_cpu_down’ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211124093254.1054750-2-mpe@ellerman.id.au
2021-11-29powerpc/85xx: Fix no previous prototype warning for mpc85xx_setup_pmc()Michael Ellerman1-0/+2
Fixes the following W=1 warning: arch/powerpc/platforms/85xx/mpc85xx_pm_ops.c:89:12: warning: no previous prototype for 'mpc85xx_setup_pmc' Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211124093254.1054750-1-mpe@ellerman.id.au
2021-11-29powerpc: select CPUMASK_OFFSTACK if NR_CPUS >= 8192Nicholas Piggin1-0/+1
Some core kernel code starts to go beyond the 2048 byte stack size warning at NR_CPUS=8192, so select CPUMASK_OFFSTACK in that case. x86 does similarly for very large NR_CPUS. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211105035042.1398309-2-npiggin@gmail.com
2021-11-29powerpc: remove cpu_online_cores_map functionNicholas Piggin3-41/+8
This function builds the cores online map with on-stack cpumasks which can cause high stack usage with large NR_CPUS. It is not used in any performance sensitive paths, so instead just check for first thread sibling. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211105035042.1398309-1-npiggin@gmail.com