summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)AuthorFilesLines
2021-02-11KVM: PPC: Book3S HV: Ensure radix guest has no SLB entriesPaul Mackerras1-1/+5
Commit 68ad28a4cdd4 ("KVM: PPC: Book3S HV: Fix radix guest SLB side channel") changed the older guest entry path, with the side effect that vcpu->arch.slb_max no longer gets cleared for a radix guest. This means that a HPT guest which loads some SLB entries, switches to radix mode, runs the guest using the old guest entry path (e.g., because the indep_threads_mode module parameter has been set to false), and then switches back to HPT mode would now see the old SLB entries being present, whereas previously it would have seen no SLB entries. To avoid changing guest-visible behaviour, this adds a store instruction to clear vcpu->arch.slb_max for a radix guest using the old guest entry path. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-11softirq: Move do_softirq_own_stack() to generic asm headerThomas Gleixner1-0/+1
To avoid include recursion hell move the do_softirq_own_stack() related content into a generic asm header and include it from all places in arch/ which need the prototype. This allows architectures to provide an inline implementation of do_softirq_own_stack() without introducing a lot of #ifdeffery all over the place. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210210002513.289960691@linutronix.de
2021-02-11softirq: Move __ARCH_HAS_DO_SOFTIRQ to KconfigThomas Gleixner2-2/+1
To prepare for inlining do_softirq_own_stack() replace __ARCH_HAS_DO_SOFTIRQ with a Kconfig switch and select it in the affected architectures. This allows in the next step to move the function prototype and the inline stub into a seperate asm-generic header file which is required to avoid include recursion. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210210002513.181713427@linutronix.de
2021-02-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller8-21/+23
2021-02-10crypto: powerpc/sha256 - remove unneeded semicolonYang Li1-1/+1
Eliminate the following coccicheck warning: ./arch/powerpc/crypto/sha256-spe-glue.c:132:2-3: Unneeded semicolon Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-02-10KVM: PPC: Don't always report hash MMU capability for P9 < DD2.2Fabiano Rosas3-2/+13
These machines don't support running both MMU types at the same time, so remove the KVM_CAP_PPC_MMU_HASH_V3 capability when the host is using Radix MMU. [paulus@ozlabs.org - added defensive check on kvmppc_hv_ops->hash_v3_possible] Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10KVM: PPC: Book3S HV: Save and restore FSCR in the P9 pathFabiano Rosas1-0/+4
The Facility Status and Control Register is a privileged SPR that defines the availability of some features in problem state. Since it can be written by the guest, we must restore it to the previous host value after guest exit. This restoration is currently done by taking the value from current->thread.fscr, which in the P9 path is not enough anymore because the guest could context switch the QEMU thread, causing the guest-current value to be saved into the thread struct. The above situation manifested when running a QEMU linked against a libc with System Call Vectored support, which causes scv instructions to be run by QEMU early during the guest boot (during SLOF), at which point the FSCR is 0 due to guest entry. After a few scv calls (1 to a couple hundred), the context switching happens and the QEMU thread runs with the guest value, resulting in a Facility Unavailable interrupt. This patch saves and restores the host value of FSCR in the inner guest entry loop in a way independent of current->thread.fscr. The old way of doing it is still kept in place because it works for the old entry path. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10KVM: PPC: remove unneeded semicolonYang Li1-1/+1
Eliminate the following coccicheck warning: ./arch/powerpc/kvm/booke.c:701:2-3: Unneeded semicolon Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10KVM: PPC: Book3S HV: Use POWER9 SLBIA IH=6 variant to clear SLBNicholas Piggin1-3/+3
IH=6 may preserve hypervisor real-mode ERAT entries and is the recommended SLBIA hint for switching partitions. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10KVM: PPC: Book3S HV: No need to clear radix host SLB before loading HPT guestNicholas Piggin1-1/+5
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10KVM: PPC: Book3S HV: Fix radix guest SLB side channelNicholas Piggin1-8/+31
The slbmte instruction is legal in radix mode, including radix guest mode. This means radix guests can load the SLB with arbitrary data. KVM host does not clear the SLB when exiting a guest if it was a radix guest, which would allow a rogue radix guest to use the SLB as a side channel to communicate with other guests. Fix this by ensuring the SLB is cleared when coming out of a radix guest. Only the first 4 entries are a concern, because radix guests always run with LPCR[UPRT]=1, which limits the reach of slbmte. slbia is not used (except in a non-performance-critical path) because it can clear cached translations. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host ↵Nicholas Piggin5-226/+32
without mixed mode support This reverts much of commit c01015091a770 ("KVM: PPC: Book3S HV: Run HPT guests on POWER9 radix hosts"), which was required to run HPT guests on RPT hosts on early POWER9 CPUs without support for "mixed mode", which meant the host could not run with MMU on while guests were running. This code has some corner case bugs, e.g., when the guest hits a machine check or HMI the primary locks up waiting for secondaries to switch LPCR to host, which they never do. This could all be fixed in software, but most CPUs in production have mixed mode support, and those that don't are believed to be all in installations that don't use this capability. So simplify things and remove support. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10KVM: PPC: Book3S HV: Introduce new capability for 2nd DAWRRavi Bangoria3-0/+23
Introduce KVM_CAP_PPC_DAWR1 which can be used by QEMU to query whether KVM supports 2nd DAWR or not. The capability is by default disabled even when the underlying CPU supports 2nd DAWR. QEMU needs to check and enable it manually to use the feature. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10KVM: PPC: Book3S HV: Add infrastructure to support 2nd DAWRRavi Bangoria7-1/+87
KVM code assumes single DAWR everywhere. Add code to support 2nd DAWR. DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/ unset it. Introduce new case H_SET_MODE_RESOURCE_SET_DAWR1 for 2nd DAWR. Also, KVM will support 2nd DAWR only if CPU_FTR_DAWR1 is set. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10KVM: PPC: Book3S HV: Rename current DAWR macros and variablesRavi Bangoria5-30/+30
Power10 is introducing a second DAWR (Data Address Watchpoint Register). Use real register names (with suffix 0) from ISA for current macros and variables used by kvm. One exception is KVM_REG_PPC_DAWR. Keep it as it is because it's uapi so changing it will break userspace. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10KVM: PPC: Book3S HV: Allow nested guest creation when L0 hv_guest_state > L1Ravi Bangoria2-12/+60
On powerpc, L1 hypervisor takes help of L0 using H_ENTER_NESTED hcall to load L2 guest state in cpu. L1 hypervisor prepares the L2 state in struct hv_guest_state and passes a pointer to it via hcall. Using that pointer, L0 reads/writes that state directly from/to L1 memory. Thus L0 must be aware of hv_guest_state layout of L1. Currently it uses version field to achieve this. i.e. If L0 hv_guest_state.version != L1 hv_guest_state.version, L0 won't allow nested kvm guest. This restriction can be loosened up a bit. L0 can be taught to understand older layout of hv_guest_state, if we restrict the new members to be added only at the end, i.e. we can allow nested guest even when L0 hv_guest_state.version > L1 hv_guest_state.version. Though, the other way around is not possible. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-09KVM: Raise the maximum number of user memslotsVitaly Kuznetsov1-1/+0
Current KVM_USER_MEM_SLOTS limits are arch specific (512 on Power, 509 on x86, 32 on s390, 16 on MIPS) but they don't really need to be. Memory slots are allocated dynamically in KVM when added so the only real limitation is 'id_to_index' array which is 'short'. We don't have any other KVM_MEM_SLOTS_NUM/KVM_USER_MEM_SLOTS-sized statically defined structures. Low KVM_USER_MEM_SLOTS can be a limiting factor for some configurations. In particular, when QEMU tries to start a Windows guest with Hyper-V SynIC enabled and e.g. 256 vCPUs the limit is hit as SynIC requires two pages per vCPU and the guest is free to pick any GFN for each of them, this fragments memslots as QEMU wants to have a separate memslot for each of these pages (which are supposed to act as 'overlay' pages). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210127175731.2020089-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-08powerpc/64s: Handle program checks in wrong endian during early bootMichael Ellerman1-0/+45
There's a short window during boot where although the kernel is running little endian, any exceptions will cause the CPU to switch back to big endian. This situation persists until we call configure_exceptions(), which calls either the hypervisor or OPAL to configure the CPU so that exceptions will be taken in little endian (via HID0[HILE]). We don't intend to take exceptions during early boot, but one way we sometimes do is via a WARN/BUG etc. Those all boil down to a trap instruction, which will cause a program check exception. The first instruction of the program check handler is an mtsprg, which when executed in the wrong endian is an lhzu with a ~3GB displacement from r3. The content of r3 is random, so that becomes a load from some random location, and depending on the system (installed RAM etc.) can easily lead to a checkstop, or an infinitely recursive page fault. That prevents whatever the WARN/BUG was complaining about being printed to the console, and the user just sees a dead system. We can fix it by having a trampoline at the beginning of the program check handler that detects we are in the wrong endian, and flips us back to the correct endian. We can't flip MSR[LE] using mtmsr (alas), so we have to use rfid. That requires backing up SRR0/1 as well as a GPR. To do that we use SPRG0/2/3 (SPRG1 is already used for the paca). SPRG3 is user readable, but this trampoline is only active very early in boot, and SPRG3 will be reinitialised in vdso_getcpu_init() before userspace starts. With this trampoline in place we can survive a WARN early in boot and print a stack trace, which is eventually printed to the console once the console is up, eg: [83565.758545] kexec_core: Starting new kernel [ 0.000000] ------------[ cut here ]------------ [ 0.000000] static_key_enable_cpuslocked(): static key '0xc000000000ea6160' used before call to jump_label_init() [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:166 static_key_enable_cpuslocked+0xfc/0x120 [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0-gcc-8.2.0-dirty #618 [ 0.000000] NIP: c0000000002fd46c LR: c0000000002fd468 CTR: c000000000170660 [ 0.000000] REGS: c000000001227940 TRAP: 0700 Not tainted (5.10.0-gcc-8.2.0-dirty) [ 0.000000] MSR: 9000000002823003 <SF,HV,VEC,VSX,FP,ME,RI,LE> CR: 24882422 XER: 20040000 [ 0.000000] CFAR: 0000000000000730 IRQMASK: 1 [ 0.000000] GPR00: c0000000002fd468 c000000001227bd0 c000000001228300 0000000000000065 [ 0.000000] GPR04: 0000000000000001 0000000000000065 c0000000010cf970 000000000000000d [ 0.000000] GPR08: 0000000000000000 0000000000000000 0000000000000000 c00000000122763f [ 0.000000] GPR12: 0000000000002000 c000000000f8a980 0000000000000000 0000000000000000 [ 0.000000] GPR16: 0000000000000000 0000000000000000 c000000000f88c8e c000000000f88c9a [ 0.000000] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 0.000000] GPR24: 0000000000000000 c000000000dea3a8 0000000000000000 c000000000f35114 [ 0.000000] GPR28: 0000002800000000 c000000000f88c9a c000000000f88c8e c000000000ea6160 [ 0.000000] NIP [c0000000002fd46c] static_key_enable_cpuslocked+0xfc/0x120 [ 0.000000] LR [c0000000002fd468] static_key_enable_cpuslocked+0xf8/0x120 [ 0.000000] Call Trace: [ 0.000000] [c000000001227bd0] [c0000000002fd468] static_key_enable_cpuslocked+0xf8/0x120 (unreliable) [ 0.000000] [c000000001227c40] [c0000000002fd4c0] static_key_enable+0x30/0x50 [ 0.000000] [c000000001227c70] [c000000000f6629c] early_page_poison_param+0x58/0x9c [ 0.000000] [c000000001227cb0] [c000000000f351b8] do_early_param+0xa4/0x10c [ 0.000000] [c000000001227d30] [c00000000011e020] parse_args+0x270/0x5e0 [ 0.000000] [c000000001227e20] [c000000000f35864] parse_early_options+0x48/0x5c [ 0.000000] [c000000001227e40] [c000000000f358d0] parse_early_param+0x58/0x84 [ 0.000000] [c000000001227e70] [c000000000f3a368] early_init_devtree+0xc4/0x490 [ 0.000000] [c000000001227f10] [c000000000f3bca0] early_setup+0xc8/0x1c8 [ 0.000000] [c000000001227f90] [000000000000c320] 0xc320 [ 0.000000] Instruction dump: [ 0.000000] 4bfffddd 7c2004ac 39200001 913f0000 4bffffb8 7c651b78 3c82ffac 3c62ffc0 [ 0.000000] 38841b00 3863f310 4bdf03a5 60000000 <0fe00000> 4bffff38 60000000 60000000 [ 0.000000] random: get_random_bytes called from print_oops_end_marker+0x40/0x80 with crng_init=0 [ 0.000000] ---[ end trace 0000000000000000 ]--- [ 0.000000] dt-cpu-ftrs: setup for ISA 3000 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210202130207.1303975-2-mpe@ellerman.id.au
2021-02-08powerpc/64: Make stack tracing work during very early bootMichael Ellerman1-0/+3
If we try to stack trace very early during boot, either due to a WARN/BUG or manual dump_stack(), we will oops in valid_emergency_stack() when we try to dereference the paca_ptrs array. The fix is simple, we just return false if paca_ptrs isn't allocated yet. The stack pointer definitely isn't part of any emergency stack because we haven't allocated any yet. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210202130207.1303975-1-mpe@ellerman.id.au
2021-02-08powerpc64/idle: Fix SP offsets when saving GPRsChristopher M. Riedl1-65/+73
The idle entry/exit code saves/restores GPRs in the stack "red zone" (Protected Zone according to PowerPC64 ELF ABI v2). However, the offset used for the first GPR is incorrect and overwrites the back chain - the Protected Zone actually starts below the current SP. In practice this is probably not an issue, but it's still incorrect so fix it. Also expand the comments to explain why using the stack "red zone" instead of creating a new stackframe is appropriate here. Signed-off-by: Christopher M. Riedl <cmr@codefail.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210206072342.5067-1-cmr@codefail.de
2021-02-08powerpc/32s: Allow constant folding in mtsr()/mfsr()Christophe Leroy1-2/+8
On the same way as we did in wrtee(), add an alternative using mtsr/mfsr instructions instead of mtsrin/mfsrin when the segment register can be determined at compile time. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/9baed0ff9d76723ec90f1b567ddd4ac1ecc7a190.1612612022.git.christophe.leroy@csgroup.eu
2021-02-08powerpc/32s: mfsrin()/mtsrin() become mfsr()/mtsr()Christophe Leroy5-9/+9
Function names should tell what the function does, not how. mfsrin() and mtsrin() are read/writing segment registers. They are called that way because they are using mfsrin and mtsrin instructions, but it doesn't matter for the caller. In preparation of following patch, change their name to mfsr() and mtsr() in order to make it obvious they manipulate segment registers without messing up with how they do it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f92d99f4349391b77766745900231aa880a0efb5.1612612022.git.christophe.leroy@csgroup.eu
2021-02-08powerpc/32s: Change mfsrin() into a static inline functionChristophe Leroy2-7/+8
mfsrin() is a macro. Change in into an inline function to avoid conflicts in KVM and make it more evolutive. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/72c7b9879e2e2e6f5c27dadda6486386c2b50f23.1612612022.git.christophe.leroy@csgroup.eu
2021-02-08powerpc/uaccess: Perform barrier_nospec() in KUAP allowance helpersChristophe Leroy2-11/+3
barrier_nospec() in uaccess helpers is there to protect against speculative accesses around access_ok(). When using user_access_begin() sequences together with unsafe_get_user() like macros, barrier_nospec() is called for every single read although we know the access_ok() is done onece. Since all user accesses must be granted by a call to either allow_read_from_user() or allow_read_write_user() which will always happen after the access_ok() check, move the barrier_nospec() there. Reported-by: Christopher M. Riedl <cmr@codefail.de> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c72f014730823b413528e90ab6c4d3bcb79f8497.1612692067.git.christophe.leroy@csgroup.eu
2021-02-08powerpc/sstep: Fix darn emulationSandipan Das1-1/+1
Commit 8813ff49607e ("powerpc/sstep: Check instruction validity against ISA version before emulation") introduced a proper way to skip unknown instructions. This makes sure that the same is used for the darn instruction when the range selection bits have a reserved value. Fixes: a23987ef267a ("powerpc: sstep: Add support for darn instruction") Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210204080744.135785-2-sandipan@linux.ibm.com
2021-02-08powerpc/sstep: Fix load-store and update emulationSandipan Das1-0/+14
The Power ISA says that the fixed-point load and update instructions must neither use R0 for the base address (RA) nor have the destination (RT) and the base address (RA) as the same register. Similarly, for fixed-point stores and floating-point loads and stores, the instruction is invalid when R0 is used as the base address (RA). This is applicable to the following instructions. * Load Byte and Zero with Update (lbzu) * Load Byte and Zero with Update Indexed (lbzux) * Load Halfword and Zero with Update (lhzu) * Load Halfword and Zero with Update Indexed (lhzux) * Load Halfword Algebraic with Update (lhau) * Load Halfword Algebraic with Update Indexed (lhaux) * Load Word and Zero with Update (lwzu) * Load Word and Zero with Update Indexed (lwzux) * Load Word Algebraic with Update Indexed (lwaux) * Load Doubleword with Update (ldu) * Load Doubleword with Update Indexed (ldux) * Load Floating Single with Update (lfsu) * Load Floating Single with Update Indexed (lfsux) * Load Floating Double with Update (lfdu) * Load Floating Double with Update Indexed (lfdux) * Store Byte with Update (stbu) * Store Byte with Update Indexed (stbux) * Store Halfword with Update (sthu) * Store Halfword with Update Indexed (sthux) * Store Word with Update (stwu) * Store Word with Update Indexed (stwux) * Store Doubleword with Update (stdu) * Store Doubleword with Update Indexed (stdux) * Store Floating Single with Update (stfsu) * Store Floating Single with Update Indexed (stfsux) * Store Floating Double with Update (stfdu) * Store Floating Double with Update Indexed (stfdux) E.g. the following behaviour is observed for an invalid load and update instruction having RA = RT. While a userspace program having an instruction word like 0xe9ce0001, i.e. ldu r14, 0(r14), runs without getting receiving a SIGILL on a Power system (observed on P8 and P9), the outcome of executing that instruction word varies and its behaviour can be considered to be undefined. Attaching an uprobe at that instruction's address results in emulation which currently performs the load as well as writes the effective address back to the base register. This might not match the outcome from hardware. To remove any inconsistencies, this adds additional checks for the aforementioned instructions to make sure that the emulation infrastructure treats them as unknown. The kernel can then fallback to executing such instructions on hardware. Fixes: 0016a4cf5582 ("powerpc: Emulate most Book I instructions in emulate_step()") Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210204080744.135785-1-sandipan@linux.ibm.com
2021-02-08powerpc/8xx: Fix software emulation interruptChristophe Leroy1-1/+1
For unimplemented instructions or unimplemented SPRs, the 8xx triggers a "Software Emulation Exception" (0x1000). That interrupt doesn't set reason bits in SRR1 as the "Program Check Exception" does. Go through emulation_assist_interrupt() to set REASON_ILLEGAL. Fixes: fbbcc3bb139e ("powerpc/8xx: Remove SoftwareEmulation()") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ad782af87a222efc79cfb06079b0fd23d4224eaf.1612515180.git.christophe.leroy@csgroup.eu
2021-02-08powerpc/perf: Record counter overflow always if SAMPLE_IP is unsetAthira Rajeev1-4/+15
While sampling for marked events, currently we record the sample only if the SIAR valid bit of Sampled Instruction Event Register (SIER) is set. SIAR_VALID bit is used for fetching the instruction address from Sampled Instruction Address Register(SIAR). But there are some usecases, where the user is interested only in the PMU stats at each counter overflow and the exact IP of the overflow event is not required. Dropping SIAR invalid samples will fail to record some of the counter overflows in such cases. Example of such usecase is dumping the PMU stats (event counts) after some regular amount of instructions/events from the userspace (ex: via ptrace). Here counter overflow is indicated to userspace via signal handler, and captured by monitoring and enabling I/O signaling on the event file descriptor. In these cases, we expect to get sample/overflow indication after each specified sample_period. Perf event attribute will not have PERF_SAMPLE_IP set in the sample_type if exact IP of the overflow event is not requested. So while profiling if SAMPLE_IP is not set, just record the counter overflow irrespective of SIAR_VALID check. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> [mpe: Reflow comment and if formatting] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1612516492-1428-1-git-send-email-atrajeev@linux.vnet.ibm.com
2021-02-08powerpc/pseries/dlpar: handle ibm, configure-connector delay statusNathan Lynch1-4/+3
dlpar_configure_connector() has two problems in its handling of ibm,configure-connector's return status: 1. When the status is -2 (busy, call again), we call ibm,configure-connector again immediately without checking whether to schedule, which can result in monopolizing the CPU. 2. Extended delay status (9900..9905) goes completely unhandled, causing the configuration to unnecessarily terminate. Fix both of these issues by using rtas_busy_delay(). Fixes: ab519a011caa ("powerpc/pseries: Kernel DLPAR Infrastructure") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210107025900.410369-1-nathanl@linux.ibm.com
2021-02-08powerpc/64s: Implement ptep_clear_flush_young that does not flush TLBsNicholas Piggin1-3/+20
Similarly to the x86 commit b13b1d2d8692 ("x86/mm: In the PTE swapout page reclaim case clear the accessed bit instead of flushing the TLB"), implement ptep_clear_flush_young that does not actually flush the TLB in the case the referenced bit is cleared. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201217134731.488135-8-npiggin@gmail.com
2021-02-08powerpc/64s/radix: serialize_against_pte_lookup IPIs trim mm_cpumaskNicholas Piggin2-10/+23
serialize_against_pte_lookup() performs IPIs to all CPUs in mm_cpumask. Take this opportunity to try trim the CPU out of mm_cpumask. This can reduce the cost of future serialize_against_pte_lookup() and/or the cost of future TLB flushes. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201217134731.488135-7-npiggin@gmail.com
2021-02-08powerpc/64s/radix: occasionally attempt to trim mm_cpumaskNicholas Piggin1-4/+56
A single-threaded process that is flushing its own address space is so far the only case where the mm_cpumask is attempted to be trimmed. This patch expands that to flush in other situations, multi-threaded processes and external sources. For now it's a relatively simple occasional trim attempt. The main aim is to add the mechanism, tweaking and tuning can come with more data. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201217134731.488135-6-npiggin@gmail.com
2021-02-08powerpc/64s/radix: Allow mm_cpumask trimming from external sourcesNicholas Piggin1-10/+6
mm_cpumask trimming is currently restricted to be issued by the current thread of a single-threaded mm. This patch relaxes that and allows the mask to be trimmed from any context. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201217134731.488135-5-npiggin@gmail.com
2021-02-08powerpc/64s/radix: Check for no TLB flush requiredNicholas Piggin1-13/+25
If there are no CPUs in mm_cpumask, no TLB flush is required at all. This patch adds a check for this case. Currently it's not tested for, in fact mm_is_thread_local() returns false if the current CPU is not in mm_cpumask, so it's treated as a global flush. This can come up in some cases like exec failure before the new mm has ever been switched to. This patch reduces TLBIE instructions required to build a kernel from about 120,000 to 45,000. Another situation it could help is page reclaim, KSM, THP, etc., (i.e., asynch operations external to the process) where the process is sleeping and has all TLBs flushed out of all CPUs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201217134731.488135-4-npiggin@gmail.com
2021-02-08powerpc/64s/radix: refactor TLB flush type selectionNicholas Piggin1-82/+94
The logic to decide what kind of TLB flush is required (local, global, or IPI) is spread multiple times over the several kinds of TLB flushes. Move it all into a single function which may issue IPIs if necessary, and also returns a flush type that is to be used. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201217134731.488135-3-npiggin@gmail.com
2021-02-08powerpc/64s/radix: add warning and comments in mm_cpumask trimNicholas Piggin1-6/+21
Add a comment explaining part of the logic for mm_cpumask trimming, and add a (hopefully graceful) check and warning in case something gets it wrong. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201217134731.488135-2-npiggin@gmail.com
2021-02-08powerpc/perf: Expose Performance Monitor Counter SPR's as part of extended regsAthira Rajeev4-15/+39
Currently Monitor Mode Control Registers and Sampling registers are part of extended regs. Patch adds support to include Performance Monitor Counter Registers (PMC1 to PMC6 ) as part of extended registers. PMCs are saved in the perf interrupt handler as part of per-cpu array 'pmcs' in struct cpu_hw_events. While capturing the register values for extended regs, fetch these saved PMC values. Simplified the PERF_REG_PMU_MASK_300/31 definition to include PMU SPRs MMCR0 to PMC6. Exclude the unsupported SPRs (MMCR3, SIER2, SIER3) from extended mask value for CPU_FTR_ARCH_300 in the new definition. PERF_REG_EXTENDED_MAX is used to check if any index beyond the extended registers is requested in the sample. Have one PERF_REG_EXTENDED_MAX for CPU_FTR_ARCH_300/CPU_FTR_ARCH_31 since perf_reg_validate function already checks the extended mask for the presence of any unsupported register. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1612335337-1888-3-git-send-email-atrajeev@linux.vnet.ibm.com
2021-02-08powerpc/perf: Include PMCs as part of per-cpu cpuhw_events structAthira Rajeev1-6/+12
To support capturing of PMC's as part of extended registers, the value of SPR's PMC1 to PMC6 has to be saved in the starting of PMI interrupt handler. This is needed since we are resetting the overflown PMC before creating sample and hence directly reading SPRN_PMCx in 'perf_reg_value' will be capturing the modified value. To solve this, add a per-cpu array as part of structure cpu_hw_events and use this array to capture PMC values in the perf interrupt handler. Patch also re-factor's the interrupt handler code to use this per-cpu array instead of current local array. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1612335337-1888-2-git-send-email-atrajeev@linux.vnet.ibm.com
2021-02-08powerpc/pkeys: Remove unused codeSandipan Das2-9/+0
This removes arch_supports_pkeys(), arch_usable_pkeys() and thread_pkey_regs_*() which are remnants from the following: commit 06bb53b33804 ("powerpc: store and restore the pkey state across context switches") commit 2cd4bd192ee9 ("powerpc/pkeys: Fix handling of pkey state across fork()") commit cf43d3b26452 ("powerpc: Enable pkey subsystem") arch_supports_pkeys() and arch_usable_pkeys() were unused since their introduction while thread_pkey_regs_*() became unused after the introduction of the following: commit d5fa30e6993f ("powerpc/book3s64/pkeys: Reset userspace AMR correctly on exec") commit 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode") Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Reviewed-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210202150050.75335-1-sandipan@linux.ibm.com
2021-02-08powerpc/44x: Fix a spelling mismach to mismatch in head_44x.SBhaskar Chowdhury1-2/+2
s/mismach/mismatch/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210202093746.5198-1-unixbhaskar@gmail.com
2021-02-08powerpc: remove unneeded semicolonsChengyang Fan16-29/+29
Remove superfluous semicolons after function definitions. Signed-off-by: Chengyang Fan <cy.fan@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210125095338.1719405-1-cy.fan@huawei.com
2021-02-08powerpc/akebono: Fix unmet dependency errorsMichael Ellerman2-7/+5
The AKEBONO config has various selects under it, including some with user-selectable dependencies, which means those dependencies can be disabled. This leads to warnings from Kconfig. This can be seen with eg: $ make allnoconfig $ ./scripts/config --file build~/.config -k -e CONFIG_44x -k -e CONFIG_PPC_47x -e CONFIG_AKEBONO $ make olddefconfig WARNING: unmet direct dependencies detected for ATA Depends on [n]: HAS_IOMEM [=y] && BLOCK [=n] Selected by [y]: - AKEBONO [=y] && PPC_47x [=y] WARNING: unmet direct dependencies detected for NETDEVICES Depends on [n]: NET [=n] Selected by [y]: - AKEBONO [=y] && PPC_47x [=y] WARNING: unmet direct dependencies detected for ETHERNET Depends on [n]: NETDEVICES [=y] && NET [=n] Selected by [y]: - AKEBONO [=y] && PPC_47x [=y] WARNING: unmet direct dependencies detected for MMC_SDHCI Depends on [n]: MMC [=n] && HAS_DMA [=y] Selected by [y]: - AKEBONO [=y] && PPC_47x [=y] WARNING: unmet direct dependencies detected for MMC_SDHCI_PLTFM Depends on [n]: MMC [=n] && MMC_SDHCI [=y] Selected by [y]: - AKEBONO [=y] && PPC_47x [=y] The problem is that AKEBONO is using select to enable things that are not true dependencies, but rather things you probably want enabled in an AKEBONO kernel. That is what a defconfig is for. So drop those selects and instead move those symbols into the defconfig. This fixes all the kconfig warnings, and the result of make 44x/akebono_defconfig is the same before and after the patch. Reported-by: Yury Norov <yury.norov@gmail.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20210201012503.940145-1-mpe@ellerman.id.au
2021-02-08powerpc/64s: runlatch interrupt handling in CNicholas Piggin2-18/+7
There is no need for this to be in asm, use the new intrrupt entry wrapper. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-42-npiggin@gmail.com
2021-02-08powerpc/64s: move NMI soft-mask handling to CNicholas Piggin2-60/+25
Saving and restoring soft-mask state can now be done in C using the interrupt handler wrapper functions. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-41-npiggin@gmail.com
2021-02-08powerpc: move NMI entry/exit code into wrapperNicholas Piggin4-46/+38
This moves the common NMI entry and exit code into the interrupt handler wrappers. This changes the behaviour of soft-NMI (watchdog) and HMI interrupts, and also MCE interrupts on 64e, by adding missing parts of the NMI entry to them. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-40-npiggin@gmail.com
2021-02-08powerpc/pseries/mce: restore msr before returning from handlerNicholas Piggin1-4/+15
The pseries real-mode machine check handler can enable the MMU, and return from the handler with the MMU still enabled. This works, but real-mode handler wrapper exit handlers want to rely on the MMU being in real-mode. So change the pseries handler to restore the MSR after it has finished virtual mode tasks. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1612702361.lm7fqo56re.astroid@bobo.none
2021-02-08powerpc/64: entry cpu time accounting in CNicholas Piggin4-30/+6
There is no need for this to be in asm, use the new interrupt entry wrapper. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-39-npiggin@gmail.com
2021-02-08powerpc/64: move account_stolen_time into its own functionNicholas Piggin2-9/+15
This will be used by interrupt entry as well. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-38-npiggin@gmail.com
2021-02-08powerpc/64s: reconcile interrupts in CNicholas Piggin2-30/+11
There is no need for this to be in asm, use the new intrrupt entry wrapper. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-37-npiggin@gmail.com
2021-02-08powerpc/64s: move context tracking exit to interrupt exit pathNicholas Piggin2-5/+47
The interrupt handler wrapper functions are not the ideal place to maintain context tracking because after they return, the low level exit code must then determine if there are interrupts to replay, or if the task should be preempted, etc. Those paths (e.g., schedule_user) include their own exception_enter/exit pairs to fix this up but it's a bit hacky (see schedule_user() comments). Ideally context tracking will go to user mode only when there are no more interrupts or context switches or other exit processing work to handle. 64e can not do this because it does not use the C interrupt exit code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-36-npiggin@gmail.com