summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)AuthorFilesLines
2018-10-11vmlinux.lds.h: Move LSM_TABLE into INIT_DATAKees Cook1-2/+0
Since the struct lsm_info table is not an initcall, we can just move it into INIT_DATA like all the other tables. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: John Johansen <john.johansen@canonical.com> Reviewed-by: James Morris <james.morris@microsoft.com> Signed-off-by: James Morris <james.morris@microsoft.com>
2018-10-09Merge branch 'fixes' into nextMichael Ellerman3-4/+28
Merge our fixes branch. It has a few important fixes that are needed for futher testing and also some commits that will conflict with content in next.
2018-10-09KVM: PPC: Book3S HV: Nested guest entry via hypercallPaul Mackerras1-0/+1
This adds a new hypercall, H_ENTER_NESTED, which is used by a nested hypervisor to enter one of its nested guests. The hypercall supplies register values in two structs. Those values are copied by the level 0 (L0) hypervisor (the one which is running in hypervisor mode) into the vcpu struct of the L1 guest, and then the guest is run until an interrupt or error occurs which needs to be reported to L1 via the hypercall return value. Currently this assumes that the L0 and L1 hypervisors are the same endianness, and the structs passed as arguments are in native endianness. If they are of different endianness, the version number check will fail and the hcall will be rejected. Nested hypervisors do not support indep_threads_mode=N, so this adds code to print a warning message if the administrator has set indep_threads_mode=N, and treat it as Y. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-09KVM: PPC: Use ccr field in pt_regs struct embedded in vcpu structPaul Mackerras1-2/+2
When the 'regs' field was added to struct kvm_vcpu_arch, the code was changed to use several of the fields inside regs (e.g., gpr, lr, etc.) but not the ccr field, because the ccr field in struct pt_regs is 64 bits on 64-bit platforms, but the cr field in kvm_vcpu_arch is only 32 bits. This changes the code to use the regs.ccr field instead of cr, and changes the assembly code on 64-bit platforms to use 64-bit loads and stores instead of 32-bit ones. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-09powerpc: Turn off CPU_FTR_P9_TM_HV_ASSIST in non-hypervisor modePaul Mackerras1-2/+2
When doing nested virtualization, it is only necessary to do the transactional memory hypervisor assist at level 0, that is, when we are in hypervisor mode. Nested hypervisors can just use the TM facilities as architected. Therefore we should clear the CPU_FTR_P9_TM_HV_ASSIST bit when we are not in hypervisor mode, along with the CPU_FTR_HVMODE bit. Doing this will not change anything at this stage because the only code that tests CPU_FTR_P9_TM_HV_ASSIST is in HV KVM, which currently can only be used when when CPU_FTR_HVMODE is set. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-07Merge tag 'powerpc-4.19-4' of ↵Greg Kroah-Hartman1-0/+10
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Michael writes: "powerpc fixes for 4.19 #4 Four regression fixes. A fix for a change to lib/xz which broke our zImage loader when building with XZ compression. OK'ed by Herbert who merged the original patch. The recent fix we did to avoid patching __init text broke some 32-bit machines, fix that. Our show_user_instructions() could be tricked into printing kernel memory, add a check to avoid that. And a fix for a change to our NUMA initialisation logic, which causes crashes in some kdump configurations. Thanks to: Christophe Leroy, Hari Bathini, Jann Horn, Joel Stanley, Meelis Roos, Murilo Opsfelder Araujo, Srikar Dronamraju." * tag 'powerpc-4.19-4' of https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/numa: Skip onlining a offline node in kdump path powerpc: Don't print kernel instructions in show_user_instructions() powerpc/lib: fix book3s/32 boot failure due to code patching lib/xz: Put CRC32_POLY_LE in xz_private.h
2018-10-05powerpc: Don't print kernel instructions in show_user_instructions()Michael Ellerman1-0/+10
Recently we implemented show_user_instructions() which dumps the code around the NIP when a user space process dies with an unhandled signal. This was modelled on the x86 code, and we even went so far as to implement the exact same bug, namely that if the user process crashed with its NIP pointing into the kernel we will dump kernel text to dmesg. eg: bad-bctr[2996]: segfault (11) at c000000000010000 nip c000000000010000 lr 12d0b0894 code 1 bad-bctr[2996]: code: fbe10068 7cbe2b78 7c7f1b78 fb610048 38a10028 38810020 fb810050 7f8802a6 bad-bctr[2996]: code: 3860001c f8010080 48242371 60000000 <7c7b1b79> 4082002c e8010080 eb610048 This was discovered on x86 by Jann Horn and fixed in commit 342db04ae712 ("x86/dumpstack: Don't dump kernel memory based on usermode RIP"). Fix it by checking the adjusted NIP value (pc) and number of instructions against USER_DS, and bail if we fail the check, eg: bad-bctr[2969]: segfault (11) at c000000000010000 nip c000000000010000 lr 107930894 code 1 bad-bctr[2969]: Bad NIP, not dumping instructions. Fixes: 88b0fe175735 ("powerpc: Add show_user_instructions()") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-04powerpc/64s/hash: Do not use PPC_INVALIDATE_ERAT on CPUs before POWER9Nicholas Piggin1-0/+7
PPC_INVALIDATE_ERAT is slbia IH=7 which is a new variant introduced with POWER9, and the result is undefined on earlier CPUs. Commits 7b9f71f974 ("powerpc/64s: POWER9 machine check handler") and d4748276ae ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9") caused POWER7/8 code to use this instruction. Remove it. An ERAT flush can be made by invalidatig the SLB, but before POWER9 that requires a flush and rebolt. Fixes: 7b9f71f974 ("powerpc/64s: POWER9 machine check handler") Fixes: d4748276ae ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9") Cc: stable@vger.kernel.org # v4.11+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-04powerpc/time: Add set_state_oneshot_stopped decrementer callbackAnton Blanchard1-0/+1
If CONFIG_PPC_WATCHDOG is enabled we always cap the decrementer to 0x7fffffff: if (IS_ENABLED(CONFIG_PPC_WATCHDOG)) set_dec(0x7fffffff); else set_dec(decrementer_max); If there are no future events, we don't reprogram the decrementer after this and we end up with 0x7fffffff even on a large decrementer capable system. As suggested by Nick, add a set_state_oneshot_stopped callback so we program the decrementer with decrementer_max if there are no future events. Signed-off-by: Anton Blanchard <anton@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-04powerpc/time: Use clockevents_register_device(), fixing an issue with large ↵Anton Blanchard1-14/+3
decrementer We currently cap the decrementer clockevent at 4 seconds, even on systems with large decrementer support. Fix this by converting the code to use clockevents_register_device() which calculates the upper bound based on the max_delta passed in. Signed-off-by: Anton Blanchard <anton@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03powerpc: Wire up memtestChristophe Leroy1-0/+3
Add call to early_memtest() so that kernel compiled with CONFIG_MEMTEST really perform memtest at startup when requested via 'memtest' boot parameter. Tested-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03powerpc/tm: Reformat commentsMichael Neuling1-28/+39
The comments in this file don't conform to the coding style so take them to "Comment Formatting Re-Education Camp". Suggested-by: Michael "Camp Drill Sergeant" Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Neuling <mikey@neuling.org> [mpe: Reflow some comments and add full stops, fix spelling of Sergeant.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03powerpc: Remove duplicated include from pci_32.cYueHaibing1-1/+0
Remove duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03powerpc/64s: consolidate MCE counter increment.Michal Suchanek1-3/+1
The code in machine_check_exception excludes 64s hvmode when incrementing the MCE counter only to call opal_machine_check to increment it specifically for this case. Remove the exclusion and special case. Fixes: a43c1590426c ("powerpc/pseries: Flush SLB contents on SLB MCE errors.") Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03powerpc/tm: Print 64-bits MSRBreno Leitao1-1/+1
On a kernel TM Bad thing program exception, the Machine State Register (MSR) is not being properly displayed. The exception code dumps a 32-bits value but MSR is a 64 bits register for all platforms that have HTM enabled. This patch dumps the MSR value as a 64-bits value instead of 32 bits. In order to do so, the 'reason' variable could not be used, since it trimmed MSR to 32-bits (int). Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03powerpc/tm: Remove msr_tm_active()Breno Leitao1-12/+9
Currently msr_tm_active() is a wrapper around MSR_TM_ACTIVE() if CONFIG_PPC_TRANSACTIONAL_MEM is set, or it is just a function that returns false if CONFIG_PPC_TRANSACTIONAL_MEM is not set. This function is not necessary, since MSR_TM_ACTIVE() just do the same and could be used, removing the dualism and simplifying the code. This patchset remove every instance of msr_tm_active() and replaced it by MSR_TM_ACTIVE(). Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03powerpc/ptrace: Add support for PTRACE_SYSEMUBreno Leitao1-0/+11
This is a patch that adds support for PTRACE_SYSEMU ptrace request in PowerPC architecture. When ptrace(PTRACE_SYSEMU, ...) request is called, it will be handled by the arch independent function ptrace_resume(), which will tag the task with the TIF_SYSCALL_EMU flag. This flag needs to be handled from a platform dependent point of view, which is what this patch does. This patch adds this task's flag as part of the _TIF_SYSCALL_DOTRACE, which is the MACRO that is used to trace syscalls at entrance/exit. Since TIF_SYSCALL_EMU is now part of _TIF_SYSCALL_DOTRACE, if the task has _TIF_SYSCALL_DOTRACE set, it will hit do_syscall_trace_enter() at syscall entrance and do_syscall_trace_leave() at syscall leave. do_syscall_trace_enter() needs to handle the TIF_SYSCALL_EMU flag properly, which will interrupt the syscall executing if TIF_SYSCALL_EMU is set. The output values should not be changed, i.e. the return value (r3) should contain the original syscall argument on exit. With this flag set, the syscall is not executed fundamentally, because do_syscall_trace_enter() is returning -1 which is bigger than NR_syscall, thus, skipping the syscall execution and exiting userspace. Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03powerpc: Redefine TIF_32BITS thread flagBreno Leitao1-1/+1
Moving TIF_32BIT to use bit 20 instead of 4 in the task flag field. This change is making room for an upcoming new task macro (_TIF_SYSCALL_EMU) which is preferred to set a bit in the lower 16-bits part of the word. This upcoming flag macro will take part in a composed macro (_TIF_SYSCALL_DOTRACE) which will contain other flags as well, and it is preferred that the whole _TIF_SYSCALL_DOTRACE macro only sets the lower 16 bits of a word, so, it could be handled using immediate operations (as load immediate, add immediate, ...) where the immediate operand (SI) is limited to 16-bits. Another possible solution would be using the LOAD_REG_IMMEDIATE() macro to load a full 64-bits word immediate, but it takes 5 operations instead of one. Having TIF_32BITS being redefined to use an upper bit is not a problem since there is only one place in the assembly code where TIF_32BIT is being used, and it could be replaced with an operation with right shift (addis), since it is used alone, i.e. not being part of a composed macro, which has different bits set, and would require LOAD_REG_IMMEDIATE(). Tested on a 64 bits Big Endian machine running a 32 bits task. Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03powerpc/64: add stack protector supportChristophe Leroy2-0/+7
On PPC64, as register r13 points to the paca_struct at all time, this patch adds a copy of the canary there, which is copied at task_switch. That new canary is then used by using the following GCC options: -mstack-protector-guard=tls -mstack-protector-guard-reg=r13 -mstack-protector-guard-offset=offsetof(struct paca_struct, canary)) Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03powerpc/32: add stack protector supportChristophe Leroy2-0/+5
This functionality was tentatively added in the past (commit 6533b7c16ee5 ("powerpc: Initial stack protector (-fstack-protector) support")) but had to be reverted (commit f2574030b0e3 ("powerpc: Revert the initial stack protector support") because of GCC implementing it differently whether it had been built with libc support or not. Now, GCC offers the possibility to manually set the stack-protector mode (global or tls) regardless of libc support. This time, the patch selects HAVE_STACKPROTECTOR only if -mstack-protector-guard=tls is supported by GCC. On PPC32, as register r2 points to current task_struct at all time, the stack_canary located inside task_struct can be used directly by using the following GCC options: -mstack-protector-guard=tls -mstack-protector-guard-reg=r2 -mstack-protector-guard-offset=offsetof(struct task_struct, stack_canary)) The protector is disabled for prom_init and bootx_init as it is too early to handle it properly. $ echo CORRUPT_STACK > /sys/kernel/debug/provoke-crash/DIRECT [ 134.943666] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: lkdtm_CORRUPT_STACK+0x64/0x64 [ 134.943666] [ 134.955414] CPU: 0 PID: 283 Comm: sh Not tainted 4.18.0-s3k-dev-12143-ga3272be41209 #835 [ 134.963380] Call Trace: [ 134.965860] [c6615d60] [c001f76c] panic+0x118/0x260 (unreliable) [ 134.971775] [c6615dc0] [c001f654] panic+0x0/0x260 [ 134.976435] [c6615dd0] [c032c368] lkdtm_CORRUPT_STACK_STRONG+0x0/0x64 [ 134.982769] [c6615e00] [ffffffff] 0xffffffff Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03powerpc/traps: merge unrecoverable_exception() and nonrecoverable_exception()Christophe Leroy2-12/+4
PPC32 uses nonrecoverable_exception() while PPC64 uses unrecoverable_exception(). Both functions are doing almost the same thing. This patch removes nonrecoverable_exception() Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03Revert "convert SLB miss handlers to C" and subsequent commitsMichael Ellerman4-59/+192
This reverts commits: 5e46e29e6a97 ("powerpc/64s/hash: convert SLB miss handlers to C") 8fed04d0f6ae ("powerpc/64s/hash: remove user SLB data from the paca") 655deecf67b2 ("powerpc/64s/hash: SLB allocation status bitmaps") 2e1626744e8d ("powerpc/64s/hash: provide arch_setup_exec hooks for hash slice setup") 89ca4e126a3f ("powerpc/64s/hash: Add a SLB preload cache") This series had a few bugs, and the fixes are not all trivial. So revert most of it for now. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-29Merge tag 'powerpc-4.19-3' of ↵Greg Kroah-Hartman2-5/+19
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Michael writes: "powerpc fixes for 4.19 #3 A reasonably big batch of fixes due to me being away for a few weeks. A fix for the TM emulation support on Power9, which could result in corrupting the guest r11 when running under KVM. Two fixes to the TM code which could lead to userspace GPR corruption if we take an SLB miss at exactly the wrong time. Our dynamic patching code had a bug that meant we could patch freed __init text, which could lead to corrupting userspace memory. csum_ipv6_magic() didn't work on little endian platforms since we optimised it recently. A fix for an endian bug when reading a device tree property telling us how many storage keys the machine has available. Fix a crash seen on some configurations of PowerVM when migrating the partition from one machine to another. A fix for a regression in the setup of our CPU to NUMA node mapping in KVM guests. A fix to our selftest Makefiles to make them work since a recent change to the shared Makefile logic." * tag 'powerpc-4.19-3' of https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: selftests/powerpc: Fix Makefiles for headers_install change powerpc/numa: Use associativity if VPHN hcall is successful powerpc/tm: Avoid possible userspace r1 corruption on reclaim powerpc/tm: Fix userspace r13 corruption powerpc/pseries: Fix unitialized timer reset on migration powerpc/pkeys: Fix reading of ibm, processor-storage-keys property powerpc: fix csum_ipv6_magic() on little endian platforms powerpc/powernv/ioda2: Reduce upper limit for DMA window size (again) powerpc: Avoid code patching freed init sections KVM: PPC: Book3S HV: Fix guest r11 corruption with POWER9 TM workarounds
2018-09-25powerpc/tm: Avoid possible userspace r1 corruption on reclaimMichael Neuling1-1/+8
Current we store the userspace r1 to PACATMSCRATCH before finally saving it to the thread struct. In theory an exception could be taken here (like a machine check or SLB miss) that could write PACATMSCRATCH and hence corrupt the userspace r1. The SLB fault currently doesn't touch PACATMSCRATCH, but others do. We've never actually seen this happen but it's theoretically possible. Either way, the code is fragile as it is. This patch saves r1 to the kernel stack (which can't fault) before we turn MSR[RI] back on. PACATMSCRATCH is still used but only with MSR[RI] off. We then copy r1 from the kernel stack to the thread struct once we have MSR[RI] back on. Suggested-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-25powerpc/tm: Fix userspace r13 corruptionMichael Neuling1-2/+9
When we treclaim we store the userspace checkpointed r13 to a scratch SPR and then later save the scratch SPR to the user thread struct. Unfortunately, this doesn't work as accessing the user thread struct can take an SLB fault and the SLB fault handler will write the same scratch SPRG that now contains the userspace r13. To fix this, we store r13 to the kernel stack (which can't fault) before we access the user thread struct. Found by running P8 guest + powervm + disable_1tb_segments + TM. Seen as a random userspace segfault with r13 looking like a kernel address. Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-21signal/powerpc: Use force_sig_fault where appropriateEric W. Biederman1-8/+1
Reviewed-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-09-21signal/powerpc: Simplify _exception_pkey by using force_sig_pkuerrEric W. Biederman1-9/+1
Call force_sig_pkuerr directly instead of rolling it by hand in _exception_pkey. Reviewed-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-09-21signal/powerpc: Specialize _exception_pkey for handling pkey exceptionsEric W. Biederman1-5/+5
Now that _exception no longer calls _exception_pkey it is no longer necessary to handle any signal with any si_code. All pkey exceptions are SIGSEGV with paired with SEGV_PKUERR. So just handle that case and remove the now unnecessary parameters from _exception_pkey. Reviewed-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-09-21signal/powerpc: Call force_sig_fault from _exceptionEric W. Biederman1-1/+4
The callers of _exception don't need the pkey exception logic because they are not processing a pkey exception. So just call exception_common directly and then call force_sig_fault to generate the appropriate siginfo and deliver the appropriate signal. Reviewed-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-09-21signal/powerpc: Factor the common exception code into exception_commonEric W. Biederman1-5/+13
It is brittle and wrong to populate si_pkey when there was not a pkey exception. The field does not exist for all si_codes and in some cases another field exists in the same memory location. So factor out the code that all exceptions handlers must run into exception_common, leaving the individual exception handlers to generate the signals themselves. Reviewed-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-09-19signal: Simplify tracehook_report_syscall_exitEric W. Biederman1-5/+2
Replace user_single_step_siginfo with user_single_step_report that allocates siginfo structure on the stack and sends it. This allows tracehook_report_syscall_exit to become a simple if statement that calls user_single_step_report or ptrace_report_syscall depending on the value of step. Update the default helper function now called user_single_step_report to explicitly set si_code to SI_USER and to set si_uid and si_pid to 0. The default helper has always been doing this (using memset) but it was far from obvious. The powerpc helper can now just call force_sig_fault. The x86 helper can now just call send_sigtrap. Unfortunately the default implementation of user_single_step_report can not use force_sig_fault as it does not use a SIGTRAP si_code. So it has to carefully setup the siginfo and use use force_sig_info. The net result is code that is easier to understand and simpler to maintain. Ref: 85ec7fd9f8e5 ("ptrace: introduce user_single_step_siginfo() helper") Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-09-19powerpc/fadump: re-register firmware-assisted dump if already registeredHari Bathini1-2/+2
Firmware-Assisted Dump (FADump) needs to be registered again after any memory hot add/remove operation to update the crash memory ranges. But currently, the kernel returns '-EEXIST' if we try to register without uregistering it first. This could expose the system to racing issues while unregistering and registering FADump from userspace during udev events. Spare the userspace of this and let it be taken care of in the kernel space for a simpler interface. Since this change, running 'echo 1 > /sys/kernel/fadump_registered' would result in re-regisering (unregistering and registering) FADump, if it was already registered. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-19powerpc/prom: Remove VLA in prom_check_platform_support()Suraj Jitindar Singh1-2/+5
In prom_check_platform_support() we retrieve and parse the "ibm,arch-vec-5-platform-support" property of the chosen node. Currently we use a variable length array however to avoid this use an array of constant length 8. This property is used to indicate the supported options of vector 5 bytes 23-26 of the ibm,architecture.vec node. Each of these options is a pair of bytes, thus for 4 options we have a max length of 8 bytes. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-19powerpc/pseries: Disable CPU hotplug across migrationsNathan Fontenot1-0/+2
When performing partition migrations all present CPUs must be online as all present CPUs must make the H_JOIN call as part of the migration process. Once all present CPUs make the H_JOIN call, one CPU is returned to make the rtas call to perform the migration to the destination system. During testing of migration and changing the SMT state we have found instances where CPUs are offlined, as part of the SMT state change, before they make the H_JOIN call. This results in a hung system where every CPU is either in H_JOIN or offline. To prevent this this patch disables CPU hotplug during the migration process. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-19powerpc/pseries: Remove prrn_work workqueueNathan Fontenot1-14/+3
When a PRRN event is received we are already running in a worker thread. Instead of spawning off another worker thread on the prrn_work workqueue to handle the PRRN event we can just call the PRRN handler routine directly. With this update we can also pass the scope variable for the PRRN event directly to the handler instead of it being a global variable. This patch fixes the following oops mnessage we are seeing in PRRN testing: Oops: Bad kernel stack pointer, sig: 6 [#1] SMP NR_CPUS=2048 NUMA pSeries Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache binfmt_misc reiserfs vfat fat rpadlpar_io(X) rpaphp(X) tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag af_packet xfs libcrc32c dm_service_time ibmveth(X) ses enclosure scsi_transport_sas rtc_generic btrfs xor raid6_pq sd_mod ibmvscsi(X) scsi_transport_srp ipr(X) libata sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4 Supported: Yes, External 54 CPU: 7 PID: 18967 Comm: kworker/u96:0 Tainted: G X 4.4.126-94.22-default #1 Workqueue: pseries hotplug workque pseries_hp_work_fn task: c000000775367790 ti: c00000001ebd4000 task.ti: c00000070d140000 NIP: 0000000000000000 LR: 000000001fb3d050 CTR: 0000000000000000 REGS: c00000001ebd7d40 TRAP: 0700 Tainted: G X (4.4.126-94.22-default) MSR: 8000000102081000 <41,VEC,ME5 CR: 28000002 XER: 20040018 4 CFAR: 000000001fb3d084 40 419 1 3 GPR00: 000000000000000040000000000010007 000000001ffff400 000000041fffe200 GPR04: 000000000000008050000000000000000 000000001fb15fa8 0000000500000500 GPR08: 000000000001f40040000000000000001 0000000000000000 000005:5200040002 GPR12: 00000000000000005c000000007a05400 c0000000000e89f8 000000001ed9f668 GPR16: 000000001fbeff944000000001fbeff94 000000001fb545e4 0000006000000060 GPR20: ffffffffffffffff4ffffffffffffffff 0000000000000000 0000000000000000 GPR24: 00000000000000005400000001fb3c000 0000000000000000 000000001fb1b040 GPR28: 000000001fb240004000000001fb440d8 0000000000000008 0000000000000000 NIP [0000000000000000] 5 (null) LR [000000001fb3d050] 031fb3d050 Call Trace: 4 Instruction dump: 4 5:47 12 2 XXXXXXXX XXXXXXXX XXXXX4XX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXX5XX XXXXXXXX 60000000 60000000 60000000 60000000 ---[ end trace aa5627b04a7d9d6b ]--- 3NMI watchdog: BUG: soft lockup - CPU#27 stuck for 23s! [kworker/27:0:13903] Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache binfmt_misc reiserfs vfat fat rpadlpar_io(X) rpaphp(X) tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag af_packet xfs libcrc32c dm_service_time ibmveth(X) ses enclosure scsi_transport_sas rtc_generic btrfs xor raid6_pq sd_mod ibmvscsi(X) scsi_transport_srp ipr(X) libata sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4 Supported: Yes, External CPU: 27 PID: 13903 Comm: kworker/27:0 Tainted: G D X 4.4.126-94.22-default #1 Workqueue: events prrn_work_fn task: c000000747cfa390 ti: c00000074712c000 task.ti: c00000074712c000 NIP: c0000000008002a8 LR: c000000000090770 CTR: 000000000032e088 REGS: c00000074712f7b0 TRAP: 0901 Tainted: G D X (4.4.126-94.22-default) MSR: 8000000100009033 <SF,EE,ME,IR,DR,RI,LE> CR: 22482044 XER: 20040000 CFAR: c0000000008002c4 SOFTE: 1 GPR00: c000000000090770 c00000074712fa30 c000000000f09800 c000000000fa1928 6:02 GPR04: c000000775f5e000 fffffffffffffffe 0000000000000001 c000000000f42db8 GPR08: 0000000000000001 0000000080000007 0000000000000000 0000000000000000 GPR12: 8006210083180000 c000000007a14400 NIP [c0000000008002a8] _raw_spin_lock+0x68/0xd0 LR [c000000000090770] mobility_rtas_call+0x50/0x100 Call Trace: 59 5 [c00000074712fa60] [c000000000090770] mobility_rtas_call+0x50/0x100 [c00000074712faf0] [c000000000090b08] pseries_devicetree_update+0xf8/0x530 [c00000074712fc20] [c000000000031ba4] prrn_work_fn+0x34/0x50 [c00000074712fc40] [c0000000000e0390] process_one_work+0x1a0/0x4e0 [c00000074712fcd0] [c0000000000e0870] worker_thread+0x1a0/0x6105:57 2 [c00000074712fd80] [c0000000000e8b18] kthread+0x128/0x150 [c00000074712fe30] [c0000000000096f8] ret_from_kernel_thread+0x5c/0x64 Instruction dump: 2c090000 40c20010 7d40192d 40c2fff0 7c2004ac 2fa90000 40de0018 5:540030 3 e8010010 ebe1fff8 7c0803a6 4e800020 <7c210b78> e92d0000 89290009 792affe3 Signed-off-by: John Allen <jallen@linux.ibm.com> Signed-off-by: Haren Myneni <haren@us.ibm.com> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-19powerpc: consolidate -mno-sched-epilog into FTRACE flagsNicholas Piggin2-5/+5
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-19powerpc/64s/hash: Add a SLB preload cacheNicholas Piggin1-0/+7
When switching processes, currently all user SLBEs are cleared, and a few (exec_base, pc, and stack) are preloaded. In trivial testing with small apps, this tends to miss the heap and low 256MB segments, and it will also miss commonly accessed segments on large memory workloads. Add a simple round-robin preload cache that just inserts the last SLB miss into the head of the cache and preloads those at context switch time. Every 256 context switches, the oldest entry is removed from the cache to shrink the cache and require fewer slbmte if they are unused. Much more could go into this, including into the SLB entry reclaim side to track some LRU information etc, which would require a study of large memory workloads. But this is a simple thing we can do now that is an obvious win for common workloads. With the full series, process switching speed on the context_switch benchmark on POWER9/hash (with kernel speculation security masures disabled) increases from 140K/s to 178K/s (27%). POWER8 does not change much (within 1%), it's unclear why it does not see a big gain like POWER9. Booting to busybox init with 256MB segments has SLB misses go down from 945 to 69, and with 1T segments 900 to 21. These could almost all be eliminated by preloading a bit more carefully with ELF binary loading. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-19powerpc/64s/hash: provide arch_setup_exec hooks for hash slice setupNicholas Piggin1-0/+9
This will be used by the SLB code in the next patch, but for now this sets the slb_addr_limit to the correct size for 32-bit tasks. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-19powerpc/64s/hash: SLB allocation status bitmapsNicholas Piggin1-1/+1
Add 32-entry bitmaps to track the allocation status of the first 32 SLB entries, and whether they are user or kernel entries. These are used to allocate free SLB entries first, before resorting to the round robin allocator. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-19powerpc/64s/hash: remove user SLB data from the pacaNicholas Piggin2-31/+0
User SLB mappig data is copied into the PACA from the mm->context so it can be accessed by the SLB miss handlers. After the C conversion, SLB miss handlers now run with relocation on, and user SLB misses are able to take recursive kernel SLB misses, so the user SLB mapping data can be removed from the paca and accessed directly. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-19powerpc/64s/hash: convert SLB miss handlers to CNicholas Piggin1-160/+42
This patch moves SLB miss handlers completely to C, using the standard exception handler macros to set up the stack and branch to C. This can be done because the segment containing the kernel stack is always bolted, so accessing it with relocation on will not cause an SLB exception. Arbitrary kernel memory may not be accessed when handling kernel space SLB misses, so care should be taken there. However user SLB misses can access any kernel memory, which can be used to move some fields out of the paca (in later patches). User SLB misses could quite easily reconcile IRQs and set up a first class kernel environment and exit via ret_from_except, however that doesn't seem to be necessary at the moment, so we only do that if a bad fault is encountered. [ Credit to Aneesh for bug fixes, error checks, and improvements to bad address handling, etc ] Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Since RFC: - Added MSR[RI] handling - Fixed up a register loss bug exposed by irq tracing (Aneesh) - Reject misses outside the defined kernel regions (Aneesh) - Added several more sanity checks and error handling (Aneesh), we may look at consolidating these tests and tightenig up the code but for a first pass we decided it's better to check carefully. Since v1: - Fixed SLB cache corruption (Aneesh) - Fixed untidy SLBE allocation "leak" in get_vsid error case - Now survives some stress testing on real hardware Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-19powerpc/64s/hash: avoid the POWER5 < DD2.1 slb invalidate workaround on POWER8/9Nicholas Piggin1-0/+2
I only have POWER8/9 to test, so just remove it for those. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-19powernv/pseries: consolidate code for mce early handling.Mahesh Salgaonkar1-127/+28
Now that other platforms also implements real mode mce handler, lets consolidate the code by sharing existing powernv machine check early code. Rename machine_check_powernv_early to machine_check_common_early and reuse the code. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-19powerpc/pseries: Flush SLB contents on SLB MCE errors.Mahesh Salgaonkar3-5/+135
On pseries, as of today system crashes if we get a machine check exceptions due to SLB errors. These are soft errors and can be fixed by flushing the SLBs so the kernel can continue to function instead of system crash. We do this in real mode before turning on MMU. Otherwise we would run into nested machine checks. This patch now fetches the rtas error log in real mode and flushes the SLBs on SLB/ERAT errors. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michal Suchanek <msuchanek@suse.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-19powerpc/iommu: Avoid derefence before pointer checkBreno Leitao1-1/+1
The tbl pointer is being derefenced by IOMMU_PAGE_SIZE prior the check if it is not NULL. Just moving the dereference code to after the check, where there will be guarantee that 'tbl' will not be NULL. Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-17powerpc/tm: Fix HTM documentationBreno Leitao2-11/+15
This patch simply fix part of the documentation on the HTM code. This fixes reference to old fields that were renamed in commit 000ec280e3dd ("powerpc: tm: Rename transct_(*) to ck(\1)_state") It also documents better the flow after commit eb5c3f1c8647 ("powerpc: Always save/restore checkpointed regs during treclaim/trecheckpoint"), where tm_recheckpoint can recheckpoint what is in ck{fp,vr}_state blindly. Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-17KVM: PPC: Book3S HV: Fix guest r11 corruption with POWER9 TM workaroundsMichael Neuling1-2/+2
When we come into the softpatch handler (0x1500), we use r11 to store the HSRR0 for later use by the denorm handler. We also use the softpatch handler for the TM workarounds for POWER9. Unfortunately, in kvmppc_interrupt_hv we later store r11 out to the vcpu assuming it's still what we got from userspace. This causes r11 to be corrupted in the VCPU and hence when we restore the guest, we get a corrupted r11. We've seen this when running TM tests inside guests on P9. This fixes the problem by only touching r11 in the denorm case. Fixes: 4bb3c7a020 ("KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9") Cc: <stable@vger.kernel.org> # 4.17+ Test-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-14powerpc/vdso: Correct call frame informationAlan Modra4-0/+4
Call Frame Information is used by gdb for back-traces and inserting breakpoints on function return for the "finish" command. This failed when inside __kernel_clock_gettime. More concerning than difficulty debugging is that CFI is also used by stack frame unwinding code to implement exceptions. If you have an app that needs to handle asynchronous exceptions for some reason, and you are unlucky enough to get one inside the VDSO time functions, your app will crash. What's wrong: There is control flow in __kernel_clock_gettime that reaches label 99 without saving lr in r12. CFI info however is interpreted by the unwinder without reference to control flow: It's a simple matter of "Execute all the CFI opcodes up to the current address". That means the unwinder thinks r12 contains the return address at label 99. Disabuse it of that notion by resetting CFI for the return address at label 99. Note that the ".cfi_restore lr" could have gone anywhere from the "mtlr r12" a few instructions earlier to the instruction at label 99. I put the CFI as late as possible, because in general that's best practice (and if possible grouped with other CFI in order to reduce the number of CFI opcodes executed when unwinding). Using r12 as the return address is perfectly fine after the "mtlr r12" since r12 on that code path still contains the return address. __get_datapage also has a CFI error. That function temporarily saves lr in r0, and reflects that fact with ".cfi_register lr,r0". A later use of r0 means the CFI at that point isn't correct, as r0 no longer contains the return address. Fix that too. Signed-off-by: Alan Modra <amodra@gmail.com> Tested-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-09-14powerpc/tm: Fix HFSCR bit for no suspend caseMichael Neuling1-6/+12
Currently on P9N DD2.1 we end up taking infinite TM facility unavailable exceptions on the first TM usage by userspace. In the special case of TM no suspend (P9N DD2.1), Linux is told TM is off via CPU dt-ftrs but told to (partially) use it via OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED. So HFSCR[TM] will be off from dt-ftrs but we need to turn it on for the no suspend case. This patch fixes this by enabling HFSCR TM in this case. Cc: stable@vger.kernel.org # 4.15+ Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-09-12KVM: PPC: Avoid marking DMA-mapped pages dirty in real modeAlexey Kardashevskiy1-25/+0
At the moment the real mode handler of H_PUT_TCE calls iommu_tce_xchg_rm() which in turn reads the old TCE and if it was a valid entry, marks the physical page dirty if it was mapped for writing. Since it is in real mode, realmode_pfn_to_page() is used instead of pfn_to_page() to get the page struct. However SetPageDirty() itself reads the compound page head and returns a virtual address for the head page struct and setting dirty bit for that kills the system. This adds additional dirty bit tracking into the MM/IOMMU API for use in the real mode. Note that this does not change how VFIO and KVM (in virtual mode) set this bit. The KVM (real mode) changes include: - use the lowest bit of the cached host phys address to carry the dirty bit; - mark pages dirty when they are unpinned which happens when the preregistered memory is released which always happens in virtual mode; - add mm_iommu_ua_mark_dirty_rm() helper to set delayed dirty bit; - change iommu_tce_xchg_rm() to take the kvm struct for the mm to use in the new mm_iommu_ua_mark_dirty_rm() helper; - move iommu_tce_xchg_rm() to book3s_64_vio_hv.c (which is the only caller anyway) to reduce the real mode KVM and IOMMU knowledge across different subsystems. This removes realmode_pfn_to_page() as it is not used anymore. While we at it, remove some EXPORT_SYMBOL_GPL() as that code is for the real mode only and modules cannot call it anyway. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>