summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2018-08-07powerpc/traps: Show instructions on exceptionsMurilo Opsfelder Araujo1-0/+3
Call show_user_instructions() in arch/powerpc/kernel/traps.c to dump instructions at faulty location, useful to debugging. Before this patch, an unhandled signal message looked like: pandafault[10524]: segfault (11) at 100007d0 nip 1000061c lr 7fffbd295100 code 2 in pandafault[10000000+10000] After this patch, it looks like: pandafault[10524]: segfault (11) at 100007d0 nip 1000061c lr 7fffbd295100 code 2 in pandafault[10000000+10000] pandafault[10524]: code: 4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe pandafault[10524]: code: 392988d0 f93f0020 e93f0020 39400048 <99490000> 39200000 7d234b78 383f0040 Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc: Add show_user_instructions()Murilo Opsfelder Araujo2-0/+45
show_user_instructions() is a slightly modified version of show_instructions() that allows userspace instruction dump. This will be useful within show_signal_msg() to dump userspace instructions of the faulty location. Here is a sample of what show_user_instructions() outputs: pandafault[10850]: code: 4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe pandafault[10850]: code: 392988d0 f93f0020 e93f0020 39400048 <99490000> 39200000 7d234b78 383f0040 The current->comm and current->pid printed can serve as a glue that links the instructions dump to its originator, allowing messages to be interleaved in the logs. Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/traps: Print VMA for unhandled signalsMurilo Opsfelder Araujo1-2/+19
This adds VMA address in the message printed for unhandled signals, similarly to what other architectures, like x86, print. Before this patch, a page fault looked like: pandafault[61470]: unhandled signal 11 at 100007d0 nip 1000061c lr 7fff8d185100 code 2 After this patch, a page fault looks like: pandafault[6303]: segfault 11 at 100007d0 nip 1000061c lr 7fff93c55100 code 2 in pandafault[10000000+10000] Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/traps: Use %lx format in show_signal_msg()Murilo Opsfelder Araujo1-8/+3
Use %lx format to print registers. This avoids having two different formats and avoids checking for MSR_64BIT, improving readability of the function. Even though we could have used %px, which is functionally equivalent to %lx as per Documentation/core-api/printk-formats.rst, it is not semantically correct because the data printed are not pointers. And using %px requires casting data to (void *). Besides that, %lx matches the format used in show_regs(). Before this patch: pandafault[4808]: unhandled signal 11 at 0000000010000718 nip 0000000010000574 lr 00007fff935e7a6c code 2 After this patch: pandafault[4732]: unhandled signal 11 at 10000718 nip 10000574 lr 7fff86697a6c code 2 Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/traps: Use an explicit ratelimit state for show_signal_msg()Murilo Opsfelder Araujo1-5/+16
Replace printk_ratelimited() by printk() and a default rate limit burst to limit displaying unhandled signals messages. This will allow us to call print_vma_addr() in a future patch, which does not work with printk_ratelimited(). Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/traps: Print unhandled signals in a separate functionMurilo Opsfelder Araujo1-10/+16
Isolate the logic of printing unhandled signals out of _exception_pkey(). No functional change, only code rearrangement. Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/64s: Make rfi_flush_fallback a little more robustMichael Ellerman1-0/+6
Because rfi_flush_fallback runs immediately before the return to userspace it currently runs with the user r1 (stack pointer). This means if we oops in there we will report a bad kernel stack pointer in the exception entry path, eg: Bad kernel stack pointer 7ffff7150e40 at c0000000000023b4 Oops: Bad kernel stack pointer, sig: 6 [#1] LE SMP NR_CPUS=32 NUMA PowerNV Modules linked in: CPU: 0 PID: 1246 Comm: klogd Not tainted 4.18.0-rc2-gcc-7.3.1-00175-g0443f8a69ba3 #7 NIP: c0000000000023b4 LR: 0000000010053e00 CTR: 0000000000000040 REGS: c0000000fffe7d40 TRAP: 4100 Not tainted (4.18.0-rc2-gcc-7.3.1-00175-g0443f8a69ba3) MSR: 9000000002803031 <SF,HV,VEC,VSX,FP,ME,IR,DR,LE> CR: 44000442 XER: 20000000 CFAR: c00000000000bac8 IRQMASK: c0000000f1e66a80 GPR00: 0000000002000000 00007ffff7150e40 00007fff93a99900 0000000000000020 ... NIP [c0000000000023b4] rfi_flush_fallback+0x34/0x80 LR [0000000010053e00] 0x10053e00 Although the NIP tells us where we were, and the TRAP number tells us what happened, it would still be nicer if we could report the actual exception rather than barfing about the stack pointer. We an do that fairly simply by loading the kernel stack pointer on entry and restoring the user value before returning. That way we see a regular oops such as: Unrecoverable exception 4100 at c00000000000239c Oops: Unrecoverable exception, sig: 6 [#1] LE SMP NR_CPUS=32 NUMA PowerNV Modules linked in: CPU: 0 PID: 1251 Comm: klogd Not tainted 4.18.0-rc3-gcc-7.3.1-00097-g4ebfcac65acd-dirty #40 NIP: c00000000000239c LR: 0000000010053e00 CTR: 0000000000000040 REGS: c0000000f1e17bb0 TRAP: 4100 Not tainted (4.18.0-rc3-gcc-7.3.1-00097-g4ebfcac65acd-dirty) MSR: 9000000002803031 <SF,HV,VEC,VSX,FP,ME,IR,DR,LE> CR: 44000442 XER: 20000000 CFAR: c00000000000bac8 IRQMASK: 0 ... NIP [c00000000000239c] rfi_flush_fallback+0x3c/0x80 LR [0000000010053e00] 0x10053e00 Call Trace: [c0000000f1e17e30] [c00000000000b9e4] system_call+0x5c/0x70 (unreliable) Note this shouldn't make the kernel stack pointer vulnerable to a meltdown attack, because it should be flushed from the cache before we return to userspace. The user r1 value will be in the cache, because we load it in the return path, but that is harmless. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
2018-08-07powerpc/powernv: Query firmware for count cache flush settingsMichael Ellerman1-0/+7
Look for fw-features properties to determine the appropriate settings for the count cache flush, and then call the generic powerpc code to set it up based on the security feature flags. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/pseries: Query hypervisor for count cache flush settingsMichael Ellerman2-0/+9
Use the existing hypercall to determine the appropriate settings for the count cache flush, and then call the generic powerpc code to set it up based on the security feature flags. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/64s: Add support for software count cache flushMichael Ellerman4-5/+154
Some CPU revisions support a mode where the count cache needs to be flushed by software on context switch. Additionally some revisions may have a hardware accelerated flush, in which case the software flush sequence can be shortened. If we detect the appropriate flag from firmware we patch a branch into _switch() which takes us to a count cache flush sequence. That sequence in turn may be patched to return early if we detect that the CPU supports accelerating the flush sequence in hardware. Add debugfs support for reporting the state of the flush, as well as runtime disabling it. And modify the spectre_v2 sysfs file to report the state of the software flush. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/64s: Add new security feature flags for count cache flushMichael Ellerman1-0/+6
Add security feature flags to indicate the need for software to flush the count cache on context switch, and for the presence of a hardware assisted count cache flush. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/asm: Add a patch_site macro & helpers for patching instructionsMichael Ellerman3-0/+36
Add a macro and some helper C functions for patching single asm instructions. The gas macro means we can do something like: 1: nop patch_site 1b, patch__foo Which is less visually distracting than defining a GLOBAL symbol at 1, and also doesn't pollute the symbol table which can confuse eg. perf. These are obviously similar to our existing feature sections, but are not automatically patched based on CPU/MMU features, rather they are designed to be manually patched by C code at some arbitrary point. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/fsl: Sanitize the syscall table for NXP PowerPC 32 bit platformsDiana Craciun1-0/+10
Used barrier_nospec to sanitize the syscall table. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/fsl: Add barrier_nospec implementation for NXP PowerPC Book3EDiana Craciun3-2/+39
Implement the barrier_nospec as a isync;sync instruction sequence. The implementation uses the infrastructure built for BOOK3S 64. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/64: Make meltdown reporting Book3S 64 specificDiana Craciun1-0/+2
In a subsequent patch we will enable building security.c for Book3E. However the NXP platforms are not vulnerable to Meltdown, so make the Meltdown vulnerability reporting PPC_BOOK3S_64 specific. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/64: Call setup_barrier_nospec() from setup_arch()Michael Ellerman4-2/+6
Currently we require platform code to call setup_barrier_nospec(). But if we add an empty definition for the !CONFIG_PPC_BARRIER_NOSPEC case then we can call it in setup_arch(). Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/64: Add CONFIG_PPC_BARRIER_NOSPECMichael Ellerman7-10/+22
Add a config symbol to encode which platforms support the barrier_nospec speculation barrier. Currently this is just Book3S 64 but we will add Book3E in a future patch. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/64: Make stf barrier PPC_BOOK3S_64 specific.Diana Craciun1-0/+2
NXP Book3E platforms are not vulnerable to speculative store bypass, so make the mitigations PPC_BOOK3S_64 specific. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/64: Disable the speculation barrier from the command lineDiana Craciun1-1/+11
The speculation barrier can be disabled from the command line with the parameter: "nospectre_v1". Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/64s: Don't use __MASKABLE_EXCEPTION unnecessarilyMichael Ellerman1-11/+5
We only need to use __MASKABLE_EXCEPTION in one of the four cases for hardware interrupt, so use the helper macros in the other cases. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/64s: Drop unused loc parameter to MASKABLE_EXCEPTION macrosMichael Ellerman2-6/+6
We pass the "loc" (location) parameter to MASKABLE_EXCEPTION and friends, but it's not used, so drop it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/64s: Remove PSERIES naming from the MASKABLE macrosMichael Ellerman3-22/+18
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/64s: Drop _MASKABLE_RELON_EXCEPTION_PSERIES()Michael Ellerman2-9/+6
_MASKABLE_RELON_EXCEPTION_PSERIES() does nothing useful, update all callers to use __MASKABLE_RELON_EXCEPTION_PSERIES() directly. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/64s: Drop _MASKABLE_EXCEPTION_PSERIES()Michael Ellerman2-9/+6
_MASKABLE_EXCEPTION_PSERIES() does nothing useful, update all callers to use __MASKABLE_EXCEPTION_PSERIES() directly. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/64s: Rename EXCEPTION_PROLOG_PSERIES to EXCEPTION_PROLOGMichael Ellerman1-8/+6
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/64s: Rename EXCEPTION_RELON_PROLOG_PSERIESMichael Ellerman1-4/+3
To just EXCEPTION_RELON_PROLOG(). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/64s: Rename EXCEPTION_RELON_PROLOG_PSERIES_1Michael Ellerman1-10/+10
The EXCEPTION_RELON_PROLOG_PSERIES_1() macro does the same job as EXCEPTION_PROLOG_2 (which we just recently created), except for "RELON" (relocation on) exceptions. So rename it as such. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/64s: Remove PSERIES from the NORI macrosMichael Ellerman2-10/+10
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/64s: Rename EXCEPTION_PROLOG_PSERIES_1 to EXCEPTION_PROLOG_2Michael Ellerman2-12/+12
As with the other patches in this series, we are removing the "PSERIES" from the name as it's no longer meaningful. In this case it's not simply a case of removing the "PSERIES" as that would result in a clash with the existing EXCEPTION_PROLOG_1. Instead we name this one EXCEPTION_PROLOG_2, as it's usually used in sequence after 0 and 1. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/64s: Rename STD_RELON_EXCEPTION_PSERIES_OOL to STD_RELON_EXCEPTION_OOLMichael Ellerman2-2/+2
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/64s: Rename STD_RELON_EXCEPTION_PSERIES to STD_RELON_EXCEPTIONMichael Ellerman2-2/+2
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/64s: Rename STD_EXCEPTION_PSERIES_OOL to STD_EXCEPTION_OOLMichael Ellerman2-2/+2
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/64s: Rename STD_EXCEPTION_PSERIES to STD_EXCEPTIONMichael Ellerman2-2/+2
The "PSERIES" in STD_EXCEPTION_PSERIES is to differentiate the macros from the legacy iSeries versions, which are called STD_EXCEPTION_ISERIES. It is not anything to do with pseries vs powernv or powermac etc. We removed the legacy iSeries code in 2012, in commit 8ee3e0d69623x ("powerpc: Remove the main legacy iSerie platform code"). So remove "PSERIES" from the macros. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/64s: Move SET_SCRATCH0() into EXCEPTION_RELON_PROLOG_PSERIES()Michael Ellerman1-2/+1
EXCEPTION_RELON_PROLOG_PSERIES() only has two users, STD_RELON_EXCEPTION_PSERIES() and STD_RELON_EXCEPTION_HV() both of which "call" SET_SCRATCH0(), so just move SET_SCRATCH0() into EXCEPTION_RELON_PROLOG_PSERIES(). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/64s: Move SET_SCRATCH0() into EXCEPTION_PROLOG_PSERIES()Michael Ellerman1-2/+1
EXCEPTION_PROLOG_PSERIES() only has two users, STD_EXCEPTION_PSERIES() and STD_EXCEPTION_HV() both of which "call" SET_SCRATCH0(), so just move SET_SCRATCH0() into EXCEPTION_PROLOG_PSERIES(). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/pasemi: Search for PCI root bus by compatible propertyDarren Stevens1-5/+6
Pasemi arch code finds the root of the PCI-e bus by searching the device-tree for a node called 'pxp'. But the root bus has a compatible property of 'pasemi,rootbus' so search for that instead. Signed-off-by: Darren Stevens <darren@stevens-zone.net> Acked-by: Olof Johansson <olof@lixom.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/lib: Implement strlen() in assembly for PPC32Christophe Leroy3-1/+81
The generic implementation of strlen() reads strings byte per byte. This patch implements strlen() in assembly based on a read of entire words, in the same spirit as what some other arches and glibc do. On a 8xx the time spent in strlen is reduced by 3/4 for long strings. strlen() selftest on an 8xx provides the following values: Before the patch (ie with the generic strlen() in lib/string.c): len 256 : time = 1.195055 len 016 : time = 0.083745 len 008 : time = 0.046828 len 004 : time = 0.028390 After the patch: len 256 : time = 0.272185 ==> 78% improvment len 016 : time = 0.040632 ==> 51% improvment len 008 : time = 0.033060 ==> 29% improvment len 004 : time = 0.029149 ==> 2% degradation On a 832x: Before the patch: len 256 : time = 0.236125 len 016 : time = 0.018136 len 008 : time = 0.011000 len 004 : time = 0.007229 After the patch: len 256 : time = 0.094950 ==> 60% improvment len 016 : time = 0.013357 ==> 26% improvment len 008 : time = 0.010586 ==> 4% improvment len 004 : time = 0.008784 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/pseries: Defer the logging of rtas error to irq work queue.Mahesh Salgaonkar3-15/+51
rtas_log_buf is a buffer to hold RTAS event data that are communicated to kernel by hypervisor. This buffer is then used to pass RTAS event data to user through proc fs. This buffer is allocated from vmalloc (non-linear mapping) area. On Machine check interrupt, register r3 points to RTAS extended event log passed by hypervisor that contains the MCE event. The pseries machine check handler then logs this error into rtas_log_buf. The rtas_log_buf is a vmalloc-ed (non-linear) buffer we end up taking up a page fault (vector 0x300) while accessing it. Since machine check interrupt handler runs in NMI context we can not afford to take any page fault. Page faults are not honored in NMI context and causes kernel panic. Apart from that, as Nick pointed out, pSeries_log_error() also takes a spin_lock while logging error which is not safe in NMI context. It may endup in deadlock if we get another MCE before releasing the lock. Fix this by deferring the logging of rtas error to irq work queue. Current implementation uses two different buffers to hold rtas error log depending on whether extended log is provided or not. This makes bit difficult to identify which buffer has valid data that needs to logged later in irq work. Simplify this using single buffer, one per paca, and copy rtas log to it irrespective of whether extended log is provided or not. Allocate this buffer below RMA region so that it can be accessed in real mode mce handler. Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt") Cc: stable@vger.kernel.org # v4.14+ Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/pseries: Avoid using the size greater than RTAS_ERROR_LOG_MAX.Mahesh Salgaonkar1-1/+1
The global mce data buffer that used to copy rtas error log is of 2048 (RTAS_ERROR_LOG_MAX) bytes in size. Before the copy we read extended_log_length from rtas error log header, then use max of extended_log_length and RTAS_ERROR_LOG_MAX as a size of data to be copied. Ideally the platform (phyp) will never send extended error log with size > 2048. But if that happens, then we have a risk of buffer overrun and corruption. Fix this by using min_t instead. Fixes: d368514c3097 ("powerpc: Fix corruption when grabbing FWNMI data") Reported-by: Michal Suchanek <msuchanek@suse.com> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/xive: Remove xive_kexec_teardown_cpu()Benjamin Herrenschmidt4-25/+2
It's identical to xive_teardown_cpu() so just use the latter Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/xive: Remove now useless pr_debug statementsBenjamin Herrenschmidt1-9/+1
Those overly verbose statement in the setup of the pool VP aren't particularly useful (esp. considering we don't actually use the pool, we configure it bcs HW requires it only). So remove them which improves the code readability. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/64s: free page table caches at exit_mmap timeNicholas Piggin1-2/+3
The kernel page table caches are tied to init_mm, so there is no more need for them after userspace is finished. destroy_context() gets called when we drop the last reference for an mm, which can be much later than the task exit due to other lazy mm references to it. We can free the page table cache pages on task exit because they only cache the userspace page tables and kernel threads should not access user space addresses. The mapping for kernel threads itself is maintained in init_mm and page table cache for that is attached to init_mm. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Merge change log additions from Aneesh] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/64s/radix: tlb do not flush on page size when fullmmNicholas Piggin1-0/+3
When the mm is being torn down there will be a full PID flush so there is no need to flush the TLB on page size changes. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc: Add a checkpatch wrapper with our preferred settingsMichael Ellerman1-0/+22
This makes it easy to run checkpatch with settings that I like. Usage is eg: $ ./arch/powerpc/tools/checkpatch.sh -g origin/master.. To check all commits since origin/master. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Russell Currey <ruscur@russell.cc>
2018-08-07powerpc/64: Disable irq restore warning for nowMichael Ellerman1-3/+10
We recently added a warning in arch_local_irq_restore() to check that the soft masking state matches reality. Unfortunately it trips in a few places, which are not entirely trivial to fix. The key problem is if we're doing function_graph tracing of restore_math(), the warning pops and then seems to recurse. It's not entirely clear because the system continuously oopses on all CPUs, with the output interleaved and unreadable. It's also been observed on a G5 coming out of idle. Until we can fix those cases disable the warning for now. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-03powerpc/powernv: Fix concurrency issue with npu->mmio_atsd_usageReza Arbab1-2/+3
We've encountered a performance issue when multiple processors stress {get,put}_mmio_atsd_reg(). These functions contend for mmio_atsd_usage, an unsigned long used as a bitmask. The accesses to mmio_atsd_usage are done using test_and_set_bit_lock() and clear_bit_unlock(). As implemented, both of these will require a (successful) stwcx to that same cache line. What we end up with is thread A, attempting to unlock, being slowed by other threads repeatedly attempting to lock. A's stwcx instructions fail and retry because the memory reservation is lost every time a different thread beats it to the punch. There may be a long-term way to fix this at a larger scale, but for now resolve the immediate problem by gating our call to test_and_set_bit_lock() with one to test_bit(), which is obviously implemented without using a store. Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2") Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Acked-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-03powerpc: Do not redefine NEED_DMA_MAP_STATEChristoph Hellwig1-3/+1
kernel/dma/Kconfig already defines NEED_DMA_MAP_STATE, just select it from CONFIG_PPC using the same condition as an if guard. Signed-off-by: Christoph Hellwig <hch@lst.de> [mpe: Move it under PPC] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-03powerpc/4xx: Fix error return path in ppc4xx_msi_probe()Guenter Roeck1-21/+30
An arbitrary error in ppc4xx_msi_probe() quite likely results in a crash similar to the following, seen after dma_alloc_coherent() returned an error. Unable to handle kernel paging request for data at address 0x00000000 Faulting instruction address: 0xc001bff0 Oops: Kernel access of bad area, sig: 11 [#1] BE Canyonlands Modules linked in: CPU: 0 PID: 1 Comm: swapper Tainted: G W 4.18.0-rc6-00010-gff33d1030a6c #1 NIP: c001bff0 LR: c001c418 CTR: c01faa7c REGS: cf82db40 TRAP: 0300 Tainted: G W (4.18.0-rc6-00010-gff33d1030a6c) MSR: 00029000 <CE,EE,ME> CR: 28002024 XER: 00000000 DEAR: 00000000 ESR: 00000000 GPR00: c001c418 cf82dbf0 cf828000 cf8de400 00000000 00000000 000000c4 000000c4 GPR08: c0481ea4 00000000 00000000 000000c4 22002024 00000000 c00025e8 00000000 GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 c0492380 0000004a GPR24: 00029000 0000000c 00000000 cf8de410 c0494d60 c0494d60 cf8bebc0 00000001 NIP [c001bff0] ppc4xx_of_msi_remove+0x48/0xa0 LR [c001c418] ppc4xx_msi_probe+0x294/0x3b8 Call Trace: [cf82dbf0] [00029000] 0x29000 (unreliable) [cf82dc10] [c001c418] ppc4xx_msi_probe+0x294/0x3b8 [cf82dc70] [c0209fbc] platform_drv_probe+0x40/0x9c [cf82dc90] [c0208240] driver_probe_device+0x2a8/0x350 [cf82dcc0] [c0206204] bus_for_each_drv+0x60/0xac [cf82dcf0] [c0207e88] __device_attach+0xe8/0x160 [cf82dd20] [c02071e0] bus_probe_device+0xa0/0xbc [cf82dd40] [c02050c8] device_add+0x404/0x5c4 [cf82dd90] [c0288978] of_platform_device_create_pdata+0x88/0xd8 [cf82ddb0] [c0288b70] of_platform_bus_create+0x134/0x220 [cf82de10] [c0288bcc] of_platform_bus_create+0x190/0x220 [cf82de70] [c0288cf4] of_platform_bus_probe+0x98/0xec [cf82de90] [c0449650] __machine_initcall_canyonlands_ppc460ex_device_probe+0x38/0x54 [cf82dea0] [c0002404] do_one_initcall+0x40/0x188 [cf82df00] [c043daec] kernel_init_freeable+0x130/0x1d0 [cf82df30] [c0002600] kernel_init+0x18/0x104 [cf82df40] [c000c23c] ret_from_kernel_thread+0x14/0x1c Instruction dump: 90010024 813d0024 2f890000 83c30058 41bd0014 48000038 813d0024 7f89f800 409d002c 813e000c 57ea103a 3bff0001 <7c69502e> 2f830000 419effe0 4803b26d ---[ end trace 8cf551077ecfc42a ]--- Fix it up. Specifically, - Return valid error codes from ppc4xx_setup_pcieh_hw(), have it clean up after itself, and only access hardware after all possible error conditions have been handled. - Use devm_kzalloc() instead of kzalloc() in ppc4xx_msi_probe() Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-03powernv/cpuidle: Fix idle states all being marked invalidNicholas Piggin1-1/+2
Commit 9c7b185ab2fe ("powernv/cpuidle: Parse dt idle properties into global structure") parses dt idle states into structs, but never marks them valid. This results in all idle states being lost. Fixes: 9c7b185ab2fe ("powernv/cpuidle: Parse dt idle properties into global structure") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31powerpc/pseries: fix EEH recovery of some IOV devicesSam Bobroff1-8/+17
EEH recovery currently fails on pSeries for some IOV capable PCI devices, if CONFIG_PCI_IOV is on and the hypervisor doesn't provide certain device tree properties for the device. (Found on an IOV capable device using the ipr driver.) Recovery fails in pci_enable_resources() at the check on r->parent, because r->flags is set and r->parent is not. This state is due to sriov_init() setting the start, end and flags members of the IOV BARs but the parent not being set later in pseries_pci_fixup_iov_resources(), because the "ibm,open-sriov-vf-bar-info" property is missing. Correct this by zeroing the resource flags for IOV BARs when they can't be configured (this is the same method used by sriov_init() and __pci_read_base()). VFs cleared this way can't be enabled later, because that requires another device tree property, "ibm,number-of-configurable-vfs" as well as support for the RTAS function "ibm_map_pes". These are all part of hypervisor support for IOV and it seems unlikely that a hypervisor would ever partially, but not fully, support it. (None are currently provided by QEMU/KVM.) Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Reviewed-by: Bryant G. Ly <bryantly@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>