summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)AuthorFilesLines
2019-02-27powerpc/8xx: fix setting of pagetable for Abatron BDI debug tool.Christophe Leroy1-1/+2
[ Upstream commit fb0bdec51a4901b7dd088de0a1e365e1b9f5cd21 ] Commit 8c8c10b90d88 ("powerpc/8xx: fix handling of early NULL pointer dereference") moved the loading of r6 earlier in the code. As some functions are called inbetween, r6 needs to be loaded again with the address of swapper_pg_dir in order to set PTE pointers for the Abatron BDI. Fixes: 8c8c10b90d88 ("powerpc/8xx: fix handling of early NULL pointer dereference") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12powerpc/fadump: Do not allow hot-remove memory from fadump reserved area.Mahesh Salgaonkar1-2/+8
[ Upstream commit 0db6896ff6332ba694f1e61b93ae3b2640317633 ] For fadump to work successfully there should not be any holes in reserved memory ranges where kernel has asked firmware to move the content of old kernel memory in event of crash. Now that fadump uses CMA for reserved area, this memory area is now not protected from hot-remove operations unless it is cma allocated. Hence, fadump service can fail to re-register after the hot-remove operation, if hot-removed memory belongs to fadump reserved region. To avoid this make sure that memory from fadump reserved area is not hot-removable if fadump is registered. However, if user still wants to remove that memory, he can do so by manually stopping fadump service before hot-remove operation. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-02-12powerpc/32: Add .data..Lubsan_data*/.data..Lubsan_type* sections explicitlyMathieu Malaterre1-0/+4
[ Upstream commit beba24ac59133cb36ecd03f9af9ccb11971ee20e ] When both `CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y` and `CONFIG_UBSAN=y` are set, link step typically produce numberous warnings about orphan section: + powerpc-linux-gnu-ld -EB -m elf32ppc -Bstatic --orphan-handling=warn --build-id --gc-sections -X -o .tmp_vmlinux1 -T ./arch/powerpc/kernel/vmlinux.lds --who le-archive built-in.a --no-whole-archive --start-group lib/lib.a --end-group powerpc-linux-gnu-ld: warning: orphan section `.data..Lubsan_data393' from `init/main.o' being placed in section `.data..Lubsan_data393'. powerpc-linux-gnu-ld: warning: orphan section `.data..Lubsan_data394' from `init/main.o' being placed in section `.data..Lubsan_data394'. ... powerpc-linux-gnu-ld: warning: orphan section `.data..Lubsan_type11' from `init/main.o' being placed in section `.data..Lubsan_type11'. powerpc-linux-gnu-ld: warning: orphan section `.data..Lubsan_type12' from `init/main.o' being placed in section `.data..Lubsan_type12'. ... This commit remove those warnings produced at W=1. Link: https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg135407.html Suggested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-13powerpc/tm: Set MSR[TS] just prior to recheckpointBreno Leitao2-15/+49
commit e1c3743e1a20647c53b719dbf28b48f45d23f2cd upstream. On a signal handler return, the user could set a context with MSR[TS] bits set, and these bits would be copied to task regs->msr. At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set, several __get_user() are called and then a recheckpoint is executed. This is a problem since a page fault (in kernel space) could happen when calling __get_user(). If it happens, the process MSR[TS] bits were already set, but recheckpoint was not executed, and SPRs are still invalid. The page fault can cause the current process to be de-scheduled, with MSR[TS] active and without tm_recheckpoint() being called. More importantly, without TEXASR[FS] bit set also. Since TEXASR might not have the FS bit set, and when the process is scheduled back, it will try to reclaim, which will be aborted because of the CPU is not in the suspended state, and, then, recheckpoint. This recheckpoint will restore thread->texasr into TEXASR SPR, which might be zero, hitting a BUG_ON(). kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434! cpu 0xb: Vector: 700 (Program Check) at [c00000041f1576d0] pc: c000000000054550: restore_gprs+0xb0/0x180 lr: 0000000000000000 sp: c00000041f157950 msr: 8000000100021033 current = 0xc00000041f143000 paca = 0xc00000000fb86300 softe: 0 irq_happened: 0x01 pid = 1021, comm = kworker/11:1 kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434! Linux version 4.9.0-3-powerpc64le (debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26) enter ? for help [c00000041f157b30] c00000000001bc3c tm_recheckpoint.part.11+0x6c/0xa0 [c00000041f157b70] c00000000001d184 __switch_to+0x1e4/0x4c0 [c00000041f157bd0] c00000000082eeb8 __schedule+0x2f8/0x990 [c00000041f157cb0] c00000000082f598 schedule+0x48/0xc0 [c00000041f157ce0] c0000000000f0d28 worker_thread+0x148/0x610 [c00000041f157d80] c0000000000f96b0 kthread+0x120/0x140 [c00000041f157e30] c00000000000c0e0 ret_from_kernel_thread+0x5c/0x7c This patch simply delays the MSR[TS] set, so, if there is any page fault in the __get_user() section, it does not have regs->msr[TS] set, since the TM structures are still invalid, thus avoiding doing TM operations for in-kernel exceptions and possible process reschedule. With this patch, the MSR[TS] will only be set just before recheckpointing and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in invalid state. Other than that, if CONFIG_PREEMPT is set, there might be a preemption just after setting MSR[TS] and before tm_recheckpoint(), thus, this block must be atomic from a preemption perspective, thus, calling preempt_disable/enable() on this code. It is not possible to move tm_recheckpoint to happen earlier, because it is required to get the checkpointed registers from userspace, with __get_user(), thus, the only way to avoid this undesired behavior is delaying the MSR[TS] set. The 32-bits signal handler seems to be safe this current issue, but, it might be exposed to the preemption issue, thus, disabling preemption in this chunk of code. Changes from v2: * Run the critical section with preempt_disable. Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals") Cc: stable@vger.kernel.org (v3.9+) Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-13Revert "powerpc/tm: Unset MSR[TS] if not recheckpointing"Greg Kroah-Hartman2-29/+9
This reverts commit a9935a12768851762089fda8e5a9daaf0231808e which is commit 6f5b9f018f4c7686fd944d920209d1382d320e4e upstream. It breaks the powerpc build, so drop it from the tree until a fix goes upstream. Reported-by: Guenter Roeck <linux@roeck-us.net> Cc: Breno Leitao <leitao@debian.org> Cc: Michal Suchánek <msuchanek@suse.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-13powerpc: Disable -Wbuiltin-requires-header when setjmp is usedJoel Stanley1-0/+3
commit aea447141c7e7824b81b49acd1bc785506fba46e upstream. The powerpc kernel uses setjmp which causes a warning when building with clang: In file included from arch/powerpc/xmon/xmon.c:51: ./arch/powerpc/include/asm/setjmp.h:15:13: error: declaration of built-in function 'setjmp' requires inclusion of the header <setjmp.h> [-Werror,-Wbuiltin-requires-header] extern long setjmp(long *); ^ ./arch/powerpc/include/asm/setjmp.h:16:13: error: declaration of built-in function 'longjmp' requires inclusion of the header <setjmp.h> [-Werror,-Wbuiltin-requires-header] extern void longjmp(long *, long); ^ This *is* the header and we're not using the built-in setjump but rather the one in arch/powerpc/kernel/misc.S. As the compiler warning does not make sense, it for the files where setjmp is used. Signed-off-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> [mpe: Move subdir-ccflags in xmon/Makefile to not clobber -Werror] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-13powerpc: consolidate -mno-sched-epilog into FTRACE flagsNicholas Piggin2-5/+5
commit 2a056f58fd33ccc6a0261b552b0f17e7fa4a12f3 upstream. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-09powerpc/tm: Unset MSR[TS] if not recheckpointingBreno Leitao2-9/+29
commit 6f5b9f018f4c7686fd944d920209d1382d320e4e upstream. There is a TM Bad Thing bug that can be caused when you return from a signal context in a suspended transaction but with ucontext MSR[TS] unset. This forces regs->msr[TS] to be set at syscall entrance (since the CPU state is transactional). It also calls treclaim() to flush the transaction state, which is done based on the live (mfmsr) MSR state. Since user context MSR[TS] is not set, then restore_tm_sigcontexts() is not called, thus, not executing recheckpoint, keeping the CPU state as not transactional. When calling rfid, SRR1 will have MSR[TS] set, but the CPU state is non transactional, causing the TM Bad Thing with the following stack: [ 33.862316] Bad kernel stack pointer 3fffd9dce3e0 at c00000000000c47c cpu 0x8: Vector: 700 (Program Check) at [c00000003ff7fd40] pc: c00000000000c47c: fast_exception_return+0xac/0xb4 lr: 00003fff865f442c sp: 3fffd9dce3e0 msr: 8000000102a03031 current = 0xc00000041f68b700 paca = 0xc00000000fb84800 softe: 0 irq_happened: 0x01 pid = 1721, comm = tm-signal-sigre Linux version 4.9.0-3-powerpc64le (debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26) WARNING: exception is not recoverable, can't continue The same problem happens on 32-bits signal handler, and the fix is very similar, if tm_recheckpoint() is not executed, then regs->msr[TS] should be zeroed. This patch also fixes a sparse warning related to lack of indentation when CONFIG_PPC_TRANSACTIONAL_MEM is set. Fixes: 2b0a576d15e0e ("powerpc: Add new transactional memory state to the signal context") CC: Stable <stable@vger.kernel.org> # 3.10+ Signed-off-by: Breno Leitao <leitao@debian.org> Tested-by: Michal Suchánek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-09powerpc/fsl: Fix spectre_v2 mitigations reportingDiana Craciun1-1/+1
commit 7d8bad99ba5a22892f0cad6881289fdc3875a930 upstream. Currently for CONFIG_PPC_FSL_BOOK3E the spectre_v2 file is incorrect: $ cat /sys/devices/system/cpu/vulnerabilities/spectre_v2 "Mitigation: Software count cache flush" Which is wrong. Fix it to report vulnerable for now. Fixes: ee13cb249fab ("powerpc/64s: Add support for software count cache flush") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-19powerpc: Look for "stdout-path" when setting up legacy consolesBenjamin Herrenschmidt1-1/+5
commit bf3d6afbb234156749b640b6c50f714967a85964 upstream. Commit 78e5dfea84dc ("powerpc: dts: replace 'linux,stdout-path' with 'stdout-path'") broke the default console on a number of embedded PowerPC systems, because it failed to also update the code in arch/powerpc/kernel/legacy_serial.c to look for that property in addition to the old one. This fixes it. Fixes: 78e5dfea84dc ("powerpc: dts: replace 'linux,stdout-path' with 'stdout-path'") Cc: stable@vger.kernel.org # v4.17+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-19powerpc/msi: Fix NULL pointer access in teardown codeRadu Rendec1-1/+6
commit 78e7b15e17ac175e7eed9e21c6f92d03d3b0a6fa upstream. The arch_teardown_msi_irqs() function assumes that controller ops pointers were already checked in arch_setup_msi_irqs(), but this assumption is wrong: arch_teardown_msi_irqs() can be called even when arch_setup_msi_irqs() returns an error (-ENOSYS). This can happen in the following scenario: - msi_capability_init() calls pci_msi_setup_msi_irqs() - pci_msi_setup_msi_irqs() returns -ENOSYS - msi_capability_init() notices the error and calls free_msi_irqs() - free_msi_irqs() calls pci_msi_teardown_msi_irqs() This is easier to see when CONFIG_PCI_MSI_IRQ_DOMAIN is not set and pci_msi_setup_msi_irqs() and pci_msi_teardown_msi_irqs() are just aliases to arch_setup_msi_irqs() and arch_teardown_msi_irqs(). The call to free_msi_irqs() upon pci_msi_setup_msi_irqs() failure seems legit, as it does additional cleanup; e.g. list_del(&entry->list) and kfree(entry) inside free_msi_irqs() do happen (MSI descriptors are allocated before pci_msi_setup_msi_irqs() is called and need to be cleaned up if that fails). Fixes: 6b2fd7efeb88 ("PCI/MSI/PPC: Remove arch_msi_check_device()") Cc: stable@vger.kernel.org # v3.18+ Signed-off-by: Radu Rendec <radu.rendec@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-05powerpc/function_graph: Simplify with function_graph_enter()Steven Rostedt (VMware)1-13/+2
commit fe60522ec60082a1dd735691b82c64f65d4ad15e upstream. The function_graph_enter() function does the work of calling the function graph hook function and the management of the shadow stack, simplifying the work done in the architecture dependent prepare_ftrace_return(). Have powerpc use the new code, and remove the shadow stack management as well as having to set up the trace structure. This is needed to prepare for a fix of a design bug on how the curr_ret_stack is used. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Cc: stable@kernel.org Fixes: 03274a3ffb449 ("tracing/fgraph: Adjust fgraph depth before calling trace return callback") Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21Revert "powerpc/8xx: Use L1 entry APG to handle _PAGE_ACCESSED for CONFIG_SWAP"Christophe Leroy1-18/+27
commit cc4ebf5c0a3440ed0a32d25c55ebdb6ce5f3c0bc upstream. This reverts commit 4f94b2c7462d9720b2afa7e8e8d4c19446bb31ce. That commit was buggy, as it used rlwinm instead of rlwimi. Instead of fixing that bug, we revert the previous commit in order to reduce the dependency between L1 entries and L2 entries Fixes: 4f94b2c7462d9 ("powerpc/8xx: Use L1 entry APG to handle _PAGE_ACCESSED for CONFIG_SWAP") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21powerpc/eeh: Fix possible null deref in eeh_dump_dev_log()Sam Bobroff1-0/+5
[ Upstream commit f9bc28aedfb5bbd572d2d365f3095c1becd7209b ] If an error occurs during an unplug operation, it's possible for eeh_dump_dev_log() to be called when edev->pdn is null, which currently leads to dereferencing a null pointer. Handle this by skipping the error log for those devices. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21powerpc/64/module: REL32 relocation range checkNicholas Piggin1-1/+8
[ Upstream commit b851ba02a6f3075f0f99c60c4bc30a4af80cf428 ] The recent module relocation overflow crash demonstrated that we have no range checking on REL32 relative relocations. This patch implements a basic check, the same kernel that previously oopsed and rebooted now continues with some of these errors when loading the module: module_64: x_tables: REL32 527703503449812 out of range! Possibly other relocations (ADDR32, REL16, TOC16, etc.) should also have overflow checks. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21powerpc/traps: restore recoverability of machine_check interruptsChristophe Leroy1-2/+7
[ Upstream commit daf00ae71dad8aa05965713c62558aeebf2df48e ] commit b96672dd840f ("powerpc: Machine check interrupt is a non- maskable interrupt") added a call to nmi_enter() at the beginning of machine check restart exception handler. Due to that, in_interrupt() always returns true regardless of the state before entering the exception, and die() panics even when the system was not already in interrupt. This patch calls nmi_exit() before calling die() in order to restore the interrupt state we had before calling nmi_enter() Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13powerpc/64s/hash: Do not use PPC_INVALIDATE_ERAT on CPUs before POWER9Nicholas Piggin1-0/+7
commit bc276ecba132caccb1fda5863a652c15def2b8c6 upstream. PPC_INVALIDATE_ERAT is slbia IH=7 which is a new variant introduced with POWER9, and the result is undefined on earlier CPUs. Commits 7b9f71f974 ("powerpc/64s: POWER9 machine check handler") and d4748276ae ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9") caused POWER7/8 code to use this instruction. Remove it. An ERAT flush can be made by invalidatig the SLB, but before POWER9 that requires a flush and rebolt. Fixes: 7b9f71f974 ("powerpc/64s: POWER9 machine check handler") Fixes: d4748276ae ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9") Cc: stable@vger.kernel.org # v4.11+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13powerpc/tm: Fix HFSCR bit for no suspend caseMichael Neuling1-6/+12
commit dd9a8c5a87395b6f05552c3b44e42fdc95760552 upstream. Currently on P9N DD2.1 we end up taking infinite TM facility unavailable exceptions on the first TM usage by userspace. In the special case of TM no suspend (P9N DD2.1), Linux is told TM is off via CPU dt-ftrs but told to (partially) use it via OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED. So HFSCR[TM] will be off from dt-ftrs but we need to turn it on for the no suspend case. This patch fixes this by enabling HFSCR TM in this case. Cc: stable@vger.kernel.org # 4.15+ Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13powerpc64/module elfv1: Set opd addresses after module relocationNaveen N. Rao2-5/+8
commit 59fe7eaf3598a89cbcd72e645b1d08afd76f7b29 upstream. module_frob_arch_sections() is called before the module is moved to its final location. The function descriptor section addresses we are setting here are thus invalid. Fix this by processing opd section during module_finalize() Fixes: 5633e85b2c313 ("powerpc64: Add .opd based function descriptor dereference") Cc: stable@vger.kernel.org # v4.16 Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-07Merge tag 'powerpc-4.19-4' of ↵Greg Kroah-Hartman1-0/+10
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Michael writes: "powerpc fixes for 4.19 #4 Four regression fixes. A fix for a change to lib/xz which broke our zImage loader when building with XZ compression. OK'ed by Herbert who merged the original patch. The recent fix we did to avoid patching __init text broke some 32-bit machines, fix that. Our show_user_instructions() could be tricked into printing kernel memory, add a check to avoid that. And a fix for a change to our NUMA initialisation logic, which causes crashes in some kdump configurations. Thanks to: Christophe Leroy, Hari Bathini, Jann Horn, Joel Stanley, Meelis Roos, Murilo Opsfelder Araujo, Srikar Dronamraju." * tag 'powerpc-4.19-4' of https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/numa: Skip onlining a offline node in kdump path powerpc: Don't print kernel instructions in show_user_instructions() powerpc/lib: fix book3s/32 boot failure due to code patching lib/xz: Put CRC32_POLY_LE in xz_private.h
2018-10-05powerpc: Don't print kernel instructions in show_user_instructions()Michael Ellerman1-0/+10
Recently we implemented show_user_instructions() which dumps the code around the NIP when a user space process dies with an unhandled signal. This was modelled on the x86 code, and we even went so far as to implement the exact same bug, namely that if the user process crashed with its NIP pointing into the kernel we will dump kernel text to dmesg. eg: bad-bctr[2996]: segfault (11) at c000000000010000 nip c000000000010000 lr 12d0b0894 code 1 bad-bctr[2996]: code: fbe10068 7cbe2b78 7c7f1b78 fb610048 38a10028 38810020 fb810050 7f8802a6 bad-bctr[2996]: code: 3860001c f8010080 48242371 60000000 <7c7b1b79> 4082002c e8010080 eb610048 This was discovered on x86 by Jann Horn and fixed in commit 342db04ae712 ("x86/dumpstack: Don't dump kernel memory based on usermode RIP"). Fix it by checking the adjusted NIP value (pc) and number of instructions against USER_DS, and bail if we fail the check, eg: bad-bctr[2969]: segfault (11) at c000000000010000 nip c000000000010000 lr 107930894 code 1 bad-bctr[2969]: Bad NIP, not dumping instructions. Fixes: 88b0fe175735 ("powerpc: Add show_user_instructions()") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-29Merge tag 'powerpc-4.19-3' of ↵Greg Kroah-Hartman2-5/+19
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Michael writes: "powerpc fixes for 4.19 #3 A reasonably big batch of fixes due to me being away for a few weeks. A fix for the TM emulation support on Power9, which could result in corrupting the guest r11 when running under KVM. Two fixes to the TM code which could lead to userspace GPR corruption if we take an SLB miss at exactly the wrong time. Our dynamic patching code had a bug that meant we could patch freed __init text, which could lead to corrupting userspace memory. csum_ipv6_magic() didn't work on little endian platforms since we optimised it recently. A fix for an endian bug when reading a device tree property telling us how many storage keys the machine has available. Fix a crash seen on some configurations of PowerVM when migrating the partition from one machine to another. A fix for a regression in the setup of our CPU to NUMA node mapping in KVM guests. A fix to our selftest Makefiles to make them work since a recent change to the shared Makefile logic." * tag 'powerpc-4.19-3' of https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: selftests/powerpc: Fix Makefiles for headers_install change powerpc/numa: Use associativity if VPHN hcall is successful powerpc/tm: Avoid possible userspace r1 corruption on reclaim powerpc/tm: Fix userspace r13 corruption powerpc/pseries: Fix unitialized timer reset on migration powerpc/pkeys: Fix reading of ibm, processor-storage-keys property powerpc: fix csum_ipv6_magic() on little endian platforms powerpc/powernv/ioda2: Reduce upper limit for DMA window size (again) powerpc: Avoid code patching freed init sections KVM: PPC: Book3S HV: Fix guest r11 corruption with POWER9 TM workarounds
2018-09-25powerpc/tm: Avoid possible userspace r1 corruption on reclaimMichael Neuling1-1/+8
Current we store the userspace r1 to PACATMSCRATCH before finally saving it to the thread struct. In theory an exception could be taken here (like a machine check or SLB miss) that could write PACATMSCRATCH and hence corrupt the userspace r1. The SLB fault currently doesn't touch PACATMSCRATCH, but others do. We've never actually seen this happen but it's theoretically possible. Either way, the code is fragile as it is. This patch saves r1 to the kernel stack (which can't fault) before we turn MSR[RI] back on. PACATMSCRATCH is still used but only with MSR[RI] off. We then copy r1 from the kernel stack to the thread struct once we have MSR[RI] back on. Suggested-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-25powerpc/tm: Fix userspace r13 corruptionMichael Neuling1-2/+9
When we treclaim we store the userspace checkpointed r13 to a scratch SPR and then later save the scratch SPR to the user thread struct. Unfortunately, this doesn't work as accessing the user thread struct can take an SLB fault and the SLB fault handler will write the same scratch SPRG that now contains the userspace r13. To fix this, we store r13 to the kernel stack (which can't fault) before we access the user thread struct. Found by running P8 guest + powervm + disable_1tb_segments + TM. Seen as a random userspace segfault with r13 looking like a kernel address. Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-17KVM: PPC: Book3S HV: Fix guest r11 corruption with POWER9 TM workaroundsMichael Neuling1-2/+2
When we come into the softpatch handler (0x1500), we use r11 to store the HSRR0 for later use by the denorm handler. We also use the softpatch handler for the TM workarounds for POWER9. Unfortunately, in kvmppc_interrupt_hv we later store r11 out to the vcpu assuming it's still what we got from userspace. This causes r11 to be corrupted in the VCPU and hence when we restore the guest, we get a corrupted r11. We've seen this when running TM tests inside guests on P9. This fixes the problem by only touching r11 in the denorm case. Fixes: 4bb3c7a020 ("KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9") Cc: <stable@vger.kernel.org> # 4.17+ Test-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-09-12KVM: PPC: Avoid marking DMA-mapped pages dirty in real modeAlexey Kardashevskiy1-25/+0
At the moment the real mode handler of H_PUT_TCE calls iommu_tce_xchg_rm() which in turn reads the old TCE and if it was a valid entry, marks the physical page dirty if it was mapped for writing. Since it is in real mode, realmode_pfn_to_page() is used instead of pfn_to_page() to get the page struct. However SetPageDirty() itself reads the compound page head and returns a virtual address for the head page struct and setting dirty bit for that kills the system. This adds additional dirty bit tracking into the MM/IOMMU API for use in the real mode. Note that this does not change how VFIO and KVM (in virtual mode) set this bit. The KVM (real mode) changes include: - use the lowest bit of the cached host phys address to carry the dirty bit; - mark pages dirty when they are unpinned which happens when the preregistered memory is released which always happens in virtual mode; - add mm_iommu_ua_mark_dirty_rm() helper to set delayed dirty bit; - change iommu_tce_xchg_rm() to take the kvm struct for the mm to use in the new mm_iommu_ua_mark_dirty_rm() helper; - move iommu_tce_xchg_rm() to book3s_64_vio_hv.c (which is the only caller anyway) to reduce the real mode KVM and IOMMU knowledge across different subsystems. This removes realmode_pfn_to_page() as it is not used anymore. While we at it, remove some EXPORT_SYMBOL_GPL() as that code is for the real mode only and modules cannot call it anyway. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-08-24Merge tag 'powerpc-4.19-2' of ↵Linus Torvalds4-16/+26
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - An implementation for the newly added hv_ops->flush() for the OPAL hvc console driver backends, I forgot to apply this after merging the hvc driver changes before the merge window. - Enable all PCI bridges at boot on powernv, to avoid races when multiple children of a bridge try to enable it simultaneously. This is a workaround until the PCI core can be enhanced to fix the races. - A fix to query PowerVM for the correct system topology at boot before initialising sched domains, seen in some configurations to cause broken scheduling etc. - A fix for pte_access_permitted() on "nohash" platforms. - Two commits to fix SIGBUS when using remap_pfn_range() seen on Power9 due to a workaround when using the nest MMU (GPUs, accelerators). - Another fix to the VFIO code used by KVM, the previous fix had some bugs which caused guests to not start in some configurations. - A handful of other minor fixes. Thanks to: Aneesh Kumar K.V, Benjamin Herrenschmidt, Christophe Leroy, Hari Bathini, Luke Dashjr, Mahesh Salgaonkar, Nicholas Piggin, Paul Mackerras, Srikar Dronamraju. * tag 'powerpc-4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/mce: Fix SLB rebolting during MCE recovery path. KVM: PPC: Book3S: Fix guest DMA when guest partially backed by THP pages powerpc/mm/radix: Only need the Nest MMU workaround for R -> RW transition powerpc/mm/books3s: Add new pte bit to mark pte temporarily invalid. powerpc/nohash: fix pte_access_permitted() powerpc/topology: Get topology for shared processors at boot powerpc64/ftrace: Include ftrace.h needed for enable/disable calls powerpc/powernv/pci: Work around races in PCI bridge enabling powerpc/fadump: cleanup crash memory ranges support powerpc/powernv: provide a console flush operation for opal hvc driver powerpc/traps: Avoid rate limit messages from show unhandled signals powerpc/64s: Fix PACA_IRQ_HARD_DIS accounting in idle_power4()
2018-08-21powerpc/topology: Get topology for shared processors at bootSrikar Dronamraju1-0/+5
On a shared LPAR, Phyp will not update the CPU associativity at boot time. Just after the boot system does recognize itself as a shared LPAR and trigger a request for correct CPU associativity. But by then the scheduler would have already created/destroyed its sched domains. This causes - Broken load balance across Nodes causing islands of cores. - Performance degradation esp if the system is lightly loaded - dmesg to wrongly report all CPUs to be in Node 0. - Messages in dmesg saying borken topology. - With commit 051f3ca02e46 ("sched/topology: Introduce NUMA identity node sched domain"), can cause rcu stalls at boot up. The sched_domains_numa_masks table which is used to generate cpumasks is only created at boot time just before creating sched domains and never updated. Hence, its better to get the topology correct before the sched domains are created. For example on 64 core Power 8 shared LPAR, dmesg reports Brought up 512 CPUs Node 0 CPUs: 0-511 Node 1 CPUs: Node 2 CPUs: Node 3 CPUs: Node 4 CPUs: Node 5 CPUs: Node 6 CPUs: Node 7 CPUs: Node 8 CPUs: Node 9 CPUs: Node 10 CPUs: Node 11 CPUs: ... BUG: arch topology borken the DIE domain not a subset of the NUMA domain BUG: arch topology borken the DIE domain not a subset of the NUMA domain numactl/lscpu output will still be correct with cores spreading across all nodes: Socket(s): 64 NUMA node(s): 12 Model: 2.0 (pvr 004d 0200) Model name: POWER8 (architected), altivec supported Hypervisor vendor: pHyp Virtualization type: para L1d cache: 64K L1i cache: 32K NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,176-183,272-279,368-375,464-471 NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,184-191,280-287,376-383,472-479 NUMA node2 CPU(s): 16-23,48-55,80-87,112-119,192-199,288-295,384-391,480-487 NUMA node3 CPU(s): 24-31,56-63,88-95,120-127,200-207,296-303,392-399,488-495 NUMA node4 CPU(s): 208-215,304-311,400-407,496-503 NUMA node5 CPU(s): 168-175,264-271,360-367,456-463 NUMA node6 CPU(s): 128-135,224-231,320-327,416-423 NUMA node7 CPU(s): 136-143,232-239,328-335,424-431 NUMA node8 CPU(s): 216-223,312-319,408-415,504-511 NUMA node9 CPU(s): 144-151,240-247,336-343,432-439 NUMA node10 CPU(s): 152-159,248-255,344-351,440-447 NUMA node11 CPU(s): 160-167,256-263,352-359,448-455 Currently on this LPAR, the scheduler detects 2 levels of Numa and created numa sched domains for all CPUs, but it finds a single DIE domain consisting of all CPUs. Hence it deletes all numa sched domains. To address this, detect the shared processor and update topology soon after CPUs are setup so that correct topology is updated just before scheduler creates sched domain. With the fix, dmesg reports: numa: Node 0 CPUs: 0-7 32-39 64-71 96-103 176-183 272-279 368-375 464-471 numa: Node 1 CPUs: 8-15 40-47 72-79 104-111 184-191 280-287 376-383 472-479 numa: Node 2 CPUs: 16-23 48-55 80-87 112-119 192-199 288-295 384-391 480-487 numa: Node 3 CPUs: 24-31 56-63 88-95 120-127 200-207 296-303 392-399 488-495 numa: Node 4 CPUs: 208-215 304-311 400-407 496-503 numa: Node 5 CPUs: 168-175 264-271 360-367 456-463 numa: Node 6 CPUs: 128-135 224-231 320-327 416-423 numa: Node 7 CPUs: 136-143 232-239 328-335 424-431 numa: Node 8 CPUs: 216-223 312-319 408-415 504-511 numa: Node 9 CPUs: 144-151 240-247 336-343 432-439 numa: Node 10 CPUs: 152-159 248-255 344-351 440-447 numa: Node 11 CPUs: 160-167 256-263 352-359 448-455 and lscpu also reports: Socket(s): 64 NUMA node(s): 12 Model: 2.0 (pvr 004d 0200) Model name: POWER8 (architected), altivec supported Hypervisor vendor: pHyp Virtualization type: para L1d cache: 64K L1i cache: 32K NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,176-183,272-279,368-375,464-471 NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,184-191,280-287,376-383,472-479 NUMA node2 CPU(s): 16-23,48-55,80-87,112-119,192-199,288-295,384-391,480-487 NUMA node3 CPU(s): 24-31,56-63,88-95,120-127,200-207,296-303,392-399,488-495 NUMA node4 CPU(s): 208-215,304-311,400-407,496-503 NUMA node5 CPU(s): 168-175,264-271,360-367,456-463 NUMA node6 CPU(s): 128-135,224-231,320-327,416-423 NUMA node7 CPU(s): 136-143,232-239,328-335,424-431 NUMA node8 CPU(s): 216-223,312-319,408-415,504-511 NUMA node9 CPU(s): 144-151,240-247,336-343,432-439 NUMA node10 CPU(s): 152-159,248-255,344-351,440-447 NUMA node11 CPU(s): 160-167,256-263,352-359,448-455 Reported-by: Manjunatha H R <manjuhr1@in.ibm.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> [mpe: Trim / format change log] Tested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-20powerpc/fadump: cleanup crash memory ranges supportHari Bathini1-7/+1
Commit 1bd6a1c4b80a ("powerpc/fadump: handle crash memory ranges array index overflow") changed crash memory ranges to a dynamic array that is reallocated on-demand with krealloc(). The relevant header for this call was not included. The kernel compiles though. But be cautious and add the header anyway. Also, memory allocation logic in fadump_add_crash_memory() takes care of memory allocation for crash memory ranges in all scenarios. Drop unnecessary memory allocation in fadump_setup_crash_memory_ranges(). Fixes: 1bd6a1c4b80a ("powerpc/fadump: handle crash memory ranges array index overflow") Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-20powerpc/traps: Avoid rate limit messages from show unhandled signalsMichael Ellerman1-7/+6
In the recent commit to add an explicit ratelimit state when showing unhandled signals, commit 35a52a10c3ac ("powerpc/traps: Use an explicit ratelimit state for show_signal_msg()"), I put the check of show_unhandled_signals and the ratelimit state before the call to unhandled_signal() so as to avoid unnecessarily calling the latter when show_unhandled_signals is false. However that causes us to check the ratelimit state on every call, so if we take a lot of *handled* signals that has the effect of making the ratelimit code print warnings that callbacks have been suppressed when they haven't. So rearrange the code so that we check show_unhandled_signals first, then call unhandled_signal() and finally check the ratelimit state. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
2018-08-17Merge tag 'powerpc-4.19-1' of ↵Linus Torvalds50-291/+617
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Notable changes: - A fix for a bug in our page table fragment allocator, where a page table page could be freed and reallocated for something else while still in use, leading to memory corruption etc. The fix reuses pt_mm in struct page (x86 only) for a powerpc only refcount. - Fixes to our pkey support. Several are user-visible changes, but bring us in to line with x86 behaviour and/or fix outright bugs. Thanks to Florian Weimer for reporting many of these. - A series to improve the hvc driver & related OPAL console code, which have been seen to cause hardlockups at times. The hvc driver changes in particular have been in linux-next for ~month. - Increase our MAX_PHYSMEM_BITS to 128TB when SPARSEMEM_VMEMMAP=y. - Remove Power8 DD1 and Power9 DD1 support, neither chip should be in use anywhere other than as a paper weight. - An optimised memcmp implementation using Power7-or-later VMX instructions - Support for barrier_nospec on some NXP CPUs. - Support for flushing the count cache on context switch on some IBM CPUs (controlled by firmware), as a Spectre v2 mitigation. - A series to enhance the information we print on unhandled signals to bring it into line with other arches, including showing the offending VMA and dumping the instructions around the fault. Thanks to: Aaro Koskinen, Akshay Adiga, Alastair D'Silva, Alexey Kardashevskiy, Alexey Spirkov, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Arnd Bergmann, Bartosz Golaszewski, Benjamin Herrenschmidt, Bharat Bhushan, Bjoern Noetel, Boqun Feng, Breno Leitao, Bryant G. Ly, Camelia Groza, Christophe Leroy, Christoph Hellwig, Cyril Bur, Dan Carpenter, Daniel Klamt, Darren Stevens, Dave Young, David Gibson, Diana Craciun, Finn Thain, Florian Weimer, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven, Geoff Levand, Guenter Roeck, Gustavo Romero, Haren Myneni, Hari Bathini, Joel Stanley, Jonathan Neuschäfer, Kees Cook, Madhavan Srinivasan, Mahesh Salgaonkar, Markus Elfring, Mathieu Malaterre, Mauro S. M. Rodrigues, Michael Hanselmann, Michael Neuling, Michael Schmitz, Mukesh Ojha, Murilo Opsfelder Araujo, Nicholas Piggin, Parth Y Shah, Paul Mackerras, Paul Menzel, Ram Pai, Randy Dunlap, Rashmica Gupta, Reza Arbab, Rodrigo R. Galvao, Russell Currey, Sam Bobroff, Scott Wood, Shilpasri G Bhat, Simon Guo, Souptick Joarder, Stan Johnson, Thiago Jung Bauermann, Tyrel Datwyler, Vaibhav Jain, Vasant Hegde, Venkat Rao, zhong jiang" * tag 'powerpc-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (234 commits) powerpc/mm/book3s/radix: Add mapping statistics powerpc/uaccess: Enable get_user(u64, *p) on 32-bit powerpc/mm/hash: Remove unnecessary do { } while(0) loop powerpc/64s: move machine check SLB flushing to mm/slb.c powerpc/powernv/idle: Fix build error powerpc/mm/tlbflush: update the mmu_gather page size while iterating address range powerpc/mm: remove warning about ‘type’ being set powerpc/32: Include setup.h header file to fix warnings powerpc: Move `path` variable inside DEBUG_PROM powerpc/powermac: Make some functions static powerpc/powermac: Remove variable x that's never read cxl: remove a dead branch powerpc/powermac: Add missing include of header pmac.h powerpc/kexec: Use common error handling code in setup_new_fdt() powerpc/xmon: Add address lookup for percpu symbols powerpc/mm: remove huge_pte_offset_and_shift() prototype powerpc/lib: Use patch_site to patch copy_32 functions once cache is enabled powerpc/pseries: Fix endianness while restoring of r3 in MCE handler. powerpc/fadump: merge adjacent memory ranges to reduce PT_LOAD segements powerpc/fadump: handle crash memory ranges array index overflow ...
2018-08-16Merge tag 'pci-v4.19-changes' of ↵Linus Torvalds1-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull pci updates from Bjorn Helgaas: - Decode AER errors with names similar to "lspci" (Tyler Baicar) - Expose AER statistics in sysfs (Rajat Jain) - Clear AER status bits selectively based on the type of recovery (Oza Pawandeep) - Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST (Alexandru Gagniuc) - Don't clear AER status bits if we're using the "Firmware-First" strategy where firmware owns the registers (Alexandru Gagniuc) - Use sysfs_match_string() to simplify ASPM sysfs parsing (Andy Shevchenko) - Remove unnecessary includes of <linux/pci-aspm.h> (Bjorn Helgaas) - Defer DPC event handling to work queue (Keith Busch) - Use threaded IRQ for DPC bottom half (Keith Busch) - Print AER status while handling DPC events (Keith Busch) - Work around IDT switch ACS Source Validation erratum (James Puthukattukaran) - Emit diagnostics for all cases of PCIe Link downtraining (Links operating slower than they're capable of) (Alexandru Gagniuc) - Skip VFs when configuring Max Payload Size (Myron Stowe) - Reduce Root Port Max Payload Size if necessary when hot-adding a device below it (Myron Stowe) - Simplify SHPC existence/permission checks (Bjorn Helgaas) - Remove hotplug sample skeleton driver (Lukas Wunner) - Convert pciehp to threaded IRQ handling (Lukas Wunner) - Improve pciehp tolerance of missed events and initially unstable links (Lukas Wunner) - Clear spurious pciehp events on resume (Lukas Wunner) - Add pciehp runtime PM support, including for Thunderbolt controllers (Lukas Wunner) - Support interrupts from pciehp bridges in D3hot (Lukas Wunner) - Mark fall-through switch cases before enabling -Wimplicit-fallthrough (Gustavo A. R. Silva) - Move DMA-debug PCI init from arch code to PCI core (Christoph Hellwig) - Fix pci_request_irq() usage of IRQF_ONESHOT when no handler is supplied (Heiner Kallweit) - Unify PCI and DMA direction #defines (Shunyong Yang) - Add PCI_DEVICE_DATA() macro (Andy Shevchenko) - Check for VPD completion before checking for timeout (Bert Kenward) - Limit Netronome NFP5000 config space size to work around erratum (Jakub Kicinski) - Set IRQCHIP_ONESHOT_SAFE for PCI MSI irqchips (Heiner Kallweit) - Document ACPI description of PCI host bridges (Bjorn Helgaas) - Add "pci=disable_acs_redir=" parameter to disable ACS redirection for peer-to-peer DMA support (we don't have the peer-to-peer support yet; this is just one piece) (Logan Gunthorpe) - Clean up devm_of_pci_get_host_bridge_resources() resource allocation (Jan Kiszka) - Fixup resizable BARs after suspend/resume (Christian König) - Make "pci=earlydump" generic (Sinan Kaya) - Fix ROM BAR access routines to stay in bounds and check for signature correctly (Rex Zhu) - Add DMA alias quirk for Microsemi Switchtec NTB (Doug Meyer) - Expand documentation for pci_add_dma_alias() (Logan Gunthorpe) - To avoid bus errors, enable PASID only if entire path supports End-End TLP prefixes (Sinan Kaya) - Unify slot and bus reset functions and remove hotplug knowledge from callers (Sinan Kaya) - Add Function-Level Reset quirks for Intel and Samsung NVMe devices to fix guest reboot issues (Alex Williamson) - Add function 1 DMA alias quirk for Marvell 88SS9183 PCIe SSD Controller (Bjorn Helgaas) - Remove Xilinx AXI-PCIe host bridge arch dependency (Palmer Dabbelt) - Remove Aardvark outbound window configuration (Evan Wang) - Fix Aardvark bridge window sizing issue (Zachary Zhang) - Convert Aardvark to use pci_host_probe() to reduce code duplication (Thomas Petazzoni) - Correct the Cadence cdns_pcie_writel() signature (Alan Douglas) - Add Cadence support for optional generic PHYs (Alan Douglas) - Add Cadence power management ops (Alan Douglas) - Remove redundant variable from Cadence driver (Colin Ian King) - Add Kirin MSI support (Xiaowei Song) - Drop unnecessary root_bus_nr setting from exynos, imx6, keystone, armada8k, artpec6, designware-plat, histb, qcom, spear13xx (Shawn Guo) - Move link notification settings from DesignWare core to individual drivers (Gustavo Pimentel) - Add endpoint library MSI-X interfaces (Gustavo Pimentel) - Correct signature of endpoint library IRQ interfaces (Gustavo Pimentel) - Add DesignWare endpoint library MSI-X callbacks (Gustavo Pimentel) - Add endpoint library MSI-X test support (Gustavo Pimentel) - Remove unnecessary GFP_ATOMIC from Hyper-V "new child" allocation (Jia-Ju Bai) - Add more devices to Broadcom PAXC quirk (Ray Jui) - Work around corrupted Broadcom PAXC config space to enable SMMU and GICv3 ITS (Ray Jui) - Disable MSI parsing to work around broken Broadcom PAXC logic in some devices (Ray Jui) - Hide unconfigured functions to work around a Broadcom PAXC defect (Ray Jui) - Lower iproc log level to reduce console output during boot (Ray Jui) - Fix mobiveil iomem/phys_addr_t type usage (Lorenzo Pieralisi) - Fix mobiveil missing include file (Lorenzo Pieralisi) - Add mobiveil Kconfig/Makefile support (Lorenzo Pieralisi) - Fix mvebu I/O space remapping issues (Thomas Petazzoni) - Use generic pci_host_bridge in mvebu instead of ARM-specific API (Thomas Petazzoni) - Whitelist VMD devices with fast interrupt handlers to avoid sharing vectors with slow handlers (Keith Busch) * tag 'pci-v4.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (153 commits) PCI/AER: Don't clear AER bits if error handling is Firmware-First PCI: Limit config space size for Netronome NFP5000 PCI/MSI: Set IRQCHIP_ONESHOT_SAFE for PCI-MSI irqchips PCI/VPD: Check for VPD access completion before checking for timeout PCI: Add PCI_DEVICE_DATA() macro to fully describe device ID entry PCI: Match Root Port's MPS to endpoint's MPSS as necessary PCI: Skip MPS logic for Virtual Functions (VFs) PCI: Add function 1 DMA alias quirk for Marvell 88SS9183 PCI: Check for PCIe Link downtraining PCI: Add ACS Redirect disable quirk for Intel Sunrise Point PCI: Add device-specific ACS Redirect disable infrastructure PCI: Convert device-specific ACS quirks from NULL termination to ARRAY_SIZE PCI: Add "pci=disable_acs_redir=" parameter for peer-to-peer support PCI: Allow specifying devices using a base bus and path of devfns PCI: Make specifying PCI devices in kernel parameters reusable PCI: Hide ACS quirk declarations inside PCI core PCI: Delay after FLR of Intel DC P3700 NVMe PCI: Disable Samsung SM961/PM961 NVMe before FLR PCI: Export pcie_has_flr() PCI: mvebu: Drop bogus comment above mvebu_pcie_map_registers() ...
2018-08-15Merge branch 'pci/misc'Bjorn Helgaas1-3/+0
- Mark fall-through switch cases before enabling -Wimplicit-fallthrough (Gustavo A. R. Silva) - Move DMA-debug PCI init from arch code to PCI core (Christoph Hellwig) - Fix pci_request_irq() usage of IRQF_ONESHOT when no handler is supplied (Heiner Kallweit) - Unify PCI and DMA direction #defines (Shunyong Yang) - Add PCI_DEVICE_DATA() macro (Andy Shevchenko) - Check for VPD completion before checking for timeout (Bert Kenward) - Limit Netronome NFP5000 config space size to work around erratum (Jakub Kicinski) * pci/misc: PCI: Limit config space size for Netronome NFP5000 PCI/VPD: Check for VPD access completion before checking for timeout PCI: Add PCI_DEVICE_DATA() macro to fully describe device ID entry PCI: Unify PCI and normal DMA direction definitions PCI: Use IRQF_ONESHOT if pci_request_irq() called with no handler PCI: Call dma_debug_add_bus() for pci_bus_type from PCI core PCI: Mark fall-through switch cases before enabling -Wimplicit-fallthrough # Conflicts: # drivers/pci/hotplug/pciehp_ctrl.c
2018-08-15Merge tag 'kbuild-v4.19' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - verify depmod is installed before modules_install - support build salt in case build ids must be unique between builds - allow users to specify additional host compiler flags via HOST*FLAGS, and rename internal variables to KBUILD_HOST*FLAGS - update buildtar script to drop vax support, add arm64 support - update builddeb script for better debarch support - document the pit-fall of if_changed usage - fix parallel build of UML with O= option - make 'samples' target depend on headers_install to fix build errors - remove deprecated host-progs variable - add a new coccinelle script for refcount_t vs atomic_t check - improve double-test coccinelle script - misc cleanups and fixes * tag 'kbuild-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (41 commits) coccicheck: return proper error code on fail Coccinelle: doubletest: reduce side effect false positives kbuild: remove deprecated host-progs variable kbuild: make samples really depend on headers_install um: clean up archheaders recipe kbuild: add %asm-generic to no-dot-config-targets um: fix parallel building with O= option scripts: Add Python 3 support to tracing/draw_functrace.py builddeb: Add automatic support for sh{3,4}{,eb} architectures builddeb: Add automatic support for riscv* architectures builddeb: Add automatic support for m68k architecture builddeb: Add automatic support for or1k architecture builddeb: Add automatic support for sparc64 architecture builddeb: Add automatic support for mips{,64}r6{,el} architectures builddeb: Add automatic support for mips64el architecture builddeb: Add automatic support for ppc64 and powerpcspe architectures builddeb: Introduce functions to simplify kconfig tests in set_debarch builddeb: Drop check for 32-bit s390 builddeb: Change architecture detection fallback to use dpkg-architecture builddeb: Skip architecture detection when KBUILD_DEBARCH is set ...
2018-08-14powerpc/64s: Fix PACA_IRQ_HARD_DIS accounting in idle_power4()Nicholas Piggin1-2/+14
When idle_power4() hard disables interrupts then finds a soft pending interrupt, it returns with interrupts hard disabled but without PACA_IRQ_HARD_DIS set. Commit 9b81c0211c ("powerpc/64s: make PACA_IRQ_HARD_DIS track MSR[EE] closely") added a warning for that condition (since disabled). Fix this by adding the PACA_IRQ_HARD_DIS for that case. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-13Merge branch 'perf-core-for-linus' of ↵Linus Torvalds4-192/+58
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf update from Thomas Gleixner: "The perf crowd presents: Kernel updates: - Removal of jprobes - Cleanup and consolidatation the handling of kprobes - Cleanup and consolidation of hardware breakpoints - The usual pile of fixes and updates to PMUs and event descriptors Tooling updates: - Updates and improvements all over the place. Nothing outstanding, just the (good) boring incremental grump work" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits) perf trace: Do not require --no-syscalls to suppress strace like output perf bpf: Include uapi/linux/bpf.h from the 'perf trace' script's bpf.h perf tools: Allow overriding MAX_NR_CPUS at compile time perf bpf: Show better message when failing to load an object perf list: Unify metric group description format with PMU event description perf vendor events arm64: Update ThunderX2 implementation defined pmu core events perf cs-etm: Generate branch sample for CS_ETM_TRACE_ON packet perf cs-etm: Generate branch sample when receiving a CS_ETM_TRACE_ON packet perf cs-etm: Support dummy address value for CS_ETM_TRACE_ON packet perf cs-etm: Fix start tracing packet handling perf build: Fix installation directory for eBPF perf c2c report: Fix crash for empty browser perf tests: Fix indexing when invoking subtests perf trace: Beautify the AF_INET & AF_INET6 'socket' syscall 'protocol' args perf trace beauty: Add beautifiers for 'socket''s 'protocol' arg perf trace beauty: Do not print NULL strarray entries perf beauty: Add a generator for IPPROTO_ socket's protocol constants tools include uapi: Grab a copy of linux/in.h perf tests: Fix complex event name parsing perf evlist: Fix error out while applying initial delay and LBR ...
2018-08-13Merge branch 'fixes' into nextMichael Ellerman1-0/+2
Merge our fixes branch from the 4.18 cycle to resolve some minor conflicts.
2018-08-10powerpc/64s: move machine check SLB flushing to mm/slb.cNicholas Piggin1-17/+9
The machine check code that flushes and restores bolted segments in real mode belongs in mm/slb.c. This will also be used by pseries machine check and idle code in future changes. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10powerpc/32: Include setup.h header file to fix warningsMathieu Malaterre1-0/+2
Make sure to include setup.h to provide the following prototypes: - irqstack_early_init - setup_power_save - initialize_cache_info Fix the following warnings (treated as error in W=1): arch/powerpc/kernel/setup_32.c:198:13: error: no previous prototype for ‘irqstack_early_init’ arch/powerpc/kernel/setup_32.c:238:13: error: no previous prototype for ‘setup_power_save’ arch/powerpc/kernel/setup_32.c:253:13: error: no previous prototype for ‘initialize_cache_info’ Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10powerpc: Move `path` variable inside DEBUG_PROMMathieu Malaterre1-2/+7
Add gcc attribute unused for two variables. Fix warnings treated as errors with W=1: arch/powerpc/kernel/prom_init.c:1388:8: error: variable ‘path’ set but not used [-Werror=unused-but-set-variable] Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10powerpc/kexec: Use common error handling code in setup_new_fdt()Markus Elfring1-16/+12
Add a jump target so that a bit of exception handling can be better reused at the end of this function. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10powerpc/lib: Use patch_site to patch copy_32 functions once cache is enabledChristophe Leroy1-4/+3
The symbol memcpy_nocache_branch defined in order to allow patching of memset function once cache is enabled leads to confusing reports by perf tool. Using the new patch_site functionality solves this issue. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10powerpc/fadump: merge adjacent memory ranges to reduce PT_LOAD segementsHari Bathini1-9/+36
With dynamic memory allocation support for crash memory ranges array, there is no hard limit on the no. of crash memory ranges kernel could export, but program headers count could overflow in the /proc/vmcore ELF file while exporting each memory range as PT_LOAD segment. Reduce the likelihood of a such scenario, by folding adjacent crash memory ranges which minimizes the total number of PT_LOAD segments. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-10powerpc/fadump: handle crash memory ranges array index overflowHari Bathini1-14/+77
Crash memory ranges is an array of memory ranges of the crashing kernel to be exported as a dump via /proc/vmcore file. The size of the array is set based on INIT_MEMBLOCK_REGIONS, which works alright in most cases where memblock memory regions count is less than INIT_MEMBLOCK_REGIONS value. But this count can grow beyond INIT_MEMBLOCK_REGIONS value since commit 142b45a72e22 ("memblock: Add array resizing support"). On large memory systems with a few DLPAR operations, the memblock memory regions count could be larger than INIT_MEMBLOCK_REGIONS value. On such systems, registering fadump results in crash or other system failures like below: task: c00007f39a290010 ti: c00000000b738000 task.ti: c00000000b738000 NIP: c000000000047df4 LR: c0000000000f9e58 CTR: c00000000010f180 REGS: c00000000b73b570 TRAP: 0300 Tainted: G L X (4.4.140+) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 22004484 XER: 20000000 CFAR: c000000000008500 DAR: 000007a450000000 DSISR: 40000000 SOFTE: 0 ... NIP [c000000000047df4] smp_send_reschedule+0x24/0x80 LR [c0000000000f9e58] resched_curr+0x138/0x160 Call Trace: resched_curr+0x138/0x160 (unreliable) check_preempt_curr+0xc8/0xf0 ttwu_do_wakeup+0x38/0x150 try_to_wake_up+0x224/0x4d0 __wake_up_common+0x94/0x100 ep_poll_callback+0xac/0x1c0 __wake_up_common+0x94/0x100 __wake_up_sync_key+0x70/0xa0 sock_def_readable+0x58/0xa0 unix_stream_sendmsg+0x2dc/0x4c0 sock_sendmsg+0x68/0xa0 ___sys_sendmsg+0x2cc/0x2e0 __sys_sendmsg+0x5c/0xc0 SyS_socketcall+0x36c/0x3f0 system_call+0x3c/0x100 as array index overflow is not checked for while setting up crash memory ranges causing memory corruption. To resolve this issue, dynamically allocate memory for crash memory ranges and resize it incrementally, in units of pagesize, on hitting array size limit. Fixes: 2df173d9e85d ("fadump: Initialize elfcore header and add PT_LOAD program headers.") Cc: stable@vger.kernel.org # v3.4+ Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> [mpe: Just use PAGE_SIZE directly, fixup variable placement] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/Makefiles: Convert ifeq to ifdef where possibleRodrigo R. Galvao1-6/+8
In Makefiles if we're testing a CONFIG_FOO symbol for equality with 'y' we can instead just use ifdef. The latter reads easily, so convert to it where possible. Signed-off-by: Rodrigo R. Galvao <rosattig@linux.vnet.ibm.com> Reviewed-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/traps: Show instructions on exceptionsMurilo Opsfelder Araujo1-0/+3
Call show_user_instructions() in arch/powerpc/kernel/traps.c to dump instructions at faulty location, useful to debugging. Before this patch, an unhandled signal message looked like: pandafault[10524]: segfault (11) at 100007d0 nip 1000061c lr 7fffbd295100 code 2 in pandafault[10000000+10000] After this patch, it looks like: pandafault[10524]: segfault (11) at 100007d0 nip 1000061c lr 7fffbd295100 code 2 in pandafault[10000000+10000] pandafault[10524]: code: 4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe pandafault[10524]: code: 392988d0 f93f0020 e93f0020 39400048 <99490000> 39200000 7d234b78 383f0040 Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc: Add show_user_instructions()Murilo Opsfelder Araujo1-0/+32
show_user_instructions() is a slightly modified version of show_instructions() that allows userspace instruction dump. This will be useful within show_signal_msg() to dump userspace instructions of the faulty location. Here is a sample of what show_user_instructions() outputs: pandafault[10850]: code: 4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe pandafault[10850]: code: 392988d0 f93f0020 e93f0020 39400048 <99490000> 39200000 7d234b78 383f0040 The current->comm and current->pid printed can serve as a glue that links the instructions dump to its originator, allowing messages to be interleaved in the logs. Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/traps: Print VMA for unhandled signalsMurilo Opsfelder Araujo1-2/+19
This adds VMA address in the message printed for unhandled signals, similarly to what other architectures, like x86, print. Before this patch, a page fault looked like: pandafault[61470]: unhandled signal 11 at 100007d0 nip 1000061c lr 7fff8d185100 code 2 After this patch, a page fault looks like: pandafault[6303]: segfault 11 at 100007d0 nip 1000061c lr 7fff93c55100 code 2 in pandafault[10000000+10000] Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/traps: Use %lx format in show_signal_msg()Murilo Opsfelder Araujo1-8/+3
Use %lx format to print registers. This avoids having two different formats and avoids checking for MSR_64BIT, improving readability of the function. Even though we could have used %px, which is functionally equivalent to %lx as per Documentation/core-api/printk-formats.rst, it is not semantically correct because the data printed are not pointers. And using %px requires casting data to (void *). Besides that, %lx matches the format used in show_regs(). Before this patch: pandafault[4808]: unhandled signal 11 at 0000000010000718 nip 0000000010000574 lr 00007fff935e7a6c code 2 After this patch: pandafault[4732]: unhandled signal 11 at 10000718 nip 10000574 lr 7fff86697a6c code 2 Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07powerpc/traps: Use an explicit ratelimit state for show_signal_msg()Murilo Opsfelder Araujo1-5/+16
Replace printk_ratelimited() by printk() and a default rate limit burst to limit displaying unhandled signals messages. This will allow us to call print_vma_addr() in a future patch, which does not work with printk_ratelimited(). Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>