summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)AuthorFilesLines
2018-10-14powerpc/64s/hash: Add a SLB preload cacheNicholas Piggin1-0/+7
When switching processes, currently all user SLBEs are cleared, and a few (exec_base, pc, and stack) are preloaded. In trivial testing with small apps, this tends to miss the heap and low 256MB segments, and it will also miss commonly accessed segments on large memory workloads. Add a simple round-robin preload cache that just inserts the last SLB miss into the head of the cache and preloads those at context switch time. Every 256 context switches, the oldest entry is removed from the cache to shrink the cache and require fewer slbmte if they are unused. Much more could go into this, including into the SLB entry reclaim side to track some LRU information etc, which would require a study of large memory workloads. But this is a simple thing we can do now that is an obvious win for common workloads. With the full series, process switching speed on the context_switch benchmark on POWER9/hash (with kernel speculation security masures disabled) increases from 140K/s to 178K/s (27%). POWER8 does not change much (within 1%), it's unclear why it does not see a big gain like POWER9. Booting to busybox init with 256MB segments has SLB misses go down from 945 to 69, and with 1T segments 900 to 21. These could almost all be eliminated by preloading a bit more carefully with ELF binary loading. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/64s/hash: Provide arch_setup_exec() hooks for hash slice setupNicholas Piggin1-0/+9
This will be used by the SLB code in the next patch, but for now this sets the slb_addr_limit to the correct size for 32-bit tasks. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/64s/hash: Add SLB allocation status bitmapsNicholas Piggin1-1/+1
Add 32-entry bitmaps to track the allocation status of the first 32 SLB entries, and whether they are user or kernel entries. These are used to allocate free SLB entries first, before resorting to the round robin allocator. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/64s/hash: Convert SLB miss handlers to CNicholas Piggin1-160/+42
This patch moves SLB miss handlers completely to C, using the standard exception handler macros to set up the stack and branch to C. This can be done because the segment containing the kernel stack is always bolted, so accessing it with relocation on will not cause an SLB exception. Arbitrary kernel memory must not be accessed when handling kernel space SLB misses, so care should be taken there. However user SLB misses can access any kernel memory, which can be used to move some fields out of the paca (in later patches). User SLB misses could quite easily reconcile IRQs and set up a first class kernel environment and exit via ret_from_except, however that doesn't seem to be necessary at the moment, so we only do that if a bad fault is encountered. [ Credit to Aneesh for bug fixes, error checks, and improvements to bad address handling, etc ] Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Disallow tracing for all of slb.c for now.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/64: Interrupts save PPR on stack rather than thread_structNicholas Piggin4-14/+9
PPR is the odd register out when it comes to interrupt handling, it is saved in current->thread.ppr while all others are saved on the stack. The difficulty with this is that accessing thread.ppr can cause a SLB fault, but the SLB fault handler implementation in C change had assumed the normal exception entry handlers would not cause an SLB fault. Fix this by allocating room in the interrupt stack to save PPR. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/ptrace: Don't use sizeof(struct pt_regs) in ptrace codeMichael Ellerman1-7/+7
Now that we've split the user & kernel versions of pt_regs we need to be more careful in the ptrace code. For now we've ensured the location of the fields in both structs is the same, so most of the ptrace code doesn't need updating. But there are a few places where we use sizeof(pt_regs), and these will be wrong as soon as we increase the size of the kernel structure. So flip them all to use sizeof(user_pt_regs). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc: Split user/kernel definitions of struct pt_regsMichael Ellerman1-0/+39
We use a shared definition for struct pt_regs in uapi/asm/ptrace.h. That means the layout of the structure is ABI, ie. we can't change it. That would be fine if it was only used to describe the user-visible register state of a process, but it's also the struct we use in the kernel to describe the registers saved in an interrupt frame. We'd like more flexibility in the content (and possibly layout) of the kernel version of the struct, but currently that's not possible. So split the definition into a user-visible definition which remains unchanged, and a kernel internal one. At the moment they're still identical, and we check that at build time. That's because we have code (in ptrace etc.) that assumes that they are the same. We will fix that code in future patches, and then we can break the strict symmetry between the two structs. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/prom_init: Make "default_colors" constBenjamin Herrenschmidt1-1/+1
It's never modified. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/prom_init: Make "fake_elf" constBenjamin Herrenschmidt1-1/+1
It is never modified Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/prom_init: Make of_workarounds staticBenjamin Herrenschmidt1-1/+1
It's not used anywhere else. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/8xx: change name of a few page flags to avoid confusionChristophe Leroy1-3/+3
_PAGE_PRIVILEGED corresponds to the SH bit which doesn't protect against user access but only disables ASID verification on kernel accesses. User access is controlled with _PMD_USER flag. Name it _PAGE_SH instead of _PAGE_PRIVILEGED _PAGE_HUGE corresponds to the SPS bit which doesn't really tells that's it is a huge page but only that it is not a 4k page. Name it _PAGE_SPS instead of _PAGE_HUGE Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc: handover page flags with a pgprot_t parameterChristophe Leroy3-6/+6
In order to avoid multiple conversions, handover directly a pgprot_t to map_kernel_page() as already done for radix. Do the same for __ioremap_caller() and __ioremap_at(). Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/mm: properly set PAGE_KERNEL flags in ioremap()Christophe Leroy2-4/+4
Set PAGE_KERNEL directly in the caller and do not rely on a hack adding PAGE_KERNEL flags when _PAGE_PRESENT is not set. As already done for PPC64, use pgprot_cache() helpers instead of _PAGE_XXX flags in PPC32 ioremap() derived functions. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc: don't use ioremap_prot() nor __ioremap() unless really needed.Christophe Leroy2-2/+2
In many places, ioremap_prot() and __ioremap() can be replaced with higher level functions like ioremap(), ioremap_coherent(), ioremap_cache(), ioremap_wc() ... Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc/rtas: Fix a potential race between CPU-Offline & MigrationGautham R. Shenoy1-0/+9
Live Partition Migrations require all the present CPUs to execute the H_JOIN call, and hence rtas_ibm_suspend_me() onlines any offline CPUs before initiating the migration for this purpose. The commit 85a88cabad57 ("powerpc/pseries: Disable CPU hotplug across migrations") disables any CPU-hotplug operations once all the offline CPUs are brought online to prevent any further state change. Once the CPU-Hotplug operation is disabled, the code assumes that all the CPUs are online. However, there is a minor window in rtas_ibm_suspend_me() between onlining the offline CPUs and disabling CPU-Hotplug when a concurrent CPU-offline operations initiated by the userspace can succeed thereby nullifying the the aformentioned assumption. In this unlikely case these offlined CPUs will not call H_JOIN, resulting in a system hang. Fix this by verifying that all the present CPUs are actually online after CPU-Hotplug has been disabled, failing which we restore the state of the offline CPUs in rtas_ibm_suspend_me() and return an -EBUSY. Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc/cacheinfo: Report the correct shared_cpu_map on big-coresGautham R. Shenoy1-2/+35
Currently on POWER9 SMT8 cores systems, in sysfs, we report the shared_cache_map for L1 caches (both data and instruction) to be the cpu-ids of the threads in SMT8 cores. This is incorrect since on POWER9 SMT8 cores there are two groups of threads, each of which shares its own L1 cache. This patch addresses this by reporting the shared_cpu_map correctly in sysfs for L1 caches. Before the patch /sys/devices/system/cpu/cpu0/cache/index0/shared_cpu_map : 000000ff /sys/devices/system/cpu/cpu0/cache/index1/shared_cpu_map : 000000ff /sys/devices/system/cpu/cpu1/cache/index0/shared_cpu_map : 000000ff /sys/devices/system/cpu/cpu1/cache/index1/shared_cpu_map : 000000ff After the patch /sys/devices/system/cpu/cpu0/cache/index0/shared_cpu_map : 00000055 /sys/devices/system/cpu/cpu0/cache/index1/shared_cpu_map : 00000055 /sys/devices/system/cpu/cpu1/cache/index0/shared_cpu_map : 000000aa /sys/devices/system/cpu/cpu1/cache/index1/shared_cpu_map : 000000aa Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc: Use cpu_smallcore_sibling_mask at SMT level on bigcoresGautham R. Shenoy1-1/+18
POWER9 SMT8 cores consist of two groups of threads, where threads in each group shares L1-cache. The scheduler is not aware of this distinction as the current sched-domain hierarchy has all the threads of the core defined at the SMT domain. SMT [Thread siblings of the SMT8 core] DIE [CPUs in the same die] NUMA [All the CPUs in the system] Due to this, we can observe run-to-run variance when we run a multi-threaded benchmark bound to a single core based on how the scheduler spreads the software threads across the two groups in the core. We fix this in this patch by defining each group of threads which share L1-cache to be the SMT level. The group of threads in the SMT8 core is defined to be the CACHE level. The sched-domain hierarchy after this patch will be : SMT [Thread siblings in the core that share L1 cache] CACHE [Thread siblings that are in the SMT8 core] DIE [CPUs in the same die] NUMA [All the CPUs in the system] Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc: Detect the presence of big-cores via "ibm, thread-groups"Gautham R. Shenoy1-0/+222
On IBM POWER9, the device tree exposes a property array identifed by "ibm,thread-groups" which will indicate which groups of threads share a particular set of resources. As of today we only have one form of grouping identifying the group of threads in the core that share the L1 cache, translation cache and instruction data flow. This patch adds helper functions to parse the contents of "ibm,thread-groups" and populate a per-cpu variable to cache information about siblings of each CPU that share the L1, traslation cache and instruction data-flow. It also defines a new global variable named "has_big_cores" which indicates if the cores on this configuration have multiple groups of threads that share L1 cache. For each online CPU, it maintains a cpu_smallcore_mask, which indicates the online siblings which share the L1-cache with it. Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc: Use SWITCH_FRAME_SIZE for prom and rtas entryJoel Stanley2-14/+5
Commit 6c1719942e19 ("powerpc/of: Remove useless register save/restore when calling OF back") removed the saving of srr0 and srr1 when calling into OpenFirmware. Commit e31aa453bbc4 ("powerpc: Use LOAD_REG_IMMEDIATE only for constants on 64-bit") did the same for rtas. This means we don't need to save the extra stack space and can use the common SWITCH_FRAME_SIZE. There were already no users of _SRR0 and _SRR1 so we can remove them too. Link: https://github.com/linuxppc/linux/issues/83 Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc/pseries/mobility: Extend start/stop topology update scopeMichael Bringmann1-2/+0
The powerpc mobility code may receive RTAS requests to perform PRRN (Platform Resource Reassignment Notification) topology changes at any time, including during LPAR migration operations. In some configurations where the affinity of CPUs or memory is being changed on that platform, the PRRN requests may apply or refer to outdated information prior to the complete update of the device-tree. This patch changes the duration for which topology updates are suppressed during LPAR migrations from just the rtas_ibm_suspend_me() / 'ibm,suspend-me' call(s) to cover the entire migration_store() operation to allow all changes to the device-tree to be applied prior to accepting and applying any PRRN requests. For tracking purposes, pr_info notices are added to the functions start_topology_update() and stop_topology_update() of 'numa.c'. Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc/eeh: Cleanup control flow in eeh_handle_normal_event()Sam Bobroff1-102/+94
Rather than mixing "if (state)" blocks and gotos, convert entirely to "if (state)" blocks to make the state machine behaviour clearer. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc/eeh: Cleanup eeh_ops.wait_state()Sam Bobroff3-6/+56
The wait_state member of eeh_ops does not need to be platform dependent; it's just logic around eeh_ops.get_state(). Therefore, merge the two (slightly different!) platform versions into a new function, eeh_wait_state() and remove the eeh_ops member. While doing this, also correct: * The wait logic, so that it never waits longer than max_wait. * The wait logic, so that it never waits less than EEH_STATE_MIN_WAIT_TIME. * One call site where the result is treated like a bit field before it's checked for negative error values. * In pseries_eeh_get_state(), rename the "state" parameter to "delay" because that's what it is. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc/eeh: Cleanup eeh_pe_state_mark()Sam Bobroff3-48/+40
Currently, eeh_pe_state_mark() marks a PE (and it's children) with a state and then performs additional processing if that state included EEH_PE_ISOLATED. The state parameter is always a constant at the call site, so rearrange eeh_pe_state_mark() into two functions and just call the appropriate one at each site. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc/eeh: Cleanup unnecessary eeh_pe_state_mark_with_cfg()Sam Bobroff2-24/+4
The function eeh_pe_state_mark_with_cfg() just performs the work of eeh_pe_state_mark() and then, conditionally, the work of eeh_pe_state_clear(). However it is only ever called with a constant state such that the condition is always true, so replace it by direct calls. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc/eeh: Cleanup logic in eeh_rmv_from_parent_pe()Sam Bobroff1-2/+2
Move the call to eeh_dev_to_pe() up, so that later it's clear that "pe" isn't NULL. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc/eeh: Cleanup field names in eeh_rmv_dataSam Bobroff1-10/+11
Change the name of the fields in eeh_rmv_data to clarify their usage. Change "edev_list" to "removed_vf_list" because it does not contain generic edevs, but rather only edevs that contain virtual functions (which need to be removed during recovery). Similarly, change "removed" to "removed_dev_count" because it is a count of any removed devices, not just those in the above list. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc/eeh: Cleanup list_head field namesSam Bobroff3-13/+10
Instances of struct eeh_pe are placed in a tree structure using the fields "child_list" and "child", so place these next to each other in the definition. The field "child" is a list entry, so remove the unnecessary and misleading use of the list initializer, LIST_HEAD(), on it. The eeh_dev struct contains two list entry fields, called "list" and "rmv_list". Rename them to "entry" and "rmv_entry" and, as above, stop initializing them with LIST_HEAD(). Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc/eeh: Cleanup eeh_add_virt_device()Sam Bobroff1-4/+3
Remove the unnecessary cast through void * on the first parameter and remove the unused second parameter (always NULL). Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc/eeh: Cleanup unused field in eeh_devSam Bobroff1-1/+0
The 'bus' member of struct eeh_dev is assigned to once but never used, so remove it. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc/eeh: Cleanup EEH_POSTPONED_PROBESam Bobroff1-12/+6
Currently a flag, EEH_POSTPONED_PROBE, is used to prevent an incorrect message "EEH: No capable adapters found" from being displayed during the boot of powernv systems. It is necessary because, on powernv, the call to eeh_probe_devices() made from eeh_init() is too early and EEH can't yet be enabled. A second call is made later from eeh_pnv_post_init(), which succeeds. (On pseries, the first call succeeds because PCI devices are set up early enough and no second call is made.) This can be simplified by moving the early call to eeh_probe_devices() from eeh_init() (where it's seen by both platforms) to pSeries_final_fixup(), so that each platform only calls eeh_probe_devices() once, at a point where it can succeed. This is slightly later in the boot sequence, but but still early enough and it is now in the same place in the sequence for both platforms (the pcibios_fixup hook). The display of the message can be cleaned up as well, by moving it into eeh_probe_devices(). Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc/eeh: Fix use of EEH_PE_KEEP on wrong fieldSam Bobroff1-1/+1
eeh_add_to_parent_pe() sometimes removes the EEH_PE_KEEP flag, but it incorrectly removes it from pe->type, instead of pe->state. However, rather than clearing it from the correct field, remove it. Inspection of the code shows that it can't ever have had any effect (even if it had been cleared from the correct field), because the field is never tested after it is cleared by the statement in question. The clear statement was added by commit 807a827d4e74 ("powerpc/eeh: Keep PE during hotplug"), but it didn't explain why it was necessary. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc/eeh: Fix null deref for devices removed during EEHSam Bobroff1-0/+4
If a device is removed during EEH processing (either by a driver's handler or as part of recovery), it can lead to a null dereference in eeh_pe_report_edev(). To handle this, skip devices that have been removed. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc/eeh: Fix possible null deref in eeh_dump_dev_log()Sam Bobroff1-0/+5
If an error occurs during an unplug operation, it's possible for eeh_dump_dev_log() to be called when edev->pdn is null, which currently leads to dereferencing a null pointer. Handle this by skipping the error log for those devices. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc/rtasd: Improve unknown error loggingOliver O'Halloran1-2/+4
Currently when we get an unknown RTAS event it prints the type as "Unknown" and no other useful information. Add the raw type code to the log message so that we have something to work off. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc: Disable -Wbuiltin-requires-header when setjmp is usedJoel Stanley1-0/+3
The powerpc kernel uses setjmp which causes a warning when building with clang: In file included from arch/powerpc/xmon/xmon.c:51: ./arch/powerpc/include/asm/setjmp.h:15:13: error: declaration of built-in function 'setjmp' requires inclusion of the header <setjmp.h> [-Werror,-Wbuiltin-requires-header] extern long setjmp(long *); ^ ./arch/powerpc/include/asm/setjmp.h:16:13: error: declaration of built-in function 'longjmp' requires inclusion of the header <setjmp.h> [-Werror,-Wbuiltin-requires-header] extern void longjmp(long *, long); ^ This *is* the header and we're not using the built-in setjump but rather the one in arch/powerpc/kernel/misc.S. As the compiler warning does not make sense, it for the files where setjmp is used. Signed-off-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> [mpe: Move subdir-ccflags in xmon/Makefile to not clobber -Werror] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc/process: Constify the number of insns printed by show instructions ↵Christophe Leroy1-7/+6
functions. instructions_to_print var is assigned value 16 and there is no way to change it. This patch replaces it by a constant. Reviewed-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc/process: Fix interleaved output in show_user_instructions()Christophe Leroy1-18/+19
When two processes crash at the same time, we sometimes encounter interleaving in the middle of a line: init[1]: segfault (11) at 0 nip 0 lr 0 code 1 init[1]: code: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX init[74]: segfault (11) at 10a74 nip 1000c198 lr 100078c8 code 1 in sh[10000000+14000] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX init[1]: code: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX init[74]: code: 90010024 bf61000c 91490a7c 3fa01002 3be00000 7d3e4b78 3bbd0c20 3b600000 init[74]: code: 3b9d0040 7c7fe02e 2f830000 419e0028 <89230000> 2f890000 41be001c 4b7f6e79 This patch fixes it by preparing complete lines in a buffer and printing it at once. Fixes: 88b0fe1757359 ("powerpc: Add show_user_instructions()") Reviewed-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Use seq_buf_printf() not seq_buf_puts() which doesn't NULL terminate] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc/process: Add missing include of stacktrace.hChristophe Leroy1-0/+1
As spotted by sparse: arch/powerpc/kernel/process.c:1302:6: warning: symbol 'show_user_instructions' was not declared. Should it be static? Fixes: 88b0fe1757359 ("powerpc: Add show_user_instructions()") Reviewed-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc/process: Fix sparse address space warningsChristophe Leroy1-2/+2
This patch fixes the following warnings, which are leftovers from when __get_user() was replaced by probe_kernel_address(). arch/powerpc/kernel/process.c:1287:22: warning: incorrect type in argument 2 (different address spaces) arch/powerpc/kernel/process.c:1287:22: expected void const *src arch/powerpc/kernel/process.c:1287:22: got unsigned int [noderef] <asn:1>*<noident> arch/powerpc/kernel/process.c:1319:21: warning: incorrect type in argument 2 (different address spaces) arch/powerpc/kernel/process.c:1319:21: expected void const *src arch/powerpc/kernel/process.c:1319:21: got unsigned int [noderef] <asn:1>*<noident> Fixes: 7b051f665c32d ("powerpc: Use probe_kernel_address in show_instructions") Reviewed-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc/64: properly initialise the stackprotector canary on SMP.Christophe Leroy1-0/+8
commit 06ec27aea9fc ("powerpc/64: add stack protector support") doesn't initialise the stack canary on SMP secondary CPU's paca, leading to the following false positive report from the stack protector. smp: Bringing up secondary CPUs ... Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: __schedule+0x978/0xa80 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.0-rc7-next-20181010-autotest-autotest #1 Call Trace: [c000001fed5b3bf0] [c000000000a0ef3c] dump_stack+0xb0/0xf4 (unreliable) [c000001fed5b3c30] [c0000000000f9d68] panic+0x140/0x308 [c000001fed5b3cc0] [c0000000000f9844] __stack_chk_fail+0x24/0x30 [c000001fed5b3d20] [c000000000a2c3a8] __schedule+0x978/0xa80 [c000001fed5b3e00] [c000000000a2c9b4] schedule_idle+0x34/0x60 [c000001fed5b3e30] [c00000000013d344] do_idle+0x224/0x3d0 [c000001fed5b3ec0] [c00000000013d6e0] cpu_startup_entry+0x30/0x50 [c000001fed5b3ef0] [c000000000047f34] start_secondary+0x4d4/0x520 [c000001fed5b3f90] [c00000000000b370] start_secondary_prolog+0x10/0x14 This patch properly initialises the stack_canary of the secondary idle tasks. Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Fixes: 06ec27aea9fc ("powerpc/64: add stack protector support") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-11vmlinux.lds.h: Move LSM_TABLE into INIT_DATAKees Cook1-2/+0
Since the struct lsm_info table is not an initcall, we can just move it into INIT_DATA like all the other tables. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: John Johansen <john.johansen@canonical.com> Reviewed-by: James Morris <james.morris@microsoft.com> Signed-off-by: James Morris <james.morris@microsoft.com>
2018-10-09Merge branch 'fixes' into nextMichael Ellerman3-4/+28
Merge our fixes branch. It has a few important fixes that are needed for futher testing and also some commits that will conflict with content in next.
2018-10-09KVM: PPC: Book3S HV: Nested guest entry via hypercallPaul Mackerras1-0/+1
This adds a new hypercall, H_ENTER_NESTED, which is used by a nested hypervisor to enter one of its nested guests. The hypercall supplies register values in two structs. Those values are copied by the level 0 (L0) hypervisor (the one which is running in hypervisor mode) into the vcpu struct of the L1 guest, and then the guest is run until an interrupt or error occurs which needs to be reported to L1 via the hypercall return value. Currently this assumes that the L0 and L1 hypervisors are the same endianness, and the structs passed as arguments are in native endianness. If they are of different endianness, the version number check will fail and the hcall will be rejected. Nested hypervisors do not support indep_threads_mode=N, so this adds code to print a warning message if the administrator has set indep_threads_mode=N, and treat it as Y. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-09KVM: PPC: Use ccr field in pt_regs struct embedded in vcpu structPaul Mackerras1-2/+2
When the 'regs' field was added to struct kvm_vcpu_arch, the code was changed to use several of the fields inside regs (e.g., gpr, lr, etc.) but not the ccr field, because the ccr field in struct pt_regs is 64 bits on 64-bit platforms, but the cr field in kvm_vcpu_arch is only 32 bits. This changes the code to use the regs.ccr field instead of cr, and changes the assembly code on 64-bit platforms to use 64-bit loads and stores instead of 32-bit ones. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-09powerpc: Turn off CPU_FTR_P9_TM_HV_ASSIST in non-hypervisor modePaul Mackerras1-2/+2
When doing nested virtualization, it is only necessary to do the transactional memory hypervisor assist at level 0, that is, when we are in hypervisor mode. Nested hypervisors can just use the TM facilities as architected. Therefore we should clear the CPU_FTR_P9_TM_HV_ASSIST bit when we are not in hypervisor mode, along with the CPU_FTR_HVMODE bit. Doing this will not change anything at this stage because the only code that tests CPU_FTR_P9_TM_HV_ASSIST is in HV KVM, which currently can only be used when when CPU_FTR_HVMODE is set. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-07Merge tag 'powerpc-4.19-4' of ↵Greg Kroah-Hartman1-0/+10
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Michael writes: "powerpc fixes for 4.19 #4 Four regression fixes. A fix for a change to lib/xz which broke our zImage loader when building with XZ compression. OK'ed by Herbert who merged the original patch. The recent fix we did to avoid patching __init text broke some 32-bit machines, fix that. Our show_user_instructions() could be tricked into printing kernel memory, add a check to avoid that. And a fix for a change to our NUMA initialisation logic, which causes crashes in some kdump configurations. Thanks to: Christophe Leroy, Hari Bathini, Jann Horn, Joel Stanley, Meelis Roos, Murilo Opsfelder Araujo, Srikar Dronamraju." * tag 'powerpc-4.19-4' of https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/numa: Skip onlining a offline node in kdump path powerpc: Don't print kernel instructions in show_user_instructions() powerpc/lib: fix book3s/32 boot failure due to code patching lib/xz: Put CRC32_POLY_LE in xz_private.h
2018-10-05powerpc: Don't print kernel instructions in show_user_instructions()Michael Ellerman1-0/+10
Recently we implemented show_user_instructions() which dumps the code around the NIP when a user space process dies with an unhandled signal. This was modelled on the x86 code, and we even went so far as to implement the exact same bug, namely that if the user process crashed with its NIP pointing into the kernel we will dump kernel text to dmesg. eg: bad-bctr[2996]: segfault (11) at c000000000010000 nip c000000000010000 lr 12d0b0894 code 1 bad-bctr[2996]: code: fbe10068 7cbe2b78 7c7f1b78 fb610048 38a10028 38810020 fb810050 7f8802a6 bad-bctr[2996]: code: 3860001c f8010080 48242371 60000000 <7c7b1b79> 4082002c e8010080 eb610048 This was discovered on x86 by Jann Horn and fixed in commit 342db04ae712 ("x86/dumpstack: Don't dump kernel memory based on usermode RIP"). Fix it by checking the adjusted NIP value (pc) and number of instructions against USER_DS, and bail if we fail the check, eg: bad-bctr[2969]: segfault (11) at c000000000010000 nip c000000000010000 lr 107930894 code 1 bad-bctr[2969]: Bad NIP, not dumping instructions. Fixes: 88b0fe175735 ("powerpc: Add show_user_instructions()") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-04powerpc/64s/hash: Do not use PPC_INVALIDATE_ERAT on CPUs before POWER9Nicholas Piggin1-0/+7
PPC_INVALIDATE_ERAT is slbia IH=7 which is a new variant introduced with POWER9, and the result is undefined on earlier CPUs. Commits 7b9f71f974 ("powerpc/64s: POWER9 machine check handler") and d4748276ae ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9") caused POWER7/8 code to use this instruction. Remove it. An ERAT flush can be made by invalidatig the SLB, but before POWER9 that requires a flush and rebolt. Fixes: 7b9f71f974 ("powerpc/64s: POWER9 machine check handler") Fixes: d4748276ae ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9") Cc: stable@vger.kernel.org # v4.11+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-04powerpc/time: Add set_state_oneshot_stopped decrementer callbackAnton Blanchard1-0/+1
If CONFIG_PPC_WATCHDOG is enabled we always cap the decrementer to 0x7fffffff: if (IS_ENABLED(CONFIG_PPC_WATCHDOG)) set_dec(0x7fffffff); else set_dec(decrementer_max); If there are no future events, we don't reprogram the decrementer after this and we end up with 0x7fffffff even on a large decrementer capable system. As suggested by Nick, add a set_state_oneshot_stopped callback so we program the decrementer with decrementer_max if there are no future events. Signed-off-by: Anton Blanchard <anton@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-04powerpc/time: Use clockevents_register_device(), fixing an issue with large ↵Anton Blanchard1-14/+3
decrementer We currently cap the decrementer clockevent at 4 seconds, even on systems with large decrementer support. Fix this by converting the code to use clockevents_register_device() which calculates the upper bound based on the max_delta passed in. Signed-off-by: Anton Blanchard <anton@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>