summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)AuthorFilesLines
2017-11-07resource: Provide resource struct in resource walk callbackTom Lendacky1-3/+9
In preperation for a new function that will need additional resource information during the resource walk, update the resource walk callback to pass the resource structure. Since the current callback start and end arguments are pulled from the resource structure, the callback functions can obtain them from the resource structure directly. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Borislav Petkov <bp@suse.de> Tested-by: Borislav Petkov <bp@suse.de> Cc: kvm@vger.kernel.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lkml.kernel.org/r/20171020143059.3291-10-brijesh.singh@amd.com
2017-11-07powerpc/mm/hash: Remove stale comment.Michal Suchanek1-4/+0
In commit e6f81a92015b ("powerpc/mm/hash: Support 68 bit VA") the masking is folded into ASM_VSID_SCRAMBLE but the comment about masking is removed only from the firt use of ASM_VSID_SCRAMBLE. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07powerpc/powernv/ioda: Remove explicit max window size checkAlexey Kardashevskiy1-1/+1
DMA windows can only have a size of power of two on IODA2 hardware and using memory_hotplug_max() to determine the upper limit won't work correcly if it returns not power of two value. This removes the check as the platform code does this check in pnv_pci_ioda2_setup_default_config() anyway; the other client is VFIO and that thing checks against locked_vm limit which prevents the userspace from locking too much memory. It is expected to impact DPDK on machines with non-power-of-two RAM size, mostly. KVM guests are less likely to be affected as usually guests get less than half of hosts RAM. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-07Merge branch 'linus' into locking/core, to resolve conflictsIngo Molnar426-27/+469
Conflicts: include/linux/compiler-clang.h include/linux/compiler-gcc.h include/linux/compiler-intel.h include/uapi/linux/stddef.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-07powerpc/powernv/cpufreq: Fix the frequency read by /proc/cpuinfoShriya1-1/+1
The call to /proc/cpuinfo in turn calls cpufreq_quick_get() which returns the last frequency requested by the kernel, but may not reflect the actual frequency the processor is running at. This patch makes a call to cpufreq_get() instead which returns the current frequency reported by the hardware. Fixes: fb5153d05a7d ("powerpc: powernv: Implement ppc_md.get_proc_freq()") Signed-off-by: Shriya <shriyak@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 PMU workaround on DD2.1Nicholas Piggin1-12/+19
DD2.1 does not have to save MMCR0 for all state-loss idle states, only after deep idle states (like other PMU registers). Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 ERAT workaround on DD2.1Nicholas Piggin1-0/+2
DD2.1 does not have to flush the ERAT after a state-loss idle. Performance testing was done on a DD2.1 using only the stop0 idle state (the shallowest state which supports state loss), using context_switch selftest configured to ping-poing between two threads on the same core and two different cores. Performance improvement for same core is 7.0%, different cores is 14.8%. Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc: add POWER9_DD20 featureNicholas Piggin3-1/+26
Cc: Michael Neuling <mikey@neuling.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc: Remove facility loadups on transactional {fp, vec, vsx} unavailableCyril Bur1-30/+0
After handling a transactional FP, Altivec or VSX unavailable exception. The return to userspace code will detect that the TIF_RESTORE_TM bit is set and call restore_tm_state(). restore_tm_state() will call restore_math() to ensure that the correct facilities are loaded. This means that all the loadup code in {fp,altivec,vsx}_unavailable_tm() is doing pointless work and can simply be removed. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc: Always save/restore checkpointed regs during treclaim/trecheckpointCyril Bur6-81/+35
Lazy save and restore of FP/Altivec means that a userspace process can be sent to userspace with FP or Altivec disabled and loaded only as required (by way of an FP/Altivec unavailable exception). Transactional Memory complicates this situation as a transaction could be started without FP/Altivec being loaded up. This causes the hardware to checkpoint incorrect registers. Handling FP/Altivec unavailable exceptions while a thread is transactional requires a reclaim and recheckpoint to ensure the CPU has correct state for both sets of registers. tm_reclaim() has optimisations to not always save the FP/Altivec registers to the checkpointed save area. This was originally done because the caller might have information that the checkpointed registers aren't valid due to lazy save and restore. We've also been a little vague as to how tm_reclaim() leaves the FP/Altivec state since it doesn't necessarily always save it to the thread struct. This has lead to an (incorrect) assumption that it leaves the checkpointed state on the CPU. tm_recheckpoint() has similar optimisations in reverse. It may not always reload the checkpointed FP/Altivec registers from the thread struct before the trecheckpoint. It is therefore quite unclear where it expects to get the state from. This didn't help with the assumption made about tm_reclaim(). These optimisations sit in what is by definition a slow path. If a process has to go through a reclaim/recheckpoint then its transaction will be doomed on returning to userspace. This mean that the process will be unable to complete its transaction and be forced to its failure handler. This is already an out if line case for userspace. Furthermore, the cost of copying 64 times 128 bits from registers isn't very long[0] (at all) on modern processors. As such it appears these optimisations have only served to increase code complexity and are unlikely to have had a measurable performance impact. Our transactional memory handling has been riddled with bugs. A cause of this has been difficulty in following the code flow, code complexity has not been our friend here. It makes sense to remove these optimisations in favour of a (hopefully) more stable implementation. This patch does mean that some times the assembly will needlessly save 'junk' registers which will subsequently get overwritten with the correct value by the C code which calls the assembly function. This small inefficiency is far outweighed by the reduction in complexity for general TM code, context switching paths, and transactional facility unavailable exception handler. 0: I tried to measure it once for other work and found that it was hiding in the noise of everything else I was working with. I find it exceedingly likely this will be the case here. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc: Force reload for recheckpoint during tm {fp, vec, vsx} unavailable ↵Cyril Bur2-7/+19
exception Lazy save and restore of FP/Altivec means that a userspace process can be sent to userspace with FP or Altivec disabled and loaded only as required (by way of an FP/Altivec unavailable exception). Transactional Memory complicates this situation as a transaction could be started without FP/Altivec being loaded up. This causes the hardware to checkpoint incorrect registers. Handling FP/Altivec unavailable exceptions while a thread is transactional requires a reclaim and recheckpoint to ensure the CPU has correct state for both sets of registers. tm_reclaim() has optimisations to not always save the FP/Altivec registers to the checkpointed save area. This was originally done because the caller might have information that the checkpointed registers aren't valid due to lazy save and restore. We've also been a little vague as to how tm_reclaim() leaves the FP/Altivec state since it doesn't necessarily always save it to the thread struct. This has lead to an (incorrect) assumption that it leaves the checkpointed state on the CPU. tm_recheckpoint() has similar optimisations in reverse. It may not always reload the checkpointed FP/Altivec registers from the thread struct before the trecheckpoint. It is therefore quite unclear where it expects to get the state from. This didn't help with the assumption made about tm_reclaim(). This patch is a minimal fix for ease of backporting. A more correct fix which removes the msr parameter to tm_reclaim() and tm_recheckpoint() altogether has been upstreamed to apply on top of this patch. Fixes: dc3106690b20 ("powerpc: tm: Always use fp_state and vr_state to store live registers") Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc: Don't enable FP/Altivec if not checkpointedCyril Bur2-6/+20
Lazy save and restore of FP/Altivec means that a userspace process can be sent to userspace with FP or Altivec disabled and loaded only as required (by way of an FP/Altivec unavailable exception). Transactional Memory complicates this situation as a transaction could be started without FP/Altivec being loaded up. This causes the hardware to checkpoint incorrect registers. Handling FP/Altivec unavailable exceptions while a thread is transactional requires a reclaim and recheckpoint to ensure the CPU has correct state for both sets of registers. Lazy save and restore of FP/Altivec cannot be done if a process is transactional. If a facility was enabled it must remain enabled whenever a thread is transactional. Commit dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware transactional memory in use") ensures that the facilities are always enabled if a thread is transactional. A bug in the introduced code may cause it to inadvertently enable a facility that was (and should remain) disabled. The problem with this extraneous enablement is that the registers for the erroneously enabled facility have not been correctly recheckpointed - the recheckpointing code assumed the facility would remain disabled. Further compounding the issue, the transactional {fp,altivec,vsx} unavailable code has been incorrectly using the MSR to enable facilities. The presence of the {FP,VEC,VSX} bit in the regs->msr simply means if the registers are live on the CPU, not if the kernel should load them before returning to userspace. This has worked due to the bug mentioned above. This causes transactional threads which return to their failure handler to observe incorrect checkpointed registers. Perhaps an example will help illustrate the problem: A userspace process is running and uses both FP and Altivec registers. This process then continues to run for some time without touching either sets of registers. The kernel subsequently disables the facilities as part of lazy save and restore. The userspace process then performs a tbegin and the CPU checkpoints 'junk' FP and Altivec registers. The process then performs a floating point instruction triggering a fp unavailable exception in the kernel. The kernel then loads the FP registers - and only the FP registers. Since the thread is transactional it must perform a reclaim and recheckpoint to ensure both the checkpointed registers and the transactional registers are correct. It then (correctly) enables MSR[FP] for the process. Later (on exception exist) the kernel also (inadvertently) enables MSR[VEC]. The process is then returned to userspace. Since the act of loading the FP registers doomed the transaction we know CPU will fail the transaction, restore its checkpointed registers, and return the process to its failure handler. The problem is that we're now running with Altivec enabled and the 'junk' checkpointed registers are restored. The kernel had only recheckpointed FP. This patch solves this by only activating FP/Altivec if userspace was using them when it entered the kernel and not simply if the process is transactional. Fixes: dc16b553c949 ("powerpc: Always restore FPU/VEC/VSX if hardware transactional memory in use") Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc/powernv: Add OPAL_BUSY to opal_error_code()Cyril Bur1-0/+2
Also export opal_error_code() so that it can be used in modules Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc/opal: Add opal_async_wait_response_interruptible() to opal-asyncCyril Bur2-4/+82
This patch adds an _interruptible version of opal_async_wait_response(). This is useful when a long running OPAL call is performed on behalf of a userspace thread, for example, the opal_flash_{read,write,erase} functions performed by the powernv-flash MTD driver. It is foreseeable that these functions would take upwards of two minutes causing the wait_event() to block long enough to cause hung task warnings. Furthermore, wait_event_interruptible() is preferable as otherwise there is no way for signals to stop the process which is going to be confusing in userspace. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powernv/opal-sensor: remove not needed lockStewart Smith1-13/+4
Parallel sensor reads could run out of async tokens due to opal_get_sensor_data grabbing tokens but then doing the sensor read behind a mutex, essentially serializing the (possibly asynchronous and relatively slow) sensor read. It turns out that the mutex isn't needed at all, not only should the OPAL interface allow concurrent reads, the implementation is certainly safe for that, and if any sensor we were reading from somewhere isn't, doing the mutual exclusion in the kernel is the wrong place to do it, OPAL should be doing it for the kernel. So, remove the mutex. Additionally, we shouldn't be printing out an error when we don't get a token as the only way this should happen is if we've been interrupted in down_interruptible() on the semaphore. Reported-by: Robert Lippert <rlippert@google.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc/opal: Rework the opal-async interfaceCyril Bur1-41/+51
Future work will add an opal_async_wait_response_interruptible() which will call wait_event_interruptible(). This work requires extra token state to be tracked as wait_event_interruptible() can return and the caller could release the token before OPAL responds. Currently token state is tracked with two bitfields which are 64 bits big but may not need to be as OPAL informs Linux how many async tokens there are. It also uses an array indexed by token to store response messages for each token. The bitfields make it difficult to add more state and also provide a hard maximum as to how many tokens there can be - it is possible that OPAL will inform Linux that there are more than 64 tokens. Rather than add a bitfield to track the extra state, rework the internals slightly. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> [mpe: Fix __opal_async_get_token() when no tokens are free] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc/opal: Make __opal_async_{get, release}_token() staticCyril Bur2-9/+3
There are no callers of both __opal_async_get_token() and __opal_async_release_token(). This patch also removes the possibility of "emergency through synchronous call to __opal_async_get_token()" as such it makes more sense to initialise opal_sync_sem for the maximum number of async tokens. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc/opal: Fix EBUSY bug in acquiring tokensWilliam A. Kennington III1-3/+3
The current code checks the completion map to look for the first token that is complete. In some cases, a completion can come in but the token can still be on lease to the caller processing the completion. If this completed but unreleased token is the first token found in the bitmap by another tasks trying to acquire a token, then the __test_and_set_bit call will fail since the token will still be on lease. The acquisition will then fail with an EBUSY. This patch reorganizes the acquisition code to look at the opal_async_token_map for an unleased token. If the token has no lease it must have no outstanding completions so we should never see an EBUSY, unless we have leased out too many tokens. Since opal_async_get_token_inrerruptible is protected by a semaphore, we will practically never see EBUSY anymore. Fixes: 8d7248232208 ("powerpc/powernv: Infrastructure to support OPAL async completion") Signed-off-by: William A. Kennington III <wak@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc/eeh: Stop using do_gettimeofday()Arnd Bergmann3-6/+6
This interface is inefficient and deprecated because of the y2038 overflow. ktime_get_seconds() is an appropriate replacement here, since it has sufficient granularity but is more efficient and uses monotonic time. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06bpf: take advantage of stack_depth tracking in powerpc JITSandipan Das2-9/+14
Take advantage of stack_depth tracking, originally introduced for x64, in powerpc JIT as well. Round up allocated stack by 16 bytes to make sure it stays aligned for functions called from JITed bpf program. Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc/tm: Don't check for WARN in TM Bad Thing handlingMichael Ellerman1-7/+2
Currently when we take a TM Bad Thing program check exception, we search the bug table to see if the program check was generated by a WARN/WARN_ON etc. That makes no sense, the WARN macros use trap instructions, which should never generate a TM Bad Thing exception. If they ever did that would be a bug and we should oops. We do have some hand-coded bugs in tm.S, using EMIT_BUG_ENTRY, but those are all BUGs not WARNs, and they all use trap instructions anyway. Almost certainly this check was incorrectly copied from the REASON_TRAP handling in the same function. Remove it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-By: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc/mm: Add a CONFIG option to choose if radix is used by defaultMichael Ellerman2-2/+24
Currently if the hardware supports the radix MMU we will use it, *unless* "disable_radix" is passed on the kernel command line. However some users would like the reverse semantics. ie. The kernel uses the hash MMU by default, unless radix is explicitly requested on the command line. So add a CONFIG option to choose whether we use radix by default or not, and expand the disable_radix command line option to allow "disable_radix=no" which *enables* radix. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc/64s: Replace CONFIG_PPC_STD_MMU_64 with CONFIG_PPC_BOOK3S_64Michael Ellerman26-72/+68
CONFIG_PPC_STD_MMU_64 indicates support for the "standard" powerpc MMU on 64-bit CPUs. The "standard" MMU refers to the hash page table MMU found in "server" processors, from IBM mainly. Currently CONFIG_PPC_STD_MMU_64 is == CONFIG_PPC_BOOK3S_64. While it's annoying to have two symbols that always have the same value, it's not quite annoying enough to bother removing one. However with the arrival of Power9, we now have the situation where CONFIG_PPC_STD_MMU_64 is enabled, but the kernel is running using the Radix MMU - *not* the "standard" MMU. So it is now actively confusing to use it, because it implies that code is disabled or inactive when the Radix MMU is in use, however that is not necessarily true. So s/CONFIG_PPC_STD_MMU_64/CONFIG_PPC_BOOK3S_64/, and do some minor formatting updates of some of the affected lines. This will be a pain for backports, but c'est la vie. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc/64: Free up CPU_FTR_ICSWXMichael Ellerman2-4/+4
The last user of CPU_FTR_ICSWX was removed in commit 6ff4d3e96652 ("powerpc: Remove old unused icswx based coprocessor support"), so free the bit up for future use. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc/mm/hash: Add pr_fmt() to hash_utils64.cAneesh Kumar K.V1-0/+1
Make the printks look a bit nicer by adding a prefix. Radix config now do radix-mmu: Page sizes from device-tree: radix-mmu: Page size shift = 12 AP=0x0 radix-mmu: Page size shift = 16 AP=0x5 radix-mmu: Page size shift = 21 AP=0x1 radix-mmu: Page size shift = 30 AP=0x2 This patch update hash config to do similar dmesg output. With the patch we have hash-mmu: Page sizes from device-tree: hash-mmu: base_shift=12: shift=12, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=0 hash-mmu: base_shift=12: shift=16, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=7 hash-mmu: base_shift=12: shift=24, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=56 hash-mmu: base_shift=16: shift=16, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=1 hash-mmu: base_shift=16: shift=24, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=8 hash-mmu: base_shift=20: shift=20, sllp=0x0111, avpnm=0x00000000, tlbiel=0, penc=2 hash-mmu: base_shift=24: shift=24, sllp=0x0100, avpnm=0x00000001, tlbiel=0, penc=0 hash-mmu: base_shift=34: shift=34, sllp=0x0120, avpnm=0x000007ff, tlbiel=0, penc=3 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc/ipic: Fix status get and status clearChristophe Leroy1-2/+2
IPIC Status is provided by register IPIC_SERSR and not by IPIC_SERMR which is the mask register. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc/powernv: Reserve a hole which appears after enabling IOVAlexey Kardashevskiy2-3/+22
In order to make generic IOV code work, the physical function IOV BAR should start from offset of the first VF. Since M64 segments share PE number space across PHB, and some PEs may be in use at the time when IOV is enabled, the existing code shifts the IOV BAR to the index of the first PE/VF. This creates a hole in IOMEM space which can be potentially taken by some other device. This reserves a temporary hole on a parent and releases it when IOV is disabled; the temporary resources are stored in pci_dn to avoid kmalloc/free. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc/pseries/vio: Dispose of virq mapping on vdevice unregisterTyrel Datwyler1-0/+2
When a vdevice is DLPAR removed from the system the vio subsystem doesn't bother unmapping the virq from the irq_domain. As a result we have a virq mapped to a hardware irq that is no longer valid for the irq_domain. A side effect is that we are left with /proc/irq/<irq#> affinity entries, and attempts to modify the smp_affinity of the irq will fail. In the following observed example the kernel log is spammed by ics_rtas_set_affinity errors after the removal of a VSCSI adapter. This is a result of irqbalance trying to adjust the affinity every 10 seconds. rpadlpar_io: slot U8408.E8E.10A7ACV-V5-C25 removed ics_rtas_set_affinity: ibm,set-xive irq=655385 returns -3 ics_rtas_set_affinity: ibm,set-xive irq=655385 returns -3 This patch fixes the issue by calling irq_dispose_mapping() on the virq of the viodev on unregister. Fixes: f2ab6219969f ("powerpc/pseries: Add PFO support to the VIO bus") Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc/64s/radix: Fix process table entry cache invalidationNicholas Piggin3-6/+29
According to the architecture, the process table entry cache must be flushed with tlbie RIC=2. Currently the process table entry is set to invalid right before the PID is returned to the allocator, with no invalidation. This works on existing implementations that are known to not cache the process table entry for any except the current PIDR. It is architecturally correct and cleaner to invalidate with RIC=2 after clearing the process table entry and before the PID is returned to the allocator. This can be done in arch_exit_mmap that runs before the final flush, and to ensure the final flush (fullmm) is always a RIC=2 variant. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc/64s/radix: Improve preempt handling in TLB codeNicholas Piggin1-25/+23
Preempt should be consistently disabled for mm_is_thread_local tests, so bring the rest of these under preempt_disable(). Preempt does not need to be disabled for the mm->context.id tests, which allows simplification and removal of gotos. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc/powernv: Use FIXUP_ENDIAN_HV in OPAL returnNicholas Piggin1-2/+2
Close the recoverability gap for OPAL calls by using FIXUP_ENDIAN_HV in the return path. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc/book3s: Add an HV variant of FIXUP_ENDIAN that is recoverableNicholas Piggin1-0/+22
Add an HV variant of FIXUP_ENDIAN which uses HSRR[01] and does not clear MSR[RI], which improves recoverability. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc/book3s: Use label for FIXUP_ENDIAN macro branchNicholas Piggin1-2/+3
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06powerpc/64: Fix latency tracing for lazy irq replayNicholas Piggin1-0/+9
When returning from an exception to a soft-enabled context, pending IRQs are replayed but IRQ tracing is not reset, so a number of them can get chained together into the same IRQ-disabled trace. Fix this by having __check_irq_replay re-set IRQ trace. This is conceptually where we respond to the next interrupt, so it fits the semantics of the IRQ tracer. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06KVM: PPC: Book3S HV: Handle host system reset in guest modeNicholas Piggin4-2/+11
If the host takes a system reset interrupt while a guest is running, the CPU must exit the guest before processing the host exception handler. After this patch, taking a sysrq+x with a CPU running in a guest gives a trace like this: cpu 0x27: Vector: 100 (System Reset) at [c000000fdf5776f0] pc: c008000010158b80: kvmppc_run_core+0x16b8/0x1ad0 [kvm_hv] lr: c008000010158b80: kvmppc_run_core+0x16b8/0x1ad0 [kvm_hv] sp: c000000fdf577850 msr: 9000000002803033 current = 0xc000000fdf4b1e00 paca = 0xc00000000fd4d680 softe: 3 irq_happened: 0x01 pid = 6608, comm = qemu-system-ppc Linux version 4.14.0-rc7-01489-g47e1893a404a-dirty #26 SMP [c000000fdf577a00] c008000010159dd4 kvmppc_vcpu_run_hv+0x3dc/0x12d0 [kvm_hv] [c000000fdf577b30] c0080000100a537c kvmppc_vcpu_run+0x44/0x60 [kvm] [c000000fdf577b60] c0080000100a1ae0 kvm_arch_vcpu_ioctl_run+0x118/0x310 [kvm] [c000000fdf577c00] c008000010093e98 kvm_vcpu_ioctl+0x530/0x7c0 [kvm] [c000000fdf577d50] c000000000357bf8 do_vfs_ioctl+0xd8/0x8c0 [c000000fdf577df0] c000000000358448 SyS_ioctl+0x68/0x100 [c000000fdf577e30] c00000000000b220 system_call+0x58/0x6c --- Exception: c01 (System Call) at 00007fff76868df0 SP (7fff7069baf0) is in userspace Fixes: e36d0a2ed5 ("powerpc/powernv: Implement NMI IPI with OPAL_SIGNAL_SYSTEM_RESET") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-03Merge tag 'powerpc-4.14-6' of ↵Linus Torvalds5-13/+26
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Some more powerpc fixes for 4.14. This is bigger than I like to send at rc7, but that's at least partly because I didn't send any fixes last week. If it wasn't for the IMC driver, which is new and getting heavy testing, the diffstat would look a bit better. I've also added ftrace on big endian to my test suite, so we shouldn't break that again in future. - A fix to the handling of misaligned paste instructions (P9 only), where a change to a #define has caused the check for the instruction to always fail. - The preempt handling was unbalanced in the radix THP flush (P9 only). Though we don't generally use preempt we want to keep it working as much as possible. - Two fixes for IMC (P9 only), one when booting with restricted number of CPUs and one in the error handling when initialisation fails due to firmware etc. - A revert to fix function_graph on big endian machines, and then a rework of the reverted patch to fix kprobes blacklist handling on big endian machines. Thanks to: Anju T Sudhakar, Guilherme G. Piccoli, Madhavan Srinivasan, Naveen N. Rao, Nicholas Piggin, Paul Mackerras" * tag 'powerpc-4.14-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/perf: Fix core-imc hotplug callback failure during imc initialization powerpc/kprobes: Dereference function pointers only if the address does not belong to kernel text Revert "powerpc64/elfv1: Only dereference function descriptor for non-text symbols" powerpc/64s/radix: Fix preempt imbalance in TLB flush powerpc: Fix check for copy/paste instructions in alignment handler powerpc/perf: Fix IMC allocation routine
2017-11-03powerpc/watchdog: Convert timers to use timer_setup()Kees Cook1-3/+2
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-03powerpc/perf: Fix core-imc hotplug callback failure during imc initializationMadhavan Srinivasan1-0/+14
Call trace observed during boot: nest_capp0_imc performance monitor hardware support registered nest_capp1_imc performance monitor hardware support registered core_imc memory allocation for cpu 56 failed Unable to handle kernel paging request for data at address 0xffa400010 Faulting instruction address: 0xc000000000bf3294 0:mon> e cpu 0x0: Vector: 300 (Data Access) at [c000000ff38ff8d0] pc: c000000000bf3294: mutex_lock+0x34/0x90 lr: c000000000bf3288: mutex_lock+0x28/0x90 sp: c000000ff38ffb50 msr: 9000000002009033 dar: ffa400010 dsisr: 80000 current = 0xc000000ff383de00 paca = 0xc000000007ae0000 softe: 0 irq_happened: 0x01 pid = 13, comm = cpuhp/0 Linux version 4.11.0-39.el7a.ppc64le (mockbuild@ppc-058.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Oct 3 07:42:44 EDT 2017 0:mon> t [c000000ff38ffb80] c0000000002ddfac perf_pmu_migrate_context+0xac/0x470 [c000000ff38ffc40] c00000000011385c ppc_core_imc_cpu_offline+0x1ac/0x1e0 [c000000ff38ffc90] c000000000125758 cpuhp_invoke_callback+0x198/0x5d0 [c000000ff38ffd00] c00000000012782c cpuhp_thread_fun+0x8c/0x3d0 [c000000ff38ffd60] c0000000001678d0 smpboot_thread_fn+0x290/0x2a0 [c000000ff38ffdc0] c00000000015ee78 kthread+0x168/0x1b0 [c000000ff38ffe30] c00000000000b368 ret_from_kernel_thread+0x5c/0x74 While registering the cpuhoplug callbacks for core-imc, if we fails in the cpuhotplug online path for any random core (either because opal call to initialize the core-imc counters fails or because memory allocation fails for that core), ppc_core_imc_cpu_offline() will get invoked for other cpus who successfully returned from cpuhotplug online path. But in the ppc_core_imc_cpu_offline() path we are trying to migrate the event context, when core-imc counters are not even initialized. Thus creating the above stack dump. Add a check to see if core-imc counters are enabled or not in the cpuhotplug offline path before migrating the context to handle this failing scenario. Fixes: 885dcd709ba9 ("powerpc/perf: Add nest IMC PMU support") Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-02Merge tag 'spdx_identifiers-4.14-rc8' of ↵Linus Torvalds418-0/+418
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull initial SPDX identifiers from Greg KH: "License cleanup: add SPDX license identifiers to some files Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" * tag 'spdx_identifiers-4.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: License cleanup: add SPDX license identifier to uapi header files with a license License cleanup: add SPDX license identifier to uapi header files with no license License cleanup: add SPDX GPL-2.0 license identifier to files with no license
2017-11-02License cleanup: add SPDX license identifier to uapi header files with a licenseGreg Kroah-Hartman25-0/+25
Many user space API headers have licensing information, which is either incomplete, badly formatted or just a shorthand for referring to the license under which the file is supposed to be. This makes it hard for compliance tools to determine the correct license. Update these files with an SPDX license identifier. The identifier was chosen based on the license information in the file. GPL/LGPL licensed headers get the matching GPL/LGPL SPDX license identifier with the added 'WITH Linux-syscall-note' exception, which is the officially assigned exception identifier for the kernel syscall exception: NOTE! This copyright does *not* cover user programs that use kernel services by normal system calls - this is merely considered normal use of the kernel, and does *not* fall under the heading of "derived work". This exception makes it possible to include GPL headers into non GPL code, without confusing license compliance tools. Headers which have either explicit dual licensing or are just licensed under a non GPL license are updated with the corresponding SPDX identifier and the GPLv2 with syscall exception identifier. The format is: ((GPL-2.0 WITH Linux-syscall-note) OR SPDX-ID-OF-OTHER-LICENSE) SPDX license identifiers are a legally binding shorthand, which can be used instead of the full boiler plate text. The update does not remove existing license information as this has to be done on a case by case basis and the copyright holders might have to be consulted. This will happen in a separate step. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. See the previous patch in this series for the methodology of how this patch was researched. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02License cleanup: add SPDX license identifier to uapi header files with no ↵Greg Kroah-Hartman15-0/+15
license Many user space API headers are missing licensing information, which makes it hard for compliance tools to determine the correct license. By default are files without license information under the default license of the kernel, which is GPLV2. Marking them GPLV2 would exclude them from being included in non GPLV2 code, which is obviously not intended. The user space API headers fall under the syscall exception which is in the kernels COPYING file: NOTE! This copyright does *not* cover user programs that use kernel services by normal system calls - this is merely considered normal use of the kernel, and does *not* fall under the heading of "derived work". otherwise syscall usage would not be possible. Update the files which contain no license information with an SPDX license identifier. The chosen identifier is 'GPL-2.0 WITH Linux-syscall-note' which is the officially assigned identifier for the Linux syscall exception. SPDX license identifiers are a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. See the previous patch in this series for the methodology of how this patch was researched. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman378-0/+378
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02Backmerge tag 'v4.14-rc7' into drm-nextDave Airlie21-82/+190
Linux 4.14-rc7 Requested by Ben Skeggs for nouveau to avoid major conflicts, and things were getting a bit conflicty already, esp around amdgpu reverts.
2017-11-01powerpc/kprobes: Dereference function pointers only if the address does not ↵Naveen N. Rao1-1/+6
belong to kernel text This makes the changes introduced in commit 83e840c770f2c5 ("powerpc64/elfv1: Only dereference function descriptor for non-text symbols") to be specific to the kprobe subsystem. We previously changed ppc_function_entry() to always check the provided address to confirm if it needed to be dereferenced. This is actually only an issue for kprobe blacklisted asm labels (through use of _ASM_NOKPROBE_SYMBOL) and can cause other issues with ftrace. Also, the additional checks are not really necessary for our other uses. As such, move this check to the kprobes subsystem. Fixes: 83e840c770f2 ("powerpc64/elfv1: Only dereference function descriptor for non-text symbols") Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-01Revert "powerpc64/elfv1: Only dereference function descriptor for non-text ↵Naveen N. Rao1-9/+1
symbols" This reverts commit 83e840c770f2c5 ("powerpc64/elfv1: Only dereference function descriptor for non-text symbols"). Chandan reported that on newer kernels, trying to enable function_graph tracer on ppc64 (BE) locks up the system with the following trace: Unable to handle kernel paging request for data at address 0x600000002fa30010 Faulting instruction address: 0xc0000000001f1300 Thread overran stack, or stack corrupted Oops: Kernel access of bad area, sig: 11 [#1] BE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries Modules linked in: CPU: 1 PID: 6586 Comm: bash Not tainted 4.14.0-rc3-00162-g6e51f1f-dirty #20 task: c000000625c07200 task.stack: c000000625c07310 NIP: c0000000001f1300 LR: c000000000121cac CTR: c000000000061af8 REGS: c000000625c088c0 TRAP: 0380 Not tainted (4.14.0-rc3-00162-g6e51f1f-dirty) MSR: 8000000000001032 <SF,ME,IR,DR,RI> CR: 28002848 XER: 00000000 CFAR: c0000000001f1320 SOFTE: 0 ... NIP [c0000000001f1300] .__is_insn_slot_addr+0x30/0x90 LR [c000000000121cac] .kernel_text_address+0x18c/0x1c0 Call Trace: [c000000625c08b40] [c0000000001bd040] .is_module_text_address+0x20/0x40 (unreliable) [c000000625c08bc0] [c000000000121cac] .kernel_text_address+0x18c/0x1c0 [c000000625c08c50] [c000000000061960] .prepare_ftrace_return+0x50/0x130 [c000000625c08cf0] [c000000000061b10] .ftrace_graph_caller+0x14/0x34 [c000000625c08d60] [c000000000121b40] .kernel_text_address+0x20/0x1c0 [c000000625c08df0] [c000000000061960] .prepare_ftrace_return+0x50/0x130 ... [c000000625c0ab30] [c000000000061960] .prepare_ftrace_return+0x50/0x130 [c000000625c0abd0] [c000000000061b10] .ftrace_graph_caller+0x14/0x34 [c000000625c0ac40] [c000000000121b40] .kernel_text_address+0x20/0x1c0 [c000000625c0acd0] [c000000000061960] .prepare_ftrace_return+0x50/0x130 [c000000625c0ad70] [c000000000061b10] .ftrace_graph_caller+0x14/0x34 [c000000625c0ade0] [c000000000121b40] .kernel_text_address+0x20/0x1c0 This is because ftrace is using ppc_function_entry() for obtaining the address of return_to_handler() in prepare_ftrace_return(). The call to kernel_text_address() itself gets traced and we end up in a recursive loop. Fixes: 83e840c770f2 ("powerpc64/elfv1: Only dereference function descriptor for non-text symbols") Cc: stable@vger.kernel.org # v4.13+ Reported-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-01KVM: PPC: Book3S HV: Run HPT guests on POWER9 radix hostsPaul Mackerras5-17/+212
This patch removes the restriction that a radix host can only run radix guests, allowing us to run HPT (hashed page table) guests as well. This is useful because it provides a way to run old guest kernels that know about POWER8 but not POWER9. Unfortunately, POWER9 currently has a restriction that all threads in a given code must either all be in HPT mode, or all in radix mode. This means that when entering a HPT guest, we have to obtain control of all 4 threads in the core and get them to switch their LPIDR and LPCR registers, even if they are not going to run a guest. On guest exit we also have to get all threads to switch LPIDR and LPCR back to host values. To make this feasible, we require that KVM not be in the "independent threads" mode, and that the CPU cores be in single-threaded mode from the host kernel's perspective (only thread 0 online; threads 1, 2 and 3 offline). That allows us to use the same code as on POWER8 for obtaining control of the secondary threads. To manage the LPCR/LPIDR changes required, we extend the kvm_split_info struct to contain the information needed by the secondary threads. All threads perform a barrier synchronization (where all threads wait for every other thread to reach the synchronization point) on guest entry, both before and after loading LPCR and LPIDR. On guest exit, they all once again perform a barrier synchronization both before and after loading host values into LPCR and LPIDR. Finally, it is also currently necessary to flush the entire TLB every time we enter a HPT guest on a radix host. We do this on thread 0 with a loop of tlbiel instructions. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-11-01KVM: PPC: Book3S HV: Allow for running POWER9 host in single-threaded modePaul Mackerras3-33/+57
This patch allows for a mode on POWER9 hosts where we control all the threads of a core, much as we do on POWER8. The mode is controlled by a module parameter on the kvm_hv module, called "indep_threads_mode". The normal mode on POWER9 is the "independent threads" mode, with indep_threads_mode=Y, where the host is in SMT4 mode (or in fact any desired SMT mode) and each thread independently enters and exits from KVM guests without reference to what other threads in the core are doing. If indep_threads_mode is set to N at the point when a VM is started, KVM will expect every core that the guest runs on to be in single threaded mode (that is, threads 1, 2 and 3 offline), and will set the flag that prevents secondary threads from coming online. We can still use all four threads; the code that implements dynamic micro-threading on POWER8 will become active in over-commit situations and will allow up to three other VCPUs to be run on the secondary threads of the core whenever a VCPU is run. The reason for wanting this mode is that this will allow us to run HPT guests on a radix host on a POWER9 machine that does not support "mixed mode", that is, having some threads in a core be in HPT mode while other threads are in radix mode. It will also make it possible to implement a "strict threads" mode in future, if desired. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-11-01KVM: PPC: Book3S HV: Add infrastructure for running HPT guests on radix hostPaul Mackerras5-37/+79
This sets up the machinery for switching a guest between HPT (hashed page table) and radix MMU modes, so that in future we can run a HPT guest on a radix host on POWER9 machines. * The KVM_PPC_CONFIGURE_V3_MMU ioctl can now specify either HPT or radix mode, on a radix host. * The KVM_CAP_PPC_MMU_HASH_V3 capability now returns 1 on POWER9 with HV KVM on a radix host. * The KVM_PPC_GET_SMMU_INFO returns information about the HPT MMU on a radix host. * The KVM_PPC_ALLOCATE_HTAB ioctl on a radix host will switch the guest to HPT mode and allocate a HPT. * For simplicity, we now allocate the rmap array for each memslot, even on a radix host, since it will be needed if the guest switches to HPT mode. * Since we cannot yet run a HPT guest on a radix host, the KVM_RUN ioctl will return an EINVAL error in that case. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-11-01KVM: PPC: Book3S HV: Unify dirty page map between HPT and radixPaul Mackerras7-113/+96
Currently, the HPT code in HV KVM maintains a dirty bit per guest page in the rmap array, whether or not dirty page tracking has been enabled for the memory slot. In contrast, the radix code maintains a dirty bit per guest page in memslot->dirty_bitmap, and only does so when dirty page tracking has been enabled. This changes the HPT code to maintain the dirty bits in the memslot dirty_bitmap like radix does. This results in slightly less code overall, and will mean that we do not lose the dirty bits when transitioning between HPT and radix mode in future. There is one minor change to behaviour as a result. With HPT, when dirty tracking was enabled for a memslot, we would previously clear all the dirty bits at that point (both in the HPT entries and in the rmap arrays), meaning that a KVM_GET_DIRTY_LOG ioctl immediately following would show no pages as dirty (assuming no vcpus have run in the meantime). With this change, the dirty bits on HPT entries are not cleared at the point where dirty tracking is enabled, so KVM_GET_DIRTY_LOG would show as dirty any guest pages that are resident in the HPT and dirty. This is consistent with what happens on radix. This also fixes a bug in the mark_pages_dirty() function for radix (in the sense that the function no longer exists). In the case where a large page of 64 normal pages or more is marked dirty, the addressing of the dirty bitmap was incorrect and could write past the end of the bitmap. Fortunately this case was never hit in practice because a 2MB large page is only 32 x 64kB pages, and we don't support backing the guest with 1GB huge pages at this point. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-11-01KVM: PPC: Book3S HV: Rename hpte_setup_done to mmu_readyPaul Mackerras3-33/+44
This renames the kvm->arch.hpte_setup_done field to mmu_ready because we will want to use it for radix guests too -- both for setting things up before vcpu execution, and for excluding vcpus from executing while MMU-related things get changed, such as in future switching the MMU from radix to HPT mode or vice-versa. This also moves the call to kvmppc_setup_partition_table() that was done in kvmppc_hv_setup_htab_rma() for HPT guests, and the setting of mmu_ready, into the caller in kvmppc_vcpu_run_hv(). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>