summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)AuthorFilesLines
2018-01-23powerpc/64s: Improve RFI L1-D cache flush fallbackNicholas Piggin3-54/+38
The fallback RFI flush is used when firmware does not provide a way to flush the cache. It's a "displacement flush" that evicts useful data by displacing it with an uninteresting buffer. The flush has to take care to work with implementation specific cache replacment policies, so the recipe has been in flux. The initial slow but conservative approach is to touch all lines of a congruence class, with dependencies between each load. It has since been determined that a linear pattern of loads without dependencies is sufficient, and is significantly faster. Measuring the speed of a null syscall with RFI fallback flush enabled gives the relative improvement: P8 - 1.83x P9 - 1.75x The flush also becomes simpler and more adaptable to different cache geometries. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-23signal/ptrace: Add force_sig_ptrace_errno_trap and use it where neededEric W. Biederman1-7/+2
There are so many places that build struct siginfo by hand that at least one of them is bound to get it wrong. A handful of cases in the kernel arguably did just that when using the errno field of siginfo to pass no errno values to userspace. The usage is limited to a single si_code so at least does not mess up anything else. Encapsulate this questionable pattern in a helper function so that the userspace ABI is preserved. Update all of the places that use this pattern to use the new helper function. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-23signal/powerpc: Remove unnecessary signal_code parameter of do_send_trapEric W. Biederman2-9/+9
signal_code is always TRAP_HWBKPT Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-01-22powerpc/pseries, ps3: panic flush kernel messages before halting systemNicholas Piggin1-0/+24
Platforms with a panic handler that halts the system can have problems getting kernel messages out, because the panic notifiers are called before kernel/panic.c does its flushing of printk buffers an console etc. This was attempted to be solved with commit a3b2cb30f252 ("powerpc: Do not call ppc_md.panic in fadump panic notifier"), but that wasn't the right approach and caused other problems, and was reverted by commit ab9dbf771ff9. Instead, the powernv shutdown paths have already had a similar problem, fixed by taking the message flushing sequence from kernel/panic.c. That's a little bit ugly, but while we have the code duplicated, it will work for this case as well. So have ppc panic handlers do the same flushing before they terminate. Without this patch, a qemu pseries_le_defconfig guest stops silently when issued the nmi command when xmon is off and no crash dumpers enabled. Afterwards, an oops is printed by each CPU as expected. Fixes: ab9dbf771ff9 ("Revert "powerpc: Do not call ppc_md.panic in fadump panic notifier"") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-21powerpc/tm: Fix endianness flip on trapGustavo Romero1-1/+7
Currently it's possible that a thread on PPC64 LE has its endianness flipped inadvertently to Big-Endian resulting in a crash once the process is back from the signal handler. If giveup_all() is called when regs->msr has the bits MSR.FP and MSR.VEC disabled (and hence MSR.VSX disabled too) it returns without calling check_if_tm_restore_required() which copies regs->msr to ckpt_regs->msr if the process caught a signal whilst in transactional mode. Then once in setup_tm_sigcontexts() MSR from ckpt_regs.msr is used, but since check_if_tm_restore_required() was not called previuosly, gp_regs[PT_MSR] gets a copy of invalid MSR bits as MSR in ckpt_regs was not updated from regs->msr and so is zeroed. Later when leaving the signal handler once in sys_rt_sigreturn() the TS bits of gp_regs[PT_MSR] are checked to determine if restore_tm_sigcontexts() must be called to pull in the correct MSR state into the user context. Because TS bits are zeroed restore_tm_sigcontexts() is never called and MSR restored from the user context on returning from the signal handler has the MSR.LE (the endianness bit) forced to zero (Big-Endian). That leads, for instance, to 'nop' being treated as an illegal instruction in the following sequence: tbegin. beq 1f trap tend. 1: nop on PPC64 LE machines and the process dies just after returning from the signal handler. PPC64 BE is also affected but in a subtle way since forcing Big-Endian on a BE machine does not change the endianness. This commit fixes the issue described above by ensuring that once in setup_tm_sigcontexts() the MSR used is from regs->msr instead of from ckpt_regs->msr and by ensuring that we pull in only the MSR.FP, MSR.VEC, and MSR.VSX bits from ckpt_regs->msr. The fix was tested both on LE and BE machines and no regression regarding the powerpc/tm selftests was observed. Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-21powerpc: Expose TSCR via sysfsAnton Blanchard1-0/+8
The thread switch control register (TSCR) is a per core register that configures how the CPU shares resources between SMT threads. Exposing it via sysfs allows us to tune it at run time. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-21powerpc: Use octal numbers for file permissionsRussell Currey6-13/+13
Symbolic macros are unintuitive and hard to read, whereas octal constants are much easier to interpret. Replace macros for the basic permission flags (user/group/other read/write/execute) with numeric constants instead, across the whole powerpc tree. Introducing a significant number of changes across the tree for no runtime benefit isn't exactly desirable, but so long as these macros are still used in the tree people will keep sending patches that add them. Not only are they hard to parse at a glance, there are multiple ways of coming to the same value (as you can see with 0444 and 0644 in this patch) which hurts readability. Signed-off-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-21Merge branch 'fixes' into nextMichael Ellerman9-54/+343
Merge our fixes branch from the 4.15 cycle. Unusually the fixes branch saw some significant features merged, notably the RFI flush patches, so we want the code in next to be tested against that, to avoid any surprises when the two are merged. There's also some other work on the panic handling that was reverted in fixes and we now want to do properly in next, which would conflict. And we also fix a few other minor merge conflicts.
2018-01-21Merge branch 'topic/ppc-kvm' into nextMichael Ellerman1-28/+114
Merge the topic branch we share with kvm-ppc, this brings in two xive commits, one from Paul to rework HMI handling, and a minor cleanup to drop an unused flag.
2018-01-21powerpc: Enable support for ibm,drc-info devtree propertyMichael Bringmann1-0/+1
To: linuxppc-dev@lists.ozlabs.org From: Michael Bringmann <mwb@linux.vnet.ibm.com> Cc: Michael Bringmann <mwb@linux.vnet.ibm.com> Cc: nfont@linux.vnet.ibm.com Subject: [PATCH V6 4/4] powerpc: Enable support for ibm,drc-info devtree property prom_init.c: Enable support for new DRC device tree property "ibm,drc-info" in initial handshake between the Linux kernel and the front end processor. Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-21powerpc/watchdog: improve watchdog commentsNicholas Piggin1-20/+38
The overview comments in the powerpc watchdog are out of date after several iterations and changes of the code. Bring them up to date. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-20powerpc/ptrace: Add memory protection key regsetThiago Jung Bauermann2-0/+73
The AMR/IAMR/UAMOR are part of the program context. Allow it to be accessed via ptrace and through core files. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-20powerpc: Deliver SEGV signal on pkey violationRam Pai1-1/+11
The value of the pkey, whose protection got violated, is made available in si_pkey field of the siginfo structure. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-20powerpc: Handle exceptions caused by pkey violationRam Pai1-1/+1
Handle Data and Instruction exceptions caused by memory protection-key. The CPU will detect the key fault if the HPTE is already programmed with the key. However if the HPTE is not hashed, a key fault will not be detected by the hardware. The software will detect pkey violation in such a case. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-20powerpc: store and restore the pkey state across context switchesRam Pai1-0/+7
Store and restore the AMR, IAMR and UAMOR register state of the task before scheduling out and after scheduling in, respectively. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-19cxl: Add support for ASB_Notify on POWER9Christophe Lombard1-0/+1
The POWER9 core supports a new feature: ASB_Notify which requires the support of the Special Purpose Register: TIDR. The ASB_Notify command, generated by the AFU, will attempt to wake-up the host thread identified by the particular LPID:PID:TID. This patch assign a unique TIDR (thread id) for the current thread which will be used in the process element entry. Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Philippe Bergheaud <felix@linux.vnet.ibm.com> Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-19powerpc: Add new kconfig CONFIG_PPC_IRQ_SOFT_MASK_DEBUGMadhavan Srinivasan2-4/+4
New Kconfig is added "CONFIG_PPC_IRQ_SOFT_MASK_DEBUG" to add WARN_ON to alert the invalid transitions. Also moved the code under the CONFIG_TRACE_IRQFLAGS in arch_local_irq_restore() to new Kconfig. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Fix name of CONFIG option in change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-19powerpc/64s: Add support to mask perf interrupts and replay themMadhavan Srinivasan3-3/+15
Two new bit mask field "IRQ_DISABLE_MASK_PMU" is introduced to support the masking of PMI and "IRQ_DISABLE_MASK_ALL" to aid interrupt masking checking. Couple of new irq #defs "PACA_IRQ_PMI" and "SOFTEN_VALUE_0xf0*" added to use in the exception code to check for PMI interrupts. In the masked_interrupt handler, for PMIs we reset the MSR[EE] and return. In the __check_irq_replay(), replay the PMI interrupt by calling performance_monitor_common handler. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-19powerpc/64s: Add support to take additional parameter in MASKABLE_* macroMadhavan Srinivasan1-13/+19
To support addition of "bitmask" to MASKABLE_* macros, factor out the EXCPETION_PROLOG_1 macro. Make it explicit the interrupt masking supported by a gievn interrupt handler. Patch correspondingly extends the MASKABLE_* macros with an addition's parameter. "bitmask" parameter is passed to SOFTEN_TEST macro to decide on masking the interrupt. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-19powerpc/64: Rename soft_enabled to irq_soft_maskMadhavan Srinivasan11-42/+29
Rename the paca->soft_enabled to paca->irq_soft_mask as it is no longer used as a flag for interrupt state, but a mask. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-19powerpc/64: Change soft_enabled from flag to bitmaskMadhavan Srinivasan3-19/+32
"paca->soft_enabled" is used as a flag to mask some of interrupts. Currently supported flags values and their details: soft_enabled MSR[EE] 0 0 Disabled (PMI and HMI not masked) 1 1 Enabled "paca->soft_enabled" is initialized to 1 to make the interripts as enabled. arch_local_irq_disable() will toggle the value when interrupts needs to disbled. At this point, the interrupts are not actually disabled, instead, interrupt vector has code to check for the flag and mask it when it occurs. By "mask it", it update interrupt paca->irq_happened and return. arch_local_irq_restore() is called to re-enable interrupts, which checks and replays interrupts if any occured. Now, as mentioned, current logic doesnot mask "performance monitoring interrupts" and PMIs are implemented as NMI. But this patchset depends on local_irq_* for a successful local_* update. Meaning, mask all possible interrupts during local_* update and replay them after the update. So the idea here is to reserve the "paca->soft_enabled" logic. New values and details: soft_enabled MSR[EE] 1 0 Disabled (PMI and HMI not masked) 0 1 Enabled Reason for the this change is to create foundation for a third mask value "0x2" for "soft_enabled" to add support to mask PMIs. When ->soft_enabled is set to a value "3", PMI interrupts are mask and when set to a value of "1", PMI are not mask. With this patch also extends soft_enabled as interrupt disable mask. Current flags are renamed from IRQ_[EN?DIS}ABLED to IRQS_ENABLED and IRQS_DISABLED. Patch also fixes the ptrace call to force the user to see the softe value to be alway 1. Reason being, even though userspace has no business knowing about softe, it is part of pt_regs. Like-wise in signal context. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-19powerpc/64: Implement and use soft_enabled_return APIMadhavan Srinivasan1-1/+1
Add a new wrapper function, soft_enabled_return(), added to return paca->soft_enabled value. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-19powerpc/64: Move set_soft_enabled() and renameMadhavan Srinivasan3-14/+8
Move set_soft_enabled() from powerpc/kernel/irq.c to asm/hw_irq.c, to encourage updates to paca->soft_enabled done via these access function. Add "memory" clobber to hint compiler since paca->soft_enabled memory is the target here. Renaming it as soft_enabled_set() will make namespaces works better as prefix than a postfix when new soft_enabled manipulation functions are introduced. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-19powerpc/64: Add #defines for paca->soft_enabled flagsMadhavan Srinivasan9-21/+29
Two #defines IRQS_ENABLED and IRQS_DISABLED are added to be used when updating paca->soft_enabled. Replace the hardcoded values used when updating paca->soft_enabled with IRQ_(EN|DIS)ABLED #define. No logic change. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-19powerpc: Hard wire PT_SOFTE value to 1 in ptrace & signalsMadhavan Srinivasan3-0/+23
We have always had softe in pt_regs, and accessible via PT_SOFTE, even though it is not userspace state. The value userspace sees should always be 1, because we should never be in userspace with interrupts soft disabled. In a subsequent patch we will be changing the semantics of the kernel softe value, so hard wire the value to 1 to retain the existing semantics. As far as we know nothing ever looks at it, but better safe than sorry. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Split out of larger patch, write change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-19KVM: PPC: Book3S HV: Keep XIVE escalation interrupt masked unless cededBenjamin Herrenschmidt1-0/+3
This works on top of the single escalation support. When in single escalation, with this change, we will keep the escalation interrupt disabled unless the VCPU is in H_CEDE (idle). In any other case, we know the VCPU will be rescheduled and thus there is no need to take escalation interrupts in the host whenever a guest interrupt fires. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-01-19KVM: PPC: Book3S HV: Don't use existing "prodded" flag for XIVE escalationsBenjamin Herrenschmidt1-0/+1
The prodded flag is only cleared at the beginning of H_CEDE, so every time we have an escalation, we will cause the *next* H_CEDE to return immediately. Instead use a dedicated "irq_pending" flag to indicate that a guest interrupt is pending for the VCPU. We don't reuse the existing exception bitmap so as to avoid expensive atomic ops. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-01-18powerpc/watchdog: remove arch_trigger_cpumask_backtraceNicholas Piggin1-22/+0
The powerpc NMI IPIs may not be recoverable if they are taken in some sections of code, and also there have been and still are issues with taking NMIs (in KVM guest code, in firmware, etc) which makes them a bit dangerous to use. Generic code like softlockup detector and rcu stall detectors really hammer on trigger_*_backtrace, which has lead to further problems because we've implemented it with the NMI. So stop providing NMI backtraces for now. Importantly, the powerpc code uses NMI IPIs in crash/debug, and the SMP hardlockup watchdog. So if the softlockup and rcu hang detection traces are not being printed because the CPU is stuck with interrupts off, then the hard lockup watchdog should get it with the NMI IPI. Fixes: 2104180a5369 ("powerpc/64s: implement arch-specific hardlockup watchdog") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-18powerpc/64s: Relax PACA address limitationsNicholas Piggin3-16/+27
Book3S PACA memory allocation is restricted by the RMA limit and also must not take SLB faults when accessed in virtual mode. Currently a fixed 256MB limit is used for this, which is imprecise and sub-optimal. Update the paca allocation limits to use use the ppc64_rma_size for RMA limit, and share the safe_stack_limit() that is currently used for stack allocations that must not take virtual mode faults. The safe_stack_limit() name is changed to ppc64_bolted_size() to match ppc64_rma_size and some comments are updated. We also need to use early_mmu_has_feature() because we are now calling this function prior to the jump label patching that enables mmu_has_feature(). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Change mmu_has_feature() to early_mmu_has_feature()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-18KVM: PPC: Book3S HV: Improve handling of debug-trigger HMIs on POWER9Paul Mackerras1-28/+114
Hypervisor maintenance interrupts (HMIs) are generated by various causes, signalled by bits in the hypervisor maintenance exception register (HMER). In most cases calling OPAL to handle the interrupt is the correct thing to do, but the "debug trigger" HMIs signalled by PPC bit 17 (bit 46) of HMER are used to invoke software workarounds for hardware bugs, and OPAL does not have any code to handle this cause. The debug trigger HMI is used in POWER9 DD2.0 and DD2.1 chips to work around a hardware bug in executing vector load instructions to cache inhibited memory. In POWER9 DD2.2 chips, it is generated when conditions are detected relating to threads being in TM (transactional memory) suspended mode when the core SMT configuration needs to be reconfigured. The kernel currently has code to detect the vector CI load condition, but only when the HMI occurs in the host, not when it occurs in a guest. If a HMI occurs in the guest, it is always passed to OPAL, and then we always re-sync the timebase, because the HMI cause might have been a timebase error, for which OPAL would re-sync the timebase, thus removing the timebase offset which KVM applied for the guest. Since we don't know what OPAL did, we don't know whether to subtract the timebase offset from the timebase, so instead we re-sync the timebase. This adds code to determine explicitly what the cause of a debug trigger HMI will be. This is based on a new device-tree property under the CPU nodes called ibm,hmi-special-triggers, if it is present, or otherwise based on the PVR (processor version register). The handling of debug trigger HMIs is pulled out into a separate function which can be called from the KVM guest exit code. If this function handles and clears the HMI, and no other HMI causes remain, then we skip calling OPAL and we proceed to subtract the guest timebase offset from the timebase. The overall handling for HMIs that occur in the host (i.e. not in a KVM guest) is largely unchanged, except that we now don't set the flag for the vector CI load workaround on DD2.2 processors. This also removes a BUG_ON in the KVM code. BUG_ON is generally not useful in KVM guest entry/exit code since it is difficult to handle the resulting trap gracefully. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-18powerpc/pci: Use of_irq_parse_and_map_pci() helperRob Herring1-8/+2
Instead of calling both of_irq_parse_pci() and irq_create_of_mapping(), call of_irq_parse_and_map_pci(), which does the same thing. This will allow making of_irq_parse_pci() a private, static function. This changes the logic slightly in that the fallback path will also be taken if irq_create_of_mapping() fails internally. Signed-off-by: Rob Herring <robh@kernel.org> [bhelgaas: fold in virq init from Stephen Rothwell <sfr@canb.auug.org.au>] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au>
2018-01-17powerpc/64: rtas avoid accessing paca in 32-bit modeNicholas Piggin1-6/+11
Commit 177ba7c647f3 ("powerpc/mm/radix: Limit paca allocation in radix") limited the paca allocation address to 1G on pSeries because RTAS return accesses the paca in 32-bit mode: On return from RTAS we access the paca variables and we have 64 bit disabled. This requires us to limit paca in 32 bit range. Fix this by setting ppc64_rma_size to first_memblock_size/1G range. Avoid this limit by switching to 64-bit mode before accessing any memory. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-17powerpc/64s: Improve local TLB flush for boot and MCE on POWER9Nicholas Piggin4-208/+2
There are several cases outside the normal address space management where a CPU's entire local TLB is to be flushed: 1. Booting the kernel, in case something has left stale entries in the TLB (e.g., kexec). 2. Machine check, to clean corrupted TLB entries. One other place where the TLB is flushed, is waking from deep idle states. The flush is a side-effect of calling ->cpu_restore with the intention of re-setting various SPRs. The flush itself is unnecessary because in the first case, the TLB should not acquire new corrupted TLB entries as part of sleep/wake (though they may be lost). This type of TLB flush is coded inflexibly, several times for each CPU type, and they have a number of problems with ISA v3.0B: - The current radix mode of the MMU is not taken into account, it is always done as a hash flushn For IS=2 (LPID-matching flush from host) and IS=3 with HV=0 (guest kernel flush), tlbie(l) is undefined if the R field does not match the current radix mode. - ISA v3.0B hash must flush the partition and process table caches as well. - ISA v3.0B radix must flush partition and process scoped translations, partition and process table caches, and also the page walk cache. So consolidate the flushing code and implement it in C and inline asm under the mm/ directory with the rest of the flush code. Add ISA v3.0B cases for radix and hash, and use the radix flush in radix environment. Provide a way for IS=2 (LPID flush) to specify the radix mode of the partition. Have KVM pass in the radix mode of the guest. Take out the flushes from early cputable/dt_cpu_ftrs detection hooks, and move it later in the boot process after, the MMU registers are set up and before relocation is first turned on. The TLB flush is no longer called when restoring from deep idle states. This was not be done as a separate step because booting secondaries uses the same cpu_restore as idle restore, which needs the TLB flush. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-17powerpc: System reset avoid interleaving oops using die synchronisationNicholas Piggin1-1/+1
The die() oops path contains a serializing lock to prevent oops messages from being interleaved. In the case of a system reset initiated oops (e.g., qemu nmi command), __die was being called which lacks that synchronisation and oops reports could be interleaved across CPUs. A recent patch 4388c9b3a6ee7 ("powerpc: Do not send system reset request through the oops path") changed this to __die to avoid the debugger() call, but there is no real harm to calling it twice if the first time fell through. So go back to using die() here. This was observed to fix the problem. Fixes: 4388c9b3a6ee7 ("powerpc: Do not send system reset request through the oops path") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-17powerpc/64s: Allow control of RFI flush via debugfsMichael Ellerman1-0/+30
Expose the state of the RFI flush (enabled/disabled) via debugfs, and allow it to be enabled/disabled at runtime. eg: $ cat /sys/kernel/debug/powerpc/rfi_flush 1 $ echo 0 > /sys/kernel/debug/powerpc/rfi_flush $ cat /sys/kernel/debug/powerpc/rfi_flush 0 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
2018-01-17powerpc/64s: Wire up cpu_show_meltdown()Michael Ellerman1-0/+8
The recent commit 87590ce6e373 ("sysfs/cpu: Add vulnerability folder") added a generic folder and set of files for reporting information on CPU vulnerabilities. One of those was for meltdown: /sys/devices/system/cpu/vulnerabilities/meltdown This commit wires up that file for 64-bit Book3S powerpc. For now we default to "Vulnerable" unless the RFI flush is enabled. That may not actually be true on all hardware, further patches will refine the reporting based on the CPU/platform etc. But for now we default to being pessimists. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16powerpc: Use the TRAP macro whenever comparing a trap numberBenjamin Herrenschmidt2-2/+2
Trap numbers can have extra bits at the bottom that need to be filtered out. There are a few cases where we don't do that. It's possible that we got lucky but better safe than sorry. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16powerpc: Remove useless EXC_COMMON_HVBenjamin Herrenschmidt1-1/+1
The only difference between EXC_COMMON_HV and EXC_COMMON is that the former adds "2" to the trap number which is supposed to represent the fact that this is an "HV" interrupt which uses HSRR0/1. However KVM is the only one who cares and it has its own separate macros. In fact, we only have one user of EXC_COMMON_HV and it's for an unknown interrupt case. All the other ones already using EXC_COMMON. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16powerpc: Cosmetic cleanup of cpuinfo_opBenjamin Herrenschmidt1-4/+4
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16powerpc: Make newline in cpuinfo unconditionalBenjamin Herrenschmidt1-3/+0
We used to not put the newline between the CPU part and the summary part on UP kernels. This is a rather pointless ifdef so take it out. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16powerpc/8xx: Use L1 entry APG to handle _PAGE_ACCESSED for CONFIG_SWAPChristophe Leroy1-27/+18
When CONFIG_SWAP is set, the TLB miss handlers have to also take into account _PAGE_ACCESSED flag. At the moment it is done by anding _PAGE_ACCESSED into _PAGE_PRESENT using 3 instructions. This patch uses APG for handling _PAGE_ACCESSED, allowing to just copy _PAGE_ACCESSED bit into APG field, hence reducing the action to a single instruction. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16powerpc/8xx: Remove _PAGE_USER and handle user access at PMD levelChristophe Leroy1-36/+10
As Linux kernel separates KERNEL and USER address spaces, there is therefore no need to flag USER access at page level. Today, the 8xx TLB handlers already handle user access in the L1 entry through Access Protection Groups, it is then natural to move the user access handling at PMD level once _PAGE_NA allows to handle PAGE_NONE protection without _PAGE_USER In the mean time, as we free up one bit in the PTE, we can use it to include SPS (page size flag) in the PTE and avoid handling it at every TLB miss hence removing special handling based on compiled page size. For _PAGE_EXEC, we rework it to use PP PTE bits, avoiding the copy of _PAGE_EXEC bit into the L1 entry. Unfortunatly we are not able to put it at the correct location as it conflicts with NA/RO/RW bits for data entries. Upper bits of APG in L1 entry overlap with PMD base address. In order to avoid having to filter that out, we set up all groups so that upper bits can have any value. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16powerpc/mm: extend _PAGE_PRIVILEGED to all CPUsChristophe Leroy1-3/+3
commit ac29c64089b74 ("powerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED") introduced _PAGE_PRIVILEGED for BOOK3S/64 This patch generalises _PAGE_PRIVILEGED for all CPUs, allowing to have either _PAGE_PRIVILEGED or _PAGE_USER or both. PPC_8xx has a _PAGE_SHARED flag which is set for and only for all non user pages. Lets rename it _PAGE_PRIVILEGED to remove confusion as it has nothing to do with Linux shared pages. On BookE, there's a _PAGE_BAP_SR which has to be set for kernel pages: defining _PAGE_PRIVILEGED as _PAGE_BAP_SR will make this generic Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16powerpc/8xx: remove unused _PAGE_WRITETHRUChristophe Leroy1-5/+0
_PAGE_WRITETHRU is only used in: * AMIGA_Z2RAM block driver which is never activated on powerPC * Video/FB driver which is for PPC_PMAC Therefore, no need to spend time in 8xx TLB miss handlers for handling it. And by removing it, we free up bit 20 which then avoids having to clear it on each TLB miss. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16powerpc/8xx: Only perform perf counting when perf is in use.Christophe Leroy2-20/+37
In TLB miss handlers, updating the perf counter is only useful when performing a perf analysis. As it has a noticeable overhead, let's only do it when needed. In order to do so, the exit of the miss handlers will be patched when starting/stopping 'perf': the first register restore instruction of each exit point will be replaced by a jump to the counting code. Once this is done, CONFIG_PPC_8xx_PERF_EVENT becomes useless as this feature doesn't add any overhead. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16powerpc/8xx: remove EXCEPTION_PROLOG/EPILOG_0 and change r3 to r12Christophe Leroy1-38/+40
EXCEPTION_PROLOG_0 and EXCEPTION_EPILOG_0 were added some time ago in order to regroup the two mtspr/mfspr to SCRATCH0 and SCRATCH1 and the mfcr/mtcr in order to ease entry and exit of function not using the full EXCEPTION_PROLOG. Since then, the mfcr/mtcr has been taken out, hence just leaving the two mtspr/mfspr in the macro. In order to improve readability of the exception functions, we remove those two macros and copy back the two mtspr/mfspr instead. As r10 and r11 are used for SCRATCH0 and SCRATCH1, lets also use r12 for SCRATCH2. It will also improve the readability/maintenance. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16powerpc/8xx: Remove CPU6 ERRATA WorkaroundChristophe Leroy1-42/+12
CPU6 ERRATA affects only MPC860 revisions prior to C.0. Manufacturing of those revisiosn was stopped in 1999-2000. Therefore, it has been almost 20 years since this ERRATA has been fixed in the silicon. This patch removes the workaround for that ERRATA. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16powernv/kdump: Fix cases where the kdump kernel can get HMI'sBalbir Singh2-1/+30
Certain HMI's such as malfunction error propagate through all threads/core on the system. If a thread was offline prior to us crashing the system and jumping to the kdump kernel, bad things happen when it wakes up due to an HMI in the kdump kernel. There are several possible ways to solve this problem 1. Put the offline cores in a state such that they are not woken up for machine check and HMI errors. This does not work, since we might need to wake up offline threads to handle TB errors 2. Ignore HMI errors, setup HMEER to mask HMI errors, but this still leads the window open for any MCEs and masking them for the duration of the dump might be a concern 3. Wake up offline CPUs, as in send them to crash_ipi_callback (not wake them up as in mark them online as seen by the hotplug). kexec does a wake_online_cpus() call, this patch does something similar, but instead sends an IPI and forces them to crash_ipi_callback() This patch takes approach #3. Care is taken to enable this only for powenv platforms via crash_wake_offline (a global value set at setup time). The crash code sends out IPI's to all CPU's which then move to crash_ipi_callback and kexec_smp_wait(). Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16powerpc/crash: Remove the test for cpu_online in the IPI callbackBalbir Singh1-3/+0
Our check was extra cautious, we've audited crash_send_ipi and it sends an IPI only to online CPU's. Removal of this check should have not functional impact on crash kdump. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16powerpc: make use of for_each_node_by_type() instead of open-coding itDmitry Torokhov1-2/+7
Instead of manually coding the loop with of_find_node_by_type(), let's switch to the standard macro for iterating over nodes with given type. Also fixed a couple of refcount leaks in the aforementioned loops. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>