summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2014-12-15powernv/powerpc: Add winkle support for offline cpusShreyas B. Prabhu13-12/+281
Winkle is a deep idle state supported in power8 chips. A core enters winkle when all the threads of the core enter winkle. In this state power supply to the entire chiplet i.e core, private L2 and private L3 is turned off. As a result it gives higher powersavings compared to sleep. But entering winkle results in a total hypervisor state loss. Hence the hypervisor context has to be preserved before entering winkle and restored upon wake up. Power-on Reset Engine (PORE) is a dedicated engine which is responsible for powering on the chiplet during wake up. It can be programmed to restore the register contests of a few specific registers. This patch uses PORE to restore register state wherever possible and uses stack to save and restore rest of the necessary registers. With hypervisor state restore things fall under three categories- per-core state, per-subcore state and per-thread state. To manage this, extend the infrastructure introduced for sleep. Mainly we add a paca variable subcore_sibling_mask. Using this and the core_idle_state we can distingush first thread in core and subcore. Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-15powernv/cpuidle: Redesign idle states managementShreyas B. Prabhu10-57/+294
Deep idle states like sleep and winkle are per core idle states. A core enters these states only when all the threads enter either the particular idle state or a deeper one. There are tasks like fastsleep hardware bug workaround and hypervisor core state save which have to be done only by the last thread of the core entering deep idle state and similarly tasks like timebase resync, hypervisor core register restore that have to be done only by the first thread waking up from these state. The current idle state management does not have a way to distinguish the first/last thread of the core waking/entering idle states. Tasks like timebase resync are done for all the threads. This is not only is suboptimal, but can cause functionality issues when subcores and kvm is involved. This patch adds the necessary infrastructure to track idle states of threads in a per-core structure. It uses this info to perform tasks like fastsleep workaround and timebase resync only once per core. Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Originally-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: linux-pm@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-15powerpc/powernv: Enable Offline CPUs to enter deep idle statesShreyas B. Prabhu4-1/+65
The secondary threads should enter deep idle states so as to gain maximum powersavings when the entire core is offline. To do so the offline path must be made aware of the available deepest idle state. Hence probe the device tree for the possible idle states in powernv core code and expose the deepest idle state through flags. Since the device tree is probed by the cpuidle driver as well, move the parameters required to discover the idle states into an appropriate common place to both the driver and the powernv core code. Another point is that fastsleep idle state may require workarounds in the kernel to function properly. This workaround is introduced in the subsequent patches. However neither the cpuidle driver or the hotplug path need be bothered about this workaround. They will be taken care of by the core powernv code. Originally-by: Srivatsa S. Bhat <srivatsa@mit.edu> Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Reviewed-by: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: linux-pm@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-15powerpc/powernv: Switch off MMU before entering nap/sleep/rvwinkle modePaul Mackerras2-1/+19
Currently, when going idle, we set the flag indicating that we are in nap mode (paca->kvm_hstate.hwthread_state) and then execute the nap (or sleep or rvwinkle) instruction, all with the MMU on. This is bad for two reasons: (a) the architecture specifies that those instructions must be executed with the MMU off, and in fact with only the SF, HV, ME and possibly RI bits set, and (b) this introduces a race, because as soon as we set the flag, another thread can switch the MMU to a guest context. If the race is lost, this thread will typically start looping on relocation-on ISIs at 0xc...4400. This fixes it by setting the MSR as required by the architecture before setting the flag or executing the nap/sleep/rvwinkle instruction. Cc: stable@vger.kernel.org [ shreyas@linux.vnet.ibm.com: Edited to handle LE ] Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-14i2c: Driver to expose PowerNV platform i2c bussesNeelesh Gupta3-0/+42
The patch exposes the available i2c busses on the PowerNV platform to the kernel and implements the bus driver to support i2c and smbus commands. The driver uses the platform device infrastructure to probe the busses on the platform and registers them with the i2c driver framework. Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Wolfram Sang <wsa@the-dreams.de> (I2C part, excluding the bindings) Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-12powerpc: add little endian flag to syscall_get_arch()Richard Guy Briggs1-1/+5
Since both ppc and ppc64 have LE variants which are now reported by uname, add that flag (__AUDIT_ARCH_LE) to syscall_get_arch() and add AUDIT_ARCH_PPC64LE variant. Without this, perf trace and auditctl fail. Mainline kernel reports ppc64le (per a058801) but there is no matching AUDIT_ARCH_PPC64LE. Since 32-bit PPC LE is not supported by audit, don't advertise it in AUDIT_ARCH_PPC* variants. See: https://www.redhat.com/archives/linux-audit/2014-August/msg00082.html https://www.redhat.com/archives/linux-audit/2014-December/msg00004.html Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-12power/perf/hv-24x7: Use kmem_cache_free() instead of kfreeSukadev Bhattiprolu1-1/+1
Use kmem_cache_free() to free a buffer allocated with kmem_cache_alloc(). Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-12powerpc/perf/hv-24x7: Use per-cpu page buffersukadev@linux.vnet.ibm.com1-12/+9
The 24x7 counters are continuously running and not updated on an interrupt. So we record the event counts when stopping the event or deleting it. But to "read" a single counter in 24x7, we allocate a page and pass it into the hypervisor (The HV returns the page full of counters from which we extract the specific counter for this event). We allocate a page using GFP_USER and when deleting the event, we end up with the following warning because we are blocking in interrupt context. [ 698.641709] BUG: scheduling while atomic: swapper/0/0/0x10010000 We could use GFP_ATOMIC but that could result in failures. Pre-allocate a buffer so we don't have to allocate in interrupt context. Further as Michael Ellerman suggested, use Per-CPU buffer so we only need to allocate once per CPU. Cc: stable@vger.kernel.org Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-09powerpc: Secondary CPUs must set cpu_callin_map after setting active and onlineAnton Blanchard1-1/+8
I have a busy ppc64le KVM box where guests sometimes hit the infamous "kernel BUG at kernel/smpboot.c:134!" issue during boot: BUG_ON(td->cpu != smp_processor_id()); Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops output confirms it: CPU: 0 Comm: watchdog/130 The problem is that we aren't ensuring the CPU active and online bits are set before allowing the master to continue on. The master unparks the secondary CPUs kthreads and the scheduler looks for a CPU to run on. It calls select_task_rq and realises the suggested CPU is not in the cpus_allowed mask. It then ends up in select_fallback_rq, and since the active and online bits aren't set we choose some other CPU to run on. Cc: stable@vger.kernel.org Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-08powerpc/powernv: Return to cpu offline loop when finished in KVM guestPaul Mackerras5-25/+68
When a secondary hardware thread has finished running a KVM guest, we currently put that thread into nap mode using a nap instruction in the KVM code. This changes the code so that instead of doing a nap instruction directly, we instead cause the call to power7_nap() that put the thread into nap mode to return. The reason for doing this is to avoid having the KVM code having to know what low-power mode to put the thread into. In the case of a secondary thread used to run a KVM guest, the thread will be offline from the point of view of the host kernel, and the relevant power7_nap() call is the one in pnv_smp_cpu_disable(). In this case we don't want to clear pending IPIs in the offline loop in that function, since that might cause us to miss the wakeup for the next time the thread needs to run a guest. To tell whether or not to clear the interrupt, we use the SRR1 value returned from power7_nap(), and check if it indicates an external interrupt. We arrange that the return from power7_nap() when we have finished running a guest returns 0, so pending interrupts don't get flushed in that case. Note that it is important a secondary thread that has finished executing in the guest, or that didn't have a guest to run, should not return to power7_nap's caller while the kvm_hstate.hwthread_req flag in the PACA is non-zero, because the return from power7_nap will reenable the MMU, and the MMU might still be in guest context. In this situation we spin at low priority in real mode waiting for hwthread_req to become zero. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-05powerpc/book3s: Fix partial invalidation of TLBs in MCE code.Mahesh Salgaonkar1-2/+2
The existing MCE code calls flush_tlb hook with IS=0 (single page) resulting in partial invalidation of TLBs which is not right. This patch fixes that by passing IS=0xc00 to invalidate whole TLB for successful recovery from TLB and ERAT errors. Cc: stable@vger.kernel.org Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-05powerpc/mm: don't do tlbie for updatepp request with NO HPTE faultAneesh Kumar K.V15-55/+85
upatepp can get called for a nohpte fault when we find from the linux page table that the translation was hashed before. In that case we are sure that there is no existing translation, hence we could avoid doing tlbie. We could possibly race with a parallel fault filling the TLB. But that should be ok because updatepp is only ever relaxing permissions. We also look at linux pte permission bits when filling hash pte permission bits. We also hold the linux pte busy bits while inserting/updating a hashpte entry, hence a paralle update of linux pte is not possible. On the other hand mprotect involves ptep_modify_prot_start which cause a hpte invalidate and not updatepp. Performance number: We use randbox_access_bench written by Anton. Kernel with THP disabled and smaller hash page table size. 86.60% random_access_b [kernel.kallsyms] [k] .native_hpte_updatepp 2.10% random_access_b random_access_bench [.] doit 1.99% random_access_b [kernel.kallsyms] [k] .do_raw_spin_lock 1.85% random_access_b [kernel.kallsyms] [k] .native_hpte_insert 1.26% random_access_b [kernel.kallsyms] [k] .native_flush_hash_range 1.18% random_access_b [kernel.kallsyms] [k] .__delay 0.69% random_access_b [kernel.kallsyms] [k] .native_hpte_remove 0.37% random_access_b [kernel.kallsyms] [k] .clear_user_page 0.34% random_access_b [kernel.kallsyms] [k] .__hash_page_64K 0.32% random_access_b [kernel.kallsyms] [k] fast_exception_return 0.30% random_access_b [kernel.kallsyms] [k] .hash_page_mm With Fix: 27.54% random_access_b random_access_bench [.] doit 22.90% random_access_b [kernel.kallsyms] [k] .native_hpte_insert 5.76% random_access_b [kernel.kallsyms] [k] .native_hpte_remove 5.20% random_access_b [kernel.kallsyms] [k] fast_exception_return 5.12% random_access_b [kernel.kallsyms] [k] .__hash_page_64K 4.80% random_access_b [kernel.kallsyms] [k] .hash_page_mm 3.31% random_access_b [kernel.kallsyms] [k] data_access_common 1.84% random_access_b [kernel.kallsyms] [k] .trace_hardirqs_on_caller Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-02powerpc/xmon: Cleanup the breakpoint flagsMichael Ellerman1-10/+9
Drop BP_IABR_TE, which though used, does not do anything useful. Rename BP_IABR to BP_CIABR. Renumber the flags. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-02powerpc/xmon: Enable HW instruction breakpoint on POWER8Anshuman Khandual1-7/+51
This patch enables support for hardware instruction breakpoint in xmon on POWER8 platform with the help of a new register called the CIABR (Completed Instruction Address Breakpoint Register). With this patch, a single hardware instruction breakpoint can be added and cleared during any active xmon debug session. The hardware based instruction breakpoint mechanism works correctly with the existing TRAP based instruction breakpoint available on xmon. There are no powerpc CPU with CPU_FTR_IABR feature any more. This patch has re-purposed all the existing IABR related code to work with CIABR register based HW instruction breakpoint. This has one odd feature, which is that when we hit a breakpoint xmon doesn't tell us we have hit the breakpoint. This is because xmon is expecting bp->address == regs->nip. Because CIABR fires on completition regs->nip points to the instruction after the breakpoint. We could fix that, but it would then confuse other parts of the xmon code which think we need to emulate the instruction. [mpe] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
2014-12-02Merge remote-tracking branch 'benh/next' into nextMichael Ellerman18-284/+162
Merge updates collected & acked by Ben. A few EEH patches from Gavin, some mm updates from Aneesh and a few odds and ends.
2014-12-02powerpc/mm/thp: Use tlbiel if possibleAneesh Kumar K.V7-13/+37
If we know that user address space has never executed on other cpus we could use tlbiel. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-12-02powerpc/mm/thp: Remove code duplicationAneesh Kumar K.V4-108/+65
Rename invalidate_old_hpte to flush_hash_hugepage and use that in other places. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-12-02powerpc/mm/hugetlb: Sanity check gigantic hugepage countJames Yang1-0/+7
Limit the number of gigantic hugepages specified by the hugepages= parameter to MAX_NUMBER_GPAGES. Signed-off-by: James Yang <James.Yang@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-12-02powerpc/oprofile: Disable pagefaults during user stack readJiang Lu1-2/+4
A page fault occurred during reading user stack in oprofile backtrace would lead following calltrace: WARNING: at linux/kernel/smp.c:210 Modules linked in: CPU: 5 PID: 736 Comm: sh Tainted: G W 3.14.23-WR7.0.0.0_standard #1 task: c0000000f6208bc0 ti: c00000007c72c000 task.ti: c00000007c72c000 NIP: c0000000000ed6e4 LR: c0000000000ed5b8 CTR: 0000000000000000 REGS: c00000007c72f050 TRAP: 0700 Tainted: G W (3.14.23-WR7.0.0 tandard) MSR: 0000000080021000 <CE,ME> CR: 48222482 XER: 00000000 SOFTE: 0 GPR00: c0000000000ed5b8 c00000007c72f2d0 c0000000010aa048 0000000000000005 GPR04: c000000000fdb820 c00000007c72f410 0000000000000001 0000000000000005 GPR08: c0000000010b5768 c000000000f8a048 0000000000000001 0000000000000000 GPR12: 0000000048222482 c00000000fffe580 0000000022222222 0000000010129664 GPR16: 0000000010143cc0 0000000000000000 0000000044444444 0000000000000000 GPR20: c00000007c7221d8 c0000000f638e3c8 000003f15a20120d 0000000000000001 GPR24: 000000005a20120d c00000007c722000 c00000007cdedda8 00003fffef23b160 GPR28: 0000000000000001 c00000007c72f410 c000000000fdb820 0000000000000006 NIP [c0000000000ed6e4] .smp_call_function_single+0x18c/0x248 LR [c0000000000ed5b8] .smp_call_function_single+0x60/0x248 Call Trace: [c00000007c72f2d0] [c0000000000ed5b8] .smp_call_function_single+0x60/0x248 (unreliable) [c00000007c72f3a0] [c000000000030810] .__flush_tlb_page+0x164/0x1b0 [c00000007c72f460] [c00000000002e054] .ptep_set_access_flags+0xb8/0x168 [c00000007c72f500] [c0000000001ad3d8] .handle_mm_fault+0x4a8/0xbac [c00000007c72f5e0] [c000000000bb3238] .do_page_fault+0x3b8/0x868 [c00000007c72f810] [c00000000001e1d0] storage_fault_common+0x20/0x44 Exception: 301 at .__copy_tofrom_user_base+0x54/0x5b0 LR = .op_powerpc_backtrace+0x190/0x20c [c00000007c72fb00] [c000000000a2ec34] .op_powerpc_backtrace+0x204/0x20c (unreliable) [c00000007c72fbc0] [c000000000a2b5fc] .oprofile_add_ext_sample+0xe8/0x118 [c00000007c72fc70] [c000000000a2eee0] .fsl_emb_handle_interrupt+0x20c/0x27c [c00000007c72fd30] [c000000000a2e440] .op_handle_interrupt+0x44/0x58 [c00000007c72fdb0] [c000000000016d68] .performance_monitor_exception+0x74/0x90 [c00000007c72fe30] [c00000000001d8b4] exc_0x260_common+0xfc/0x100 performance_monitor_exception() is executed in a context with interrupt disabled and preemption enabled. When there is a user space page fault happened, do_page_fault() invoke in_atomic() to decide whether kernel should handle such page fault. in_atomic() only check preempt_count. So need call pagefault_disable() to disable preemption before reading user stack. Signed-off-by: Jiang Lu <lu.jiang@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-12-02powerpc/mm: Check for matching hpte without taking hpte lockAneesh Kumar K.V1-9/+15
With smaller hash page table config, we would end up in situation where we would be replacing hash page table slot frequently. In such config, we will find the hpte to be not matching, and we can do that check without holding the hpte lock. We need to recheck the hpte again after holding lock. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-12-02powerpc: Drop useless warning in eeh_init()Greg Kurz1-4/+1
This is what we get in dmesg when booting a pseries guest and the hypervisor doesn't provide EEH support. [ 0.166655] EEH functionality not supported [ 0.166778] eeh_init: Failed to call platform init function (-22) Since both powernv_eeh_init() and pseries_eeh_init() already complain when hitting an error, it is not needed to print more (especially such an uninformative message). Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-12-02powerpc/powernv: Cleanup unused MCE definitions/declarations.Mahesh Salgaonkar5-136/+0
Cleanup OpalMCE_* definitions/declarations and other related code which is not used anymore. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Benjamin Herrrenschmidt <benh@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-12-02powerpc/eeh: Dump PHB diag-data earlyGavin Shan3-1/+15
On PowerNV platform, PHB diag-data is dumped after stopping device drivers. In case of recursive EEH errors, the kernel is usually crashed before dumping PHB diag-data for the second EEH error. It's hard to locate the root cause of the second EEH error without PHB diag-data. The patch adds one more EEH option "eeh=early_log", which helps dumping PHB diag-data immediately once frozen PE is detected, in order to get the PHB diag-data for the second EEH error. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-12-02powerpc/eeh: Recover EEH error on ownership change for BCM5719Gavin Shan1-0/+1
In PCI passthrou scenario, we need simulate EEH recovery for Emulex adapters when their ownership changes, as we did in commit 5cfb20b96 ("powerpc/eeh: Emulate EEH recovery for VFIO devices"). Broadcom BCM5719 adpaters are facing same problem and needs same cure. Reported-by: Rajeshkumar Subramanian <rajeshkumars@in.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-12-02powerpc/eeh: Set EEH_PE_RESET on PE resetGavin Shan4-9/+8
The patch introduces additional flag EEH_PE_RESET to indicate the corresponding PE is under reset. In turn, the PE retrieval bakcend on PowerNV platform can return unfrozen state for the EEH core to moving forward. Flag EEH_PE_CFG_BLOCKED isn't the correct one for the purpose. In PCI passthrou case, the problem is more worse: Guest doesn't recover 6th EEH error. The PE is left in isolated (frozen) and config blocked state on Broadcom adapters. We can't retrieve the PE's state correctly any more, even from the host side via sysfs /sys/bus/pci/devices/xxx/eeh_pe_state. Reported-by: Rajeshkumar Subramanian <rajeshkumars@in.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-12-02powerpc/eeh: Refactor eeh_reset_pe()Gavin Shan1-11/+18
The patch refactors eeh_reset_pe() in order for: * Varied return values for different failure cases. * Replace pr_err() with pr_warn() and print function name. * Coding style cleanup. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-11-19powerpc: Remove more traces of bootmemMichael Ellerman12-49/+20
Although we are now selecting NO_BOOTMEM, we still have some traces of bootmem lying around. That is because even with NO_BOOTMEM there is still a shim that converts bootmem calls into memblock calls, but ultimately we want to remove all traces of bootmem. Most of the patch is conversions from alloc_bootmem() to memblock_virt_alloc(). In general a call such as: p = (struct foo *)alloc_bootmem(x); Becomes: p = memblock_virt_alloc(x, 0); We don't need the cast because memblock_virt_alloc() returns a void *. The alignment value of zero tells memblock to use the default alignment, which is SMP_CACHE_BYTES, the same value alloc_bootmem() uses. We remove a number of NULL checks on the result of memblock_virt_alloc(). That is because memblock_virt_alloc() will panic if it can't allocate, in exactly the same way as alloc_bootmem(), so the NULL checks are and always have been redundant. The memory returned by memblock_virt_alloc() is already zeroed, so we remove several memsets of the result of memblock_virt_alloc(). Finally we convert a few uses of __alloc_bootmem(x, y, MAX_DMA_ADDRESS) to just plain memblock_virt_alloc(). We don't use memblock_alloc_base() because MAX_DMA_ADDRESS is ~0ul on powerpc, so limiting the allocation to that is pointless, 16XB ought to be enough for anyone. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-19powerpc/pseries: Initialise nvram_pstore_info's buf_lockLi Zhong1-0/+2
nvram_pstore_info's buf_lock is not initialized before registering, which is clearly incorrect. It causes some strange behavior when trying to obtain the lock during kdump process. On a UP configuration, the console stopped for a couple of seconds, then "lockup suspected" warning printed out, but then it continued to run. So try lock fails, and lockup reported, but then arch_spin_lock() passes. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> [mpe: Edited changelog] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-18Merge remote-tracking branch 'scottwood/next' into nextMichael Ellerman35-541/+553
Scott says: "Highlights include a bunch of 8xx optimizations, device tree bindings for Freescale BMan, QMan, and FMan datapath components, misc device tree updates, and inbound rio window support."
2014-11-18powerpc/config: Enable memory driverPrabhakar Kushwaha4-0/+4
As Freescale IFC controller has been moved to driver to driver/memory. So enable memory driver in powerpc config Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com> Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-11-17rtc/tpo: Driver to support rtc and wakeup on PowerNV platformNeelesh Gupta7-50/+36
The patch implements the OPAL rtc driver that binds with the rtc driver subsystem. The driver uses the platform device infrastructure to probe the rtc device and register it to rtc class framework. The 'wakeup' is supported depending upon the property 'has-tpo' present in the OF node. It provides a way to load the generic rtc driver in in the absence of an OPAL driver. The patch also moves the existing OPAL rtc get/set time interfaces to the new driver and exposes the necessary OPAL calls using EXPORT_SYMBOL_GPL. Test results: ------------- Host: [root@tul169p1 ~]# ls -l /sys/class/rtc/ total 0 lrwxrwxrwx 1 root root 0 Oct 14 03:07 rtc0 -> ../../devices/opal-rtc/rtc/rtc0 [root@tul169p1 ~]# cat /sys/devices/opal-rtc/rtc/rtc0/time 08:10:07 [root@tul169p1 ~]# echo `date '+%s' -d '+ 2 minutes'` > /sys/class/rtc/rtc0/wakealarm [root@tul169p1 ~]# cat /sys/class/rtc/rtc0/wakealarm 1413274345 [root@tul169p1 ~]# FSP: $ smgr mfgState standby $ rtim timeofday System time is valid: 2014/10/14 08:12:04.225115 $ smgr mfgState ipling $ CC: devicetree@vger.kernel.org CC: tglx@linutronix.de CC: rtc-linux@googlegroups.com CC: a.zummo@towertech.it Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-17powerpc: Use generic PIE randomizationVineeth Vijayan3-11/+2
Back in 2009 we merged 501cb16d3cfd "Randomise PIEs", which added support for randomizing PIE (Position Independent Executable) binaries. That commit added randomize_et_dyn(), which correctly randomized the addresses, but failed to honor PF_RANDOMIZE. That means it was not possible to disable PIE randomization via the personality flag, or /proc/sys/kernel/randomize_va_space. Since then there has been generic support for PIE randomization added to binfmt_elf.c, selectable via ARCH_BINFMT_ELF_RANDOMIZE_PIE. Enabling that allows us to drop randomize_et_dyn(), which means we start honoring PF_RANDOMIZE correctly. It also causes a fairly major change to how we layout PIE binaries. Currently we will place the binary at 512MB-520MB for 32 bit binaries, or 512MB-1.5GB for 64 bit binaries, eg: $ cat /proc/$$/maps 4e550000-4e580000 r-xp 00000000 08:02 129813 /bin/dash 4e580000-4e590000 rw-p 00020000 08:02 129813 /bin/dash 10014110000-10014140000 rw-p 00000000 00:00 0 [heap] 3fffaa3f0000-3fffaa5a0000 r-xp 00000000 08:02 921 /lib/powerpc64le-linux-gnu/libc-2.19.so 3fffaa5a0000-3fffaa5b0000 rw-p 001a0000 08:02 921 /lib/powerpc64le-linux-gnu/libc-2.19.so 3fffaa5c0000-3fffaa5d0000 rw-p 00000000 00:00 0 3fffaa5d0000-3fffaa5f0000 r-xp 00000000 00:00 0 [vdso] 3fffaa5f0000-3fffaa620000 r-xp 00000000 08:02 1246 /lib/powerpc64le-linux-gnu/ld-2.19.so 3fffaa620000-3fffaa630000 rw-p 00020000 08:02 1246 /lib/powerpc64le-linux-gnu/ld-2.19.so 3ffffc340000-3ffffc370000 rw-p 00000000 00:00 0 [stack] With this commit applied we don't do any special randomisation for the binary, and instead rely on mmap randomisation. This means the binary ends up at high addresses, eg: $ cat /proc/$$/maps 3fff99820000-3fff999d0000 r-xp 00000000 08:02 921 /lib/powerpc64le-linux-gnu/libc-2.19.so 3fff999d0000-3fff999e0000 rw-p 001a0000 08:02 921 /lib/powerpc64le-linux-gnu/libc-2.19.so 3fff999f0000-3fff99a00000 rw-p 00000000 00:00 0 3fff99a00000-3fff99a20000 r-xp 00000000 00:00 0 [vdso] 3fff99a20000-3fff99a50000 r-xp 00000000 08:02 1246 /lib/powerpc64le-linux-gnu/ld-2.19.so 3fff99a50000-3fff99a60000 rw-p 00020000 08:02 1246 /lib/powerpc64le-linux-gnu/ld-2.19.so 3fff99a60000-3fff99a90000 r-xp 00000000 08:02 129813 /bin/dash 3fff99a90000-3fff99aa0000 rw-p 00020000 08:02 129813 /bin/dash 3fffc3de0000-3fffc3e10000 rw-p 00000000 00:00 0 [stack] 3fffc55e0000-3fffc5610000 rw-p 00000000 00:00 0 [heap] Although this should be OK, it's possible it might break badly written binaries that make assumptions about the address space layout. Signed-off-by: Vineeth Vijayan <vvijayan@mvista.com> [mpe: Rewrite changelog] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14powerpc/powernv: Fix potential zero devisorGavin Shan1-12/+16
If there're no PHBs under P5IOC2 HUB device tree node, we should bail early to avoid zero devisor and allocating TCE tables. Reported-by: Anton Blanchard <anton@samba.org> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14powerpc/powernv: Bail upon invalid master PEGavin Shan1-1/+3
When freezing compound PEs in pnv_ioda_freeze_pe(), we should bail upon illegal master PE. We needn't freeze slave PE because it should have been put into frozen state by hardware. Reported-by: Anton Blanchard <anton@samba.org> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14powerpc/powernv: Simplify pnv_ioda_configure_pe()Gavin Shan1-15/+17
Nested if statements are always bad and the patch avoids one by checking PHB type and bail in advance if necessary. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14powerpc/powernv: Set PELTV for compound PEsGavin Shan1-16/+102
Commit 262af55 ("powerpc/powernv: Enable M64 aperatus for PHB3") introduced compound PEs in order to support M64 aperatus on PHB3. However, we never configured PELTV for compound PEs. The patch fixes that by: parent PE can freeze all child compound PEs. Any compound PE affects the group. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14powerpc/powernv: Initialize M64 PE in timeGavin Shan1-6/+21
The patch initializes PE instance when reserving PE number to keep consistent things as we did before. Also, it replaces the iteration on bridge's windows with the prefered way. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14powerpc/powernv: Rename alloc_m64_pe() to reserve_m64_pe()Gavin Shan2-5/+5
The patch renames alloc_m64_pe() to reserve_m64_pe() to reflect its real usage: We reserve PE numbers for M64 segments in advance and then pick up the reserved PE numbers when building the mapping between PE numbers and M64 segments. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14powerpc/powernv: Fix condition to remove M64Gavin Shan1-2/+2
The M64 resource should be removed if we don't have hook to initialize it, or (not and) fail to do that. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14powerpc/powernv: Check PHB type in advanceGavin Shan1-6/+6
The patch checks PHB type a bit early to save a bit cycles for P7 because we don't support M64 for P7IOC no matter what OPAL firmware we have. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14powerpc/mm: Switch to generic RCU get_user_pages_fastAneesh Kumar K.V8-267/+22
This patch switch the ppc arch to use the generic RCU based gup implementation. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14mm: Update generic gup implementation to handle hugepage directoryAneesh Kumar K.V1-0/+1
Update generic gup implementation with powerpc specific details. On powerpc at pmd level we can have hugepte, normal pmd pointer or a pointer to the hugepage directory. Tested-by: Steve Capper <steve.capper@linaro.org> Acked-by: Steve Capper <steve.capper@linaro.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14powerpc/mm: Add missing pmd accessorsAneesh Kumar K.V5-17/+78
This patch add documentation and missing accessors. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-14powerpc: Disable CPU_FTR_TM if TM is disabled by firmwareAneesh Kumar K.V1-0/+6
Firmware is allowed to communicate to us via the "ibm,pa-features" property that TM (Transactional Memory) support is disabled. Currently this doesn't happen on any platform we're aware of, but we should honor it anyway. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-13powerpc/fsl-rio: add support for mapping inbound windowsMartijn de Gouw2-0/+117
Add support for mapping and unmapping of inbound rapidio windows. This allows for drivers to open up a part of local memory on the rapidio network. Also applications can use this and tranfer blocks of data over the network. Signed-off-by: Martijn de Gouw <martijn.de.gouw@prodrive-technologies.com> [scottwood@freescale.com: updated commit message based on review] Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-11-12powerpc/xmon: Fix build when 4xx=y and 44x=nMichael Ellerman1-1/+1
dump_tlb_44x() is only defined when 44x=y, but the ifdef in xmon.c checks for 4xx, leading to a build failure: arch/powerpc/xmon/xmon.c:912:4: error: implicit declaration of function 'dump_tlb_44x' Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-12Merge branch 'topic/opal-ipmi' into nextMichael Ellerman3-0/+33
2014-11-12powerpc: Fix comment typos in arch/powerpc/include/asm/bitops.hBoqun Feng1-2/+2
In arch/powerpc/include/asm/bitops.h, the comments about bit numbers in large (> 1 word) bitmaps have two typos: - On ppc64 system, the LSB of the 4th word should be bit 192 rather than 196, because if it's bit 196, bit 192-195 will be missing in the bitmap. - On ppc32 system, the LSB of the second word should be bit 32 rather than 31, because bit 31 is already in the first word. This patch fixes these typos. Signed-off-by: Boqun Feng <boqun.feng@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-12powerpc/powernv: Add OPAL IPMI interfaceJeremy Kerr3-0/+33
Recent OPAL firmare adds a couple of functions to send and receive IPMI messages: https://github.com/open-power/skiboot/commit/b2a374da This change updates the token list and wrappers to suit, and adds the platform devices for any IPMI interfaces. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-11-12powerpc: Fix compilation of emulate_step()Paul Mackerras1-2/+4
Commit be96f63375a1 ("powerpc: Split out instruction analysis part of emulate_step()") added some calls to do_fp_load() and do_fp_store(), which fail to compile on configs with CONFIG_PPC_FPU=n and CONFIG_PPC_EMULATE_SSTEP=y. This fixes the compile by adding #ifdef CONFIG_PPC_FPU around the code that calls these functions. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>