summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)AuthorFilesLines
2016-07-21powerpc/mm: Move hash table ops to a separate structureBenjamin Herrenschmidt2-3/+8
Moving probe_machine() to after mmu init will cause the ppc_md fields relative to the hash table management to be overwritten. Since we have essentially disconnected the machine type from the hash backend ops, finish the job by moving them to a different structure. The only callback that didn't quite fix is update_partition_table since this is not specific to hash, so I moved it to a standalone variable for now. We can revisit later if needed. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Fix ppc64e build failure in kexec] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21powerpc/64: Move MMU backend selection out of platform codeBenjamin Herrenschmidt1-0/+6
We move it into early_mmu_init() based on firmware features. For PS3, we have to move the setting of these into early_init_devtree(). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21powerpc: Put exception configuration in a common placeBenjamin Herrenschmidt1-14/+42
The various calls to establish exception endianness and AIL are now done from a single point using already established CPU and FW feature bits to decide what to do. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21powerpc: Move FW feature probing out of pseries probe()Benjamin Herrenschmidt1-0/+4
We move the function itself to pseries/firmware.c and call it along with almost all other flat device-tree parsers from early_init_devtree() Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Move #ifdefs into the header by providing pseries_probe_fw_features()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21powerpc: Move 64-bit memory reserves to setup_arch()Benjamin Herrenschmidt1-11/+11
There is really no need to do them that early, early_setup() runs before MMU is on, we should do the strict minimum there to get the MMU going. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21powerpc: Move 64-bit feature fixup earlierBenjamin Herrenschmidt1-2/+3
Make it part of early_setup() as we really want the feature fixups to be applied before we turn on the MMU since they can have an impact on the various assembly path related to MMU management and interrupts. This makes 64-bit match what 32-bit does. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21powerpc: Factor do_feature_fixup callsBenjamin Herrenschmidt2-27/+3
32 and 64-bit do a similar set of calls early on, we move it all to a single common function to make the boot code more readable. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-19powerpc/32: Remove RELOCATABLE_PPC32Kevin Hao2-3/+2
It is seldom used in the kernel code and can be easily replaced by either RELOCATABLE or PPC32. So there is no reason to keep a separate kernel option for this. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/mm/radix: Add a kernel command line to disable radixAneesh Kumar K.V1-0/+13
This patch adds the kernel command line disable_radix which disable the radix MMU mode even if firmware indicates radix support via ibm,pa-features device tree node. This helps in testing different MMU mode easily. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/mm: Clear top 16 bits of va only on older cpusAneesh Kumar K.V1-2/+2
As per ISA, we need to do this only for architecture version 2.02 and earlier. This continued to work even for 2.07. But let's not do this for anything after 2.02. ISA 3.0 requires these top bits to be not cleared. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/pci: Don't try to allocate resources that will be reassignedBenjamin Herrenschmidt1-2/+4
When we know we will reassign all resources, trying (and failing) to allocate them initially is fairly pointless and leads to a lot of scary messages in the kernel log Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/opal: Add real mode call wrappersBenjamin Herrenschmidt1-11/+5
Replace the old generic opal_call_realmode() with proper per-call wrappers similar to the normal ones and convert callers. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/irq: Add mechanism to force a replay of interruptsBenjamin Herrenschmidt1-0/+15
Calling this function with interrupts soft-disabled will cause a replay of the external interrupt vector when they are re-enabled. This will be used by the OPAL XICS backend (and latter by the native XIVE code) to handle EOI signaling that there are more interrupts to fetch from the hardware since the hardware won't issue another HW interrupt in that case. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/irq: Add support for HV virtualization interruptsBenjamin Herrenschmidt2-0/+21
This will be delivering external interrupts from the XIVE to the Hypervisor. We treat it as a normal external interrupt for the lazy irq disable code (so it will be replayed as a 0x500) and route it to do_IRQ. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/book64s: Move a few exception common handlers to make roomBenjamin Herrenschmidt1-8/+9
This moves the CBE RAS and facility unavailable "common" handlers down to after the FWNMI page. This frees up some space in the very demanded spaces before the relocation-on vectors and before the FWNMI page. They are still within 64K of __start, so CONFIG_RELOCATABLE should still work. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/powernv: Add platform support for stop instructionShreyas B. Prabhu1-34/+159
POWER ISA v3 defines a new idle processor core mechanism. In summary, a) new instruction named stop is added. This instruction replaces instructions like nap, sleep, rvwinkle. b) new per thread SPR named Processor Stop Status and Control Register (PSSCR) is added which controls the behavior of stop instruction. PSSCR layout: ---------------------------------------------------------- | PLS | /// | SD | ESL | EC | PSLL | /// | TR | MTL | RL | ---------------------------------------------------------- 0 4 41 42 43 44 48 54 56 60 PSSCR key fields: Bits 0:3 - Power-Saving Level Status. This field indicates the lowest power-saving state the thread entered since stop instruction was last executed. Bit 42 - Enable State Loss 0 - No state is lost irrespective of other fields 1 - Allows state loss Bits 44:47 - Power-Saving Level Limit This limits the power-saving level that can be entered into. Bits 60:63 - Requested Level Used to specify which power-saving level must be entered on executing stop instruction This patch adds support for stop instruction and PSSCR handling. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/powernv: abstraction for saving SPRs before entering deep idle statesShreyas B. Prabhu1-22/+32
Create a function for saving SPRs before entering deep idle states. This function can be reused for POWER9 deep idle states. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/powernv: Make pnv_powersave_common more genericShreyas B. Prabhu1-9/+14
pnv_powersave_common does common steps needed before entering idle state and eventually changes MSR to MSR_IDLE and does rfid to pnv_enter_arch207_idle_mode. Move the updation of HSTATE_HWTHREAD_STATE to pnv_powersave_common from pnv_enter_arch207_idle_mode and make it more generic by passing the rfid address as a function parameter. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/powernv: Rename reusable idle functions to hardware agnostic namesShreyas B. Prabhu2-20/+21
Functions like power7_wakeup_loss, power7_wakeup_noloss, power7_wakeup_tb_loss are used by POWER7 and POWER8 hardware. They can also be used by POWER9. Hence rename these functions hardware agnostic names. Suggested-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/powernv: Rename idle_power7.S to idle_book3s.SShreyas B. Prabhu2-1/+1
idle_power7.S handles idle entry/exit for POWER7, POWER8 and in next patch for POWER9. Rename the file to a non-hardware specific name. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/kvm: make hypervisor state restore a functionShreyas B. Prabhu2-54/+46
In the current code, when the thread wakes up in reset vector, some of the state restore code and check for whether a thread needs to branch to kvm is duplicated. Reorder the code such that this duplication is avoided. At a higher level this is what the change looks like- Before this patch - power7_wakeup_tb_loss: restore hypervisor state if (thread needed by kvm) goto kvm_start_guest restore nvgprs, cr, pc rfid to process context power7_wakeup_loss: restore nvgprs, cr, pc rfid to process context reset vector: if (waking from deep idle states) goto power7_wakeup_tb_loss else if (thread needed by kvm) goto kvm_start_guest goto power7_wakeup_loss After this patch - power7_wakeup_tb_loss: restore hypervisor state return power7_restore_hyp_resource(): if (waking from deep idle states) goto power7_wakeup_tb_loss return power7_wakeup_loss: restore nvgprs, cr, pc rfid to process context reset vector: power7_restore_hyp_resource() if (thread needed by kvm) goto kvm_start_guest goto power7_wakeup_loss Reviewed-by: Paul Mackerras <paulus@samba.org> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/powernv: Use PNV_THREAD_WINKLE macro while requesting for winkleShreyas B. Prabhu1-1/+1
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/tm: Fix stack pointer corruption in __tm_recheckpoint()Michael Neuling1-2/+1
At the start of __tm_recheckpoint() we save the kernel stack pointer (r1) in SPRG SCRATCH0 (SPRG2) so that we can restore it after the trecheckpoint. Unfortunately, the same SPRG is used in the SLB miss handler. If an SLB miss is taken between the save and restore of r1 to the SPRG, the SPRG is changed and hence r1 is also corrupted. We can end up with the following crash when we start using r1 again after the restore from the SPRG: Oops: Bad kernel stack pointer, sig: 6 [#1] SMP NR_CPUS=2048 NUMA pSeries CPU: 658 PID: 143777 Comm: htm_demo Tainted: G EL X 4.4.13-0-default #1 task: c0000b56993a7810 ti: c00000000cfec000 task.ti: c0000b56993bc000 NIP: c00000000004f188 LR: 00000000100040b8 CTR: 0000000010002570 REGS: c00000000cfefd40 TRAP: 0300 Tainted: G EL X (4.4.13-0-default) MSR: 8000000300001033 <SF,ME,IR,DR,RI,LE> CR: 02000424 XER: 20000000 CFAR: c000000000008468 DAR: 00003ffd84e66880 DSISR: 40000000 SOFTE: 0 PACATMSCRATCH: 00003ffbc865e680 GPR00: fffffffcfabc4268 00003ffd84e667a0 00000000100d8c38 000000030544bb80 GPR04: 0000000000000002 00000000100cf200 0000000000000449 00000000100cf100 GPR08: 000000000000c350 0000000000002569 0000000000002569 00000000100d6c30 GPR12: 00000000100d6c28 c00000000e6a6b00 00003ffd84660000 0000000000000000 GPR16: 0000000000000003 0000000000000449 0000000010002570 0000010009684f20 GPR20: 0000000000800000 00003ffd84e5f110 00003ffd84e5f7a0 00000000100d0f40 GPR24: 0000000000000000 0000000000000000 0000000000000000 00003ffff0673f50 GPR28: 00003ffd84e5e960 00000000003d0f00 00003ffd84e667a0 00003ffd84e5e680 NIP [c00000000004f188] restore_gprs+0x110/0x17c LR [00000000100040b8] 0x100040b8 Call Trace: Instruction dump: f8a1fff0 e8e700a8 38a00000 7ca10164 e8a1fff8 e821fff0 7c0007dd 7c421378 7db142a6 7c3242a6 38800002 7c810164 <e9c100e0> e9e100e8 ea0100f0 ea2100f8 We hit this on large memory machines (> 2TB) but it can also be hit on smaller machines when 1TB segments are disabled. To hit this, you also need to be virtualised to ensure SLBs are periodically removed by the hypervisor. This patches moves the saving of r1 to the SPRG to the region where we are guaranteed not to take any further SLB misses. Fixes: 98ae22e15b43 ("powerpc: Add helper functions for transactional memory context switching") Cc: stable@vger.kernel.org # v3.9+ Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15Merge tag 'powerpc-4.7-5' into nextMichael Ellerman5-25/+63
Pull in the fixes we sent during 4.7, we have code we want to merge into next that depends on some of them.
2016-07-14powerpc: Make ppc_md.{halt, restart} __noreturnDaniel Axtens1-2/+2
powernv marks it's halt and restart calls as __noreturn. However, ppc_md does not have this annotation. Add the annotation to ppc_md, and then to every halt/restart function that is missing it. Additionally, I have verified that all of these functions do not return. Occasionally I have added a spin loop to be sure. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14powerpc/crash: Rearrange loop condition to avoid out of bounds array accessSuraj Jitindar Singh1-1/+1
The array crash_shutdown_handles[] has size CRASH_HANDLER_MAX, thus when we loop over the elements of the list we check crash_shutdown_handles[i] && i < CRASH_HANDLER_MAX. However this means that when we increment i to CRASH_HANDLER_MAX we will perform an out of bound array access checking the first condition before exiting on the second condition. To avoid the out of bounds access, simply reorder the loop conditions. Fixes: 1d1451655bad ("powerpc: Add array bounds checking to crash_shutdown_handlers") Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-13powerpc: Don't test for machine type in smp_setup_cpu_maps()Benjamin Herrenschmidt1-1/+1
The subsequent test for RTAS along with the LPAR test are sufficient Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-13powerpc/rtas: Don't test for machine type in rtas_initialize()Benjamin Herrenschmidt1-1/+1
The test is unnecessary, the FW_FEATURE_LPAR is sufficient as there exist no other LPAR type that has RTAS. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-11powerpc: Move epapr_paravirt_early_init() to early_init_devtree()Benjamin Herrenschmidt3-6/+2
The function is called by both 32-bit and 64-bit early setup right after early_init_devtree(). All it does is run yet another early DT parser which is precisely what early_init_devtree() is about, so move it in there. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-11powerpc: Add comment explaining the purpose of setup_kdump_trampoline()Benjamin Herrenschmidt1-2/+4
Anything in early_setup() needs to be justified to be there, in this case, we need the trampolines before we can take exceptions and thus before we turn on the MMU. Also remove a pretty meaningless and misplaced debug message Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Fix comment formatting] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-11powerpc: Update obsolete comments in setup_32.c about entry conditionsBenjamin Herrenschmidt1-3/+5
early_init() is called in-place before kernel relocation and using whatever MMU setup exists at the point the kernel is entered. machine_init() is called after relocation and after some initial mapping of PAGE_OFFSET has been established (typically using BATs on 6xx/7xx/7xxx processors or some form of bolted TLB on others). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-09powerpc/8xx: Force VIRT_IMMR_BASE to be a positive numberScott Wood1-1/+1
The asm-offsets mechanism generates signed numbers, even if the input value is explicitly unsigned. This causes a problem with older binutils (e.g. 2.23), which sign-extend a negative number when @h is applied. Thus, this instruction: cmpli cr0, r11, VIRT_IMMR_BASE@h resulted in this: Error: operand out of range (0xfffffff0 is not between 0x00000000 and 0x0000ffff) By casting to a larger type, we can force the output to be expressed as a positive number. Signed-off-by: Scott Wood <oss@buserror.net> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
2016-07-09powerpc/8xx: add CONFIG_PIN_TLB_IMMRChristophe Leroy1-4/+6
CONFIG_PIN_TLB maps IMMR area and the first 24 Mbytes of memory. In some circunstances it might be more interesting to not map IMMR but map 32 Mbytes of memory instead. Therefore we add config option CONFIG_PIN_TLB_IMMR to select if IMMR shall be pinned or not, hence whether we pin 24 or 32 Mbytes of RAM Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-09powerpc/8xx: Rework CONFIG_PIN_TLB handlingChristophe Leroy1-40/+4
On recent kernels, with some debug options like for instance CONFIG_LOCKDEP, the BSS requires more than 8M memory, allthough the kernel code fits in the first 8M. Today, it is necessary to activate CONFIG_PIN_TLB to get more than 8M at startup, allthough pinning TLB is not necessary for that. We could have inconditionaly mapped 16 or 24M bytes at startup but some old hardware only have 8M and mapping non-existing RAM would be an issue due to speculative accesses. With the preceding patch however, the TLB entries are populated on demand. By setting up the TLB miss handler to handle up to 24M until the handler is patched for the entire memory space, it is possible to allow access up to more memory without mapping non-existing RAM. It is therefore not needed anymore to map memory data at all at startup. It will be handled by the TLB miss handler. One might still want to PIN the IMMR and the first 24M of RAM. It is now possible to do it in the C memory initialisation functions. In addition, we now know how much memory we have when we do it, so we are able to adapt the pining to the real amount of memory available. So boards with less than 24M can now also benefit from PIN_TLB. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-09powerpc/8xx: Don't use page table for linear memory spaceChristophe Leroy1-34/+37
Instead of using the first level page table to define mappings for the linear memory space, we can use direct mapping from the TLB handling routines. This has several advantages: * No need to read the tables at each TLB miss * No issue in 16k pages mode where the 1st level table maps 64 Mbytes The size of the available linear space is known at system startup. In order to avoid data access at each TLB miss to know the memory size, the TLB routine is patched at startup with the proper size This patch provides a 10%-15% improvment of TLB miss handling for kernel addresses Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-09powerpc/8xx: unpin all TLBs before flushingChristophe Leroy1-8/+10
Bootloader may have pinned some TLB entries so the kernel must unpin them before flushing TLBs with tlbia otherwise pinned TLB entries won't get flushed Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-09powerpc/8xx: Map IMMR area with 512k page at a fixed addressChristophe Leroy1-0/+29
Once the linear memory space has been mapped with 8Mb pages, as seen in the related commit, we get 11 millions DTLB missed during the reference 600s period. 77% of the misses are on user addresses and 23% are on kernel addresses (1 fourth for linear address space and 3 fourth for virtual address space) Traditionaly, each driver manages one computer board which has its own components with its own memory maps. But on embedded chips like the MPC8xx, the SOC has all registers located in the same IO area. When looking at ioremaps done during startup, we see that many drivers are re-mapping small parts of the IMMR for their own use and all those small pieces gets their own 4k page, amplifying the number of TLB misses: in our system we get 0xff000000 mapped 31 times and 0xff003000 mapped 9 times. Even if each part of IMMR was mapped only once with 4k pages, it would still be several small mappings towards linear area. This patch maps the IMMR with a single 512k page. With this patch applied, the number of DTLB misses during the 10 min period is reduced to 11.8 millions for a duration of 5.8s, which represents 2% of the non-idle time hence yet another 10% reduction. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-09powerpc/8xx: Fix vaddr for IMMR early remapChristophe Leroy2-5/+14
Memory: 124428K/131072K available (3748K kernel code, 188K rwdata, 648K rodata, 508K init, 290K bss, 6644K reserved) Kernel virtual memory layout: * 0xfffdf000..0xfffff000 : fixmap * 0xfde00000..0xfe000000 : consistent mem * 0xfddf6000..0xfde00000 : early ioremap * 0xc9000000..0xfddf6000 : vmalloc & ioremap SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 Today, IMMR is mapped 1:1 at startup Mapping IMMR 1:1 is just wrong because it may overlap with another area. On most mpc8xx boards it is OK as IMMR is set to 0xff000000 but for instance on EP88xC board, IMMR is at 0xfa200000 which overlaps with VM ioremap area This patch fixes the virtual address for remapping IMMR with the fixmap regardless of the value of IMMR. The size of IMMR area is 256kbytes (CPM at offset 0, security engine at offset 128k) so a 512k page is enough Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-09powerpc32: provide VIRT_CPU_ACCOUNTINGChristophe Leroy5-36/+95
This patch provides VIRT_CPU_ACCOUTING to PPC32 architecture. PPC32 doesn't have the PACA structure, so we use the task_info structure to store the accounting data. In order to reuse on PPC32 the PPC64 functions, all u64 data has been replaced by 'unsigned long' so that it is u32 on PPC32 and u64 on PPC64 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-08powerpc: Fix typo in comment reference to CONFIG_TRACE_IRQFLAGSAndrew Donnellan1-1/+1
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-08powerpc/eeh: Fix pr_debug()s in eeh_cache.cAndrew Donnellan1-4/+4
eeh_cache.c doesn't build cleanly with -DDEBUG when CONFIG_PHYS_ADDR_T_64BIT is set, as a couple of pr_debug()s use "%lx" for resource_size_t parameters. Use "%pap" instead, as it's the correct format specifier for types deriving from phys_addr_t. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-08powerpc/pseries: start rtasd before PCI probingGreg Kurz1-5/+17
A strange behaviour is observed when comparing PCI hotplug in QEMU, between x86 and pseries. If you consider the following steps: - start a VM - add a PCI device via the QEMU monitor before the rtasd has started (for example starting the VM in paused state, or hotplug during FW or boot loader) - resume the VM execution The x86 kernel detects the PCI device, but the pseries one does not. This happens because the rtasd kernel worker is currently started under device_initcall, while PCI probing happens earlier under subsys_initcall. As a consequence, if we have a pending RTAS event at boot time, a message is printed and the event is dropped. This patch moves all the initialization of rtasd to arch_initcall, which is run before subsys_call: this way, logging_enabled is true when the RTAS event pops up and it is not lost anymore. The proc fs bits stay at device_initcall because they cannot be run before fs_initcall. Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-07powerpc/pci: Assign fixed PHB number based on device-tree propertiesGuilherme G. Piccoli1-3/+51
The domain/PHB field of PCI addresses has its value obtained from a global variable, incremented each time a new domain (represented by struct pci_controller) is added on the system. The domain addition process happens during boot or due to PHB hotplug add. As recent kernels are using predictable naming for network interfaces, the network stack is more tied to PCI naming. This can be a problem in hotplug scenarios, because PCI addresses will change if devices are removed and then re-added. This situation seems unusual, but it can happen if a user wants to replace a NIC without rebooting the machine, for example. This patch changes the way PCI domain values are generated: now, we use device-tree properties to assign fixed PHB numbers to PCI addresses when available (meaning pSeries and PowerNV cases). We also use a bitmap to allow dynamic PHB numbering when device-tree properties are not used. This bitmap keeps track of used PHB numbers and if a PHB is released (by hotplug operations for example), it allows the reuse of this PHB number, avoiding PCI address to change in case of device remove and re-add soon after. No functional changes were introduced. Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Ian Munsie <imunsie@au1.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> [mpe: Drop unnecessary machine_is(pseries) test] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-07powerpc/pci: Fix build with PCI_IOV=y and EEH=nMichael Ellerman1-4/+5
Despite attempting to fix this in commit fb36e9073693 ("powerpc/pci: Fix SRIOV not building without EEH enabled"), the build is still broken when PCI_IOV=y and EEH=n (eg. g5_defconfig with PCI_IOV=y): arch/powerpc/kernel/pci_dn.c: In function ‘remove_dev_pci_data’: arch/powerpc/kernel/pci_dn.c:230:18: error: unused variable ‘edev’ Incorporate Ben's idea of using __maybe_unused to avoid so many #ifdefs. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05powerpc/timer: Large Decrementer supportOliver O'Halloran1-8/+59
Power ISAv3 adds a large decrementer (LD) mode which increases the size of the decrementer register. The size of the enlarged decrementer register is between 32 and 64 bits with the exact size being dependent on the implementation. When in LD mode, reads are sign extended to 64 bits and a decrementer exception is raised when the high bit is set (i.e the value goes below zero). Writes however are truncated to the physical register width so some care needs to be taken to ensure that the high bit is not set when reloading the decrementer. This patch adds support for using the LD inside the host kernel on processors that support it. When LD mode is supported firmware will supply the ibm,dec-bits property for CPU nodes to allow the kernel to determine the maximum decrementer value. Enabling LD mode is a hypervisor privileged operation so the kernel can only enable it manually when running in hypervisor mode. Guests that support LD mode can request it using the "ibm,client-architecture-support" firmware call (not implemented in this patch) or some other platform specific method. If this property is not supplied then the traditional decrementer width of 32 bit is assumed and LD mode will not be enabled. This patch was based on initial work by Jack Miller. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Balbir Singh <bsingharora@gmail.com> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05powerpc/rtas: Fix array overrun in ppc_rtas() syscallAndrew Donnellan1-1/+1
If ppc_rtas() is called with args.nargs == 16 and args.nret == 0, args.rets is set to point to &args.args[16], which is beyond the end of the args.args array. This results in a minor read overrun of the array when we check the first return code (which, per PAPR, is a required output of all RTAS calls) to see if there's been a hardware error. Change the nargs/nret check to ensure nargs is <= 15, allowing room for the status code. Users shouldn't be calling with nret == 0, but there's no real harm if they do, so we don't stop them. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05powerpc: Send SIGBUS on unaligned copy and pasteChris Smart1-0/+14
Calling ISA 3.0 instructions copy, copy_first, paste and paste_last generates an alignment fault when copying or pasting unaligned data (128 byte). We catch this and send SIGBUS to the userspace process that caused it. We do not emulate these because paste may contain additional metadata when pasting to a co-processor and paste_last is the synchronisation point for preceding copy/paste sequences. Thanks to Michael Neuling <mikey@neuling.org> for his help. Signed-off-by: Chris Smart <chris@distroguy.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05powerpc/perf: factor out power8 __init_pmu codeMadhavan Srinivasan1-1/+17
Factor out the power8 pmu init functions to share with power9. Monitor Mode Control Register S(MMCRS) and Monitor Mode Control Register H(MMCRH) registers are dropped in Power9. These registers are added to new function which are included for power8 init. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05powerpc/fadump: Fix build error introduced by recent cleanupMichael Ellerman1-1/+1
We spent so much time bike-shedding the printk() we missed that the next line was missing a semi-colon. And it seems none of our defconfigs turn on CONFIG_FA_DUMP. Fixes: 4a03749f140c ("powerpc/fadump: Trivial fix of spelling mistake, clean up message") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-30powerpc: Initialise pci_io_base as early as possibleDarren Stevens1-1/+0
Commit d6a9996e84ac ("powerpc/mm: vmalloc abstraction in preparation for radix") turned kernel memory and IO addresses from #defined constants to variables initialised at runtime. On PA6T (pasemi) systems the setup_arch() machine call initialises the onboard PCI-e root-ports, and uses pci_io_base to do this, which is now before its value has been set, resulting in a panic early in boot before console IO is initialised. Move the pci_io_base initialisation to the same place as vmalloc ranges are set (hash__early_init_mmu()/radix__early_init_mmu()) - this is the earliest possible place we can initialise it. Fixes: d6a9996e84ac ("powerpc/mm: vmalloc abstraction in preparation for radix") Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Darren Stevens <darren@stevens-zone.net> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Add #ifdef CONFIG_PCI, massage change log slightly] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>