summaryrefslogtreecommitdiff
path: root/arch/powerpc/platforms/pseries/lpar.c
AgeCommit message (Collapse)AuthorFilesLines
2016-10-28powerpc/pseries: Fix stack corruption in htpe codeLaurent Dufour1-2/+2
commit 05af40e885955065aee8bb7425058eb3e1adca08 upstream. This commit fixes a stack corruption in the pseries specific code dealing with the huge pages. In __pSeries_lpar_hugepage_invalidate() the buffer used to pass arguments to the hypervisor is not large enough. This leads to a stack corruption where a previously saved register could be corrupted leading to unexpected result in the caller, like the following panic: Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=2048 NUMA pSeries Modules linked in: virtio_balloon ip_tables x_tables autofs4 virtio_blk 8139too virtio_pci virtio_ring 8139cp virtio CPU: 11 PID: 1916 Comm: mmstress Not tainted 4.8.0 #76 task: c000000005394880 task.stack: c000000005570000 NIP: c00000000027bf6c LR: c00000000027bf64 CTR: 0000000000000000 REGS: c000000005573820 TRAP: 0300 Not tainted (4.8.0) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 84822884 XER: 20000000 CFAR: c00000000010a924 DAR: 420000000014e5e0 DSISR: 40000000 SOFTE: 1 GPR00: c00000000027bf64 c000000005573aa0 c000000000e02800 c000000004447964 GPR04: c00000000404de18 c000000004d38810 00000000042100f5 00000000f5002104 GPR08: e0000000f5002104 0000000000000001 042100f5000000e0 00000000042100f5 GPR12: 0000000000002200 c00000000fe02c00 c00000000404de18 0000000000000000 GPR16: c1ffffffffffe7ff 00003fff62000000 420000000014e5e0 00003fff63000000 GPR20: 0008000000000000 c0000000f7014800 0405e600000000e0 0000000000010000 GPR24: c000000004d38810 c000000004447c10 c00000000404de18 c000000004447964 GPR28: c000000005573b10 c000000004d38810 00003fff62000000 420000000014e5e0 NIP [c00000000027bf6c] zap_huge_pmd+0x4c/0x470 LR [c00000000027bf64] zap_huge_pmd+0x44/0x470 Call Trace: [c000000005573aa0] [c00000000027bf64] zap_huge_pmd+0x44/0x470 (unreliable) [c000000005573af0] [c00000000022bbd8] unmap_page_range+0xcf8/0xed0 [c000000005573c30] [c00000000022c2d4] unmap_vmas+0x84/0x120 [c000000005573c80] [c000000000235448] unmap_region+0xd8/0x1b0 [c000000005573d80] [c0000000002378f0] do_munmap+0x2d0/0x4c0 [c000000005573df0] [c000000000237be4] SyS_munmap+0x64/0xb0 [c000000005573e30] [c000000000009560] system_call+0x38/0x108 Instruction dump: fbe1fff8 fb81ffe0 7c7f1b78 7ca32b78 7cbd2b78 f8010010 7c9a2378 f821ffb1 7cde3378 4bfffea9 7c7b1b79 41820298 <e87f0000> 48000130 7fa5eb78 7fc4f378 Most of the time, the bug is surfacing in a caller up in the stack from __pSeries_lpar_hugepage_invalidate() which is quite confusing. This bug is pending since v3.11 but was hidden if a caller of the caller of __pSeries_lpar_hugepage_invalidate() has pushed the corruped register (r18 in this case) in the stack and is not using it until restoring it. GCC 6.2.0 seems to raise it more frequently. This commit also change the definition of the parameter buffer in pSeries_lpar_flush_hash_range() to rely on the global define PLPAR_HCALL9_BUFSIZE (no functional change here). Fixes: 1a5272866f87 ("powerpc: Optimize hugepage invalidate") Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-26powerpc/mm: Rename hpte_init_lpar() and move the fallback to a headerMichael Ellerman1-1/+1
hpte_init_lpar() is part of the pseries platform, so name it as such. Move the fallback implementation for when PSERIES=n into the header, dropping the weak implementation. The panic() is now handled by the calling code. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21powerpc/mm: Move hash table ops to a separate structureBenjamin Herrenschmidt1-9/+9
Moving probe_machine() to after mmu init will cause the ppc_md fields relative to the hash table management to be overwritten. Since we have essentially disconnected the machine type from the hash backend ops, finish the job by moving them to a different structure. The only callback that didn't quite fix is update_partition_table since this is not specific to hash, so I moved it to a standalone variable for now. We can revisit later if needed. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Fix ppc64e build failure in kexec] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21powerpc: Put exception configuration in a common placeBenjamin Herrenschmidt1-18/+2
The various calls to establish exception endianness and AIL are now done from a single point using already established CPU and FW feature bits to decide what to do. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-16powerpc: Introduce asm-prototypes.hDaniel Axtens1-0/+1
Sparse picked up a number of functions that are implemented in C and then only referred to in asm code. This introduces asm-prototypes.h, which provides a place for prototypes of these functions. This silences some sparse warnings. Signed-off-by: Daniel Axtens <dja@axtens.net> [mpe: Add include guards, clean up copyright & GPL text] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11powerpc/mm/radix: Isolate hash table function from pseries guest codeAneesh Kumar K.V1-3/+11
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01powerpc/mm/hash: Add support for Power9 HashAneesh Kumar K.V1-1/+1
PowerISA 3.0 adds a parition table indexed by LPID. Parition table allows us to specify the MMU model that will be used for guest and host translation. This patch adds support with SLB based hash model (UPRT = 0). What is required with this model is to support the new hash page table entry format and also setup partition table such that we use hash table for address translation. We don't have segment table support yet. In order to make sure we don't load KVM module on Power9 (since we don't have kvm support yet) this patch also disables KVM on Power9. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01powerpc/mm: Drop WIMG in favour of new constantsAneesh Kumar K.V1-4/+0
PowerISA 3.0 introduces two pte bits with the below meaning for radix: 00 -> Normal Memory 01 -> Strong Access Order (SAO) 10 -> Non idempotent I/O (Cache inhibited and guarded) 11 -> Tolerant I/O (Cache inhibited) We drop the existing WIMG bits in the Linux page table in favour of the above constants. We loose _PAGE_WRITETHRU with this conversion. We only use writethru via pgprot_cached_wthru() which is used by fbdev/controlfb.c which is Apple control display and also PPC32. With respect to _PAGE_COHERENCE, we have been marking hpte always coherent for some time now. htab_convert_pte_flags() always added HPTE_R_M. NOTE: KVM changes need closer review. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-01powerpc/mm: Handle removing maybe-present bolted HPTEsDavid Gibson1-3/+6
At the moment the hpte_removebolted callback in ppc_md returns void and will BUG_ON() if the hpte it's asked to remove doesn't exist in the first place. This is awkward for the case of cleaning up a mapping which was partially made before failing. So, we add a return value to hpte_removebolted, and have it return ENOENT in the case that the HPTE to remove didn't exist in the first place. In the (sole) caller, we propagate errors in hpte_removebolted to its caller to handle. However, we handle ENOENT specially, continuing to complete the unmapping over the specified range before returning the error to the caller. This means that htab_remove_mapping() will work sanely on a partially present mapping, removing any HPTEs which are present, while also returning ENOENT to its caller in case it's important there. There are two callers of htab_remove_mapping(): - In remove_section_mapping() we already WARN_ON() any error return, which is reasonable - in this case the mapping should be fully present - In vmemmap_remove_mapping() we BUG_ON() any error. We change that to just a WARN_ON() in the case of ENOENT, since failing to remove a mapping that wasn't there in the first place probably shouldn't be fatal. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14powerpc/mm: Use H_READ with H_READ_4Aneesh Kumar K.V1-27/+27
This will bulk read 4 hash pte slot entries and should reduce the loop Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14powerpc/mm: Move THP headers aroundAneesh Kumar K.V1-0/+10
We support THP only with book3s_64 and 64K page size. Move THP details to hash64-64k.h to clarify the same. Acked-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-09powerpc, jump_label: Include linux/jump_label.h to get HAVE_JUMP_LABEL defineAnton Blanchard1-1/+1
Commit 1bc9e47aa8e4 ("powerpc/jump_label: Use HAVE_JUMP_LABEL") converted uses of CONFIG_JUMP_LABEL to HAVE_JUMP_LABEL in some assembly files. HAVE_JUMP_LABEL is defined in linux/jump_label.h, so we need to include this or we always get the non jump label fallback code. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: catalin.marinas@arm.com Cc: davem@davemloft.net Cc: heiko.carstens@de.ibm.com Cc: jbaron@akamai.com Cc: linux@arm.linux.org.uk Cc: linuxppc-dev@lists.ozlabs.org Cc: liuj97@gmail.com Cc: mgorman@suse.de Cc: mmarek@suse.cz Cc: paulus@samba.org Cc: ralf@linux-mips.org Cc: rostedt@goodmis.org Cc: schwidefsky@de.ibm.com Cc: will.deacon@arm.com Fixes: 1bc9e47aa8e4 ("powerpc/jump_label: Use HAVE_JUMP_LABEL") Link: http://lkml.kernel.org/r/1428551492-21977-3-git-send-email-anton@samba.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-29powerpc/kdump: Ignore failure in enabling big endian exception during crashHari Bathini1-1/+7
In LE kernel, we currently have a hack for kexec that resets the exception endian before starting a new kernel as the kernel that is loaded could be a big endian or a little endian kernel. In kdump case, resetting exception endian fails when one or more cpus is disabled. But we can ignore the failure and still go ahead, as in most cases crashkernel will be of same endianess as primary kernel and reseting endianess is not even needed in those cases. This patch adds a new inline function to say if this is kdump path. This function is used at places where such a check is needed. Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> [mpe: Rename to kdump_in_progress(), use bool, and edit comment] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-05powerpc/mm: don't do tlbie for updatepp request with NO HPTE faultAneesh Kumar K.V1-1/+1
upatepp can get called for a nohpte fault when we find from the linux page table that the translation was hashed before. In that case we are sure that there is no existing translation, hence we could avoid doing tlbie. We could possibly race with a parallel fault filling the TLB. But that should be ok because updatepp is only ever relaxing permissions. We also look at linux pte permission bits when filling hash pte permission bits. We also hold the linux pte busy bits while inserting/updating a hashpte entry, hence a paralle update of linux pte is not possible. On the other hand mprotect involves ptep_modify_prot_start which cause a hpte invalidate and not updatepp. Performance number: We use randbox_access_bench written by Anton. Kernel with THP disabled and smaller hash page table size. 86.60% random_access_b [kernel.kallsyms] [k] .native_hpte_updatepp 2.10% random_access_b random_access_bench [.] doit 1.99% random_access_b [kernel.kallsyms] [k] .do_raw_spin_lock 1.85% random_access_b [kernel.kallsyms] [k] .native_hpte_insert 1.26% random_access_b [kernel.kallsyms] [k] .native_flush_hash_range 1.18% random_access_b [kernel.kallsyms] [k] .__delay 0.69% random_access_b [kernel.kallsyms] [k] .native_hpte_remove 0.37% random_access_b [kernel.kallsyms] [k] .clear_user_page 0.34% random_access_b [kernel.kallsyms] [k] .__hash_page_64K 0.32% random_access_b [kernel.kallsyms] [k] fast_exception_return 0.30% random_access_b [kernel.kallsyms] [k] .hash_page_mm With Fix: 27.54% random_access_b random_access_bench [.] doit 22.90% random_access_b [kernel.kallsyms] [k] .native_hpte_insert 5.76% random_access_b [kernel.kallsyms] [k] .native_hpte_remove 5.20% random_access_b [kernel.kallsyms] [k] fast_exception_return 5.12% random_access_b [kernel.kallsyms] [k] .__hash_page_64K 4.80% random_access_b [kernel.kallsyms] [k] .hash_page_mm 3.31% random_access_b [kernel.kallsyms] [k] data_access_common 1.84% random_access_b [kernel.kallsyms] [k] .trace_hardirqs_on_caller Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-02powerpc/mm/thp: Use tlbiel if possibleAneesh Kumar K.V1-1/+1
If we know that user address space has never executed on other cpus we could use tlbiel. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-11-05Merge branch 'topic/get-cpu-var' into nextMichael Ellerman1-3/+3
2014-11-03powerpc: Replace __get_cpu_var usesChristoph Lameter1-3/+3
This still has not been merged and now powerpc is the only arch that does not have this change. Sorry about missing linuxppc-dev before. V2->V2 - Fix up to work against 3.18-rc1 __get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. At the end of the patch set all uses of __get_cpu_var have been removed so the macro is removed too. The patch set includes passes over all arches as well. Once these operations are used throughout then specialized macros can be defined in non -x86 arches as well in order to optimize per cpu access by f.e. using a global register that may be set to the per cpu base. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int *x = &__get_cpu_var(y); Converts to int *x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int *x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> Signed-off-by: Christoph Lameter <cl@linux.com> [mpe: Fix build errors caused by set/or_softirq_pending(), and rework assignment in __set_breakpoint() to use memcpy().] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-30powerpc/fadump: Fix endianess issues in firmware assisted dump handlingHari Bathini1-2/+12
Firmware-assisted dump (fadump) kernel code is not endian safe. The below patch fixes this issue. Tested this patch with upstream kernel. Below output shows crash tool successfully opening LE fadump vmcore. # crash vmlinux vmcore GNU gdb (GDB) 7.6 This GDB was configured as "powerpc64le-unknown-linux-gnu"... KERNEL: vmlinux DUMPFILE: vmcore CPUS: 16 DATE: Wed Dec 31 19:00:00 1969 UPTIME: 00:03:28 LOAD AVERAGE: 0.46, 0.86, 0.41 TASKS: 268 NODENAME: linux-dhr2 RELEASE: 3.17.0-rc5-7-default VERSION: #6 SMP Tue Sep 30 01:06:34 EDT 2014 MACHINE: ppc64le (4116 Mhz) MEMORY: 40 GB PANIC: "Oops: Kernel access of bad area, sig: 11 [#1]" (check log for details) PID: 6223 COMMAND: "bash" TASK: c0000009661b2500 [THREAD_INFO: c000000967ac0000] CPU: 2 STATE: TASK_RUNNING (PANIC) Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> [mpe: Make the comment in pSeries_lpar_hptab_clear() clearer] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-25powerpc/jump_label: use HAVE_JUMP_LABEL?Zhouyi Zhou1-1/+1
CONFIG_JUMP_LABEL doesn't ensure HAVE_JUMP_LABEL, if it is not the case use maintainers's own mutex to guard the modification of global values. Signed-off-by: Zhouyi Zhou <yizhouzhou@ict.ac.cn> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-25powerpc: Remove stale function prototypesAnton Blanchard1-2/+0
There were a number of prototypes for functions that no longer exist. Remove them. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-08-13powerpc/thp: Don't recompute vsid and ssize in loop on invalidateAneesh Kumar K.V1-14/+6
The segment identifier and segment size will remain the same in the loop, So we can compute it outside. We also change the hugepage_invalidate interface so that we can use it the later patch CC: <stable@vger.kernel.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-11powerpc/pseries: Use jump labels for hcall tracepointsAnton Blanchard1-7/+23
hcall tracepoints add quite a few instructions to our hcall path: plpar_hcall: mr r2,r2 mfcr r0 stw r0,8(r1) b 164 <---- start ld r12,0(r2) std r12,32(r1) cmpdi r12,0 beq 164 <---- end ... We have an unconditional branch that gets noped out during boot and a load/compare/branch. We also store the tracepoint value to the stack for the hcall_exit path to use. By using jump labels we can simplify this to just a single nop that gets replaced with a branch when the tracepoint is enabled: plpar_hcall: mr r2,r2 mfcr r0 stw r0,8(r1) nop <---- ... If jump labels are not enabled, we fall back to the old method. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-09powerpc/mm: Use HPTE constants when updating hpte bitsAneesh Kumar K.V1-1/+2
Even though we have same value for linux PTE bits and hash PTE pits use the hash pte bits wen updating hash pte Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-09powerpc: Make slb_shadow a localJeremy Kerr1-1/+1
The only external user of slb_shadow is the pseries lpar code, and it can access through the paca array instead. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-11-21pseries: Add H_SET_MODE to change exception endiannessAnton Blanchard1-0/+17
On little endian builds call H_SET_MODE so exceptions have the correct endianness. We need to reset the endian during kexec so do that in the MMU hashtable clear callback. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-08-27powerpc/pseries: Add a warning in the case of cross-cpu VPA registrationMichael Ellerman1-0/+6
The spec says it "may be problematic" if CPU x registers the VPA of CPU y. Add a warning in case we ever do that. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-08-27pseries: Move plpar_wrapper.h to powerpc common include/asm location.Deepthi Dharwar1-1/+1
As a part of pseries_idle backend driver cleanup to make the code common to both pseries and powernv platforms, it is necessary to move the backend-driver code to drivers/cpuidle. As a pre-requisite for that, it is essential to move plpar_wrapper.h to include/asm. Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-08-14powerpc: Fix little endian lppaca, slb_shadow and dtl_entryAnton Blanchard1-1/+1
The lppaca, slb_shadow and dtl_entry hypervisor structures are big endian, so we have to byte swap them in little endian builds. LE KVM hosts will also need to be fixed but for now add an #error to remind us. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-08-14powerpc: Fix a number of sparse warningsAnton Blanchard1-1/+1
Address some of the trivial sparse warnings in arch/powerpc. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-24powerpc/pseries: Fix a typo in pSeries_lpar_hpte_insert()Denis Kirjanov1-1/+1
Commit 801eb73f45371accc78ca9d6d22d647eeb722c11 introduced a bug while checking PTE flags. We have to drop the _PAGE_COHERENT flag when __PAGE_NO_CACHE is set and the cache update policy is not write-through (i.e. _PAGE_WRITETHRU is not set) Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> CC: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-01powerpc/pseries: Inform the hypervisor we are using EBB regsMichael Ellerman1-0/+3
On LPAR systems we need to inform the hypervisor that we are using the EBB registers. We do this by setting a bit in the Virtual Processor Area (VPA) - formerly known as the lppaca. For now we do this always, ie. we do not dynamically enable/disable. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21powerpc: Optimize hugepage invalidateAneesh Kumar K.V1-7/+115
Hugepage invalidate involves invalidating multiple hpte entries. Optimize the operation using H_BULK_REMOVE on lpar platforms. On native, reduce the number of tlb flush. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21powerpc/mm: handle hugepage size correctly when invalidating hpte entriesAneesh Kumar K.V1-5/+12
If a hash bucket gets full, we "evict" a more/less random entry from it. When we do that we don't invalidate the TLB (hpte_remove) because we assume the old translation is still technically "valid". This implies that when we are invalidating or updating pte, even if HPTE entry is not valid we should do a tlb invalidate. With hugepages, we need to pass the correct actual page size value for tlb invalidation. This change update the patch 0608d692463598c1d6e826d9dd7283381b4f246c "powerpc/mm: Always invalidate tlb on hpte invalidate and update" to handle transparent hugepages correctly. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30powerpc: Decode the pte-lp-encoding bits correctly.Aneesh Kumar K.V1-3/+3
We look at both the segment base page size and actual page size and store the pte-lp-encodings in an array per base page size. We also update all relevant functions to take actual page size argument so that we can use the correct PTE LP encoding in HPTE. This should also get the basic Multiple Page Size per Segment (MPSS) support. This is needed to enable THP on ppc64. [Fixed PR KVM build --BenH] Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30powerpc: Use signed formatting when printing errorAneesh Kumar K.V1-1/+1
PAPR defines these errors as negative values. So print them accordingly for easy debugging. Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-08powerpc: pSeries_lpar_hpte_remove fails from Adjunct partition being ↵Michael Wolf1-1/+7
performed before the ANDCOND test Some versions of pHyp will perform the adjunct partition test before the ANDCOND test. The result of this is that H_RESOURCE can be returned and cause the BUG_ON condition to occur. The HPTE is not removed. So add a check for H_RESOURCE, it is ok if this HPTE is not removed as pSeries_lpar_hpte_remove is looking for an HPTE to remove and not a specific HPTE to remove. So it is ok to just move on to the next slot and try again. Cc: stable@vger.kernel.org Signed-off-by: Michael Wolf <mjw@linux.vnet.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2012-09-17powerpc/mm: Convert virtual address to vpnAneesh Kumar K.V1-45/+31
This patch convert different functions to take virtual page number instead of virtual address. Virtual page number is virtual address shifted right by VPN_SHIFT (12) bits. This enable us to have an address range of upto 76 bits. Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-09-05powerpc: Remove all includes of <asm/abs_addr.h>Michael Ellerman1-1/+0
It's empty now, apart from other includes. Fixup a few files that were getting things via this header. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-21powerpc: Remove FW_FEATURE ISERIES from arch codeStephen Rothwell1-0/+1
This is no longer selectable, so just remove all the dependent code. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-01-11powerpc: Fix RCU idle and hcall tracingAnton Blanchard1-4/+10
Tracepoints should not be called inside an rcu_idle_enter/rcu_idle_exit region. Since pSeries calls H_CEDE in the idle loop, we were violating this rule. commit a7b152d5342c (powerpc: Tell RCU about idle after hcall tracing) tried to work around it by delaying the rcu_idle_enter until after we called the hcall tracepoint, but there are a number of issues with it. The hcall tracepoint trampoline code is called conditionally when the tracepoint is enabled. If the tracepoint is not enabled we never call rcu_idle_enter. The idle_uses_rcu check was also done at compile time which breaks multiplatform builds. The simple fix is to avoid tracing H_CEDE and rely on other tracepoints and the hypervisor dispatch trace log to work out if we called H_CEDE. This fixes a hang during boot on pSeries. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-01-07Merge branch 'next' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (185 commits) powerpc: fix compile error with 85xx/p1010rdb.c powerpc: fix compile error with 85xx/p1023_rds.c powerpc/fsl: add MSI support for the Freescale hypervisor arch/powerpc/sysdev/fsl_rmu.c: introduce missing kfree powerpc/fsl: Add support for Integrated Flash Controller powerpc/fsl: update compatiable on fsl 16550 uart nodes powerpc/85xx: fix PCI and localbus properties in p1022ds.dts powerpc/85xx: re-enable ePAPR byte channel driver in corenet32_smp_defconfig powerpc/fsl: Update defconfigs to enable some standard FSL HW features powerpc: Add TBI PHY node to first MDIO bus sbc834x: put full compat string in board match check powerpc/fsl-pci: Allow 64-bit PCIe devices to DMA to any memory address powerpc: Fix unpaired probe_hcall_entry and probe_hcall_exit offb: Fix setting of the pseudo-palette for >8bpp offb: Add palette hack for qemu "standard vga" framebuffer offb: Fix bug in calculating requested vram size powerpc/boot: Change the WARN to INFO for boot wrapper overlap message powerpc/44x: Fix build error on currituck platform powerpc/boot: Change the load address for the wrapper to fit the kernel powerpc/44x: Enable CRASH_DUMP for 440x ... Fix up a trivial conflict in arch/powerpc/include/asm/cputime.h due to the additional sparse-checking code for cputime_t.
2012-01-03powerpc: Fix unpaired probe_hcall_entry and probe_hcall_exitLi Zhong1-0/+2
Unpaired calling of probe_hcall_entry and probe_hcall_exit might happen as following, which could cause incorrect preempt count. __trace_hcall_entry => trace_hcall_entry -> probe_hcall_entry => get_cpu_var => preempt_disable __trace_hcall_exit => trace_hcall_exit -> probe_hcall_exit => put_cpu_var => preempt_enable where: A => B and A -> B means A calls B, but => means A will call B through function name, and B will definitely be called. -> means A will call B through function pointer, so B might not be called if the function pointer is not set. So error happens when only one of probe_hcall_entry and probe_hcall_exit get called during a hcall. This patch tries to move the preempt count operations from probe_hcall_entry and probe_hcall_exit to its callers. Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> CC: stable@kernel.org [v2.6.32+] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-12-11powerpc: Tell RCU about idle after hcall tracingPaul E. McKenney1-0/+4
The PowerPC pSeries platform (CONFIG_PPC_PSERIES=y) enables hypervisor-call tracing for CONFIG_TRACEPOINTS=y kernels. One of the hypervisor calls that is traced is the H_CEDE call in the idle loop that tells the hypervisor that this OS instance no longer needs the current CPU. However, tracing uses RCU, so this combination of kernel configuration variables needs to avoid telling RCU about the current CPU's idleness until after the H_CEDE-entry tracing completes on the one hand, and must tell RCU that the the current CPU is no longer idle before the H_CEDE-exit tracing starts. In all other cases, it suffices to inform RCU of CPU idleness upon idle-loop entry and exit. This commit makes the required adjustments. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-11-01powerpc: add export.h to files making use of EXPORT_SYMBOLPaul Gortmaker1-0/+1
With module.h being implicitly everywhere via device.h, the absence of explicitly including something for EXPORT_SYMBOL went unnoticed. Since we are heading to fix things up and clean module.h from the device.h file, we need to explicitly include these files now. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-08-05powerpc/pseries: Cleanup VPA registration and deregistration errorsAnton Blanchard1-9/+8
Make the VPA, SLB shadow and DTL registration and deregistration functions print consistent messages on error. I needed the firmware error code while chasing a kexec bug but we weren't printing it. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-08-05powerpc: pseries: Fix kexec on machines with more than 4TB of RAMAnton Blanchard1-1/+1
On a box with 8TB of RAM the MMU hashtable is 64GB in size. That means we have 4G PTEs. pSeries_lpar_hptab_clear was using a signed int to store the index which will overflow at 2G. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: <stable@kernel.org> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-06-29powerpc/pseries: Re-implement HVSI as part of hvc_vioBenjamin Herrenschmidt1-189/+0
On pseries machines, consoles are provided by the hypervisor using a low level get_chars/put_chars type interface. However, this is really just a transport to the service processor which implements them either as "raw" console (networked consoles, HMC, ...) or as "hvsi" serial ports. The later is a simple packet protocol on top of the raw character interface that is supposed to convey additional "serial port" style semantics. In practice however, all it does is provide a way to read the CD line and set/clear our DTR line, that's it. We currently implement the "raw" protocol as an hvc console backend (/dev/hvcN) and the "hvsi" protocol using a separate tty driver (/dev/hvsi0). However this is quite impractical. The arbitrary difference between the two type of devices has been a major source of user (and distro) confusion. Additionally, there's an additional mini -hvsi implementation in the pseries platform code for our low level debug console and early boot kernel messages, which means code duplication, though that low level variant is impractical as it's incapable of doing the initial protocol negociation to establish the link to the FSP. This essentially replaces the dedicated hvsi driver and the platform udbg code completely by extending the existing hvc_vio backend used in "raw" mode so that: - It now supports HVSI as well - We add support for hvc backend providing tiocm{get,set} - It also provides a udbg interface for early debug and boot console This is overall less code, though this will only be obvious once we remove the old "hvsi" driver, which is still available for now. When the old driver is enabled, the new code still kicks in for the low level udbg console, replacing the old mini implementation in the platform code, it just doesn't provide the higher level "hvc" interface. In addition to producing generally simler code, this has several benefits over our current situation: - The user/distro only has to deal with /dev/hvcN for the hypervisor console, avoiding all sort of confusion that has plagued us in the past - The tty, kernel and low level debug console all use the same code base which supports the full protocol establishment process, thus the console is now available much earlier than it used to be with the old HVSI driver. The kernel console works much earlier and udbg is available much earlier too. Hackers can enable a hard coded very-early debug console as well that works with HVSI (previously that was only supported for the "raw" mode). I've tried to keep the same semantics as hvsi relative to how I react to things like CD changes, with some subtle differences though: - I clear DTR on close if HUPCL is set - Current hvsi triggers a hangup if it detects a up->down transition on CD (you can still open a console with CD down). My new implementation triggers a hangup if the link to the FSP is severed, and severs it upon detecting a up->down transition on CD. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-06-29powerpc/udbg: Register udbg console genericallyBenjamin Herrenschmidt1-2/+0
When CONFIG_PPC_EARLY_DEBUG is set, call register_early_udbg_console() early from generic code. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-04powerpc/pseries: Add page coalescing supportBrian King1-0/+46
Adds support for page coalescing, which is a feature on IBM Power servers which allows for coalescing identical pages between logical partitions. Hint text pages as coalesce candidates, since they are the most likely pages to be able to be coalesced between partitions. This patch also exports some page coalescing statistics available from firmware via lparcfg. [BenH: Moved a couple of things around to fix compile problems] Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-04-27powerpc: Free up some CPU feature bits by moving out MMU-related featuresMatt Evans1-1/+1
Some of the 64bit PPC CPU features are MMU-related, so this patch moves them to MMU_FTR_ bits. All cpu_has_feature()-style tests are moved to mmu_has_feature(), and seven feature bits are freed as a result. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>