summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)AuthorFilesLines
2018-12-21powerpc: generate uapi header and system call table filesFiroz Khan3-110/+12
System call table generation script must be run to gener- ate unistd_32/64.h and syscall_table_32/64/c32/spu.h files. This patch will have changes which will invokes the script. This patch will generate unistd_32/64.h and syscall_table- _32/64/c32/spu.h files by the syscall table generation script invoked by parisc/Makefile and the generated files against the removed files must be identical. The generated uapi header file will be included in uapi/- asm/unistd.h and generated system call table header file will be included by kernel/systbl.S file. Signed-off-by: Firoz Khan <firoz.khan@linaro.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21powerpc: add system call table generation supportFiroz Khan4-0/+563
The system call tables are in different format in all architecture and it will be difficult to manually add or modify the system calls in the respective files. To make it easy by keeping a script and which will generate the uapi header and syscall table file. This change will also help to unify the implementation across all architectures. The system call table generation script is added in syscalls directory which contain the script to generate both uapi header file and system call table files. The syscall.tbl file will be the input for the scripts. syscall.tbl contains the list of available system calls along with system call number and corresponding entry point. Add a new system call in this architecture will be possible by adding new entry in the syscall.tbl file. Adding a new table entry consisting of: - System call number. - ABI. - System call name. - Entry point name. - Compat entry name, if required. syscallhdr.sh and syscalltbl.sh will generate uapi header- unistd_32/64.h and syscall_table_32/64/c32/spu.h files respectively. File syscall_table_32/64/c32/spu.h is incl- uded by syscall.S - the real system call table. Both *.sh files will parse the content syscall.tbl to generate the header and table files. ARM, s390 and x86 architecuture does have similar support. I leverage their implementation to come up with a generic solution. Signed-off-by: Firoz Khan <firoz.khan@linaro.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21powerpc: split compat syscall table out from native tableFiroz Khan3-11/+38
PowerPC uses a syscall table with native and compat calls interleaved, which is a slightly simpler way to define two matching tables. As we move to having the tables generated, that advantage is no longer important, but the interleaved table gets in the way of using the same scripts as on the other archit- ectures. Split out a new compat_sys_call_table symbol that contains all the compat calls, and leave the main table for the nat- ive calls, to more closely match the method we use every- where else. Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Firoz Khan <firoz.khan@linaro.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21powerpc: move macro definition from asm/systbl.hFiroz Khan1-0/+1
Move the macro definition for compat_sys_sigsuspend from asm/systbl.h to the file which it is getting included. One of the patch in this patch series is generating uapi header and syscall table files. In order to come up with a common implimentation across all architecture, we need to do this change. This change will simplify the implementation of system call table generation script and help to come up a common implementation across all architecture. Signed-off-by: Firoz Khan <firoz.khan@linaro.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21powerpc/tm: Unset MSR[TS] if not recheckpointingBreno Leitao2-9/+29
There is a TM Bad Thing bug that can be caused when you return from a signal context in a suspended transaction but with ucontext MSR[TS] unset. This forces regs->msr[TS] to be set at syscall entrance (since the CPU state is transactional). It also calls treclaim() to flush the transaction state, which is done based on the live (mfmsr) MSR state. Since user context MSR[TS] is not set, then restore_tm_sigcontexts() is not called, thus, not executing recheckpoint, keeping the CPU state as not transactional. When calling rfid, SRR1 will have MSR[TS] set, but the CPU state is non transactional, causing the TM Bad Thing with the following stack: [ 33.862316] Bad kernel stack pointer 3fffd9dce3e0 at c00000000000c47c cpu 0x8: Vector: 700 (Program Check) at [c00000003ff7fd40] pc: c00000000000c47c: fast_exception_return+0xac/0xb4 lr: 00003fff865f442c sp: 3fffd9dce3e0 msr: 8000000102a03031 current = 0xc00000041f68b700 paca = 0xc00000000fb84800 softe: 0 irq_happened: 0x01 pid = 1721, comm = tm-signal-sigre Linux version 4.9.0-3-powerpc64le (debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26) WARNING: exception is not recoverable, can't continue The same problem happens on 32-bits signal handler, and the fix is very similar, if tm_recheckpoint() is not executed, then regs->msr[TS] should be zeroed. This patch also fixes a sparse warning related to lack of indentation when CONFIG_PPC_TRANSACTIONAL_MEM is set. Fixes: 2b0a576d15e0e ("powerpc: Add new transactional memory state to the signal context") CC: Stable <stable@vger.kernel.org> # 3.10+ Signed-off-by: Breno Leitao <leitao@debian.org> Tested-by: Michal Suchánek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21powerpc/tm: Print scratch valueBreno Leitao1-1/+2
Usually a TM Bad Thing exception is raised due to three different problems. a) touching SPRs in an active transaction; b) using TM instruction with the facility disabled and c) setting a wrong MSR/SRR1 at RFID. The two initial cases are easy to identify by looking at the instructions. The latter case is harder, because the MSR is masked after RFID, so, it is very useful to look at the previous MSR (SRR1) before RFID as also the current and masked MSR. Since MSR is saved at paca just before RFID, this patch prints it if a TM Bad thing happen, helping to understand what is the invalid TM transition that is causing the exception. Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21powerpc/tm: Save MSR to PACA before RFIDBreno Leitao1-0/+4
As other exit points, move SRR1 (MSR) into paca->tm_scratch, so, if there is a TM Bad Thing in RFID, it is easy to understand what was the SRR1 value being used. Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21powerpc/tm: Set MSR[TS] just prior to recheckpointBreno Leitao2-15/+49
On a signal handler return, the user could set a context with MSR[TS] bits set, and these bits would be copied to task regs->msr. At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set, several __get_user() are called and then a recheckpoint is executed. This is a problem since a page fault (in kernel space) could happen when calling __get_user(). If it happens, the process MSR[TS] bits were already set, but recheckpoint was not executed, and SPRs are still invalid. The page fault can cause the current process to be de-scheduled, with MSR[TS] active and without tm_recheckpoint() being called. More importantly, without TEXASR[FS] bit set also. Since TEXASR might not have the FS bit set, and when the process is scheduled back, it will try to reclaim, which will be aborted because of the CPU is not in the suspended state, and, then, recheckpoint. This recheckpoint will restore thread->texasr into TEXASR SPR, which might be zero, hitting a BUG_ON(). kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434! cpu 0xb: Vector: 700 (Program Check) at [c00000041f1576d0] pc: c000000000054550: restore_gprs+0xb0/0x180 lr: 0000000000000000 sp: c00000041f157950 msr: 8000000100021033 current = 0xc00000041f143000 paca = 0xc00000000fb86300 softe: 0 irq_happened: 0x01 pid = 1021, comm = kworker/11:1 kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434! Linux version 4.9.0-3-powerpc64le (debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26) enter ? for help [c00000041f157b30] c00000000001bc3c tm_recheckpoint.part.11+0x6c/0xa0 [c00000041f157b70] c00000000001d184 __switch_to+0x1e4/0x4c0 [c00000041f157bd0] c00000000082eeb8 __schedule+0x2f8/0x990 [c00000041f157cb0] c00000000082f598 schedule+0x48/0xc0 [c00000041f157ce0] c0000000000f0d28 worker_thread+0x148/0x610 [c00000041f157d80] c0000000000f96b0 kthread+0x120/0x140 [c00000041f157e30] c00000000000c0e0 ret_from_kernel_thread+0x5c/0x7c This patch simply delays the MSR[TS] set, so, if there is any page fault in the __get_user() section, it does not have regs->msr[TS] set, since the TM structures are still invalid, thus avoiding doing TM operations for in-kernel exceptions and possible process reschedule. With this patch, the MSR[TS] will only be set just before recheckpointing and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in invalid state. Other than that, if CONFIG_PREEMPT is set, there might be a preemption just after setting MSR[TS] and before tm_recheckpoint(), thus, this block must be atomic from a preemption perspective, thus, calling preempt_disable/enable() on this code. It is not possible to move tm_recheckpoint to happen earlier, because it is required to get the checkpointed registers from userspace, with __get_user(), thus, the only way to avoid this undesired behavior is delaying the MSR[TS] set. The 32-bits signal handler seems to be safe this current issue, but, it might be exposed to the preemption issue, thus, disabling preemption in this chunk of code. Changes from v2: * Run the critical section with preempt_disable. Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals") Cc: stable@vger.kernel.org (v3.9+) Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21powerpc/fadump: Do not allow hot-remove memory from fadump reserved area.Mahesh Salgaonkar1-2/+8
For fadump to work successfully there should not be any holes in reserved memory ranges where kernel has asked firmware to move the content of old kernel memory in event of crash. Now that fadump uses CMA for reserved area, this memory area is now not protected from hot-remove operations unless it is cma allocated. Hence, fadump service can fail to re-register after the hot-remove operation, if hot-removed memory belongs to fadump reserved region. To avoid this make sure that memory from fadump reserved area is not hot-removable if fadump is registered. However, if user still wants to remove that memory, he can do so by manually stopping fadump service before hot-remove operation. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21powerpc/fadump: Throw proper error message on fadump registration failureMahesh Salgaonkar1-2/+33
fadump fails to register when there are holes in reserved memory area. This can happen if user has hot-removed a memory that falls in the fadump reserved memory area. Throw a meaningful error message to the user in such case. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> [mpe: is_reserved_memory_area_contiguous() returns bool, unsplit string] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21powerpc/fadump: Reservationless firmware assisted dumpMahesh Salgaonkar1-10/+87
One of the primary issues with Firmware Assisted Dump (fadump) on Power is that it needs a large amount of memory to be reserved. On large systems with TeraBytes of memory, this reservation can be quite significant. In some cases, fadump fails if the memory reserved is insufficient, or if the reserved memory was DLPAR hot-removed. In the normal case, post reboot, the preserved memory is filtered to extract only relevant areas of interest using the makedumpfile tool. While the tool provides flexibility to determine what needs to be part of the dump and what memory to filter out, all supported distributions default this to "Capture only kernel data and nothing else". We take advantage of this default and the Linux kernel's Contiguous Memory Allocator (CMA) to fundamentally change the memory reservation model for fadump. Instead of setting aside a significant chunk of memory nobody can use, this patch uses CMA instead, to reserve a significant chunk of memory that the kernel is prevented from using (due to MIGRATE_CMA), but applications are free to use it. With this fadump will still be able to capture all of the kernel memory and most of the user space memory except the user pages that were present in CMA region. Essentially, on a P9 LPAR with 2 cores, 8GB RAM and current upstream: [root@zzxx-yy10 ~]# free -m total used free shared buff/cache available Mem: 7557 193 6822 12 541 6725 Swap: 4095 0 4095 With this patch: [root@zzxx-yy10 ~]# free -m total used free shared buff/cache available Mem: 8133 194 7464 12 475 7338 Swap: 4095 0 4095 Changes made here are completely transparent to how fadump has traditionally worked. Thanks to Aneesh Kumar and Anshuman Khandual for helping us understand CMA and its usage. TODO: - Handle case where CMA reservation spans nodes. Signed-off-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20Merge tag 'kvm-ppc-next-4.21-1' of ↵Radim Krčmář1-0/+9
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc PPC KVM update for 4.21 from Paul Mackerras The main new feature this time is support in HV nested KVM for passing a device that is emulated by a level 0 hypervisor and presented to level 1 as a PCI device through to a level 2 guest using VFIO. Apart from that there are improvements for migration of radix guests under HV KVM and some other fixes and cleanups.
2018-12-20powerpc/eeh: Fix debugfs_simple_attr.cocci warningsYueHaibing1-10/+10
Use DEFINE_DEBUGFS_ATTRIBUTE rather than DEFINE_SIMPLE_ATTRIBUTE for debugfs files. Semantic patch information: Rationale: DEFINE_SIMPLE_ATTRIBUTE + debugfs_create_file() imposes some significant overhead as compared to DEFINE_DEBUGFS_ATTRIBUTE + debugfs_create_file_unsafe(). Generated by: scripts/coccinelle/api/debugfs/debugfs_simple_attr.cocci Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20powerpc/fsl: Update Spectre v2 reportingDiana Craciun1-1/+4
Report branch predictor state flush as a mitigation for Spectre variant 2. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20powerpc/fsl: Enable runtime patching if nospectre_v2 boot arg is usedDiana Craciun1-0/+1
If the user choses not to use the mitigations, replace the code sequence with nops. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20powerpc/fsl: Flush the branch predictor at each kernel entry (32 bit)Diana Craciun2-0/+21
In order to protect against speculation attacks on indirect branches, the branch predictor is flushed at kernel entry to protect for the following situations: - userspace process attacking another userspace process - userspace process attacking the kernel Basically when the privillege level change (i.e.the kernel is entered), the branch predictor state is flushed. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20powerpc/fsl: Flush the branch predictor at each kernel entry (64bit)Diana Craciun2-1/+30
In order to protect against speculation attacks on indirect branches, the branch predictor is flushed at kernel entry to protect for the following situations: - userspace process attacking another userspace process - userspace process attacking the kernel Basically when the privillege level change (i.e. the kernel is entered), the branch predictor state is flushed. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20powerpc/fsl: Add nospectre_v2 command line argumentDiana Craciun1-0/+21
When the command line argument is present, the Spectre variant 2 mitigations are disabled. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20powerpc/fsl: Fix spectre_v2 mitigations reportingDiana Craciun1-1/+1
Currently for CONFIG_PPC_FSL_BOOK3E the spectre_v2 file is incorrect: $ cat /sys/devices/system/cpu/vulnerabilities/spectre_v2 "Mitigation: Software count cache flush" Which is wrong. Fix it to report vulnerable for now. Fixes: ee13cb249fab ("powerpc/64s: Add support for software count cache flush") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20powerpc/fsl: Add infrastructure to fixup branch predictor flushDiana Craciun1-0/+8
In order to protect against speculation attacks (Spectre variant 2) on NXP PowerPC platforms, the branch predictor should be flushed when the privillege level is changed. This patch is adding the infrastructure to fixup at runtime the code sections that are performing the branch predictor flush depending on a boot arg parameter which is added later in a separate patch. Signed-off-by: Diana Craciun <diana.craciun@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20powerpc/prom: move the device tree if not in declared memory.Christophe Leroy1-2/+2
If the device tree doesn't reside in the memory which is declared inside it, it has to be moved as well as this memory will not be mapped by the kernel. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20powerpc: eeh_event: convert semaphore to completionArnd Bergmann1-6/+3
For this use case, completions and semaphores are equivalent, but semaphores are an awkward interface that should generally be avoided, so use the completion instead. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20powerpc/ptrace: Combine SYSCALL_EMU & SYSCALL_TRACE handlingDmitry V. Levin1-23/+31
Combine the SYSCALL_EMU and SYSCALL_TRACE handling so that we only call tracehook_report_syscall_entry() in one place. Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> [mpe: Flesh out change log, s/cached_flags/flags/, reflow comments] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20powerpc: use mm zones more sensiblyChristoph Hellwig2-10/+4
Powerpc has somewhat odd usage where ZONE_DMA is used for all memory on common 64-bit configfs, and ZONE_DMA32 is used for 31-bit schemes. Move to a scheme closer to what other architectures use (and I dare to say the intent of the system): - ZONE_DMA: optionally for memory < 31-bit (64-bit embedded only) - ZONE_NORMAL: everything addressable by the kernel - ZONE_HIGHMEM: memory > 32-bit for 32-bit kernels Also provide information on how ZONE_DMA is used by defining ARCH_ZONE_DMA_BITS. Contains various fixes from Benjamin Herrenschmidt. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20powerpc/dma: split the two __dma_alloc_coherent implementationsChristoph Hellwig1-12/+2
The implemementation for the CONFIG_NOT_COHERENT_CACHE case doesn't share any code with the one for systems with coherent caches. Split it off and merge it with the helpers in dma-noncoherent.c that have no other callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20powerpc/dma: remove the unused dma_iommu_ops exportChristoph Hellwig1-2/+0
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20powerpc/dma: remove the unused ISA_DMA_THRESHOLD exportChristoph Hellwig1-1/+0
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20powerpc/dma: properly wire up the unmap_page and unmap_sg methodsChristoph Hellwig1-1/+8
The unmap methods need to transfer memory ownership back from the device to the cpu by identical means as dma_sync_*_to_cpu. I'm not sure powerpc needs to do any work in this transfer direction, but given that it does invalidate the caches in dma_sync_*_to_cpu already we should make sure we also do so on unmapping. Signed-off-by: Christoph Hellwig <hch@lst.de> [mpe: s/dir/direction in dma_nommu_unmap_page()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20powerpc/prom: fix early DEBUG messagesChristophe Leroy1-3/+3
This patch fixes early DEBUG messages in prom.c: - Use %px instead of %p to see the addresses - Cast memblock_phys_mem_size() with (unsigned long long) to avoid build failure when phys_addr_t is not 64 bits. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20Merge branches 'iommu/fixes', 'arm/renesas', 'arm/mediatek', 'arm/tegra', ↵Joerg Roedel2-4/+4
'arm/omap', 'arm/smmu', 'x86/vt-d', 'x86/amd' and 'core' into next
2018-12-19powerpc/mm: add exec protection on powerpc 603Christophe Leroy1-1/+1
The 603 doesn't have a HASH table, TLB misses are handled by software. It is then possible to generate page fault when _PAGE_EXEC is not set like in nohash/32. There is one "reserved" PTE bit available, this patch uses it for _PAGE_EXEC. In order to support it, set_pte_filter() and set_access_flags_filter() are made common, and the handling is made dependent on MMU_FTR_HPTE_TABLE Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19powerpc: remove remaining bits from CONFIG_APUSChristophe Leroy1-6/+0
commit f21f49ea639a ("[POWERPC] Remove the dregs of APUS support from arch/powerpc") removed CONFIG_APUS, but forgot to remove the logic which adapts tophys() and tovirt() for it. This patch removes the last stale pieces. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19powerpc/8xx: add exception frame markerChristophe Leroy1-0/+3
This patch adds STACK_FRAME_REGS_MARKER in the stack at exception entry in order to see interrupts in call traces as below: [ 0.013964] Call Trace: [ 0.014014] [c0745db0] [c007a9d4] tick_periodic.constprop.5+0xd8/0x104 (unreliable) [ 0.014086] [c0745dc0] [c007aa20] tick_handle_periodic+0x20/0x9c [ 0.014181] [c0745de0] [c0009cd0] timer_interrupt+0xa0/0x264 [ 0.014258] [c0745e10] [c000e484] ret_from_except+0x0/0x14 [ 0.014390] --- interrupt: 901 at console_unlock.part.7+0x3f4/0x528 [ 0.014390] LR = console_unlock.part.7+0x3f0/0x528 [ 0.014455] [c0745ee0] [c0050334] console_unlock.part.7+0x114/0x528 (unreliable) [ 0.014542] [c0745f30] [c00524e0] register_console+0x3d8/0x44c [ 0.014625] [c0745f60] [c0675aac] cpm_uart_console_init+0x18/0x2c [ 0.014709] [c0745f70] [c06614f4] console_init+0x114/0x1cc [ 0.014795] [c0745fb0] [c0658b68] start_kernel+0x300/0x3d8 [ 0.014864] [c0745ff0] [c00022cc] start_here+0x44/0x98 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19powerpc/44x: use patch_sites for TLB handlers patchingChristophe Leroy1-6/+5
Use patch sites and associated helpers to manage TLB handlers patching instead of hardcoding. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19powerpc/signal: Use code patching instead of hardcodingChristophe Leroy2-9/+10
Instead of hardcoding code modifications, use code patching functions. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19powerpc/book3s/32: Use MMU_FTR_HPTE_TABLE in head_32.SChristophe Leroy1-0/+4
Instead of manually patching a blr at hash_page() entry in MMU_init_hw(), this patch adds a features section in head_32.S Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19powerpc/32: use patch_site_addr() in machine_init()Christophe Leroy1-2/+1
Use patch_site_addr() instead of hardcoding the address calculation in machine_init() Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-17Merge branch 'fixes' into nextMichael Ellerman3-3/+17
Merge our fixes branch again, this has a couple of build fixes and also a change to do_syscall_trace_enter() that will conflict with a patch we want to apply in next.
2018-12-17powerpc/iommu: Use device_iommu_mapped()Joerg Roedel2-4/+4
Use the new function to replace the open-coded iommu check. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell Currey <ruscur@russell.cc> Cc: Sam Bobroff <sbobroff@linux.ibm.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2018-12-17KVM: PPC: Book3S HV: Implement functions to access quadrants 1 & 2Suraj Jitindar Singh1-0/+9
The POWER9 radix mmu has the concept of quadrants. The quadrant number is the two high bits of the effective address and determines the fully qualified address to be used for the translation. The fully qualified address consists of the effective lpid, the effective pid and the effective address. This gives then 4 possible quadrants 0, 1, 2, and 3. When accessing these quadrants the fully qualified address is obtained as follows: Quadrant | Hypervisor | Guest -------------------------------------------------------------------------- | EA[0:1] = 0b00 | EA[0:1] = 0b00 0 | effLPID = 0 | effLPID = LPIDR | effPID = PIDR | effPID = PIDR -------------------------------------------------------------------------- | EA[0:1] = 0b01 | 1 | effLPID = LPIDR | Invalid Access | effPID = PIDR | -------------------------------------------------------------------------- | EA[0:1] = 0b10 | 2 | effLPID = LPIDR | Invalid Access | effPID = 0 | -------------------------------------------------------------------------- | EA[0:1] = 0b11 | EA[0:1] = 0b11 3 | effLPID = 0 | effLPID = LPIDR | effPID = 0 | effPID = 0 -------------------------------------------------------------------------- In the Guest; Quadrant 3 is normally used to address the operating system since this uses effPID=0 and effLPID=LPIDR, meaning the PID register doesn't need to be switched. Quadrant 0 is normally used to address user space since the effLPID and effPID are taken from the corresponding registers. In the Host; Quadrant 0 and 3 are used as above, however the effLPID is always 0 to address the host. Quadrants 1 and 2 can be used by the host to address guest memory using a guest effective address. Since the effLPID comes from the LPID register, the host loads the LPID of the guest it would like to access (and the PID of the process) and can perform accesses to a guest effective address. This means quadrant 1 can be used to address the guest user space and quadrant 2 can be used to address the guest operating system from the hypervisor, using a guest effective address. Access to the quadrants can cause a Hypervisor Data Storage Interrupt (HDSI) due to being unable to perform partition scoped translation. Previously this could only be generated from a guest and so the code path expects us to take the KVM trampoline in the interrupt handler. This is no longer the case so we modify the handler to call bad_page_fault() to check if we were expecting this fault so we can handle it gracefully and just return with an error code. In the hash mmu case we still raise an unknown exception since quadrants aren't defined for the hash mmu. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-14Merge tag 'powerpc-4.20-4' of ↵Linus Torvalds3-3/+17
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "One notable fix for our change to split pt_regs between user/kernel, we forgot to update BPF to use the user-visible type which was an ABI break for BPF programs. A slightly ugly but minimal fix to do_syscall_trace_enter() so that we use tracehook_report_syscall_entry() properly. We'll rework the code in next to avoid the empty if body. Seven commits fixing bugs in the new papr_scm (Storage Class Memory) driver. The driver was finally able to be tested on the other hypervisor which exposed several bugs. The fixes are all fairly minimal at least. Fix a crash in our MSI code if an MSI-capable device is plugged into a non-MSI capable PHB, only seen on older hardware (MPC8378). Fix our legacy serial code to look for "stdout-path" since the device trees were updated to use that instead of "linux,stdout-path". A change to the COFF zImage code to fix booting old powermacs. A couple of minor build fixes. Thanks to: Benjamin Herrenschmidt, Daniel Axtens, Dmitry V. Levin, Elvira Khabirova, Oliver O'Halloran, Paul Mackerras, Radu Rendec, Rob Herring, Sandipan Das" * tag 'powerpc-4.20-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/ptrace: replace ptrace_report_syscall() with a tracehook call powerpc/mm: Fallback to RAM if the altmap is unusable powerpc/papr_scm: Use ibm,unit-guid as the iset cookie powerpc/papr_scm: Fix DIMM device registration race powerpc/papr_scm: Remove endian conversions powerpc/papr_scm: Update DT properties powerpc/papr_scm: Fix resource end address powerpc/papr_scm: Use depend instead of select powerpc/bpf: Fix broken uapi for BPF_PROG_TYPE_PERF_EVENT powerpc/boot: Fix build failures with -j 1 powerpc: Look for "stdout-path" when setting up legacy consoles powerpc/msi: Fix NULL pointer access in teardown code powerpc/mm: Fix linux page tables build with some configs powerpc: Fix COFF zImage booting on old powermacs
2018-12-13dma-direct: merge swiotlb_dma_ops into the dma_direct codeChristoph Hellwig1-8/+8
While the dma-direct code is (relatively) clean and simple we actually have to use the swiotlb ops for the mapping on many architectures due to devices with addressing limits. Instead of keeping two implementations around this commit allows the dma-direct implementation to call the swiotlb bounce buffering functions and thus share the guts of the mapping implementation. This also simplified the dma-mapping setup on a few architectures where we don't have to differenciate which implementation to use. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Tony Luck <tony.luck@intel.com>
2018-12-10powerpc/ptrace: replace ptrace_report_syscall() with a tracehook callElvira Khabirova1-1/+6
Arch code should use tracehook_*() helpers, as documented in include/linux/tracehook.h, ptrace_report_syscall() is not expected to be used outside that file. The patch does not look very nice, but at least it is correct and opens the way for PTRACE_GET_SYSCALL_INFO API. Co-authored-by: Dmitry V. Levin <ldv@altlinux.org> Fixes: 5521eb4bca2d ("powerpc/ptrace: Add support for PTRACE_SYSEMU") Signed-off-by: Elvira Khabirova <lineprinter@altlinux.org> Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> [mpe: Take this as a minimal fix for 4.20, we'll rework it later] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-06powerpc/iommu: remove the mapping_error dma_map_ops methodChristoph Hellwig2-20/+14
The powerpc iommu code already returns (~(dma_addr_t)0x0) on mapping failures, so we can switch over to returning DMA_MAPPING_ERROR and let the core dma-mapping code handle the rest. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-06dma-direct: remove the mapping_error dma_map_ops methodChristoph Hellwig1-1/+0
The dma-direct code already returns (~(dma_addr_t)0x0) on mapping failures, so we can switch over to returning DMA_MAPPING_ERROR and let the core dma-mapping code handle the rest. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-06powerpc, kexec_file: factor out memblock-based arch_kexec_walk_mem()AKASHI Takahiro1-54/+0
Memblock list is another source for usable system memory layout. So move powerpc's arch_kexec_walk_mem() to common code so that other memblock-based architectures, particularly arm64, can also utilise it. A moved function is now renamed to kexec_walk_memblock() and integrated into kexec_locate_mem_hole(), which will now be usable for all architectures with no need for overriding arch_kexec_walk_mem(). With this change, arch_kexec_walk_mem() need no longer be a weak function, and was now renamed to kexec_walk_resources(). Since powerpc doesn't support kdump in its kexec_file_load(), the current kexec_walk_memblock() won't work for kdump either in this form, this will be fixed in the next patch. Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Acked-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-04powerpc/8xx: regroup TLB handler routinesChristophe Leroy1-58/+54
As this is running with MMU off, the CPU only does speculative fetch for code in the same page. Following the significant size reduction of TLB handler routines, the side handlers can be brought back close to the main part, ie in the same page. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04powerpc/8xx: don't use r12/SPRN_SPRG_SCRATCH2 in TLB Miss handlersChristophe Leroy1-61/+49
This patch reworks the TLB Miss handler in order to not use r12 register, hence avoiding having to save it into SPRN_SPRG_SCRATCH2. In the DAR Fixup code we can now use SPRN_M_TW, freeing SPRN_SPRG_SCRATCH2. Then SPRN_SPRG_SCRATCH2 may be used for something else in the future. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04powerpc/8xx: Use hardware assistance in TLB handlersChristophe Leroy1-34/+24
Today, on the 8xx the TLB handlers do SW tablewalk by doing all the calculation in ASM, in order to match with the Linux page table structure. The 8xx offers hardware assistance which allows significant size reduction of the TLB handlers, hence also reduces the time spent in the handlers. However, using this HW assistance implies some constraints on the page table structure: - Regardless of the main page size used (4k or 16k), the level 1 table (PGD) contains 1024 entries and each PGD entry covers a 4Mbytes area which is managed by a level 2 table (PTE) containing also 1024 entries each describing a 4k page. - 16k pages require 4 identifical entries in the L2 table - 512k pages PTE have to be spread every 128 bytes in the L2 table - 8M pages PTE are at the address pointed by the L1 entry and each 8M page require 2 identical entries in the PGD. This patch modifies the TLB handlers to use HW assistance for 4K PAGES. Before that patch, the mean time spent in TLB miss handlers is: - ITLB miss: 80 ticks - DTLB miss: 62 ticks After that patch, the mean time spent in TLB miss handlers is: - ITLB miss: 72 ticks - DTLB miss: 54 ticks So the improvement is 10% for ITLB and 13% for DTLB misses Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04powerpc/8xx: Temporarily disable 16k pages and hugepagesChristophe Leroy1-69/+5
In preparation of making use of hardware assistance in TLB handlers, this patch temporarily disables 16K pages and hugepages. The reason is that when using HW assistance in 4K pages mode, the linux model fit with the HW model for 4K pages and 8M pages. However for 16K pages and 512K mode some additional work is needed to get linux model fit with HW model. For the 8M pages, they will naturaly come back when we switch to HW assistance, without any additional handling. In order to keep the following patch smaller, the removal of the current special handling for 8M pages gets removed here as well. Therefore the 4K pages mode will be implemented first and without support for 512k hugepages. Then the 512k hugepages will be brought back. And the 16K pages will be implemented in the following step. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>