summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)AuthorFilesLines
2018-08-03mm, powerpc, x86: define VM_PKEY_BITx bits if CONFIG_ARCH_HAS_PKEYS is enabledRam Pai1-0/+2
[ Upstream commit 5212213aa5a2359dd0474c9dab22b6220b591fe1 ] VM_PKEY_BITx are defined only if CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is enabled. Powerpc also needs these bits. Hence lets define the VM_PKEY_BITx bits for any architecture that enables CONFIG_ARCH_HAS_PKEYS. Reviewed-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03powerpc/embedded6xx/hlwd-pic: Prevent interrupts from being handled by StarletJonathan Neuschäfer1-0/+5
[ Upstream commit 9dcb3df4281876731e4e8bff7940514d72375154 ] The interrupt controller inside the Wii's Hollywood chip is connected to two masters, the "Broadway" PowerPC and the "Starlet" ARM926, each with their own interrupt status and mask registers. When booting the Wii with mini[1], interrupts from the SD card controller (IRQ 7) are handled by the ARM, because mini provides SD access over IPC. Linux however can't currently use or disable this IPC service, so both sides try to handle IRQ 7 without coordination. Let's instead make sure that all interrupts that are unmasked on the PPC side are masked on the ARM side; this will also make sure that Linux can properly talk to the SD card controller (and potentially other devices). If access to a device through IPC is desired in the future, interrupts from that device should not be handled by Linux directly. [1]: https://github.com/lewurm/mini Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03bpf: powerpc64: pad function address loads with NOPsSandipan Das1-11/+23
[ Upstream commit 4ea69b2fd623dee2bbc77d3b6b7d8c0924e2026a ] For multi-function programs, loading the address of a callee function to a register requires emitting instructions whose count varies from one to five depending on the nature of the address. Since we come to know of the callee's address only before the extra pass, the number of instructions required to load this address may vary from what was previously generated. This can make the JITed image grow or shrink. To avoid this, we should generate a constant five-instruction when loading function addresses by padding the optimized load sequence with NOPs. Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03powerpc/8xx: fix invalid register expression in head_8xx.SChristophe Leroy1-1/+1
[ Upstream commit e4ccb1dae6bdef228d729c076c38161ef6e7ca34 ] New binutils generate the following warning AS arch/powerpc/kernel/head_8xx.o arch/powerpc/kernel/head_8xx.S: Assembler messages: arch/powerpc/kernel/head_8xx.S:916: Warning: invalid register expression This patch fixes it. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03powerpc: Add __printf verification to prom_printfMathieu Malaterre1-56/+58
[ Upstream commit eae5f709a4d738c52b6ab636981755d76349ea9e ] __printf is useful to verify format and arguments. Fix arg mismatch reported by gcc, remove the following warnings (with W=1): arch/powerpc/kernel/prom_init.c:1467:31: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘long unsigned int’ arch/powerpc/kernel/prom_init.c:1471:31: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘long unsigned int’ arch/powerpc/kernel/prom_init.c:1504:33: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘long unsigned int’ arch/powerpc/kernel/prom_init.c:1505:33: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘long unsigned int’ arch/powerpc/kernel/prom_init.c:1506:33: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘long unsigned int’ arch/powerpc/kernel/prom_init.c:1507:33: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘long unsigned int’ arch/powerpc/kernel/prom_init.c:1508:33: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘long unsigned int’ arch/powerpc/kernel/prom_init.c:1509:33: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘long unsigned int’ arch/powerpc/kernel/prom_init.c:1975:39: error: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘unsigned int’ arch/powerpc/kernel/prom_init.c:1986:27: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘long unsigned int’ arch/powerpc/kernel/prom_init.c:2567:38: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘long unsigned int’ arch/powerpc/kernel/prom_init.c:2567:46: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 3 has type ‘long unsigned int’ arch/powerpc/kernel/prom_init.c:2569:38: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 2 has type ‘long unsigned int’ arch/powerpc/kernel/prom_init.c:2569:46: error: format ‘%x’ expects argument of type ‘unsigned int’, but argument 3 has type ‘long unsigned int’ The patch also include arg mismatch fix for case with #define DEBUG_PROM (warning not listed here). This patch fix also the following warnings revealed by checkpatch: WARNING: Prefer using '"%s...", __func__' to using 'alloc_up', this function's name, in a string #101: FILE: arch/powerpc/kernel/prom_init.c:1235: + prom_debug("alloc_up(%lx, %lx)\n", size, align); and WARNING: Prefer using '"%s...", __func__' to using 'alloc_down', this function's name, in a string #138: FILE: arch/powerpc/kernel/prom_init.c:1278: + prom_debug("alloc_down(%lx, %lx, %s)\n", size, align, Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03powerpc/powermac: Mark variable x as unusedMathieu Malaterre1-1/+3
[ Upstream commit 5a4b475cf8511da721f20ba432c244061db7139f ] Since the value of x is never intended to be read, declare it with gcc attribute as unused. Fix warning treated as error with W=1: arch/powerpc/platforms/powermac/bootx_init.c:471:21: error: variable ‘x’ set but not used [-Werror=unused-but-set-variable] Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03powerpc/powermac: Add missing prototype for note_bootable_part()Mathieu Malaterre1-0/+1
[ Upstream commit f72cf3f1d49f2c35d6cb682af2e8c93550f264e4 ] Add a missing prototype for function `note_bootable_part` to silence a warning treated as error with W=1: arch/powerpc/platforms/powermac/setup.c:361:12: error: no previous prototype for ‘note_bootable_part’ [-Werror=missing-prototypes] Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03powerpc/chrp/time: Make some functions static, add missing header includeMathieu Malaterre1-2/+4
[ Upstream commit b87a358b4a1421abd544c0b554b1b7159b2b36c0 ] Add a missing include <platforms/chrp/chrp.h>. These functions can all be static, make it so. Fix warnings treated as errors with W=1: arch/powerpc/platforms/chrp/time.c:41:13: error: no previous prototype for ‘chrp_time_init’ [-Werror=missing-prototypes] arch/powerpc/platforms/chrp/time.c:66:5: error: no previous prototype for ‘chrp_cmos_clock_read’ [-Werror=missing-prototypes] arch/powerpc/platforms/chrp/time.c:74:6: error: no previous prototype for ‘chrp_cmos_clock_write’ [-Werror=missing-prototypes] arch/powerpc/platforms/chrp/time.c:86:5: error: no previous prototype for ‘chrp_set_rtc_time’ [-Werror=missing-prototypes] arch/powerpc/platforms/chrp/time.c:130:6: error: no previous prototype for ‘chrp_get_rtc_time’ [-Werror=missing-prototypes] Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03powerpc/32: Add a missing include headerMathieu Malaterre1-0/+1
[ Upstream commit c89ca593220931c150cffda24b4d4ccf82f13fc8 ] The header file <linux/syscalls.h> was missing from the includes. Fix the following warning, treated as error with W=1: arch/powerpc/kernel/pci_32.c:286:6: error: no previous prototype for ‘sys_pciconfig_iobase’ [-Werror=missing-prototypes] Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03powerpc/64s: Fix compiler store ordering to SLB shadow areaNicholas Piggin1-4/+4
[ Upstream commit 926bc2f100c24d4842b3064b5af44ae964c1d81c ] The stores to update the SLB shadow area must be made as they appear in the C code, so that the hypervisor does not see an entry with mismatched vsid and esid. Use WRITE_ONCE for this. GCC has been observed to elide the first store to esid in the update, which means that if the hypervisor interrupts the guest after storing to vsid, it could see an entry with old esid and new vsid, which may possibly result in memory corruption. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03powerpc/eeh: Fix use-after-release of EEH driverSam Bobroff1-12/+16
[ Upstream commit 46d4be41b987a6b2d25a2ebdd94cafb44e21d6c5 ] Correct two cases where eeh_pcid_get() is used to reference the driver's module but the reference is dropped before the driver pointer is used. In eeh_rmv_device() also refactor a little so that only two calls to eeh_pcid_put() are needed, rather than three and the reference isn't taken at all if it wasn't needed. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03powerpc/64s: Add barrier_nospecMichal Suchanek1-0/+15
[ Upstream commit a6b3964ad71a61bb7c61d80a60bea7d42187b2eb ] A no-op form of ori (or immediate of 0 into r31 and the result stored in r31) has been re-tasked as a speculation barrier. The instruction only acts as a barrier on newer machines with appropriate firmware support. On older CPUs it remains a harmless no-op. Implement barrier_nospec using this instruction. mpe: The semantics of the instruction are believed to be that it prevents execution of subsequent instructions until preceding branches have been fully resolved and are no longer executing speculatively. There is no further documentation available at this time. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03powerpc/lib: Adjust .balign inside string functions for PPC32Christophe Leroy2-3/+7
[ Upstream commit 1128bb7813a896bd608fb622eee3c26aaf33b473 ] commit 87a156fb18fe1 ("Align hot loops of some string functions") degraded the performance of string functions by adding useless nops A simple benchmark on an 8xx calling 100000x a memchr() that matches the first byte runs in 41668 TB ticks before this patch and in 35986 TB ticks after this patch. So this gives an improvement of approx 10% Another benchmark doing the same with a memchr() matching the 128th byte runs in 1011365 TB ticks before this patch and 1005682 TB ticks after this patch, so regardless on the number of loops, removing those useless nops improves the test by 5683 TB ticks. Fixes: 87a156fb18fe1 ("Align hot loops of some string functions") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-28KVM: PPC: Check if IOMMU page is contained in the pinned physical pageAlexey Kardashevskiy4-7/+42
commit 76fa4975f3ed12d15762bc979ca44078598ed8ee upstream. A VM which has: - a DMA capable device passed through to it (eg. network card); - running a malicious kernel that ignores H_PUT_TCE failure; - capability of using IOMMU pages bigger that physical pages can create an IOMMU mapping that exposes (for example) 16MB of the host physical memory to the device when only 64K was allocated to the VM. The remaining 16MB - 64K will be some other content of host memory, possibly including pages of the VM, but also pages of host kernel memory, host programs or other VMs. The attacking VM does not control the location of the page it can map, and is only allowed to map as many pages as it has pages of RAM. We already have a check in drivers/vfio/vfio_iommu_spapr_tce.c that an IOMMU page is contained in the physical page so the PCI hardware won't get access to unassigned host memory; however this check is missing in the KVM fastpath (H_PUT_TCE accelerated code). We were lucky so far and did not hit this yet as the very first time when the mapping happens we do not have tbl::it_userspace allocated yet and fall back to the userspace which in turn calls VFIO IOMMU driver, this fails and the guest does not retry, This stores the smallest preregistered page size in the preregistered region descriptor and changes the mm_iommu_xxx API to check this against the IOMMU page size. This calculates maximum page size as a minimum of the natural region alignment and compound page size. For the page shift this uses the shift returned by find_linux_pte() which indicates how the page is mapped to the current userspace - if the page is huge and this is not a zero, then it is a leaf pte and the page is mapped within the range. Fixes: 121f80ba68f1 ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-25powerpc/powernv: Fix save/restore of SPRG3 on entry/exit from stop (idle)Gautham R. Shenoy1-0/+2
commit b03897cf318dfc47de33a7ecbc7655584266f034 upstream. On 64-bit servers, SPRN_SPRG3 and its userspace read-only mirror SPRN_USPRG3 are used as userspace VDSO write and read registers respectively. SPRN_SPRG3 is lost when we enter stop4 and above, and is currently not restored. As a result, any read from SPRN_USPRG3 returns zero on an exit from stop4 (Power9 only) and above. Thus in this situation, on POWER9, any call from sched_getcpu() always returns zero, as on powerpc, we call __kernel_getcpu() which relies upon SPRN_USPRG3 to report the CPU and NUMA node information. Fix this by restoring SPRN_SPRG3 on wake up from a deep stop state with the sprg_vdso value that is cached in PACA. Fixes: e1c1cfed5432 ("powerpc/powernv: Save/Restore additional SPRs for stop4 cpuidle") Cc: stable@vger.kernel.org # v4.14+ Reported-by: Florian Weimer <fweimer@redhat.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03powerpc/64s: Fix DT CPU features Power9 DD2.1 logicMichael Ellerman1-1/+2
commit 749a0278c2177b2d16da5d8b135ba7f940bb4199 upstream. In the device tree CPU features quirk code we want to set CPU_FTR_POWER9_DD2_1 on all Power9s that aren't DD2.0 or earlier. But we got the logic wrong and instead set it on all CPUs that aren't Power9 DD2.0 or earlier, ie. including Power8. Fix it by making sure we're on a Power9. This isn't a bug in practice because the only code that checks the feature is Power9 only to begin with. But we'll backport it anyway to avoid confusion. Fixes: 9e9626ed3a4a ("powerpc/64s: Fix POWER9 DD2.2 and above in DT CPU features") Cc: stable@vger.kernel.org # v4.17+ Reported-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03powerpc/e500mc: Set assembler machine type to e500mcMichael Jeanson1-0/+1
commit 69a8405999aa1c489de4b8d349468f0c2b83f093 upstream. In binutils 2.26 a new opcode for the "wait" instruction was added for the POWER9 and has precedence over the one specific to the e500mc. Commit ebf714ff3756 ("powerpc/e500mc: Add support for the wait instruction in e500_idle") uses this instruction specifically on the e500mc to work around an erratum. This results in an invalid instruction in idle_e500 when we build for the e500mc on bintutils >= 2.26 with the default assembler machine type. Since multiplatform between e500 and non-e500 is not supported, set the assembler machine type globaly when CONFIG_PPC_E500MC=y. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> CC: Michael Ellerman <mpe@ellerman.id.au> CC: Kumar Gala <galak@kernel.crashing.org> CC: Vakul Garg <vakul.garg@nxp.com> CC: Scott Wood <swood@redhat.com> CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: linuxppc-dev@lists.ozlabs.org CC: linux-kernel@vger.kernel.org CC: stable@vger.kernel.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03powerpc/64s/radix: Fix radix_kvm_prefetch_workaround paca access of not ↵Nicholas Piggin1-0/+2
possible CPU commit 758380b8155f69b4e2f77f27562f8a7a466749d6 upstream. If possible CPUs are limited (e.g., by kexec), then the kvm prefetch workaround function can access the paca pointer for a !possible CPU. Fixes: d2e60075a3d44 ("powerpc/64: Use array of paca pointers and allocate pacas individually") Cc: stable@kernel.org Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03powerpc/fadump: Unregister fadump on kexec down path.Mahesh Salgaonkar1-0/+3
commit 722cde76d68e8cc4f3de42e71c82fd40dea4f7b9 upstream. Unregister fadump on kexec down path otherwise the fadump registration in new kexec-ed kernel complains that fadump is already registered. This makes new kernel to continue using fadump registered by previous kernel which may lead to invalid vmcore generation. Hence this patch fixes this issue by un-registering fadump in fadump_cleanup() which is called during kexec path so that new kernel can register fadump with new valid values. Fixes: b500afff11f6 ("fadump: Invalidate registration and release reserved memory for general use.") Cc: stable@vger.kernel.org # v3.4+ Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03powerpc/powernv/cpuidle: Init all present cpus for deep statesAkshay Adiga1-2/+2
commit ac9816dcbab53c57bcf1d7b15370b08f1e284318 upstream. Init all present cpus for deep states instead of "all possible" cpus. Init fails if a possible cpu is guarded. Resulting in making only non-deep states available for cpuidle/hotplug. Stewart says, this means that for single threaded workloads, if you guard out a CPU core you'll not get WoF (Workload Optimised Frequency), which means that performance goes down when you wouldn't expect it to. Fixes: 77b54e9f213f ("powernv/powerpc: Add winkle support for offline cpus") Cc: stable@vger.kernel.org # v3.19+ Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03powerpc/powernv: copy/paste - Mask SO bit in CRHaren Myneni1-1/+2
commit 75743649064ec0cf5ddd69f240ef23af66dde16e upstream. NX can set the 3rd bit in CR register for XER[SO] (Summary overflow) which is not related to paste request. The current paste function returns failure for a successful request when this bit is set. So mask this bit and check the proper return status. Fixes: 2392c8c8c045 ("powerpc/powernv/vas: Define copy/paste interfaces") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Haren Myneni <haren@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03powerpc/powernv/ioda2: Remove redundant free of TCE pagesAlexey Kardashevskiy1-1/+0
commit 98fd72fe82527fd26618062b60cfd329451f2329 upstream. When IODA2 creates a PE, it creates an IOMMU table with it_ops::free set to pnv_ioda2_table_free() which calls pnv_pci_ioda2_table_free_pages(). Since iommu_tce_table_put() calls it_ops::free when the last reference to the table is released, explicit call to pnv_pci_ioda2_table_free_pages() is not needed so let's remove it. This should fix double free in the case of PCI hotuplug as pnv_pci_ioda2_table_free_pages() does not reset neither iommu_table::it_base nor ::it_size. This was not exposed by SRIOV as it uses different code path via pnv_pcibios_sriov_disable(). IODA1 does not inialize it_ops::free so it does not have this issue. Fixes: c5f7700bbd2e ("powerpc/powernv: Dynamically release PE") Cc: stable@vger.kernel.org # v4.8+ Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03powerpc/ptrace: Fix enforcement of DAWR constraintsMichael Neuling1-2/+2
commit cd6ef7eebf171bfcba7dc2df719c2a4958775040 upstream. Back when we first introduced the DAWR, in commit 4ae7ebe9522a ("powerpc: Change hardware breakpoint to allow longer ranges"), we screwed up the constraint making it a 1024 byte boundary rather than a 512. This makes the check overly permissive. Fortunately GDB is the only real user and it always did they right thing, so we never noticed. This fixes the constraint to 512 bytes. Fixes: 4ae7ebe9522a ("powerpc: Change hardware breakpoint to allow longer ranges") Cc: stable@vger.kernel.org # v3.9+ Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03powerpc/perf: Fix memory allocation for core-imc based on num_possible_cpus()Anju T Sudhakar1-2/+2
commit d2032678e57fc508d7878307badde8f89b632ba3 upstream. Currently memory is allocated for core-imc based on cpu_present_mask, which has bit 'cpu' set iff cpu is populated. We use (cpu number / threads per core) as the array index to access the memory. Under some circumstances firmware marks a CPU as GUARDed CPU and boot the system, until cleared of errors, these CPU's are unavailable for all subsequent boots. GUARDed CPUs are possible but not present from linux view, so it blows a hole when we assume the max length of our allocation is driven by our max present cpus, where as one of the cpus might be online and be beyond the max present cpus, due to the hole. So (cpu number / threads per core) value bounds the array index and leads to memory overflow. Call trace observed during a guard test: Faulting instruction address: 0xc000000000149f1c cpu 0x69: Vector: 380 (Data Access Out of Range) at [c000003fea303420] pc:c000000000149f1c: prefetch_freepointer+0x14/0x30 lr:c00000000014e0f8: __kmalloc+0x1a8/0x1ac sp:c000003fea3036a0 msr:9000000000009033 dar:c9c54b2c91dbf6b7 current = 0xc000003fea2c0000 paca = 0xc00000000fddd880 softe: 3 irq_happened: 0x01 pid = 1, comm = swapper/104 Linux version 4.16.7-openpower1 (smc@smc-desktop) (gcc version 6.4.0 (Buildroot 2018.02.1-00006-ga8d1126)) #2 SMP Fri May 4 16:44:54 PDT 2018 enter ? for help call trace: __kmalloc+0x1a8/0x1ac (unreliable) init_imc_pmu+0x7f4/0xbf0 opal_imc_counters_probe+0x3fc/0x43c platform_drv_probe+0x48/0x80 driver_probe_device+0x22c/0x308 __driver_attach+0xa0/0xd8 bus_for_each_dev+0x88/0xb4 driver_attach+0x2c/0x40 bus_add_driver+0x1e8/0x228 driver_register+0xd0/0x114 __platform_driver_register+0x50/0x64 opal_imc_driver_init+0x24/0x38 do_one_initcall+0x150/0x15c kernel_init_freeable+0x250/0x254 kernel_init+0x1c/0x150 ret_from_kernel_thread+0x5c/0xc8 Allocating memory for core-imc based on cpu_possible_mask, which has bit 'cpu' set iff cpu is populatable, will fix this issue. Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Fixes: 39a846db1d57 ("powerpc/perf: Add core IMC PMU support") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03powerpc/ptrace: Fix setting 512B aligned breakpoints with PTRACE_SET_DEBUGREGMichael Neuling1-0/+1
commit 4f7c06e26ec9cf7fe9f0c54dc90079b6a4f4b2c3 upstream. In commit e2a800beaca1 ("powerpc/hw_brk: Fix off by one error when validating DAWR region end") we fixed setting the DAWR end point to its max value via PPC_PTRACE_SETHWDEBUG. Unfortunately we broke PTRACE_SET_DEBUGREG when setting a 512 byte aligned breakpoint. PTRACE_SET_DEBUGREG currently sets the length of the breakpoint to zero (memset() in hw_breakpoint_init()). This worked with arch_validate_hwbkpt_settings() before the above patch was applied but is now broken if the breakpoint is 512byte aligned. This sets the length of the breakpoint to 8 bytes when using PTRACE_SET_DEBUGREG. Fixes: e2a800beaca1 ("powerpc/hw_brk: Fix off by one error when validating DAWR region end") Cc: stable@vger.kernel.org # v3.11+ Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03powerpc/pkeys: Detach execute_only key on !PROT_EXECRam Pai1-2/+2
commit eabdb8ca8690eedd461e61ea7780595fbbae8132 upstream. Disassociate the exec_key from a VMA if the VMA permission is not PROT_EXEC anymore. Otherwise the exec_only key continues to be associated with the vma, causing unexpected behavior. The problem was reported on x86 by Shakeel Butt, which is also applicable on powerpc. Fixes: 5586cf61e108 ("powerpc: introduce execute-only pkey") Cc: stable@vger.kernel.org # v4.16+ Reported-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03powerpc/mm/hash: Add missing isync prior to kernel stack SLB switchAneesh Kumar K.V1-0/+1
commit 91d06971881f71d945910de128658038513d1b24 upstream. Currently we do not have an isync, or any other context synchronizing instruction prior to the slbie/slbmte in _switch() that updates the SLB entry for the kernel stack. However that is not correct as outlined in the ISA. From Power ISA Version 3.0B, Book III, Chapter 11, page 1133: "Changing the contents of ... the contents of SLB entries ... can have the side effect of altering the context in which data addresses and instruction addresses are interpreted, and in which instructions are executed and data accesses are performed. ... These side effects need not occur in program order, and therefore may require explicit synchronization by software. ... The synchronizing instruction before the context-altering instruction ensures that all instructions up to and including that synchronizing instruction are fetched and executed in the context that existed before the alteration." And page 1136: "For data accesses, the context synchronizing instruction before the slbie, slbieg, slbia, slbmte, tlbie, or tlbiel instruction ensures that all preceding instructions that access data storage have completed to a point at which they have reported all exceptions they will cause." We're not aware of any bugs caused by this, but it should be fixed regardless. Add the missing isync when updating kernel stack SLB entry. Cc: stable@vger.kernel.org Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Flesh out change log with more ISA text & explanation] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds6-55/+159
Pull KVM fixes from Radim Krčmář: "PPC: - Close a hole which could possibly lead to the host timebase getting out of sync. - Three fixes relating to PTEs and TLB entries for radix guests. - Fix a bug which could lead to an interrupt never getting delivered to the guest, if it is pending for a guest vCPU when the vCPU gets offlined. s390: - Fix false negatives in VSIE validity check (Cc stable) x86: - Fix time drift of VMX preemption timer when a guest uses LAPIC timer in periodic mode (Cc stable) - Unconditionally expose CPUID.IA32_ARCH_CAPABILITIES to allow migration from hosts that don't need retpoline mitigation (Cc stable) - Fix guest crashes on reboot by properly coupling CR4.OSXSAVE and CPUID.OSXSAVE (Cc stable) - Report correct RIP after Hyper-V hypercall #UD (introduced in -rc6)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: fix #UD address of failed Hyper-V hypercalls kvm: x86: IA32_ARCH_CAPABILITIES is always supported KVM: x86: Update cpuid properly when CR4.OSXAVE or CR4.PKE is changed x86/kvm: fix LAPIC timer drift when guest uses periodic mode KVM: s390: vsie: fix < 8k check for the itdba KVM: PPC: Book 3S HV: Do ptesync in radix guest exit path KVM: PPC: Book3S HV: XIVE: Resend re-routed interrupts on CPU priority change KVM: PPC: Book3S HV: Make radix clear pte when unmapping KVM: PPC: Book3S HV: Make radix use correct tlbie sequence in kvmppc_radix_tlbie_page KVM: PPC: Book3S HV: Snapshot timebase offset on guest entry
2018-05-25Merge tag 'powerpc-4.17-7' of ↵Linus Torvalds2-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Michael Ellerman: "Just one fix, to make sure the PCR (Processor Compatibility Register) is reset on boot. Otherwise if we're running in compat mode in a guest (eg. pretending a Power9 is a Power8) and the host kernel oopses and kdumps then the kdump kernel's userspace will be running in Power8 mode, and will SIGILL if it uses Power9-only instructions. Thanks to Michael Neuling" * tag 'powerpc-4.17-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Clear PCR on boot
2018-05-22powerpc/64s: Add support for a store forwarding barrier at kernel entry/exitNicholas Piggin9-2/+356
On some CPUs we can prevent a vulnerability related to store-to-load forwarding by preventing store forwarding between privilege domains, by inserting a barrier in kernel entry and exit paths. This is known to be the case on at least Power7, Power8 and Power9 powerpc CPUs. Barriers must be inserted generally before the first load after moving to a higher privilege, and after the last store before moving to a lower privilege, HV and PR privilege transitions must be protected. Barriers are added as patch sections, with all kernel/hypervisor entry points patched, and the exit points to lower privilge levels patched similarly to the RFI flush patching. Firmware advertisement is not implemented yet, so CPU flush types are hard coded. Thanks to Michal Suchánek for bug fixes and review. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michal Suchánek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-18powerpc/64s: Clear PCR on bootMichael Neuling2-0/+7
Clear the PCR (Processor Compatibility Register) on boot to ensure we are not running in a compatibility mode. We've seen this cause problems when a crash (and kdump) occurs while running compat mode guests. The kdump kernel then runs with the PCR set and causes problems. The symptom in the kdump kernel (also seen in petitboot after fast-reboot) is early userspace programs taking sigills on newer instructions (seen in libc). Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: stable@vger.kernel.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-17powerpc/powernv: Fix NVRAM sleep in invalid context when crashingNicholas Piggin1-2/+12
Similarly to opal_event_shutdown, opal_nvram_write can be called in the crash path with irqs disabled. Special case the delay to avoid sleeping in invalid context. Fixes: 3b8070335f75 ("powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops") Cc: stable@vger.kernel.org # v3.2 Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-17KVM: PPC: Book 3S HV: Do ptesync in radix guest exit pathPaul Mackerras1-0/+8
A radix guest can execute tlbie instructions to invalidate TLB entries. After a tlbie or a group of tlbies, it must then do the architected sequence eieio; tlbsync; ptesync to ensure that the TLB invalidation has been processed by all CPUs in the system before it can rely on no CPU using any translation that it just invalidated. In fact it is the ptesync which does the actual synchronization in this sequence, and hardware has a requirement that the ptesync must be executed on the same CPU thread as the tlbies which it is expected to order. Thus, if a vCPU gets moved from one physical CPU to another after it has done some tlbies but before it can get to do the ptesync, the ptesync will not have the desired effect when it is executed on the second physical CPU. To fix this, we do a ptesync in the exit path for radix guests. If there are any pending tlbies, this will wait for them to complete. If there aren't, then ptesync will just do the same as sync. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-17KVM: PPC: Book3S HV: XIVE: Resend re-routed interrupts on CPU priority changeBenjamin Herrenschmidt1-7/+101
When a vcpu priority (CPPR) is set to a lower value (masking more interrupts), we stop processing interrupts already in the queue for the priorities that have now been masked. If those interrupts were previously re-routed to a different CPU, they might still be stuck until the older one that has them in its queue processes them. In the case of guest CPU unplug, that can be never. To address that without creating additional overhead for the normal interrupt processing path, this changes H_CPPR handling so that when such a priority change occurs, we scan the interrupt queue for that vCPU, and for any interrupt in there that has been re-routed, we replace it with a dummy and force a re-trigger. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-17KVM: PPC: Book3S HV: Make radix clear pte when unmappingNicholas Piggin1-1/+1
The current partition table unmap code clears the _PAGE_PRESENT bit out of the pte, which leaves pud_huge/pmd_huge true and does not clear pud_present/pmd_present. This can confuse subsequent page faults and possibly lead to the guest looping doing continual hypervisor page faults. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-17KVM: PPC: Book3S HV: Make radix use correct tlbie sequence in ↵Nicholas Piggin1-2/+2
kvmppc_radix_tlbie_page The standard eieio ; tlbsync ; ptesync must follow tlbie to ensure it is ordered with respect to subsequent operations. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-17KVM: PPC: Book3S HV: Snapshot timebase offset on guest entryPaul Mackerras4-45/+47
Currently, the HV KVM guest entry/exit code adds the timebase offset from the vcore struct to the timebase on guest entry, and subtracts it on guest exit. Which is fine, except that it is possible for userspace to change the offset using the SET_ONE_REG interface while the vcore is running, as there is only one timebase offset per vcore but potentially multiple VCPUs in the vcore. If that were to happen, KVM would subtract a different offset on guest exit from that which it had added on guest entry, leading to the timebase being out of sync between cores in the host, which then leads to bad things happening such as hangs and spurious watchdog timeouts. To fix this, we add a new field 'tb_offset_applied' to the vcore struct which stores the offset that is currently applied to the timebase. This value is set from the vcore tb_offset field on guest entry, and is what is subtracted from the timebase on guest exit. Since it is zero when the timebase offset is not applied, we can simplify the logic in kvmhv_start_timing and kvmhv_accumulate_time. In addition, we had secondary threads reading the timebase while running concurrently with code on the primary thread which would eventually add or subtract the timebase offset from the timebase. This occurred while saving or restoring the DEC register value on the secondary threads. Although no specific incorrect behaviour has been observed, this is a race which should be fixed. To fix it, we move the DEC saving code to just before we call kvmhv_commence_exit, and the DEC restoring code to after the point where we have waited for the primary thread to switch the MMU context and add the timebase offset. That way we are sure that the timebase contains the guest timebase value in both cases. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-05-08powerpc/pseries: Fix CONFIG_NUMA=n buildMichael Ellerman1-8/+5
The build is failing with CONFIG_NUMA=n and some compiler versions: arch/powerpc/platforms/pseries/hotplug-cpu.o: In function `dlpar_online_cpu': hotplug-cpu.c:(.text+0x12c): undefined reference to `timed_topology_update' arch/powerpc/platforms/pseries/hotplug-cpu.o: In function `dlpar_cpu_remove': hotplug-cpu.c:(.text+0x400): undefined reference to `timed_topology_update' Fix it by moving the empty version of timed_topology_update() into the existing #ifdef block, which has the right guard of SPLPAR && NUMA. Fixes: cee5405da402 ("powerpc/hotplug: Improve responsiveness of hotplug change") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-07powerpc/trace/syscalls: Update syscall name matching logic to account for ↵Naveen N. Rao1-2/+19
ppc_ prefix Some syscall entry functions on powerpc are prefixed with ppc_/ppc32_/ppc64_ rather than the usual sys_/__se_sys prefix. fork(), clone(), swapcontext() are some examples of syscalls with such entry points. We need to match against these names when initializing ftrace syscall tracing. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-07powerpc/trace/syscalls: Update syscall name matching logicNaveen N. Rao1-7/+3
On powerpc64 ABIv1, we are enabling syscall tracing for only ~20 syscalls. This is due to commit e145242ea0df6 ("syscalls/core, syscalls/x86: Clean up syscall stub naming convention") which has changed the syscall entry wrapper prefix from "SyS" to "__se_sys". Update the logic for ABIv1 to not just skip the initial dot, but also the "__se_sys" prefix. Fixes: commit e145242ea0df6 ("syscalls/core, syscalls/x86: Clean up syscall stub naming convention") Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-07powerpc/64: Remove unused paca->soft_enabledMichael Ellerman1-1/+0
In commit 4e26bc4a4ed6 ("powerpc/64: Rename soft_enabled to irq_soft_mask") we renamed paca->soft_enabled. But then in commit 8e0b634b1327 ("powerpc/64s: Do not allocate lppaca if we are not virtualized") we added it back. Oops. This happened because the two patches were in flight at the same time and rebased vs each other multiple times, and we missed it in review. Fixes: 8e0b634b1327 ("powerpc/64s: Do not allocate lppaca if we are not virtualized") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-27powerpc/kvm/booke: Fix altivec related build breakLaurentiu Tudor1-0/+7
Add missing "altivec unavailable" interrupt injection helper thus fixing the linker error below: arch/powerpc/kvm/emulate_loadstore.o: In function `kvmppc_check_altivec_disabled': arch/powerpc/kvm/emulate_loadstore.c: undefined reference to `.kvmppc_core_queue_vec_unavail' Fixes: 09f984961c137c4b ("KVM: PPC: Book3S: Add MMIO emulation for VMX instructions") Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-27powerpc: Fix deadlock with multiple calls to smp_send_stopNicholas Piggin1-16/+39
smp_send_stop can lock up the IPI path for any subsequent calls, because the receiving CPUs spin in their handler function. This started becoming a problem with the addition of an smp_send_stop call in the reboot path, because panics can reboot after doing their own smp_send_stop. The NMI IPI variant was fixed with ac61c11566 ("powerpc: Fix smp_send_stop NMI IPI handling"), which leaves the smp_call_function variant. This is fixed by having smp_send_stop only ever do the smp_call_function once. This is a bit less robust than the NMI IPI fix, because any other call to smp_call_function after smp_send_stop could deadlock, but that has always been the case, and it was not been a problem before. Fixes: f2748bdfe1573 ("powerpc/powernv: Always stop secondaries before reboot/shutdown") Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-25powerpc: Fix smp_send_stop NMI IPI handlingNicholas Piggin1-5/+17
The NMI IPI handler for a receiving CPU increments nmi_ipi_busy_count over the handler function call, which causes later smp_send_nmi_ipi() callers to spin until the call is finished. The stop_this_cpu() function never returns, so the busy count is never decremeted, which can cause the system to hang in some cases. For example panic() will call smp_send_stop() early on which calls stop_this_cpu() on other CPUs, then later in the reboot path, pnv_restart() will call smp_send_stop() again, which hangs. Fix this by adding a special case to the stop_this_cpu() handler to decrement the busy count, because it will never return. Now that the NMI/non-NMI versions of stop_this_cpu() are different, split them out into separate functions rather than doing #ifdef tricks to share the body between the two functions. Fixes: 6bed3237624e3 ("powerpc: use NMI IPI for smp_send_stop") Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out the functions, tweak change log a bit] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-25rtc: opal: Fix OPAL RTC driver OPAL_BUSY loopsNicholas Piggin1-3/+5
The OPAL RTC driver does not sleep in case it gets OPAL_BUSY or OPAL_BUSY_EVENT from firmware, which causes large scheduling latencies, up to 50 seconds have been observed here when RTC stops responding (BMC reboot can do it). Fix this by converting it to the standard form OPAL_BUSY loop that sleeps. Fixes: 628daa8d5abf ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks") Cc: stable@vger.kernel.org # v3.2+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24powerpc/mce: Fix a bug where mce loops on memory UE.Mahesh Salgaonkar1-5/+2
The current code extracts the physical address for UE errors and then hooks it up into memory failure infrastructure. On successful extraction of physical address it wrongly sets "handled = 1" which means this UE error has been recovered. Since MCE handler gets return value as handled = 1, it assumes that error has been recovered and goes back to same NIP. This causes MCE interrupt again and again in a loop leading to hard lockup. Also, initialize phys_addr to ULONG_MAX so that we don't end up queuing undesired page to hwpoison. Without this patch we see: Severe Machine check interrupt [Recovered] NIP: [000000001002588c] PID: 7109 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffd2755940 Physical address: 000020181a080000 ... Severe Machine check interrupt [Recovered] NIP: [000000001002588c] PID: 7109 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffd2755940 Physical address: 000020181a080000 Severe Machine check interrupt [Recovered] NIP: [000000001002588c] PID: 7109 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffd2755940 Physical address: 000020181a080000 Memory failure: 0x20181a08: recovery action for dirty LRU page: Recovered Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned ... Watchdog CPU:38 Hard LOCKUP After this patch we see: Severe Machine check interrupt [Not recovered] NIP: [00007fffaae585f4] PID: 7168 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffaafe28ac Physical address: 00002017c0bd0000 find[7168]: unhandled signal 7 at 00007fffaae585f4 nip 00007fffaae585f4 lr 00007fffaae585e0 code 4 Memory failure: 0x2017c0bd: recovery action for dirty LRU page: Recovered Fixes: 01eaac2b0591 ("powerpc/mce: Hookup ierror (instruction) UE errors") Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24powerpc/powernv/npu: Do a PID GPU TLB flush when invalidating a large ↵Alistair Popple1-4/+19
address range The NPU has a limited number of address translation shootdown (ATSD) registers and the GPU has limited bandwidth to process ATSDs. This can result in contention of ATSD registers leading to soft lockups on some threads, particularly when invalidating a large address range in pnv_npu2_mn_invalidate_range(). At some threshold it becomes more efficient to flush the entire GPU TLB for the given MM context (PID) than individually flushing each address in the range. This patch will result in ranges greater than 2MB being converted from 32+ ATSDs into a single ATSD which will flush the TLB for the given PID on each GPU. Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Alistair Popple <alistair@popple.id.au> Acked-by: Balbir Singh <bsingharora@gmail.com> Tested-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24powerpc/powernv/npu: Prevent overwriting of pnv_npu2_init_contex() callback ↵Alistair Popple2-4/+14
parameters There is a single npu context per set of callback parameters. Callers should be prevented from overwriting existing callback values so instead return an error if different parameters are passed. Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Alistair Popple <alistair@popple.id.au> Reviewed-by: Mark Hairgrove <mhairgrove@nvidia.com> Tested-by: Mark Hairgrove <mhairgrove@nvidia.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24powerpc/powernv/npu: Add lock to prevent race in concurrent context init/destroyAlistair Popple1-9/+42
The pnv_npu2_init_context() and pnv_npu2_destroy_context() functions are used to allocate/free contexts to allow address translation and shootdown by the NPU on a particular GPU. Context initialisation is implicitly safe as it is protected by the requirement mmap_sem be held in write mode, however pnv_npu2_destroy_context() does not require mmap_sem to be held and it is not safe to call with a concurrent initialisation for a different GPU. It was assumed the driver would ensure destruction was not called concurrently with initialisation. However the driver may be simplified by allowing concurrent initialisation and destruction for different GPUs. As npu context creation/destruction is not a performance critical path and the critical section is not large a single spinlock is used for simplicity. Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Alistair Popple <alistair@popple.id.au> Reviewed-by: Mark Hairgrove <mhairgrove@nvidia.com> Tested-by: Mark Hairgrove <mhairgrove@nvidia.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24powerpc/powernv/memtrace: Let the arch hotunplug code flush cacheBalbir Singh1-17/+0
Don't do this via custom code, instead now that we have support in the arch hotplug/hotunplug code, rely on those routines to do the right thing. The existing flush doesn't work because it uses ppc64_caches.l1d.size instead of ppc64_caches.l1d.line_size. Fixes: 9d5171a8f248 ("powerpc/powernv: Enable removal of memory for in memory tracing") Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>