summaryrefslogtreecommitdiff
path: root/arch/powerpc/platforms
AgeCommit message (Collapse)AuthorFilesLines
2018-05-22powerpc/64s: Add support for a store forwarding barrier at kernel entry/exitNicholas Piggin2-0/+2
On some CPUs we can prevent a vulnerability related to store-to-load forwarding by preventing store forwarding between privilege domains, by inserting a barrier in kernel entry and exit paths. This is known to be the case on at least Power7, Power8 and Power9 powerpc CPUs. Barriers must be inserted generally before the first load after moving to a higher privilege, and after the last store before moving to a lower privilege, HV and PR privilege transitions must be protected. Barriers are added as patch sections, with all kernel/hypervisor entry points patched, and the exit points to lower privilge levels patched similarly to the RFI flush patching. Firmware advertisement is not implemented yet, so CPU flush types are hard coded. Thanks to Michal Suchánek for bug fixes and review. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michal Suchánek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-17powerpc/powernv: Fix NVRAM sleep in invalid context when crashingNicholas Piggin1-2/+12
Similarly to opal_event_shutdown, opal_nvram_write can be called in the crash path with irqs disabled. Special case the delay to avoid sleeping in invalid context. Fixes: 3b8070335f75 ("powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops") Cc: stable@vger.kernel.org # v3.2 Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-25rtc: opal: Fix OPAL RTC driver OPAL_BUSY loopsNicholas Piggin1-3/+5
The OPAL RTC driver does not sleep in case it gets OPAL_BUSY or OPAL_BUSY_EVENT from firmware, which causes large scheduling latencies, up to 50 seconds have been observed here when RTC stops responding (BMC reboot can do it). Fix this by converting it to the standard form OPAL_BUSY loop that sleeps. Fixes: 628daa8d5abf ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks") Cc: stable@vger.kernel.org # v3.2+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24powerpc/powernv/npu: Do a PID GPU TLB flush when invalidating a large ↵Alistair Popple1-4/+19
address range The NPU has a limited number of address translation shootdown (ATSD) registers and the GPU has limited bandwidth to process ATSDs. This can result in contention of ATSD registers leading to soft lockups on some threads, particularly when invalidating a large address range in pnv_npu2_mn_invalidate_range(). At some threshold it becomes more efficient to flush the entire GPU TLB for the given MM context (PID) than individually flushing each address in the range. This patch will result in ranges greater than 2MB being converted from 32+ ATSDs into a single ATSD which will flush the TLB for the given PID on each GPU. Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Alistair Popple <alistair@popple.id.au> Acked-by: Balbir Singh <bsingharora@gmail.com> Tested-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24powerpc/powernv/npu: Prevent overwriting of pnv_npu2_init_contex() callback ↵Alistair Popple1-3/+13
parameters There is a single npu context per set of callback parameters. Callers should be prevented from overwriting existing callback values so instead return an error if different parameters are passed. Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Alistair Popple <alistair@popple.id.au> Reviewed-by: Mark Hairgrove <mhairgrove@nvidia.com> Tested-by: Mark Hairgrove <mhairgrove@nvidia.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24powerpc/powernv/npu: Add lock to prevent race in concurrent context init/destroyAlistair Popple1-9/+42
The pnv_npu2_init_context() and pnv_npu2_destroy_context() functions are used to allocate/free contexts to allow address translation and shootdown by the NPU on a particular GPU. Context initialisation is implicitly safe as it is protected by the requirement mmap_sem be held in write mode, however pnv_npu2_destroy_context() does not require mmap_sem to be held and it is not safe to call with a concurrent initialisation for a different GPU. It was assumed the driver would ensure destruction was not called concurrently with initialisation. However the driver may be simplified by allowing concurrent initialisation and destruction for different GPUs. As npu context creation/destruction is not a performance critical path and the critical section is not large a single spinlock is used for simplicity. Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Alistair Popple <alistair@popple.id.au> Reviewed-by: Mark Hairgrove <mhairgrove@nvidia.com> Tested-by: Mark Hairgrove <mhairgrove@nvidia.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-24powerpc/powernv/memtrace: Let the arch hotunplug code flush cacheBalbir Singh1-17/+0
Don't do this via custom code, instead now that we have support in the arch hotplug/hotunplug code, rely on those routines to do the right thing. The existing flush doesn't work because it uses ppc64_caches.l1d.size instead of ppc64_caches.l1d.line_size. Fixes: 9d5171a8f248 ("powerpc/powernv: Enable removal of memory for in memory tracing") Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-21proc: fix /proc/loadavg regressionAlexey Dobriyan1-1/+1
Commit 95846ecf9dac ("pid: replace pid bitmap implementation with IDR API") changed last field of /proc/loadavg (last pid allocated) to be off by one: # unshare -p -f --mount-proc cat /proc/loadavg 0.00 0.00 0.00 1/60 2 <=== It should be 1 after first fork into pid namespace. This is formally a regression but given how useless this field is I don't think anyone is affected. Bug was found by /proc testsuite! Link: http://lkml.kernel.org/r/20180413175408.GA27246@avx2 Fixes: 95846ecf9dac508 ("pid: replace pid bitmap implementation with IDR API") Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Gargi Sharma <gs051095@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-15Merge tag 'powerpc-4.17-2' of ↵Linus Torvalds1-1/+6
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix crashes when loading modules built with a different CONFIG_RELOCATABLE value by adding CONFIG_RELOCATABLE to vermagic. - Fix busy loops in the OPAL NVRAM driver if we get certain error conditions from firmware. - Remove tlbie trace points from KVM code that's called in real mode, because it causes crashes. - Fix checkstops caused by invalid tlbiel on Power9 Radix. - Ensure the set of CPU features we "know" are always enabled is actually the minimal set when we build with support for firmware supplied CPU features. Thanks to: Aneesh Kumar K.V, Anshuman Khandual, Nicholas Piggin. * tag 'powerpc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features powerpc/mm/radix: Fix checkstops caused by invalid tlbiel KVM: PPC: Book3S HV: trace_tlbie must not be called in realmode powerpc/8xx: Fix build with hugetlbfs enabled powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops powerpc/fscr: Enable interrupts earlier before calling get_user() powerpc/64s: Fix section mismatch warnings from setup_rfi_flush() powerpc/modules: Fix crashes by adding CONFIG_RELOCATABLE to vermagic
2018-04-11powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loopsNicholas Piggin1-1/+6
The OPAL NVRAM driver does not sleep in case it gets OPAL_BUSY or OPAL_BUSY_EVENT from firmware, which causes large scheduling latencies, and various lockup errors to trigger (again, BMC reboot can cause it). Fix this by converting it to the standard form OPAL_BUSY loop that sleeps. Fixes: 628daa8d5abf ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks") Depends-on: 34dd25de9fe3 ("powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops") Cc: stable@vger.kernel.org # v3.2+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-10Merge tag 'libnvdimm-for-4.17' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "This cycle was was not something I ever want to repeat as there were several late changes that have only now just settled. Half of the branch up to commit d2c997c0f145 ("fs, dax: use page->mapping to warn...") have been in -next for several releases. The of_pmem driver and the address range scrub rework were late arrivals, and the dax work was scaled back at the last moment. The of_pmem driver missed a previous merge window due to an oversight. A sense of obligation to rectify that miss is why it is included for 4.17. It has acks from PowerPC folks. Stephen reported a build failure that only occurs when merging it with your latest tree, for now I have fixed that up by disabling modular builds of of_pmem. A test merge with your tree has received a build success report from the 0day robot over 156 configs. An initial version of the ARS rework was submitted before the merge window. It is self contained to libnvdimm, a net code reduction, and passing all unit tests. The filesystem-dax changes are based on the wait_var_event() functionality from tip/sched/core. However, late review feedback showed that those changes regressed truncate performance to a large degree. The branch was rewound to drop the truncate behavior change and now only includes preparation patches and cleanups (with full acks and reviews). The finalization of this dax-dma-vs-trnucate work will need to wait for 4.18. Summary: - A rework of the filesytem-dax implementation provides for detection of unmap operations (truncate / hole punch) colliding with in-progress device-DMA. A fix for these collisions remains a work-in-progress pending resolution of truncate latency and starvation regressions. - The of_pmem driver expands the users of libnvdimm outside of x86 and ACPI to describe an implementation of persistent memory on PowerPC with Open Firmware / Device tree. - Address Range Scrub (ARS) handling is completely rewritten to account for the fact that ARS may run for 100s of seconds and there is no platform defined way to cancel it. ARS will now no longer block namespace initialization. - The NVDIMM Namespace Label implementation is updated to handle label areas as small as 1K, down from 128K. - Miscellaneous cleanups and updates to unit test infrastructure" * tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (39 commits) libnvdimm, of_pmem: workaround OF_NUMA=n build error nfit, address-range-scrub: add module option to skip initial ars nfit, address-range-scrub: rework and simplify ARS state machine nfit, address-range-scrub: determine one platform max_ars value powerpc/powernv: Create platform devs for nvdimm buses doc/devicetree: Persistent memory region bindings libnvdimm: Add device-tree based driver libnvdimm: Add of_node to region and bus descriptors libnvdimm, region: quiet region probe libnvdimm, namespace: use a safe lookup for dimm device name libnvdimm, dimm: fix dpa reservation vs uninitialized label area libnvdimm, testing: update the default smart ctrl_temperature libnvdimm, testing: Add emulation for smart injection commands nfit, address-range-scrub: introduce nfit_spa->ars_state libnvdimm: add an api to cast a 'struct nd_region' to its 'struct device' nfit, address-range-scrub: fix scrub in-progress reporting dax, dm: allow device-mapper to operate without dax support dax: introduce CONFIG_DAX_DRIVER fs, dax: use page->mapping to warn if truncate collides with a busy page ext2, dax: introduce ext2_dax_aops ...
2018-04-07powerpc/powernv: Create platform devs for nvdimm busesOliver O'Halloran1-0/+3
Scan the devicetree for an nvdimm-bus compatible and create a platform device for them. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-04-03powerpc/powernv: Always stop secondaries before reboot/shutdownNicholas Piggin2-37/+6
Currently powernv reboot and shutdown requests just leave secondaries to do their own things. This is undesirable because they can trigger any number of watchdogs while waiting for reboot, but also we don't know what else they might be doing -- they might be causing trouble, trampling memory, etc. The opal scheduled flash update code already ran into watchdog problems due to flashing taking a long time, and it was fixed with 2196c6f1ed ("powerpc/powernv: Return secondary CPUs to firmware before FW update"), which returns secondaries to opal. It's been found that regular reboots can take over 10 seconds, which can result in the hard lockup watchdog firing, reboot: Restarting system [ 360.038896709,5] OPAL: Reboot request... Watchdog CPU:0 Hard LOCKUP Watchdog CPU:44 detected Hard LOCKUP other CPUS:16 Watchdog CPU:16 Hard LOCKUP watchdog: BUG: soft lockup - CPU#16 stuck for 3s! [swapper/16:0] This patch removes the special case for flash update, and calls smp_send_stop in all cases before calling reboot/shutdown. smp_send_stop could return CPUs to OPAL, the main reason not to is that the request could come from a NMI that interrupts OPAL code, so re-entry to OPAL can cause a number of problems. Putting secondaries into simple spin loops improves the chances of a successful reboot. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-03powerpc/powernv: Fix SMT4 forcing idle codeNicholas Piggin1-4/+0
The PSSCR value is not stored to PACA_REQ_PSSCR if the CPU does not have the XER[SO] bug. Fix this by storing up-front, outside the workaround code. The initial test is not required because it is a slow path. The workaround is made to depend on CONFIG_KVM_BOOK3S_HV_POSSIBLE, to match pnv_power9_force_smt4_catch() where it is used. Drop the comment on pnv_power9_force_smt4_catch() as it's no longer true. Fixes: 7672691a08c8 ("powerpc/powernv: Provide a way to force a core into SMT4 mode") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-03powerpc/pseries: Restore default security feature flags on setupMauricio Faria de Oliveira1-0/+11
After migration the security feature flags might have changed (e.g., destination system with unpatched firmware), but some flags are not set/clear again in init_cpu_char_feature_flags() because it assumes the security flags to be the defaults. Additionally, if the H_GET_CPU_CHARACTERISTICS hypercall fails then init_cpu_char_feature_flags() does not run again, which potentially might leave the system in an insecure or sub-optimal configuration. So, just restore the security feature flags to the defaults assumed by init_cpu_char_feature_flags() so it can set/clear them correctly, and to ensure safe settings are in place in case the hypercall fail. Fixes: f636c14790ea ("powerpc/pseries: Set or clear security feature flags") Depends-on: 19887d6a28e2 ("powerpc: Move default security feature flags") Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/64s: Add POWER9 CPU type selectionNicholas Piggin1-0/+5
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/64: Add GENERIC_CPU support for little endianNicholas Piggin1-2/+6
Add GENERIC_CPU support for little-endian rather than using POWER8 specific selection for POWER9 and above. Restrict GENERIC_CPU to POWER8 and above on little endian. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Duplicate GENERIC_CPU to avoid a kbuild warning about the prompt being redefined. Spell out that GENERIC means >= POWER4 for BE.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31powerpc/64s: Remove POWER4 supportNicholas Piggin1-5/+1
POWER4 has been broken since at least the change 49d09bf2a6 ("powerpc/64s: Optimise MSR handling in exception handling"), which requires mtmsrd L=1 support. This was introduced in ISA v2.01, and POWER4 supports ISA v2.00. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31powerpc/64s/idle: POWER9 implement a separate idle stop function for hotplugNicholas Piggin1-1/+1
Implement a new function to invoke stop, power9_offline_stop, which is like power9_idle_stop but used by the cpu hotplug code. Move KVM secondary state manipulation code to the offline case. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31powerpc/wii: Don't rely on the reserved memory hackJonathan Neuschäfer1-13/+1
Because the two memory blocks (usually called MEM1 and MEM2) are not merged anymore, __request_region in kernel/resource.c will correctly allow reserving regions in the physical address space between MEM1 and MEM2, where many important peripherals are (GPIO, MMC, USB, ...). A previous change to __ioremap_caller in arch/powerpc/mm/pgtable_32.c ensures that multiple memblocks are properly considered in ioremap; this makes it unnecessary to set __allow_ioremap_reserved. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31powerpc/wii: Explicitly configure GPIO owner for poweroff pinJonathan Neuschäfer1-0/+7
The Hollywood chipset's GPIO controller has two sets of registers: One for access by the PowerPC CPU, and one for access by the ARM coprocessor (but both are accessible from the PPC because the memory firewall (AHBPROT) is usually disabled when booting Linux, today). The wii_power_off function currently assumes that the poweroff GPIO pin is configured for use via the ARM side, but the upcoming GPIO driver configures all pins for use via the PPC side, breaking poweroff. Configure the owner register explicitly in wii_power_off to make wii_power_off work with and without the new GPIO driver. I think the Wii can be switched to the generic gpio-poweroff driver, after the GPIO driver is merged. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31powerpc/wii: Probe the whole devicetreeJonathan Neuschäfer1-1/+1
Previously, wii_device_probe would only initialize devices under the /hollywood node. After this patch, platform devices placed outside of /hollywood will also be initialized. The intended usecase for this are devices located outside of the Hollywood chip, such as GPIO LEDs and GPIO buttons. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-31Merge branch 'topic/paca' into nextMichael Ellerman11-32/+34
Bring in yet another series that touches KVM code, and might need to be merged into the kvm-ppc branch to resolve conflicts. This required some changes in pnv_power9_force_smt4_catch/release() due to the paca array becomming an array of pointers.
2018-03-30powerpc/4xx: Fix error return code in ppc4xx_msi_probe()Wei Yongjun1-1/+2
Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> [mpe: Add missing ';' to make it compile] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-30powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write()Nicholas Piggin1-0/+4
opal_nvram_write currently just assumes success if it encounters an error other than OPAL_BUSY or OPAL_BUSY_EVENT. Have it return -EIO on other errors instead. Fixes: 628daa8d5abf ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks") Cc: stable@vger.kernel.org # v3.2+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Stewart Smith <stewart@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-30powerpc/pseries: Fix clearing of security feature flagsMauricio Faria de Oliveira1-3/+3
The H_CPU_BEHAV_* flags should be checked for in the 'behaviour' field of 'struct h_cpu_char_result' -- 'character' is for H_CPU_CHAR_* flags. Found by playing around with QEMU's implementation of the hypercall: H_CPU_CHAR=0xf000000000000000 H_CPU_BEHAV=0x0000000000000000 This clears H_CPU_BEHAV_FAVOUR_SECURITY and H_CPU_BEHAV_L1D_FLUSH_PR so pseries_setup_rfi_flush() disables 'rfi_flush'; and it also clears H_CPU_CHAR_L1D_THREAD_PRIV flag. So there is no RFI flush mitigation at all for cpu_show_meltdown() to report; but currently it does: Original kernel: # cat /sys/devices/system/cpu/vulnerabilities/meltdown Mitigation: RFI Flush Patched kernel: # cat /sys/devices/system/cpu/vulnerabilities/meltdown Not affected H_CPU_CHAR=0x0000000000000000 H_CPU_BEHAV=0xf000000000000000 This sets H_CPU_BEHAV_BNDS_CHK_SPEC_BAR so cpu_show_spectre_v1() should report vulnerable; but currently it doesn't: Original kernel: # cat /sys/devices/system/cpu/vulnerabilities/spectre_v1 Not affected Patched kernel: # cat /sys/devices/system/cpu/vulnerabilities/spectre_v1 Vulnerable Brown-paper-bag-by: Michael Ellerman <mpe@ellerman.id.au> Fixes: f636c14790ea ("powerpc/pseries: Set or clear security feature flags") Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-30powerpc/64s: Allocate LPPACAs individuallyNicholas Piggin1-1/+6
We no longer allocate lppacas in an array, so this patch removes the 1kB static alignment for the structure, and enforces the PAPR alignment requirements at allocation time. We can not reduce the 1kB allocation size however, due to existing KVM hypervisors. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-30powerpc/64: Use array of paca pointers and allocate pacas individuallyNicholas Piggin10-22/+23
Change the paca array into an array of pointers to pacas. Allocate pacas individually. This allows flexibility in where the PACAs are allocated. Future work will allocate them node-local. Platforms that don't have address limits on PACAs would be able to defer PACA allocations until later in boot rather than allocate all possible ones up-front then freeing unused. This is slightly more overhead (one additional indirection) for cross CPU paca references, but those aren't too common. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-27powerpc/eeh: Add eeh_state_active() helperSam Bobroff1-7/+2
Checking for a "fully active" device state requires testing two flag bits, which is open coded in several places, so add a function to do it. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-27powerpc/powernv/npu: Do not try invalidating 32bit table when 64bit table is ↵Alexey Kardashevskiy1-3/+24
enabled GPUs and the corresponding NVLink bridges get different PEs as they have separate translation validation entries (TVEs). We put these PEs to the same IOMMU group so they cannot be passed through separately. So the iommu_table_group_ops::set_window/unset_window for GPUs do set tables to the NPU PEs as well which means that iommu_table's list of attached PEs (iommu_table_group_link) has both GPU and NPU PEs linked. This list is used for TCE cache invalidation. The problem is that NPU PE has just a single TVE and can be programmed to point to 32bit or 64bit windows while GPU PE has two (as any other PCI device). So we end up having an 32bit iommu_table struct linked to both PEs even though only the 64bit TCE table cache can be invalidated on NPU. And a relatively recent skiboot detects this and prints errors. This changes GPU's iommu_table_group_ops::set_window/unset_window to make sure that NPU PE is only linked to the table actually used by the hardware. If there are two tables used by an IOMMU group, the NPU PE will use the last programmed one which with the current use scenarios is expected to be a 64bit one. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-27powerpc/lpar/debug: Initialize flags before printing debug messageAlexey Kardashevskiy1-3/+3
With enabled DEBUG, there is a compile error: "error: ‘flags’ is used uninitialized in this function". This moves pr_devel() little further where @flags are initialized. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-27powerpc/pseries: Use the security flags in pseries_setup_rfi_flush()Michael Ellerman1-15/+12
Now that we have the security flags we can simplify the code in pseries_setup_rfi_flush() because the security flags have pessimistic defaults. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-27powerpc/powernv: Use the security flags in pnv_setup_rfi_flush()Michael Ellerman1-31/+10
Now that we have the security flags we can significantly simplify the code in pnv_setup_rfi_flush(), because we can use the flags instead of checking device tree properties and because the security flags have pessimistic defaults. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-27powerpc/powernv: Set or clear security feature flagsMichael Ellerman1-0/+56
Now that we have feature flags for security related things, set or clear them based on what we see in the device tree provided by firmware. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-27powerpc/pseries: Set or clear security feature flagsMichael Ellerman1-0/+43
Now that we have feature flags for security related things, set or clear them based on what we receive from the hypercall. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-27powerpc/rfi-flush: Call setup_rfi_flush() after LPM migrationMichael Ellerman3-1/+6
We might have migrated to a machine that uses a different flush type, or doesn't need flushing at all. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-27powerpc/rfi-flush: Always enable fallback flush on pseriesMichael Ellerman1-9/+1
This ensures the fallback flush area is always allocated on pseries, so in case a LPAR is migrated from a patched to an unpatched system, it is possible to enable the fallback flush in the target system. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-27powerpc/64: Call H_REGISTER_PROC_TBL when running as a HPT guest on POWER9Paul Mackerras1-2/+6
On POWER9, since commit cc3d2940133d ("powerpc/64: Enable use of radix MMU under hypervisor on POWER9", 2017-01-30), we set both the radix and HPT bits in the client-architecture-support (CAS) vector, which tells the hypervisor that we can do either radix or HPT. According to PAPR, if we use this combination we are promising to do a H_REGISTER_PROC_TBL hcall later on to let the hypervisor know whether we are doing radix or HPT. We currently do this call if we are doing radix but not if we are doing HPT. If the hypervisor is able to support both radix and HPT guests, it would be entitled to defer allocation of the HPT until the H_REGISTER_PROC_TBL call, and to fail any attempts to create HPTEs until the H_REGISTER_PROC_TBL call. Thus we need to do a H_REGISTER_PROC_TBL call when we are doing HPT; otherwise we may crash at boot time. This adds the code to call H_REGISTER_PROC_TBL in this case, before we attempt to create any HPT entries using H_ENTER. Fixes: cc3d2940133d ("powerpc/64: Enable use of radix MMU under hypervisor on POWER9") Cc: stable@vger.kernel.org # v4.11+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-24Merge branch 'topic/ppc-kvm' into nextMichael Ellerman1-0/+81
This brings in two series from Paul, one of which touches KVM code and may need to be merged into the kvm-ppc tree to resolve conflicts.
2018-03-23powerpc/powernv: Provide a way to force a core into SMT4 modePaul Mackerras1-0/+81
POWER9 processors up to and including "Nimbus" v2.2 have hardware bugs relating to transactional memory and thread reconfiguration. One of these bugs has a workaround which is to get the core into SMT4 state temporarily. This workaround is only needed when running bare-metal. This patch provides a function which gets the core into SMT4 mode by preventing threads from going to a stop state, and waking up those which are already in a stop state. Once at least 3 threads are not in a stop state, the core will be in SMT4 and we can continue. To do this, we add a "dont_stop" flag to the paca to tell the thread not to go into a stop state. If this flag is set, power9_idle_stop() just returns immediately with a return value of 0. The pnv_power9_force_smt4_catch() function does the following: 1. Set the dont_stop flag for each thread in the core, except ourselves (in fact we use an atomic_inc() in case more than one thread is calling this function concurrently). 2. See how many threads are awake, indicated by their requested_psscr field in the paca being 0. If this is at least 3, skip to step 5. 3. Send a doorbell interrupt to each thread that was seen as being in a stop state in step 2. 4. Until at least 3 threads are awake, scan the threads to which we sent a doorbell interrupt and check if they are awake now. This relies on the following properties: - Once dont_stop is non-zero, requested_psccr can't go from zero to non-zero, except transiently (and without the thread doing stop). - requested_psscr being zero guarantees that the thread isn't in a state-losing stop state where thread reconfiguration could occur. - Doing stop with a PSSCR value of 0 won't be a state-losing stop and thus won't allow thread reconfiguration. - Once threads_per_core/2 + 1 (i.e. 3) threads are awake, the core must be in SMT4 mode, since SMT modes are powers of 2. This does add a sync to power9_idle_stop(), which is necessary to provide the correct ordering between setting requested_psscr and checking dont_stop. The overhead of the sync should be unnoticeable compared to the latency of going into and out of a stop state. Because some objected to incurring this extra latency on systems where the XER[SO] bug is not relevant, I have put the test in power9_idle_stop inside a feature section. This means that pnv_power9_force_smt4_catch() WILL NOT WORK correctly on systems without the CPU_FTR_P9_TM_XER_SO_BUG feature bit set, and will probably hang the system. In order to cater for uses where the caller has an operation that has to be done while the core is in SMT4, the core continues to be kept in SMT4 after pnv_power9_force_smt4_catch() function returns, until the pnv_power9_force_smt4_release() function is called. It undoes the effect of step 1 above and allows the other threads to go into a stop state. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-20powerpc: Use sizeof(*foo) rather than sizeof(struct foo)Markus Elfring16-26/+24
It's slightly less error prone to use sizeof(*foo) rather than specifying the type. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> [mpe: Consolidate into one patch, rewrite change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-14powerpc/vas: Add a couple of trace pointsSukadev Bhattiprolu2-0/+122
Add a couple of trace points in the VAS driver Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> [mpe: Add SPDX tag to new header] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-14powerpc/vas: Fix cleanup when VAS is not configuredSukadev Bhattiprolu2-3/+14
When VAS is not configured, unregister the platform driver. Also simplify cleanup by delaying vas debugfs init until we know VAS is configured. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-14powerpc/npu-dma.c: Fix crash after __mmu_notifier_register failureMark Hairgrove1-11/+21
pnv_npu2_init_context wasn't checking the return code from __mmu_notifier_register. If __mmu_notifier_register failed, the npu_context was still assigned to the mm and the caller wasn't given any indication that things went wrong. Later on pnv_npu2_destroy_context would be called, which in turn called mmu_notifier_unregister and dropped mm->mm_count without having incremented it in the first place. This led to various forms of corruption like mm use-after-free and mm double-free. __mmu_notifier_register can fail with EINTR if a signal is pending, so this case can be frequent. This patch calls opal_npu_destroy_context on the failure paths, and makes sure not to assign mm->context.npu_context until past the failure points. Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com> Acked-By: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13powerpc: Rename plapr routines to plparMichael Ellerman2-2/+2
Back in 2013 we added some hypercall wrappers which misspelled "plpar" (P-series Logical PARtition) as "plapr". Visually they're hard to distinguish and it almost doesn't matter, but it is confusing when grepping to miss some calls because of the typo. They've also started spreading, so before they take over let's fix them all to be "plpar". Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13powerpc/pseries: Move smp_query_cpu_stopped() etc. out of plpar_wrappers.hMichael Ellerman1-0/+8
smp_query_cpu_stopped() and related #defines are currently in plpar_wrappers.h. The function actually does an RTAS call, not an hcall, and basically has nothing to do with plpar_wrappers.h Move it into pseries.h, where it can easily be used by the only two callers in pseries/smp.c and pseries/hotplug-cpu.c. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13powerpc/embedded6xx: Make functions flipper_pic_init() & ug_udbg_putc() staticMathieu Malaterre2-2/+2
Change signature of two functions, adding static keyword to prevent the following two warnings (treated as errors on W=1): arch/powerpc/platforms/embedded6xx/flipper-pic.c:135:28: error: no previous prototype for ‘flipper_pic_init’ arch/powerpc/platforms/embedded6xx/usbgecko_udbg.c:172:6: error: no previous prototype for ‘ug_udbg_putc’ Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13powerpc/powernv/mce: Don't silently restart the machineBalbir Singh1-1/+4
On MCE the current code will restart the machine with ppc_md.restart(). This case was extremely unlikely since prior to that a skiboot call is made and that resulted in a checkstop for analysis. With newer skiboots, on P9 we don't checkstop the box by default, instead we return back to the kernel to extract useful information at the time of the MCE. While we still get this information, this patch converts the restart to a panic(), so that if configured a dump can be taken and we can track and probably debug the potential issue causing the MCE. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13powerpc/powernv: Enable tunneled operationsPhilippe Bergheaud3-8/+137
P9 supports PCI tunneled operations (atomics and as_notify). This patch adds support for tunneled operations on powernv, with a new API, to be called by device drivers: pnv_pci_enable_tunnel() Enable tunnel operations, tell driver the 16-bit ASN indication used by kernel. pnv_pci_disable_tunnel() Disable tunnel operations. pnv_pci_set_tunnel_bar() Tell kernel the Tunnel BAR Response address used by driver. This function uses two new OPAL calls, as the PBCQ Tunnel BAR register is configured by skiboot. pnv_pci_get_as_notify_info() Return the ASN info of the thread to be woken up. Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13powerpc/powernv/npu: Fix deadlock in mmio_invalidate()Alistair Popple1-88/+141
When sending TLB invalidates to the NPU we need to send extra flushes due to a hardware issue. The original implementation would lock the all the ATSD MMIO registers sequentially before unlocking and relocking each of them sequentially to do the extra flush. This introduced a deadlock as it is possible for one thread to hold one ATSD register whilst waiting for another register to be freed while the other thread is holding that register waiting for the one in the first thread to be freed. For example if there are two threads and two ATSD registers: Thread A Thread B ---------------------- Acquire 1 Acquire 2 Release 1 Acquire 1 Wait 1 Wait 2 Both threads will be stuck waiting to acquire a register resulting in an RCU stall warning or soft lockup. This patch solves the deadlock by refactoring the code to ensure registers are not released between flushes and to ensure all registers are either acquired or released together and in order. Fixes: bbd5ff50afff ("powerpc/powernv/npu-dma: Add explicit flush when sending an ATSD") Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>