summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)AuthorFilesLines
2021-10-15KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if it went to guestMichael Ellerman1-2/+7
We call idle_kvm_start_guest() from power7_offline() if the thread has been requested to enter KVM. We pass it the SRR1 value that was returned from power7_idle_insn() which tells us what sort of wakeup we're processing. Depending on the SRR1 value we pass in, the KVM code might enter the guest, or it might return to us to do some host action if the wakeup requires it. If idle_kvm_start_guest() is able to handle the wakeup, and enter the guest it is supposed to indicate that by returning a zero SRR1 value to us. That was the behaviour prior to commit 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C"), however in that commit the handling of SRR1 was reworked, and the zeroing behaviour was lost. Returning from idle_kvm_start_guest() without zeroing the SRR1 value can confuse the host offline code, causing the guest to crash and other weirdness. Fixes: 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211015133929.832061-2-mpe@ellerman.id.au
2021-10-15KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest()Michael Ellerman1-9/+10
In commit 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C") kvm_start_guest() became idle_kvm_start_guest(). The old code allocated a stack frame on the emergency stack, but didn't use the frame to store anything, and also didn't store anything in its caller's frame. idle_kvm_start_guest() on the other hand is written more like a normal C function, it creates a frame on entry, and also stores CR/LR into its callers frame (per the ABI). The problem is that there is no caller frame on the emergency stack. The emergency stack for a given CPU is allocated with: paca_ptrs[i]->emergency_sp = alloc_stack(limit, i) + THREAD_SIZE; So emergency_sp actually points to the first address above the emergency stack allocation for a given CPU, we must not store above it without first decrementing it to create a frame. This is different to the regular kernel stack, paca->kstack, which is initialised to point at an initial frame that is ready to use. idle_kvm_start_guest() stores the backchain, CR and LR all of which write outside the allocation for the emergency stack. It then creates a stack frame and saves the non-volatile registers. Unfortunately the frame it creates is not large enough to fit the non-volatiles, and so the saving of the non-volatile registers also writes outside the emergency stack allocation. The end result is that we corrupt whatever is at 0-24 bytes, and 112-248 bytes above the emergency stack allocation. In practice this has gone unnoticed because the memory immediately above the emergency stack happens to be used for other stack allocations, either another CPUs mc_emergency_sp or an IRQ stack. See the order of calls to irqstack_early_init() and emergency_stack_init(). The low addresses of another stack are the top of that stack, and so are only used if that stack is under extreme pressue, which essentially never happens in practice - and if it did there's a high likelyhood we'd crash due to that stack overflowing. Still, we shouldn't be corrupting someone else's stack, and it is purely luck that we aren't corrupting something else. To fix it we save CR/LR into the caller's frame using the existing r1 on entry, we then create a SWITCH_FRAME_SIZE frame (which has space for pt_regs) on the emergency stack with the backchain pointing to the existing stack, and then finally we switch to the new frame on the emergency stack. Fixes: 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211015133929.832061-1-mpe@ellerman.id.au
2021-10-15sched: Add wrapper for get_wchan() to keep task blockedKees Cook2-7/+4
Having a stable wchan means the process must be blocked and for it to stay that way while performing stack unwinding. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm] Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64] Link: https://lkml.kernel.org/r/20211008111626.332092234@infradead.org
2021-10-14powerpc: Mark .opd section read-onlyChristophe Leroy1-6/+6
.opd section contains function descriptors used to locate functions in the kernel. If someone is able to modify a function descriptor he will be able to run arbitrary kernel function instead of another. To avoid that, move .opd section inside read-only memory. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3cd40b682fb6f75bb40947b55ca0bac20cb3f995.1634136222.git.christophe.leroy@csgroup.eu
2021-10-14powerpc/perf: Fix cycles/instructions as PM_CYC/PM_INST_CMPL in power10Athira Rajeev2-17/+35
On power9 and earlier platforms, the default event used for cyles and instructions is PM_CYC (0x0001e) and PM_INST_CMPL (0x00002) respectively. These events use two programmable PMCs and by default will count irrespective of the run latch state (idle state). But since they use programmable PMCs, these events can lead to multiplexing with other events, because there are only 4 programmable PMCs. Hence in power10, performance monitoring unit (PMU) driver uses performance monitor counter 5 (PMC5) and performance monitor counter6 (PMC6) for counting instructions and cycles. Currently on power10, the event used for cycles is PM_RUN_CYC (0x600F4) and instructions uses PM_RUN_INST_CMPL (0x500fa). But counting of these events in idle state is controlled by the CC56RUN bit setting in Monitor Mode Control Register0 (MMCR0). If the CC56RUN bit is zero, PMC5/6 will not count when CTRL[RUN] (run latch) is zero. This could lead to missing some counts if a thread is in idle state during system wide profiling. To fix it, set the CC56RUN bit in MMCR0 for power10, which makes PMC5 and PMC6 count instructions and cycles regardless of the run latch state. Since this change make PMC5/6 count as PM_INST_CMPL/PM_CYC, rename the event code 0x600f4 as PM_CYC instead of PM_RUN_CYC and event code 0x500fa as PM_INST_CMPL instead of PM_RUN_INST_CMPL. The changes are only for PMC5/6 event codes and will not affect the behaviour of PM_RUN_CYC/PM_RUN_INST_CMPL if progammed in other PMC's. Fixes: a64e697cef23 ("powerpc/perf: power10 Performance Monitoring support") Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.cm> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> [mpe: Tweak change log wording for style and consistency] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211007075121.28497-1-atrajeev@linux.vnet.ibm.com
2021-10-13powerpc/eeh: Fix docstrings in eeh.cKai Song1-4/+8
We fix the following warnings when building kernel with W=1: arch/powerpc/kernel/eeh.c:598: warning: Function parameter or member 'function' not described in 'eeh_pci_enable' arch/powerpc/kernel/eeh.c:774: warning: Function parameter or member 'edev' not described in 'eeh_set_dev_freset' arch/powerpc/kernel/eeh.c:774: warning: expecting prototype for eeh_set_pe_freset(). Prototype was for eeh_set_dev_freset() instead arch/powerpc/kernel/eeh.c:814: warning: Function parameter or member 'include_passed' not described in 'eeh_pe_reset_full' arch/powerpc/kernel/eeh.c:944: warning: Function parameter or member 'ops' not described in 'eeh_init' arch/powerpc/kernel/eeh.c:1451: warning: Function parameter or member 'include_passed' not described in 'eeh_pe_reset' arch/powerpc/kernel/eeh.c:1526: warning: Function parameter or member 'func' not described in 'eeh_pe_inject_err' arch/powerpc/kernel/eeh.c:1526: warning: Excess function parameter 'function' described in 'eeh_pe_inject_err' Signed-off-by: Kai Song <songkai01@inspur.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211009041630.4135-1-songkai01@inspur.com
2021-10-13powerpc/boot: Use CONFIG_PPC_POWERNV to compile OPAL supportCédric Le Goater2-2/+2
CONFIG_PPC64_BOOT_WRAPPER is selected by CPU_LITTLE_ENDIAN which is used to compile support for other platforms such as Microwatt. There is no need for OPAL calls on these. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211011070356.99952-1-clg@kaod.org
2021-10-13powerpc/xive: Discard disabled interrupts in get_irqchip_state()Cédric Le Goater1-1/+2
When an interrupt is passed through, the KVM XIVE device calls the set_vcpu_affinity() handler which raises the P bit to mask the interrupt and to catch any in-flight interrupts while routing the interrupt to the guest. On the guest side, drivers (like some Intels) can request at probe time some MSIs and call synchronize_irq() to check that there are no in flight interrupts. This will call the XIVE get_irqchip_state() handler which will always return true as the interrupt P bit has been set on the host side and lock the CPU in an infinite loop. Fix that by discarding disabled interrupts in get_irqchip_state(). Fixes: da15c03b047d ("powerpc/xive: Implement get_irqchip_state method for XIVE to fix shutdown race") Cc: stable@vger.kernel.org #v5.4+ Signed-off-by: Cédric Le Goater <clg@kaod.org> Tested-by: seeteena <s1seetee@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211011070203.99726-1-clg@kaod.org
2021-10-13powerpc: Set max_mapnr correctlyChristophe Leroy1-1/+1
max_mapnr is used by virt_addr_valid() to check if a linear address is valid. It must only include lowmem PFNs, like other architectures. Problem detected on a system with 1G mem (Only 768M are mapped), with CONFIG_DEBUG_VIRTUAL and CONFIG_TEST_DEBUG_VIRTUAL, it didn't report virt_to_phys(VMALLOC_START), VMALLOC_START being 0xf1000000. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/77d99037782ac4b3c3b0124fc4ae80ce7b760b05.1634035228.git.christophe.leroy@csgroup.eu
2021-10-13KVM: PPC: Book3S HV: H_ENTER filter out reserved HPTE[B] valueNicholas Piggin2-0/+13
The HPTE B field is a 2-bit field with values 0b10 and 0b11 reserved. This field is also taken from the HPTE and used when KVM executes TLBIEs to set the B field of those instructions. Disallow the guest setting B to a reserved value with H_ENTER by rejecting it. This is the same approach already taken for rejecting reserved (unsupported) LLP values. This prevents the guest from being able to induce the host to execute TLBIE with reserved values, which is not known to be a problem with current processors but in theory it could prevent the TLBIE from working correctly in a future processor. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211004145749.1331331-1-npiggin@gmail.com
2021-10-13powerpc/eeh: Use dev_driver_string() instead of struct pci_dev->driver->nameUwe Kleine-König2-5/+8
Replace pdev->driver->name by dev_driver_string() for the corresponding struct device. This is a step toward removing pci_dev->driver. Move the function nearer its only user and instead of the ?: operator use a normal "if" which is more readable. Link: https://lore.kernel.org/r/20211004125935.2300113-6-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-10-12powerpc/perf: Expose instruction and data address registers as part of ↵Athira Rajeev2-4/+11
extended regs Patch adds support to include Sampled Instruction Address Register (SIAR) and Sampled Data Address Register (SDAR) SPRs as part of extended registers. Update the definition of PERF_REG_PMU_MASK_300/31 and PERF_REG_EXTENDED_MAX to include these SPR's. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Kajol Jain<kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211007065505.27809-4-atrajeev@linux.vnet.ibm.com
2021-10-12powerpc/perf: Refactor the code definition of perf reg extended maskAthira Rajeev1-8/+13
PERF_REG_PMU_MASK_300 and PERF_REG_PMU_MASK_31 defines the mask value for extended registers. Current definition of these mask values uses hex constant and does not use registers by name, making it less readable. Patch refactor the macro values by or'ing together the actual register value constants. Also include PERF_REG_EXTENDED_MAX as part of enum definition. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Kajol Jain<kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211007065505.27809-2-atrajeev@linux.vnet.ibm.com
2021-10-10Merge tag 'powerpc-5.15-3' of ↵Linus Torvalds17-75/+234
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "A bit of a big batch, partly because I didn't send any last week, and also just because the BPF fixes happened to land this week. Summary: - Fix a regression hit by the IPR SCSI driver, introduced by the recent addition of MSI domains on pseries. - A big series including 8 BPF fixes, some with potential security impact and the rest various code generation issues. - Fix our program check assembler entry path, which was accidentally jumping into a gas macro and generating strange stack frames, which could confuse find_bug(). - A couple of fixes, and related changes, to fix corner cases in our machine check handling. - Fix our DMA IOMMU ops, which were not always returning the optimal DMA mask, leading to at least one device falling back to 32-bit DMA when it shouldn't. - A fix for KUAP handling on 32-bit Book3S. - Fix crashes seen when kdumping on some pseries systems. Thanks to Naveen N. Rao, Nicholas Piggin, Alexey Kardashevskiy, Cédric Le Goater, Christophe Leroy, Mahesh Salgaonkar, Abdul Haleem, Christoph Hellwig, Johan Almbladh, Stan Johnson" * tag 'powerpc-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: pseries/eeh: Fix the kdump kernel crash during eeh_pseries_init powerpc/32s: Fix kuap_kernel_restore() powerpc/pseries/msi: Add an empty irq_write_msi_msg() handler powerpc/64s: Fix unrecoverable MCE calling async handler from NMI powerpc/64/interrupt: Reconcile soft-mask state in NMI and fix false BUG powerpc/64: warn if local irqs are enabled in NMI or hardirq context powerpc/traps: do not enable irqs in _exception powerpc/64s: fix program check interrupt emergency stack path powerpc/bpf ppc32: Fix BPF_SUB when imm == 0x80000000 powerpc/bpf ppc32: Do not emit zero extend instruction for 64-bit BPF_END powerpc/bpf ppc32: Fix JMP32_JSET_K powerpc/bpf ppc32: Fix ALU32 BPF_ARSH operation powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC powerpc/security: Add a helper to query stf_barrier type powerpc/bpf: Fix BPF_SUB when imm == 0x80000000 powerpc/bpf: Fix BPF_MOD when imm == 1 powerpc/bpf: Validate branch ranges powerpc/lib: Add helper to check if offset is within conditional branch range powerpc/iommu: Report the correct most efficient DMA mask for PCI devices
2021-10-08powerpc/pseries/cpuhp: remove obsolete comment from pseries_cpu_dieNathan Lynch1-5/+0
This comment likely refers to the obsolete DLPAR workflow where some resource state transitions were driven more directly from user space utilities, but it also seems to contradict itself: "Change isolate state to Isolate [...]" is at odds with the preceding sentences, and it does not relate at all to the code that follows. Remove it to prevent confusion. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210927201933.76786-5-nathanl@linux.ibm.com
2021-10-08powerpc/pseries/cpuhp: delete add/remove_by_count codeNathan Lynch1-216/+2
The core DLPAR code supports two actions (add and remove) and three subtypes of action: * By DRC index: the action is attempted on a single specified resource. This is the usual case for processors. * By indexed count: the action is attempted on a range of resources beginning at the specified index. This is implemented only by the memory DLPAR code. * By count: the lower layer (CPU or memory) is responsible for locating the specified number of resources to which the action can be applied. I cannot find any evidence of the "by count" subtype being used by drmgr or qemu for processors. And when I try to exercise this code, the add case does not work: $ ppc64_cpu --smt ; nproc SMT=8 24 $ printf "cpu remove count 2" > /sys/kernel/dlpar $ nproc 8 $ printf "cpu add count 2" > /sys/kernel/dlpar -bash: printf: write error: Invalid argument $ dmesg | tail -2 pseries-hotplug-cpu: Failed to find enough CPUs (1 of 2) to add dlpar: Could not handle DLPAR request "cpu add count 2" $ nproc 8 $ drmgr -c cpu -a -q 2 # this uses the by-index method Validating CPU DLPAR capability...yes. CPU 1 CPU 17 $ nproc 24 This is because find_drc_info_cpus_to_add() does not increment drc_index appropriately during its search. This is not hard to fix. But the _by_count() functions also have the property that they attempt to roll back all prior operations if the entire request cannot be satisfied, even though the rollback itself can encounter errors. It's not possible to provide transaction-like behavior at this level, and it's undesirable to have code that can only pretend to do that. Any users of these functions cannot know what the state of the system is in the error case. And the error paths are, to my knowledge, impossible to test without adding custom error injection code. Summary: * This code has not worked reliably since its introduction. * There is no evidence that it is used. * It contains questionable rollback behaviors in error paths which are difficult to test. So let's remove it. Fixes: ac71380071d1 ("powerpc/pseries: Add CPU dlpar remove functionality") Fixes: 90edf184b9b7 ("powerpc/pseries: Add CPU dlpar add functionality") Fixes: b015f6bc9547 ("powerpc/pseries: Add cpu DLPAR support for drc-info property") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210927201933.76786-4-nathanl@linux.ibm.com
2021-10-08powerpc/cpuhp: BUG -> WARN conversion in offline pathNathan Lynch1-1/+2
If, due to bugs elsewhere, we get into unregister_cpu_online() with a CPU that isn't marked hotpluggable, we can emit a warning and return an appropriate error instead of crashing. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210927201933.76786-3-nathanl@linux.ibm.com
2021-10-08powerpc/pseries/cpuhp: cache node correctionsNathan Lynch1-4/+71
On pseries, cache nodes in the device tree can be added and removed by the CPU DLPAR code as well as the partition migration (mobility) code. PowerVM partitions in dedicated processor mode typically have L2 and L3 cache nodes. The CPU DLPAR code has the following shortcomings: * Cache nodes returned as siblings of a new CPU node by ibm,configure-connector are silently discarded; only the CPU node is added to the device tree. * Cache nodes which become unreferenced in the processor removal path are not removed from the device tree. This can lead to duplicate nodes when the post-migration device tree update code replaces cache nodes. This is long-standing behavior. Presumably it has gone mostly unnoticed because the two bugs have the property of obscuring each other in common simple scenarios (e.g. remove a CPU and add it back). Likely you'd notice only if you cared to inspect the device tree or the sysfs cacheinfo information. Booted with two processors: $ pwd /sys/firmware/devicetree/base/cpus $ ls -1d */ l2-cache@2010/ l2-cache@2011/ l3-cache@3110/ l3-cache@3111/ PowerPC,POWER9@0/ PowerPC,POWER9@8/ $ lsprop */l2-cache l2-cache@2010/l2-cache 00003110 (12560) l2-cache@2011/l2-cache 00003111 (12561) PowerPC,POWER9@0/l2-cache 00002010 (8208) PowerPC,POWER9@8/l2-cache 00002011 (8209) $ ls /sys/devices/system/cpu/cpu0/cache/ index0 index1 index2 index3 After DLPAR-adding PowerPC,POWER9@10, we see that its associated cache nodes are absent, its threads' L2+L3 cacheinfo is unpopulated, and it is missing a cache level in its sched domain hierarchy: $ ls -1d */ l2-cache@2010/ l2-cache@2011/ l3-cache@3110/ l3-cache@3111/ PowerPC,POWER9@0/ PowerPC,POWER9@10/ PowerPC,POWER9@8/ $ lsprop PowerPC\,POWER9@10/l2-cache PowerPC,POWER9@10/l2-cache 00002012 (8210) $ ls /sys/devices/system/cpu/cpu16/cache/ index0 index1 $ grep . /sys/kernel/debug/sched/domains/cpu{0,8,16}/domain*/name /sys/kernel/debug/sched/domains/cpu0/domain0/name:SMT /sys/kernel/debug/sched/domains/cpu0/domain1/name:CACHE /sys/kernel/debug/sched/domains/cpu0/domain2/name:DIE /sys/kernel/debug/sched/domains/cpu8/domain0/name:SMT /sys/kernel/debug/sched/domains/cpu8/domain1/name:CACHE /sys/kernel/debug/sched/domains/cpu8/domain2/name:DIE /sys/kernel/debug/sched/domains/cpu16/domain0/name:SMT /sys/kernel/debug/sched/domains/cpu16/domain1/name:DIE When removing PowerPC,POWER9@8, we see that its cache nodes are left behind: $ ls -1d */ l2-cache@2010/ l2-cache@2011/ l3-cache@3110/ l3-cache@3111/ PowerPC,POWER9@0/ When DLPAR is combined with VM migration, we can get duplicate nodes. E.g. removing one processor, then migrating, adding a processor, and then migrating again can result in warnings from the OF core during post-migration device tree updates: Duplicate name in cpus, renamed to "l2-cache@2011#1" Duplicate name in cpus, renamed to "l3-cache@3111#1" and nodes with duplicated phandles in the tree, making lookup behavior unpredictable: $ lsprop l[23]-cache@*/ibm,phandle l2-cache@2010/ibm,phandle 00002010 (8208) l2-cache@2011#1/ibm,phandle 00002011 (8209) l2-cache@2011/ibm,phandle 00002011 (8209) l3-cache@3110/ibm,phandle 00003110 (12560) l3-cache@3111#1/ibm,phandle 00003111 (12561) l3-cache@3111/ibm,phandle 00003111 (12561) Address these issues by: * Correctly processing siblings of the node returned from dlpar_configure_connector(). * Removing cache nodes in the CPU remove path when it can be determined that they are not associated with other CPUs or caches. Use the of_changeset API in both cases, which allows us to keep the error handling in this code from becoming more complex while ensuring that the device tree cannot become inconsistent. Fixes: ac71380071d1 ("powerpc/pseries: Add CPU dlpar remove functionality") Fixes: 90edf184b9b7 ("powerpc/pseries: Add CPU dlpar add functionality") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210927201933.76786-2-nathanl@linux.ibm.com
2021-10-08powerpc/paravirt: correct preempt debug splat in vcpu_is_preempted()Nathan Lynch1-1/+17
vcpu_is_preempted() can be used outside of preempt-disabled critical sections, yielding warnings such as: BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/185 caller is rwsem_spin_on_owner+0x1cc/0x2d0 CPU: 1 PID: 185 Comm: systemd-udevd Not tainted 5.15.0-rc2+ #33 Call Trace: [c000000012907ac0] [c000000000aa30a8] dump_stack_lvl+0xac/0x108 (unreliable) [c000000012907b00] [c000000001371f70] check_preemption_disabled+0x150/0x160 [c000000012907b90] [c0000000001e0e8c] rwsem_spin_on_owner+0x1cc/0x2d0 [c000000012907be0] [c0000000001e1408] rwsem_down_write_slowpath+0x478/0x9a0 [c000000012907ca0] [c000000000576cf4] filename_create+0x94/0x1e0 [c000000012907d10] [c00000000057ac08] do_symlinkat+0x68/0x1a0 [c000000012907d70] [c00000000057ae18] sys_symlink+0x58/0x70 [c000000012907da0] [c00000000002e448] system_call_exception+0x198/0x3c0 [c000000012907e10] [c00000000000c54c] system_call_common+0xec/0x250 The result of vcpu_is_preempted() is always used speculatively, and the function does not access per-cpu resources in a (Linux) preempt-unsafe way. Use raw_smp_processor_id() to avoid such warnings, adding explanatory comments. Fixes: ca3f969dcb11 ("powerpc/paravirt: Use is_kvm_guest() in vcpu_is_preempted()") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210928214147.312412-3-nathanl@linux.ibm.com
2021-10-08powerpc/paravirt: vcpu_is_preempted() commentaryNathan Lynch1-4/+18
Add comments more clearly documenting that this function determines whether hypervisor-level preemption of the VM has occurred. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210928214147.312412-2-nathanl@linux.ibm.com
2021-10-08powerpc: fix unbalanced node refcount in check_kvm_guest()Nathan Lynch1-4/+3
When check_kvm_guest() succeeds in looking up a /hypervisor OF node, it returns without performing a matching put for the lookup, leaving the node's reference count elevated. Add the necessary call to of_node_put(), rearranging the code slightly to avoid repetition or goto. Fixes: 107c55005fbd ("powerpc/pseries: Add KVM guest doorbell restrictions") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210928124550.132020-1-nathanl@linux.ibm.com
2021-10-08powerpc/powernv/dump: Fix typo in commentVasant Hegde1-1/+1
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210914143802.54325-1-hegdevasant@linux.vnet.ibm.com
2021-10-08powerpc/asm: Remove UPD_CONSTR after GCC 4.9 removalNick Desaulniers5-13/+11
UPD_CONSTR was previously a preprocessor define for an old GCC 4.9 inline asm bug with m<> constraints. Fixes: 6563139d90ad ("powerpc: remove GCC version check for UPD_CONSTR") Suggested-by: Nathan Chancellor <nathan@kernel.org> Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210914161712.2463458-1-ndesaulniers@google.com
2021-10-08powerpc/mem: Fix arch/powerpc/mm/mem.c:53:12: error: no previous prototype ↵Christophe Leroy1-1/+1
for 'create_section_mapping' Commit 8e11d62e2e87 ("powerpc/mem: Add back missing header to fix 'no previous prototype' error") was supposed to fix the problem, but in the meantime commit a927bd6ba952 ("mm: fix phys_to_target_node() and* memory_add_physaddr_to_nid() exports") moved create_section_mapping() prototype from asm/sparsemem.h to asm/mmzone.h Fixes: 8e11d62e2e87 ("powerpc/mem: Add back missing header to fix 'no previous prototype' error") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/025754fde3d027904ae9d0191f395890bec93369.1631541649.git.christophe.leroy@csgroup.eu
2021-10-08powerpc: Drop superfluous pci_dev_is_added() callsNiklas Schnelle2-8/+1
On powerpc, pci_dev_is_added() is called as part of SR-IOV fixups that are done under pcibios_add_device() which in turn is only called in pci_device_add() whih is called when a PCI device is scanned. pci_dev_assign_added() is called in pci_bus_add_device() which is only called after scanning the device. Thus pci_dev_is_added() is always false and can be dropped. Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Oliver O'Halloran <oohall@gmail.com> [mpe: Tweak change log slightly to reflect Oliver's comments] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210910141940.2598035-2-schnelle@linux.ibm.com
2021-10-08powerpc/powermac: Remove stale declaration of pmac_mdChristophe Leroy1-2/+0
pmac_md doesn't exist anymore, remove stall declaration. Fixes: e8222502ee61 ("[PATCH] powerpc: Kill _machine and hard-coded platform numbers") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b2e52934e5a500f149e6d94db3cfa0569bc35081.1630657402.git.christophe.leroy@csgroup.eu
2021-10-08powerpc: Remove unused prototype for of_show_percpuinfoDaniel Axtens1-1/+0
commit 6d7f58b04d82 ("[PATCH] powerpc: Some minor cleanups to setup_32.c") removed of_show_percpuinfo but didn't remove the prototype. Remove it. Fixes: 6d7f58b04d82 ("[PATCH] powerpc: Some minor cleanups to setup_32.c") Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210903063246.70691-1-dja@axtens.net
2021-10-08powerpc/476: Fix sparse reportChristophe Leroy1-2/+2
Fix sparse errors: arch/powerpc/platforms/44x/ppc476.c:236:17: warning: cast removes address space '__iomem' of expression arch/powerpc/platforms/44x/ppc476.c:241:34: warning: incorrect type in argument 1 (different address spaces) arch/powerpc/platforms/44x/ppc476.c:241:34: expected void const volatile [noderef] __iomem *addr arch/powerpc/platforms/44x/ppc476.c:241:34: got unsigned char [usertype] * arch/powerpc/platforms/44x/ppc476.c:243:17: warning: incorrect type in argument 1 (different address spaces) arch/powerpc/platforms/44x/ppc476.c:243:17: expected void volatile [noderef] __iomem *addr arch/powerpc/platforms/44x/ppc476.c:243:17: got unsigned char [usertype] *[assigned] fpga Mark 'fpga' pointer as __iomem. Fixes: ab9a4183fddf ("powerpc: Update currituck pci/usb fixup for new board revision") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/aa6055769b92a5d8685b8d0adab99c48a0b0ef4b.1631956926.git.christophe.leroy@csgroup.eu
2021-10-08powerpc/85xx: fix timebase sync issue when CONFIG_HOTPLUG_CPU=nXiaoming Ni3-7/+13
When CONFIG_SMP=y, timebase synchronization is required when the second kernel is started. arch/powerpc/kernel/smp.c: int __cpu_up(unsigned int cpu, struct task_struct *tidle) { ... if (smp_ops->give_timebase) smp_ops->give_timebase(); ... } void start_secondary(void *unused) { ... if (smp_ops->take_timebase) smp_ops->take_timebase(); ... } When CONFIG_HOTPLUG_CPU=n and CONFIG_KEXEC_CORE=n, smp_85xx_ops.give_timebase is NULL, smp_85xx_ops.take_timebase is NULL, As a result, the timebase is not synchronized. Timebase synchronization does not depend on CONFIG_HOTPLUG_CPU. Fixes: 56f1ba280719 ("powerpc/mpc85xx: refactor the PM operations") Cc: stable@vger.kernel.org # v4.6+ Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210929033646.39630-3-nixiaoming@huawei.com
2021-10-08powerpc/85xx: Fix oops when mpc85xx_smp_guts_ids node cannot be foundXiaoming Ni1-2/+1
When the field described in mpc85xx_smp_guts_ids[] is not configured in dtb, the mpc85xx_setup_pmc() does not assign a value to the "guts" variable. As a result, the oops is triggered when mpc85xx_freeze_time_base() is executed. Fixes: 56f1ba280719 ("powerpc/mpc85xx: refactor the PM operations") Cc: stable@vger.kernel.org # v4.6+ Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210929033646.39630-2-nixiaoming@huawei.com
2021-10-08powerps/pseries/dma: Add support for 2M IOMMU page sizeAlexey Kardashevskiy1-5/+5
The upcoming PAPR spec adds a 2M page size, bit 23 right after 16G page size in the "ibm,query-pe-dma-window" call. This adds support for the new page size. Since the new page size is out of sorted order, this changes the loop to not assume that shift[] is sorted. This has now been tested and is known to work on a pre-release version of phyp. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211006044735.1114669-1-aik@ozlabs.ru
2021-10-07pseries/eeh: Fix the kdump kernel crash during eeh_pseries_initMahesh Salgaonkar1-0/+4
On pseries LPAR when an empty slot is assigned to partition OR in single LPAR mode, kdump kernel crashes during issuing PHB reset. In the kdump scenario, we traverse all PHBs and issue reset using the pe_config_addr of the first child device present under each PHB. However the code assumes that none of the PHB slots can be empty and uses list_first_entry() to get the first child device under the PHB. Since list_first_entry() expects the list to be non-empty, it returns an invalid pci_dn entry and ends up accessing NULL phb pointer under pci_dn->phb causing kdump kernel crash. This patch fixes the below kdump kernel crash by skipping empty slots: audit: initializing netlink subsys (disabled) thermal_sys: Registered thermal governor 'fair_share' thermal_sys: Registered thermal governor 'step_wise' cpuidle: using governor menu pstore: Registered nvram as persistent store backend Issue PHB reset ... audit: type=2000 audit(1631267818.000:1): state=initialized audit_enabled=0 res=1 BUG: Kernel NULL pointer dereference on read at 0x00000268 Faulting instruction address: 0xc000000008101fb0 Oops: Kernel access of bad area, sig: 7 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 7 PID: 1 Comm: swapper/7 Not tainted 5.14.0 #1 NIP: c000000008101fb0 LR: c000000009284ccc CTR: c000000008029d70 REGS: c00000001161b840 TRAP: 0300 Not tainted (5.14.0) MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 28000224 XER: 20040002 CFAR: c000000008101f0c DAR: 0000000000000268 DSISR: 00080000 IRQMASK: 0 ... NIP pseries_eeh_get_pe_config_addr+0x100/0x1b0 LR __machine_initcall_pseries_eeh_pseries_init+0x2cc/0x350 Call Trace: 0xc00000001161bb80 (unreliable) __machine_initcall_pseries_eeh_pseries_init+0x2cc/0x350 do_one_initcall+0x60/0x2d0 kernel_init_freeable+0x350/0x3f8 kernel_init+0x3c/0x17c ret_from_kernel_thread+0x5c/0x64 Fixes: 5a090f7c363fd ("powerpc/pseries: PCIE PHB reset") Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> [mpe: Tweak wording and trim oops] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/163215558252.413351.8600189949820258982.stgit@jupiter
2021-10-07powerpc/32s: Fix kuap_kernel_restore()Christophe Leroy1-0/+8
At interrupt exit, kuap_kernel_restore() calls kuap_unlock() with the value contained in regs->kuap. However, when regs->kuap contains 0xffffffff it means that KUAP was not unlocked so calling kuap_unlock() is unrelevant and results in jeopardising the contents of kernel space segment registers. So check that regs->kuap doesn't contain KUAP_NONE before calling kuap_unlock(). In the meantime it also means that if KUAP has not been correcly locked back at interrupt exit, it must be locked before continuing. This is done by checking the content of current->thread.kuap which was returned by kuap_get_and_assert_locked() Fixes: 16132529cee5 ("powerpc/32s: Rework Kernel Userspace Access Protection") Reported-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0d0c4d0f050a637052287c09ba521bad960a2790.1631715131.git.christophe.leroy@csgroup.eu
2021-10-07powerpc/pseries/msi: Add an empty irq_write_msi_msg() handlerCédric Le Goater1-0/+15
The IPR drivers tests for MSI support at probe time with MSI vector 0 and when done, frees the IRQ with free_irq(). This test was introduced by 95fecd90397e ("ipr: add test for MSI interrupt support") as an improvement of commit 5a9ef25b14d3 ("[SCSI] ipr: add MSI support") because a boot failure was reported on a Bimini PowerPC system: https://lore.kernel.org/r/1242926159.3007.5.camel@localhost.localdomain It was finally decided to remove MSI support on Bimini systems in 6eb0ac03899a ("powerpc/maple: Add a quirk to disable MSI for IPR on Bimini"). Linux 5.15-rc1 added MSI domain support to the pseries machine and when free_irq is called() in the driver, msi_domain_deactivate() also is. This resets the MSI table entry of the associate vector by calling __pci_write_msi_msg() with an empty message and breaks any further activation of the same vector. In the case of the IPR driver, it breaks the initialization sequence of the IOA. Introduce an empty irq_write_msi_msg() handler in the MSI domain of the pseries machine to avoid clearing the MSI vector entry. Updating the entry is not strictly necessary since it is initialized by the underlying hypervisor, PowerVM or QEMU/KVM. Fixes: a5f3d2c17b07 ("powerpc/pseries/pci: Add MSI domains") Signed-off-by: Cédric Le Goater <clg@kaod.org> Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Tested-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> [mpe: Tweak comment wording and formatting slightly] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210930102535.1047230-1-clg@kaod.org
2021-10-07powerpc/64s: Fix unrecoverable MCE calling async handler from NMINicholas Piggin3-18/+26
The machine check handler is not considered NMI on 64s. The early handler is the true NMI handler, and then it schedules the machine_check_exception handler to run when interrupts are enabled. This works fine except the case of an unrecoverable MCE, where the true NMI is taken when MSR[RI] is clear, it can not recover, so it calls machine_check_exception directly so something might be done about it. Calling an async handler from NMI context can result in irq state and other things getting corrupted. This can also trigger the BUG at arch/powerpc/include/asm/interrupt.h:168 BUG_ON(!arch_irq_disabled_regs(regs) && !(regs->msr & MSR_EE)); Fix this by making an _async version of the handler which is called in the normal case, and a NMI version that is called for unrecoverable interrupts. Fixes: 2b43dd7653cc ("powerpc/64: enable MSR[EE] in irq replay pt_regs") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211004145642.1331214-6-npiggin@gmail.com
2021-10-07powerpc/64/interrupt: Reconcile soft-mask state in NMI and fix false BUGNicholas Piggin1-5/+8
If a NMI hits early in an interrupt handler before the irq soft-mask state is reconciled, that can cause a false-positive BUG with a CONFIG_PPC_IRQ_SOFT_MASK_DEBUG assertion. Remove that assertion and instead check the case that if regs->msr has EE clear, then regs->softe should be marked as disabled so the irq state looks correct to NMI handlers, the same as how it's fixed up in the case it was implicit soft-masked. This doesn't fix a known problem -- the change that was fixed by commit 4ec5feec1ad02 ("powerpc/64s: Make NMI record implicitly soft-masked code as irqs disabled") was the addition of a warning in the soft-nmi watchdog interrupt which can never actually fire when MSR[EE]=0. However it may be important if NMI handlers grow more code, and it's less surprising to anything using 'regs' - (I tripped over this when working in the area). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211004145642.1331214-5-npiggin@gmail.com
2021-10-07powerpc/64: warn if local irqs are enabled in NMI or hardirq contextNicholas Piggin1-0/+6
This can help catch bugs such as the one fixed by the previous change to prevent _exception() from enabling irqs. ppc32 could have a similar warning but it has no good config option to debug this stuff (the test may be overkill to add for production kernels). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211004145642.1331214-4-npiggin@gmail.com
2021-10-07powerpc/traps: do not enable irqs in _exceptionNicholas Piggin1-3/+9
_exception can be called by machine check handlers when the MCE hits user code (e.g., pseries and powernv). This will enable local irqs because, which is a dicey thing to do in NMI or hard irq context. This seemed to worked out okay because a userspace MCE can basically be treated like a synchronous interrupt (after async / imprecise MCEs are filtered out). Since NMI and hard irq handlers have started growing nmi_enter / irq_enter, and more irq state sanity checks, this has started to cause problems (or at least trigger warnings). The Fixes tag to the commit which introduced this rather than try to work out exactly which commit was the first that could possibly cause a problem because that may be difficult to prove. Fixes: 9f2f79e3a3c1 ("powerpc: Disable interrupts in 64-bit kernel FP and vector faults") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211004145642.1331214-3-npiggin@gmail.com
2021-10-07powerpc/64s: fix program check interrupt emergency stack pathNicholas Piggin1-7/+10
Emergency stack path was jumping into a 3: label inside the __GEN_COMMON_BODY macro for the normal path after it had finished, rather than jumping over it. By a small miracle this is the correct place to build up a new interrupt frame with the existing stack pointer, so things basically worked okay with an added weird looking 700 trap frame on top (which had the wrong ->nip so it didn't decode bug messages either). Fix this by avoiding using numeric labels when jumping over non-trivial macros. Before: LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: CPU: 0 PID: 88 Comm: sh Not tainted 5.15.0-rc2-00034-ge057cdade6e5 #2637 NIP: 7265677368657265 LR: c00000000006c0c8 CTR: c0000000000097f0 REGS: c0000000fffb3a50 TRAP: 0700 Not tainted MSR: 9000000000021031 <SF,HV,ME,IR,DR,LE> CR: 00000700 XER: 20040000 CFAR: c0000000000098b0 IRQMASK: 0 GPR00: c00000000006c964 c0000000fffb3cf0 c000000001513800 0000000000000000 GPR04: 0000000048ab0778 0000000042000000 0000000000000000 0000000000001299 GPR08: 000001e447c718ec 0000000022424282 0000000000002710 c00000000006bee8 GPR12: 9000000000009033 c0000000016b0000 00000000000000b0 0000000000000001 GPR16: 0000000000000000 0000000000000002 0000000000000000 0000000000000ff8 GPR20: 0000000000001fff 0000000000000007 0000000000000080 00007fff89d90158 GPR24: 0000000002000000 0000000002000000 0000000000000255 0000000000000300 GPR28: c000000001270000 0000000042000000 0000000048ab0778 c000000080647e80 NIP [7265677368657265] 0x7265677368657265 LR [c00000000006c0c8] ___do_page_fault+0x3f8/0xb10 Call Trace: [c0000000fffb3cf0] [c00000000000bdac] soft_nmi_common+0x13c/0x1d0 (unreliable) --- interrupt: 700 at decrementer_common_virt+0xb8/0x230 NIP: c0000000000098b8 LR: c00000000006c0c8 CTR: c0000000000097f0 REGS: c0000000fffb3d60 TRAP: 0700 Not tainted MSR: 9000000000021031 <SF,HV,ME,IR,DR,LE> CR: 22424282 XER: 20040000 CFAR: c0000000000098b0 IRQMASK: 0 GPR00: c00000000006c964 0000000000002400 c000000001513800 0000000000000000 GPR04: 0000000048ab0778 0000000042000000 0000000000000000 0000000000001299 GPR08: 000001e447c718ec 0000000022424282 0000000000002710 c00000000006bee8 GPR12: 9000000000009033 c0000000016b0000 00000000000000b0 0000000000000001 GPR16: 0000000000000000 0000000000000002 0000000000000000 0000000000000ff8 GPR20: 0000000000001fff 0000000000000007 0000000000000080 00007fff89d90158 GPR24: 0000000002000000 0000000002000000 0000000000000255 0000000000000300 GPR28: c000000001270000 0000000042000000 0000000048ab0778 c000000080647e80 NIP [c0000000000098b8] decrementer_common_virt+0xb8/0x230 LR [c00000000006c0c8] ___do_page_fault+0x3f8/0xb10 --- interrupt: 700 Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX ---[ end trace 6d28218e0cc3c949 ]--- After: ------------[ cut here ]------------ kernel BUG at arch/powerpc/kernel/exceptions-64s.S:491! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: CPU: 0 PID: 88 Comm: login Not tainted 5.15.0-rc2-00034-ge057cdade6e5-dirty #2638 NIP: c0000000000098b8 LR: c00000000006bf04 CTR: c0000000000097f0 REGS: c0000000fffb3d60 TRAP: 0700 Not tainted MSR: 9000000000021031 <SF,HV,ME,IR,DR,LE> CR: 24482227 XER: 00040000 CFAR: c0000000000098b0 IRQMASK: 0 GPR00: c00000000006bf04 0000000000002400 c000000001513800 c000000001271868 GPR04: 00000000100f0d29 0000000042000000 0000000000000007 0000000000000009 GPR08: 00000000100f0d29 0000000024482227 0000000000002710 c000000000181b3c GPR12: 9000000000009033 c0000000016b0000 00000000100f0d29 c000000005b22f00 GPR16: 00000000ffff0000 0000000000000001 0000000000000009 00000000100eed90 GPR20: 00000000100eed90 0000000010000000 000000001000a49c 00000000100f1430 GPR24: c000000001271868 0000000002000000 0000000000000215 0000000000000300 GPR28: c000000001271800 0000000042000000 00000000100f0d29 c000000080647860 NIP [c0000000000098b8] decrementer_common_virt+0xb8/0x230 LR [c00000000006bf04] ___do_page_fault+0x234/0xb10 Call Trace: Instruction dump: 4182000c 39400001 48000008 894d0932 714a0001 39400008 408225fc 718a4000 7c2a0b78 3821fcf0 41c20008 e82d0910 <0981fcf0> f92101a0 f9610170 f9810178 ---[ end trace a5dbd1f5ea4ccc51 ]--- Fixes: 0a882e28468f4 ("powerpc/64s/exception: remove bad stack branch") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211004145642.1331214-2-npiggin@gmail.com
2021-10-07powerpc/bpf ppc32: Fix BPF_SUB when imm == 0x80000000Naveen N. Rao1-1/+1
Special case handling of the smallest 32-bit negative number for BPF_SUB. Fixes: 51c66ad849a703 ("powerpc/bpf: Implement extended BPF on PPC32") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7135360a0cdf70adedbccf9863128b8daef18764.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
2021-10-07powerpc/bpf ppc32: Do not emit zero extend instruction for 64-bit BPF_ENDNaveen N. Rao1-1/+1
Suppress emitting zero extend instruction for 64-bit BPF_END_FROM_[L|B]E operation. Fixes: 51c66ad849a703 ("powerpc/bpf: Implement extended BPF on PPC32") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b4e3c3546121315a8e2059b19a1bda84971816e4.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
2021-10-07powerpc/bpf ppc32: Fix JMP32_JSET_KNaveen N. Rao1-1/+1
'andi' only takes an unsigned 16-bit value. Correct the imm range used when emitting andi. Fixes: 51c66ad849a703 ("powerpc/bpf: Implement extended BPF on PPC32") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b94489f52831305ec15aca4dd04a3527236be7e8.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
2021-10-07powerpc/bpf ppc32: Fix ALU32 BPF_ARSH operationNaveen N. Rao1-1/+1
Correct the destination register used for ALU32 BPF_ARSH operation. Fixes: 51c66ad849a703 ("powerpc/bpf: Implement extended BPF on PPC32") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/6d24c1f9e79b6f61f5135eaf2ea1e8bcd4dac87b.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
2021-10-07powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPECNaveen N. Rao2-8/+55
Emit similar instruction sequences to commit a048a07d7f4535 ("powerpc/64s: Add support for a store forwarding barrier at kernel entry/exit") when encountering BPF_NOSPEC. Mitigations are enabled depending on what the firmware advertises. In particular, we do not gate these mitigations based on current settings, just like in x86. Due to this, we don't need to take any action if mitigations are enabled or disabled at runtime. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/956570cbc191cd41f8274bed48ee757a86dac62a.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
2021-10-07powerpc/security: Add a helper to query stf_barrier typeNaveen N. Rao2-0/+10
Add a helper to return the stf_barrier type for the current processor. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3bd5d7f96ea1547991ac2ce3137dc2b220bae285.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
2021-10-07powerpc/bpf: Fix BPF_SUB when imm == 0x80000000Naveen N. Rao1-10/+17
We aren't handling subtraction involving an immediate value of 0x80000000 properly. Fix the same. Fixes: 156d0e290e969c ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Fold in fix from Naveen to use imm <= 32768] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/fc4b1276eb10761fd7ce0814c8dd089da2815251.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
2021-10-07powerpc/bpf: Fix BPF_MOD when imm == 1Naveen N. Rao1-2/+8
Only ignore the operation if dividing by 1. Fixes: 156d0e290e969c ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Tested-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c674ca18c3046885602caebb326213731c675d06.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
2021-10-07powerpc/bpf: Validate branch rangesNaveen N. Rao4-11/+37
Add checks to ensure that we never emit branch instructions with truncated branch offsets. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Tested-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/71d33a6b7603ec1013c9734dd8bdd4ff5e929142.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
2021-10-07powerpc/lib: Add helper to check if offset is within conditional branch rangeNaveen N. Rao3-7/+8
Add a helper to check if a given offset is within the branch range for a powerpc conditional branch instruction, and update some sites to use the new helper. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/442b69a34ced32ca346a0d9a855f3f6cfdbbbd41.1633464148.git.naveen.n.rao@linux.vnet.ibm.com
2021-10-04treewide: Replace the use of mem_encrypt_active() with cc_platform_has()Tom Lendacky2-7/+3
Replace uses of mem_encrypt_active() with calls to cc_platform_has() with the CC_ATTR_MEM_ENCRYPT attribute. Remove the implementation of mem_encrypt_active() across all arches. For s390, since the default implementation of the cc_platform_has() matches the s390 implementation of mem_encrypt_active(), cc_platform_has() does not need to be implemented in s390 (the config option ARCH_HAS_CC_PLATFORM is not set). Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210928191009.32551-9-bp@alien8.de