summaryrefslogtreecommitdiff
path: root/arch/powerpc/include
AgeCommit message (Collapse)AuthorFilesLines
2021-02-10KVM: PPC: Don't always report hash MMU capability for P9 < DD2.2Fabiano Rosas1-0/+1
These machines don't support running both MMU types at the same time, so remove the KVM_CAP_PPC_MMU_HASH_V3 capability when the host is using Radix MMU. [paulus@ozlabs.org - added defensive check on kvmppc_hv_ops->hash_v3_possible] Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host ↵Nicholas Piggin1-11/+0
without mixed mode support This reverts much of commit c01015091a770 ("KVM: PPC: Book3S HV: Run HPT guests on POWER9 radix hosts"), which was required to run HPT guests on RPT hosts on early POWER9 CPUs without support for "mixed mode", which meant the host could not run with MMU on while guests were running. This code has some corner case bugs, e.g., when the guest hits a machine check or HMI the primary locks up waiting for secondaries to switch LPCR to host, which they never do. This could all be fixed in software, but most CPUs in production have mixed mode support, and those that don't are believed to be all in installations that don't use this capability. So simplify things and remove support. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10KVM: PPC: Book3S HV: Introduce new capability for 2nd DAWRRavi Bangoria1-0/+1
Introduce KVM_CAP_PPC_DAWR1 which can be used by QEMU to query whether KVM supports 2nd DAWR or not. The capability is by default disabled even when the underlying CPU supports 2nd DAWR. QEMU needs to check and enable it manually to use the feature. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10KVM: PPC: Book3S HV: Add infrastructure to support 2nd DAWRRavi Bangoria3-1/+12
KVM code assumes single DAWR everywhere. Add code to support 2nd DAWR. DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/ unset it. Introduce new case H_SET_MODE_RESOURCE_SET_DAWR1 for 2nd DAWR. Also, KVM will support 2nd DAWR only if CPU_FTR_DAWR1 is set. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10KVM: PPC: Book3S HV: Rename current DAWR macros and variablesRavi Bangoria1-2/+2
Power10 is introducing a second DAWR (Data Address Watchpoint Register). Use real register names (with suffix 0) from ISA for current macros and variables used by kvm. One exception is KVM_REG_PPC_DAWR. Keep it as it is because it's uapi so changing it will break userspace. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-10KVM: PPC: Book3S HV: Allow nested guest creation when L0 hv_guest_state > L1Ravi Bangoria1-2/+15
On powerpc, L1 hypervisor takes help of L0 using H_ENTER_NESTED hcall to load L2 guest state in cpu. L1 hypervisor prepares the L2 state in struct hv_guest_state and passes a pointer to it via hcall. Using that pointer, L0 reads/writes that state directly from/to L1 memory. Thus L0 must be aware of hv_guest_state layout of L1. Currently it uses version field to achieve this. i.e. If L0 hv_guest_state.version != L1 hv_guest_state.version, L0 won't allow nested kvm guest. This restriction can be loosened up a bit. L0 can be taught to understand older layout of hv_guest_state, if we restrict the new members to be added only at the end, i.e. we can allow nested guest even when L0 hv_guest_state.version > L1 hv_guest_state.version. Though, the other way around is not possible. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2021-02-09KVM: Raise the maximum number of user memslotsVitaly Kuznetsov1-1/+0
Current KVM_USER_MEM_SLOTS limits are arch specific (512 on Power, 509 on x86, 32 on s390, 16 on MIPS) but they don't really need to be. Memory slots are allocated dynamically in KVM when added so the only real limitation is 'id_to_index' array which is 'short'. We don't have any other KVM_MEM_SLOTS_NUM/KVM_USER_MEM_SLOTS-sized statically defined structures. Low KVM_USER_MEM_SLOTS can be a limiting factor for some configurations. In particular, when QEMU tries to start a Windows guest with Hyper-V SynIC enabled and e.g. 256 vCPUs the limit is hit as SynIC requires two pages per vCPU and the guest is free to pick any GFN for each of them, this fragments memslots as QEMU wants to have a separate memslot for each of these pages (which are supposed to act as 'overlay' pages). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210127175731.2020089-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-08powerpc/32s: Allow constant folding in mtsr()/mfsr()Christophe Leroy1-2/+8
On the same way as we did in wrtee(), add an alternative using mtsr/mfsr instructions instead of mtsrin/mfsrin when the segment register can be determined at compile time. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/9baed0ff9d76723ec90f1b567ddd4ac1ecc7a190.1612612022.git.christophe.leroy@csgroup.eu
2021-02-08powerpc/32s: mfsrin()/mtsrin() become mfsr()/mtsr()Christophe Leroy2-6/+6
Function names should tell what the function does, not how. mfsrin() and mtsrin() are read/writing segment registers. They are called that way because they are using mfsrin and mtsrin instructions, but it doesn't matter for the caller. In preparation of following patch, change their name to mfsr() and mtsr() in order to make it obvious they manipulate segment registers without messing up with how they do it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f92d99f4349391b77766745900231aa880a0efb5.1612612022.git.christophe.leroy@csgroup.eu
2021-02-08powerpc/32s: Change mfsrin() into a static inline functionChristophe Leroy1-3/+8
mfsrin() is a macro. Change in into an inline function to avoid conflicts in KVM and make it more evolutive. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/72c7b9879e2e2e6f5c27dadda6486386c2b50f23.1612612022.git.christophe.leroy@csgroup.eu
2021-02-08powerpc/uaccess: Perform barrier_nospec() in KUAP allowance helpersChristophe Leroy2-11/+3
barrier_nospec() in uaccess helpers is there to protect against speculative accesses around access_ok(). When using user_access_begin() sequences together with unsafe_get_user() like macros, barrier_nospec() is called for every single read although we know the access_ok() is done onece. Since all user accesses must be granted by a call to either allow_read_from_user() or allow_read_write_user() which will always happen after the access_ok() check, move the barrier_nospec() there. Reported-by: Christopher M. Riedl <cmr@codefail.de> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c72f014730823b413528e90ab6c4d3bcb79f8497.1612692067.git.christophe.leroy@csgroup.eu
2021-02-08powerpc/64s: Implement ptep_clear_flush_young that does not flush TLBsNicholas Piggin1-3/+20
Similarly to the x86 commit b13b1d2d8692 ("x86/mm: In the PTE swapout page reclaim case clear the accessed bit instead of flushing the TLB"), implement ptep_clear_flush_young that does not actually flush the TLB in the case the referenced bit is cleared. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201217134731.488135-8-npiggin@gmail.com
2021-02-08powerpc/perf: Expose Performance Monitor Counter SPR's as part of extended regsAthira Rajeev2-6/+24
Currently Monitor Mode Control Registers and Sampling registers are part of extended regs. Patch adds support to include Performance Monitor Counter Registers (PMC1 to PMC6 ) as part of extended registers. PMCs are saved in the perf interrupt handler as part of per-cpu array 'pmcs' in struct cpu_hw_events. While capturing the register values for extended regs, fetch these saved PMC values. Simplified the PERF_REG_PMU_MASK_300/31 definition to include PMU SPRs MMCR0 to PMC6. Exclude the unsupported SPRs (MMCR3, SIER2, SIER3) from extended mask value for CPU_FTR_ARCH_300 in the new definition. PERF_REG_EXTENDED_MAX is used to check if any index beyond the extended registers is requested in the sample. Have one PERF_REG_EXTENDED_MAX for CPU_FTR_ARCH_300/CPU_FTR_ARCH_31 since perf_reg_validate function already checks the extended mask for the presence of any unsupported register. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1612335337-1888-3-git-send-email-atrajeev@linux.vnet.ibm.com
2021-02-08powerpc/pkeys: Remove unused codeSandipan Das2-9/+0
This removes arch_supports_pkeys(), arch_usable_pkeys() and thread_pkey_regs_*() which are remnants from the following: commit 06bb53b33804 ("powerpc: store and restore the pkey state across context switches") commit 2cd4bd192ee9 ("powerpc/pkeys: Fix handling of pkey state across fork()") commit cf43d3b26452 ("powerpc: Enable pkey subsystem") arch_supports_pkeys() and arch_usable_pkeys() were unused since their introduction while thread_pkey_regs_*() became unused after the introduction of the following: commit d5fa30e6993f ("powerpc/book3s64/pkeys: Reset userspace AMR correctly on exec") commit 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode") Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Reviewed-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210202150050.75335-1-sandipan@linux.ibm.com
2021-02-08powerpc: remove unneeded semicolonsChengyang Fan12-20/+20
Remove superfluous semicolons after function definitions. Signed-off-by: Chengyang Fan <cy.fan@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210125095338.1719405-1-cy.fan@huawei.com
2021-02-08powerpc/64s: runlatch interrupt handling in CNicholas Piggin1-0/+7
There is no need for this to be in asm, use the new intrrupt entry wrapper. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-42-npiggin@gmail.com
2021-02-08powerpc/64s: move NMI soft-mask handling to CNicholas Piggin1-0/+25
Saving and restoring soft-mask state can now be done in C using the interrupt handler wrapper functions. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-41-npiggin@gmail.com
2021-02-08powerpc: move NMI entry/exit code into wrapperNicholas Piggin1-0/+28
This moves the common NMI entry and exit code into the interrupt handler wrappers. This changes the behaviour of soft-NMI (watchdog) and HMI interrupts, and also MCE interrupts on 64e, by adding missing parts of the NMI entry to them. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-40-npiggin@gmail.com
2021-02-08powerpc/64: entry cpu time accounting in CNicholas Piggin2-24/+6
There is no need for this to be in asm, use the new interrupt entry wrapper. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-39-npiggin@gmail.com
2021-02-08powerpc/64: move account_stolen_time into its own functionNicholas Piggin1-0/+14
This will be used by interrupt entry as well. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-38-npiggin@gmail.com
2021-02-08powerpc/64s: reconcile interrupts in CNicholas Piggin1-4/+11
There is no need for this to be in asm, use the new intrrupt entry wrapper. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-37-npiggin@gmail.com
2021-02-08powerpc/64s: move context tracking exit to interrupt exit pathNicholas Piggin1-3/+31
The interrupt handler wrapper functions are not the ideal place to maintain context tracking because after they return, the low level exit code must then determine if there are interrupts to replay, or if the task should be preempted, etc. Those paths (e.g., schedule_user) include their own exception_enter/exit pairs to fix this up but it's a bit hacky (see schedule_user() comments). Ideally context tracking will go to user mode only when there are no more interrupts or context switches or other exit processing work to handle. 64e can not do this because it does not use the C interrupt exit code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-36-npiggin@gmail.com
2021-02-08powerpc: handle irq_enter/irq_exit in interrupt handler wrappersNicholas Piggin1-0/+2
Move irq_enter/irq_exit into asynchronous interrupt handler wrappers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-35-npiggin@gmail.com
2021-02-08powerpc/64: add context tracking to asynchronous interruptsNicholas Piggin1-0/+2
Previously context tracking was not done for asynchronous interrupts, (those that run in interrupt context), and if those would cause a reschedule when they exit, then scheduling functions (schedule_user, preempt_schedule_irq) call exception_enter/exit to fix this up and exit user context. This is a hack we would like to get away from, so do context tracking for asynchronous interrupts too. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-34-npiggin@gmail.com
2021-02-08powerpc/64: context tracking move to interrupt wrappersNicholas Piggin1-0/+9
This moves exception_enter/exit calls to wrapper functions for synchronous interrupts. More interrupt handlers are covered by this than previously. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-33-npiggin@gmail.com
2021-02-08powerpc/64s/hash: improve context tracking of hash faultsNicholas Piggin1-0/+1
This moves the 64s/hash context tracking from hash_page_mm() to __do_hash_fault(), so it's no longer called by OCXL / SPU accelerators, which was certainly the wrong thing to be doing, because those callers are not low level interrupt handlers, so should have entered a kernel context tracking already. Then remain in kernel context for the duration of the fault, rather than enter/exit for the hash fault then enter/exit for the page fault, which is pointless. Even still, calling exception_enter/exit in __do_hash_fault seems questionable because that's touching per-cpu variables, tracing, etc., which might have been interrupted by this hash fault or themselves cause hash faults. But maybe I miss something because hash_page_mm very deliberately calls trace_hash_fault too, for example. So for now go with it, it's no worse than before, in this regard. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-32-npiggin@gmail.com
2021-02-08powerpc/64: context tracking remove _TIF_NOHZNicholas Piggin1-3/+1
Add context tracking to the system call handler explicitly, and remove _TIF_NOHZ. This improves system call performance when nohz_full is enabled. On a POWER9, gettid scv system call cost on a nohz_full CPU improves from 1129 cycles to 1004 cycles and on a housekeeping CPU from 550 cycles to 430 cycles. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-31-npiggin@gmail.com
2021-02-08powerpc: add interrupt_cond_local_irq_enable helperNicholas Piggin1-0/+7
Simple helper for synchronous interrupt handlers (i.e., process-context) to enable interrupts if it was taken in an interrupts-enabled context. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-30-npiggin@gmail.com
2021-02-08powerpc: convert interrupt handlers to use wrappersNicholas Piggin6-43/+67
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-29-npiggin@gmail.com
2021-02-08powerpc: add interrupt wrapper entry / exit stub functionsNicholas Piggin1-0/+66
These will be used by subsequent patches. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-28-npiggin@gmail.com
2021-02-08powerpc: interrupt handler wrapper functionsNicholas Piggin1-0/+169
Add wrapper functions (derived from x86 macros) for interrupt handler functions. This allows interrupt entry code to be written in C. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-27-npiggin@gmail.com
2021-02-08powerpc: introduce die_mceNicholas Piggin1-0/+1
As explained by commit daf00ae71dad ("powerpc/traps: restore recoverability of machine_check interrupts"), die() can't be called from within nmi_enter to nicely kill a process context that was interrupted. nmi_exit must be called first. This adds a function die_mce which takes care of this for machine check handlers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-24-npiggin@gmail.com
2021-02-08powerpc: add and use unknown_async_exceptionNicholas Piggin1-0/+1
This is currently the same as unknown_exception, but it will diverge after interrupt wrappers are added and code moved out of asm into the wrappers (e.g., async handlers will check FINISH_NAP). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-22-npiggin@gmail.com
2021-02-08powerpc/time: move timer_broadcast_interrupt prototype to asm/time.hNicholas Piggin2-1/+2
Interrupt handler prototypes are going to be rearranged in a future patch, so tidy this out of the way first. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-21-npiggin@gmail.com
2021-02-08powerpc/64s: add do_bad_page_fault_segv handlerNicholas Piggin1-0/+1
This function acts like an interrupt handler so it needs to follow the standard interrupt handler function signature which will be introduced in a future change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-13-npiggin@gmail.com
2021-02-08powerpc: bad_page_fault get registers from regsNicholas Piggin1-2/+2
Similar to the previous patch this makes interrupt handler function types more regular so they can be wrapped with the next patch. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-12-npiggin@gmail.com
2021-02-08powerpc: do_break get registers from regsNicholas Piggin1-2/+1
Similar to the previous patch this makes interrupt handler function types more regular so they can be wrapped with the next patch. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-9-npiggin@gmail.com
2021-02-08powerpc: remove arguments from fault handler functionsNicholas Piggin3-4/+4
Make mm fault handlers all just take the pt_regs * argument and load DAR/DSISR from that. Make those that return a value return long. This is done to make the function signatures match other handlers, which will help with a future patch to add wrappers. Explicit arguments could be added for performance but that would require more wrapper macro variants. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-7-npiggin@gmail.com
2021-02-08powerpc/64s: move the hash fault handling logic to CNicholas Piggin1-0/+1
The fault handling still has some complex logic particularly around hash table handling, in asm. Implement most of this in C. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-6-npiggin@gmail.com
2021-02-06powerpc/kuap: Allow kernel thread to access userspace after kthread_use_mmAneesh Kumar K.V2-9/+11
This fix the bad fault reported by KUAP when io_wqe_worker access userspace. Bug: Read fault blocked by KUAP! WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0 NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0 LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 .......... Call Trace: [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable) [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120 [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0 .......... NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0 LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0 interrupt: 300 [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280 [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780 [c000000016367990] [c000000000710330] iomap_file_buffered_write+0xa0/0x120 [c0000000163679e0] [c00800000040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs] [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460 [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200 [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250 [c000000016367cb0] [c0000000006e2578] io_worker_handle_work+0x498/0x800 [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0 [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0 [c000000016367e10] [c00000000000dbf0] ret_from_kernel_thread+0x5c/0x6c The kernel consider thread AMR value for kernel thread to be AMR_KUAP_BLOCKED. Hence access to userspace is denied. This of course not correct and we should allow userspace access after kthread_use_mm(). To be precise, kthread_use_mm() should inherit the AMR value of the operating address space. But, the AMR value is thread-specific and we inherit the address space and not thread access restrictions. Because of this ignore AMR value when accessing userspace via kernel thread. current_thread_amr/iamr() are updated, because we use them in the below stack. .... [ 530.710838] CPU: 13 PID: 5587 Comm: io_wqe_worker-0 Tainted: G D 5.11.0-rc6+ #3 .... NIP [c0000000000aa0c8] pkey_access_permitted+0x28/0x90 LR [c0000000004b9278] gup_pte_range+0x188/0x420 --- interrupt: 700 [c00000001c4ef3f0] [0000000000000000] 0x0 (unreliable) [c00000001c4ef490] [c0000000004bd39c] gup_pgd_range+0x3ac/0xa20 [c00000001c4ef5a0] [c0000000004bdd44] internal_get_user_pages_fast+0x334/0x410 [c00000001c4ef620] [c000000000852028] iov_iter_get_pages+0xf8/0x5c0 [c00000001c4ef6a0] [c0000000007da44c] bio_iov_iter_get_pages+0xec/0x700 [c00000001c4ef770] [c0000000006a325c] iomap_dio_bio_actor+0x2ac/0x4f0 [c00000001c4ef810] [c00000000069cd94] iomap_apply+0x2b4/0x740 [c00000001c4ef920] [c0000000006a38b8] __iomap_dio_rw+0x238/0x5c0 [c00000001c4ef9d0] [c0000000006a3c60] iomap_dio_rw+0x20/0x80 [c00000001c4ef9f0] [c008000001927a30] xfs_file_dio_aio_write+0x1f8/0x650 [xfs] [c00000001c4efa60] [c0080000019284dc] xfs_file_write_iter+0xc4/0x130 [xfs] [c00000001c4efa90] [c000000000669984] io_write+0x104/0x4b0 [c00000001c4efbb0] [c00000000066cea4] io_issue_sqe+0x3d4/0xf50 [c00000001c4efc60] [c000000000670200] io_wq_submit_work+0xb0/0x2f0 [c00000001c4efcb0] [c000000000674268] io_worker_handle_work+0x248/0x4a0 [c00000001c4efd30] [c0000000006746e8] io_wqe_worker+0x228/0x2a0 [c00000001c4efda0] [c00000000019d994] kthread+0x1b4/0x1c0 Fixes: 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode.") Reported-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210206025634.521979-1-aneesh.kumar@linux.ibm.com
2021-02-03powerpc/pci: Add ppc_md.discover_phbs()Oliver O'Halloran1-0/+3
On many powerpc platforms the discovery and initalisation of pci_controllers (PHBs) happens inside of setup_arch(). This is very early in boot (pre-initcalls) and means that we're initialising the PHB long before many basic kernel services (slab allocator, debugfs, a real ioremap) are available. On PowerNV this causes an additional problem since we map the PHB registers with ioremap(). As of commit d538aadc2718 ("powerpc/ioremap: warn on early use of ioremap()") a warning is printed because we're using the "incorrect" API to setup and MMIO mapping in searly boot. The kernel does provide early_ioremap(), but that is not intended to create long-lived MMIO mappings and a seperate warning is printed by generic code if early_ioremap() mappings are "leaked." This is all fixable with dumb hacks like using early_ioremap() to setup the initial mapping then replacing it with a real ioremap later on in boot, but it does raise the question: Why the hell are we setting up the PHB's this early in boot? The old and wise claim it's due to "hysterical rasins." Aside from amused grapes there doesn't appear to be any real reason to maintain the current behaviour. Already most of the newer embedded platforms perform PHB discovery in an arch_initcall and between the end of setup_arch() and the start of initcalls none of the generic kernel code does anything PCI related. On powerpc scanning PHBs occurs in a subsys_initcall so it should be possible to move the PHB discovery to a core, postcore or arch initcall. This patch adds the ppc_md.discover_phbs hook and a core_initcall stub that calls it. The core_initcalls are the earliest to be called so this will any possibly issues with dependency between initcalls. This isn't just an academic issue either since on pseries and PowerNV EEH init occurs in an arch_initcall and depends on the pci_controllers being available, similarly the creation of pci_dns occurs at core_initcall_sync (i.e. between core and postcore initcalls). These problems need to be addressed seperately. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> [mpe: Make discover_phbs() static] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201103043523.916109-1-oohall@gmail.com
2021-01-31powerpc: Fix build error in paravirt.hMichal Suchanek1-0/+1
./arch/powerpc/include/asm/paravirt.h:83:44: error: implicit declaration of function 'smp_processor_id'; did you mean 'raw_smp_processor_id'? smp_processor_id is defined in linux/smp.h but it is not included. The build error happens only when the patch is applied to 5.3 kernel but it only works by chance in mainline. Fixes: ca3f969dcb11 ("powerpc/paravirt: Use is_kvm_guest() in vcpu_is_preempted()") Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210120132838.15589-1-msuchanek@suse.de
2021-01-31powerpc/mce: Remove per cpu variables from MCE handlersGanesh Goudar2-0/+22
Access to per-cpu variables requires translation to be enabled on pseries machine running in hash mmu mode, Since part of MCE handler runs in realmode and part of MCE handling code is shared between ppc architectures pseries and powernv, it becomes difficult to manage these variables differently on different architectures, So have these variables in paca instead of having them as per-cpu variables to avoid complications. Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210128104143.70668-2-ganeshgr@linux.ibm.com
2021-01-31powerpc/mce: Reduce the size of event arraysGanesh Goudar1-1/+1
Maximum recursive depth of MCE is 4, Considering the maximum depth allowed reduce the size of event to 10 from 100. This saves us ~19kB of memory and has no fatal consequences. Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210128104143.70668-1-ganeshgr@linux.ibm.com
2021-01-31powerpc/pci: Delete traverse_pci_dn()Oliver O'Halloran1-3/+0
Nothing uses it. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200902035121.1762475-1-oohall@gmail.com
2021-01-30KVM: PPC: Book3S HV: Declare some prototypesCédric Le Goater1-0/+7
This fixes these W=1 compile errors: ../arch/powerpc/kvm/book3s_hv_builtin.c:425:6: error: no previous prototype for ‘kvmppc_read_intr’ [-Werror=missing-prototypes] 425 | long kvmppc_read_intr(void) | ^~~~~~~~~~~~~~~~ ../arch/powerpc/kvm/book3s_hv_builtin.c:652:6: error: no previous prototype for ‘kvmppc_bad_interrupt’ [-Werror=missing-prototypes] 652 | void kvmppc_bad_interrupt(struct pt_regs *regs) | ^~~~~~~~~~~~~~~~~~~~ ../arch/powerpc/kvm/book3s_hv_builtin.c:695:6: error: no previous prototype for ‘kvmhv_p9_set_lpcr’ [-Werror=missing-prototypes] 695 | void kvmhv_p9_set_lpcr(struct kvm_split_mode *sip) | ^~~~~~~~~~~~~~~~~ ../arch/powerpc/kvm/book3s_hv_builtin.c:740:6: error: no previous prototype for ‘kvmhv_p9_restore_lpcr’ [-Werror=missing-prototypes] 740 | void kvmhv_p9_restore_lpcr(struct kvm_split_mode *sip) | ^~~~~~~~~~~~~~~~~~~~~ ../arch/powerpc/kvm/book3s_hv_builtin.c:768:6: error: no previous prototype for ‘kvmppc_set_msr_hv’ [-Werror=missing-prototypes] 768 | void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 msr) | ^~~~~~~~~~~~~~~~~ ../arch/powerpc/kvm/book3s_hv_builtin.c:817:6: error: no previous prototype for ‘kvmppc_inject_interrupt_hv’ [-Werror=missing-prototypes] 817 | void kvmppc_inject_interrupt_hv(struct kvm_vcpu *vcpu, int vec, u64 srr1_flags) Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210104143206.695198-21-clg@kaod.org
2021-01-30powerpc/watchdog: Declare soft_nmi_interrupt() prototypeCédric Le Goater1-0/+1
soft_nmi_interrupt() usage requires PPC_WATCHDOG to be configured. Check the CONFIG definition to declare the prototype. It fixes this W=1 compile error : ../arch/powerpc/kernel/watchdog.c:250:6: error: no previous prototype for ‘soft_nmi_interrupt’ [-Werror=missing-prototypes] 250 | void soft_nmi_interrupt(struct pt_regs *regs) | ^~~~~~~~~~~~~~~~~~ Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210104143206.695198-18-clg@kaod.org
2021-01-30powerpc/mm: Declare arch_report_meminfo() prototype.Cédric Le Goater1-0/+3
It fixes this W=1 compile error : ../arch/powerpc/mm/book3s64/pgtable.c:411:6: error: no previous prototype for ‘arch_report_meminfo’ [-Werror=missing-prototypes] 411 | void arch_report_meminfo(struct seq_file *m) | ^~~~~~~~~~~~~~~~~~~ Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210104143206.695198-17-clg@kaod.org
2021-01-30powerpc/mm: Declare preload_new_slb_context() prototypeCédric Le Goater1-0/+1
It fixes this W=1 compile error : ../arch/powerpc/mm/book3s64/slb.c:380:6: error: no previous prototype for ‘preload_new_slb_context’ [-Werror=missing-prototypes] 380 | void preload_new_slb_context(unsigned long start, unsigned long sp) | ^~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210104143206.695198-15-clg@kaod.org
2021-01-30powerpc/mm: Move hpte_insert_repeating() prototypeCédric Le Goater1-0/+2
It fixes this W=1 compile error : ../arch/powerpc/mm/book3s64/hash_utils.c:1867:6: error: no previous prototype for ‘hpte_insert_repeating’ [-Werror=missing-prototypes] 1867 | long hpte_insert_repeating(unsigned long hash, unsigned long vpn, | ^~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210104143206.695198-14-clg@kaod.org