summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)AuthorFilesLines
2022-05-19powerpc: Add CONFIG_PPC64_ELF_ABI_V1 and CONFIG_PPC64_ELF_ABI_V2Christophe Leroy3-5/+13
At the time being, we use CONFIG_CPU_LITTLE_ENDIAN and CONFIG_CPU_BIG_ENDIAN to pass -mabi=elfv1 or elfv2 to compiler, then define a PPC64_ELF_ABI_v1 or PPC64_ELF_ABI_v2 macro in asm/types.h based on _CALL_ELF define set by the compiler. Make it more straight forward with a CONFIG option that is directly usable. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1eca1addbc550167da9841c7340a010d0c4b2200.1652074503.git.christophe.leroy@csgroup.eu
2022-05-19powerpc/ftrace: Use patch_instruction() return directlyChristophe Leroy2-5/+2
Instead of returning -EPERM when patch_instruction() fails, just return what patch_instruction returns. That simplifies ftrace_modify_code(): 0: 94 21 ff c0 stwu r1,-64(r1) 4: 93 e1 00 3c stw r31,60(r1) 8: 7c 7f 1b 79 mr. r31,r3 c: 40 80 00 30 bge 3c <ftrace_modify_code+0x3c> 10: 93 c1 00 38 stw r30,56(r1) 14: 7c 9e 23 78 mr r30,r4 18: 7c a4 2b 78 mr r4,r5 1c: 80 bf 00 00 lwz r5,0(r31) 20: 7c 1e 28 40 cmplw r30,r5 24: 40 82 00 34 bne 58 <ftrace_modify_code+0x58> 28: 83 c1 00 38 lwz r30,56(r1) 2c: 7f e3 fb 78 mr r3,r31 30: 83 e1 00 3c lwz r31,60(r1) 34: 38 21 00 40 addi r1,r1,64 38: 48 00 00 00 b 38 <ftrace_modify_code+0x38> 38: R_PPC_REL24 patch_instruction Before: 0: 94 21 ff c0 stwu r1,-64(r1) 4: 93 e1 00 3c stw r31,60(r1) 8: 7c 7f 1b 79 mr. r31,r3 c: 40 80 00 4c bge 58 <ftrace_modify_code+0x58> 10: 93 c1 00 38 stw r30,56(r1) 14: 7c 9e 23 78 mr r30,r4 18: 7c a4 2b 78 mr r4,r5 1c: 80 bf 00 00 lwz r5,0(r31) 20: 7c 08 02 a6 mflr r0 24: 90 01 00 44 stw r0,68(r1) 28: 7c 1e 28 40 cmplw r30,r5 2c: 40 82 00 48 bne 74 <ftrace_modify_code+0x74> 30: 7f e3 fb 78 mr r3,r31 34: 48 00 00 01 bl 34 <ftrace_modify_code+0x34> 34: R_PPC_REL24 patch_instruction 38: 80 01 00 44 lwz r0,68(r1) 3c: 20 63 00 00 subfic r3,r3,0 40: 83 c1 00 38 lwz r30,56(r1) 44: 7c 63 19 10 subfe r3,r3,r3 48: 7c 08 03 a6 mtlr r0 4c: 83 e1 00 3c lwz r31,60(r1) 50: 38 21 00 40 addi r1,r1,64 54: 4e 80 00 20 blr It improves ftrace activation/deactivation duration by about 3%. Modify patch_instruction() return on failure to -EPERM in order to match with ftrace expectations. Other users of patch_instruction() do not care about the exact error value returned. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/49a8597230713e2633e7d9d7b56140787c4a7e20.1652074503.git.christophe.leroy@csgroup.eu
2022-05-19powerpc/ftrace: Inline ftrace_modify_code()Christophe Leroy1-1/+1
Inlining ftrace_modify_code(), it increases a bit the size of ftrace code but brings 5% improvment on ftrace activation. Usually in C files we let gcc decide what to do but here it really help to 'help' gcc to decide to inline, thought we don't want to force it with an __always_inline that would be too much for CONFIG_CC_OPTIMIZE_FOR_SIZE. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1597a06d57cfc80e6853838c4066e799bf6c7977.1652074503.git.christophe.leroy@csgroup.eu
2022-05-19powerpc/code-patching: Inline create_branch()Christophe Leroy2-22/+20
create_branch() is a good candidate for inlining because: - Flags can be folded in. - Range tests are likely to be already done. Hence reducing the create_branch() to only a set of instructions. So inline it. It improves ftrace activation by 10%. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/69851cc9a7bf8f03d025e6d29e165f2d0bd3bb6e.1652074503.git.christophe.leroy@csgroup.eu
2022-05-19powerpc/ftrace: Use is_offset_in_branch_range()Christophe Leroy1-6/+2
Use is_offset_in_branch_range() instead of create_branch() to check if a target is within branch range. This patch together with the previous one improves ftrace activation time by 7% Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/912ae51782f5a53c44e435497c8c3fb5cc632387.1652074503.git.christophe.leroy@csgroup.eu
2022-05-19powerpc/code-patching: Inline is_offset_in_{cond}_branch_range()Christophe Leroy2-29/+27
Test in is_offset_in_branch_range() and is_offset_in_cond_branch_range() are simple tests that are worth inlining. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a05be0ccb7373e6a9789a1988fcd0c810f5f9269.1652074503.git.christophe.leroy@csgroup.eu
2022-05-19powerpc/ftrace: Remove redundant create_branch() callsChristophe Leroy1-20/+0
Since commit d5937db114e4 ("powerpc/code-patching: Fix patch_branch() return on out-of-range failure") patch_branch() fails with -ERANGE when trying to branch out of range. No need to perform the test twice. Remove redundant create_branch() calls. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/aa45fbad0b4b7493080835d8276c0cb4ce146503.1652074503.git.christophe.leroy@csgroup.eu
2022-05-19powerpc/ftrace: Refactor prepare_ftrace_return()Christophe Leroy1-3/+9
When we have CONFIG_DYNAMIC_FTRACE_WITH_ARGS, prepare_ftrace_return() is called by ftrace_graph_func() otherwise prepare_ftrace_return() is called from assembly. Refactor prepare_ftrace_return() into a static __prepare_ftrace_return() that will be called by both prepare_ftrace_return() and ftrace_graph_func(). It will allow GCC to fold __prepare_ftrace_return() inside ftrace_graph_func(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0d42deafe353980c66cf19d3132638c05ba9f4a9.1652074503.git.christophe.leroy@csgroup.eu
2022-05-19powerpc/rtas: enture rtas_call is called with MMU enabledNicholas Piggin1-0/+5
rtas_call must not be called with the MMU disabled because in case of rtas error, log_error is called which requires MMU enabled. Add a test and warning for this. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220308135047.478297-14-npiggin@gmail.com
2022-05-19powerpc/rtas: Leave MSR[RI] enabled over RTAS callNicholas Piggin1-9/+2
PAPR specifies that RTAS may be called with MSR[RI] enabled if the calling context is recoverable, and RTAS will manage RI as necessary. Call the rtas entry point with RI enabled, and add a check to ensure the caller has RI enabled. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220308135047.478297-10-npiggin@gmail.com
2022-05-19powerpc/rtas: PACA can be restored directly from SPRGNicholas Piggin1-9/+4
On 64-bit, PACA is saved in a SPRG so it does not need to be saved on stack. We also don't need to mask off the top bits for real mode addresses because the architecture does this for us. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220308135047.478297-8-npiggin@gmail.com
2022-05-19powerpc/rtas: Call enter_rtas with MSR[EE] disabledNicholas Piggin2-15/+4
Disable MSR[EE] in C code rather than asm. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220308135047.478297-5-npiggin@gmail.com
2022-05-19powerpc/rtas: Fix whitespace in rtas_entry.SNicholas Piggin1-13/+13
The code was moved verbatim including whitespace cruft. Fix that. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220308135047.478297-4-npiggin@gmail.com
2022-05-19powerpc/rtas: Make enter_rtas a nokprobe symbol on 64-bitNicholas Piggin1-0/+1
This symbol is marked nokprobe on 32-bit but not 64-bit, add it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220308135047.478297-3-npiggin@gmail.com
2022-05-19powerpc/rtas: Move rtas entry assembly into its own fileNicholas Piggin4-200/+199
This makes working on the code a bit easier. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220308135047.478297-2-npiggin@gmail.com
2022-05-19powerpc/signal: Report minimum signal frame size to userspace via AT_MINSIGSTKSZNicholas Piggin6-2/+47
Implement the AT_MINSIGSTKSZ AUXV entry, allowing userspace to dynamically size stack allocations in a manner forward-compatible with new processor state saved in the signal frame For now these statically find the maximum signal frame size rather than doing any runtime testing of features to minimise the size. glibc 2.34 will take advantage of this, as will applications that use use _SC_MINSIGSTKSZ and _SC_SIGSTKSZ. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> References: 94b07c1f8c39 ("arm64: signal: Report signal frame size to userspace via auxv") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220307182734.289289-2-npiggin@gmail.com
2022-05-19powerpc/64: Bump SIGSTKSZ and MINSIGSTKSZNicholas Piggin1-0/+5
The sad tale of SIGSTKSZ and MINSIGSTKSZ is documented in glibc.git commit f7c399cff5bd ("PowerPC SIGSTKSZ"), which explains why glibc does not use the kernel defines for these constants. Since then in fact there has been a further expansion of the signal stack frame size on little-endian with linux commit 573ebfa6601f ("powerpc: Increase stack redzone for 64-bit userspace to 512 bytes"), which has caused it to exceed even the glibc defines. See kernel commit 63dee5df43a3 ("powerpc: Allow 4224 bytes of stack expansion for the signal frame") for more details of the history of the expansion. Increase MINSIGSTKSZ to 8192 which is double the current glibc value and fits the current stack frame with room to grow. SIGSTKSZ is set to 4x the minimum as convention. glibc will have to be updated as well. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220307182734.289289-1-npiggin@gmail.com
2022-05-19powerpc/vdso: Link with ld.lld when requestedNathan Chancellor1-0/+1
The PowerPC vDSO uses $(CC) to link, which differs from the rest of the kernel, which uses $(LD) directly. As a result, the default linker of the compiler is used, which may differ from the linker requested by the builder. For example: $ make ARCH=powerpc LLVM=1 mrproper defconfig arch/powerpc/kernel/vdso/ ... $ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg File: arch/powerpc/kernel/vdso/vdso32.so.dbg String dump of section '.comment': [ 0] clang version 14.0.0 (Fedora 14.0.0-1.fc37) File: arch/powerpc/kernel/vdso/vdso64.so.dbg String dump of section '.comment': [ 0] clang version 14.0.0 (Fedora 14.0.0-1.fc37) LLVM=1 sets LD=ld.lld but ld.lld is not used to link the vDSO; GNU ld is because "ld" is the default linker for clang on most Linux platforms. This is a problem for Clang's Link Time Optimization as implemented in the kernel because use of GNU ld with LTO requires the LLVMgold plugin, which is not technically supported for ld.bfd per https://llvm.org/docs/GoldPlugin.html. Furthermore, if LLVMgold.so is missing from a user's system, the build will fail, even though LTO as it is implemented in the kernel requires ld.lld to avoid this dependency in the first place. Ultimately, the PowerPC vDSO should be converted to compiling and linking with $(CC) and $(LD) respectively but there were issues last time this was tried, potentially due to older but supported tool versions. To avoid regressing GCC + binutils, use the compiler option '-fuse-ld', which tells the compiler which linker to use when it is invoked as both the compiler and linker. Use '-fuse-ld=lld' when LD=ld.lld has been specified (CONFIG_LD_IS_LLD) so that the vDSO is linked with the same linker as the rest of the kernel. $ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg File: arch/powerpc/kernel/vdso/vdso32.so.dbg String dump of section '.comment': [ 0] Linker: LLD 14.0.0 [ 14] clang version 14.0.0 (Fedora 14.0.0-1.fc37) File: arch/powerpc/kernel/vdso/vdso64.so.dbg String dump of section '.comment': [ 0] Linker: LLD 14.0.0 [ 14] clang version 14.0.0 (Fedora 14.0.0-1.fc37) LD can be a full path to ld.lld, which will not be handled properly by '-fuse-ld=lld' if the full path to ld.lld is outside of the compiler's search path. '-fuse-ld' can take a path to the linker but it is deprecated in clang 12.0.0; '--ld-path' is preferred for this scenario. Use '--ld-path' if it is supported, as it will handle a full path or just 'ld.lld' properly. See the LLVM commit below for the full details of '--ld-path'. Signed-off-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://github.com/ClangBuiltLinux/linux/issues/774 Link: https://github.com/llvm/llvm-project/commit/1bc5c84710a8c73ef21295e63c19d10a8c71f2f5 Link: https://lore.kernel.org/r/20220511185001.3269404-3-nathan@kernel.org
2022-05-19powerpc/vdso: Remove unused ENTRY in linker scriptsNathan Chancellor2-2/+0
When linking vdso{32,64}.so.dbg with ld.lld, there is a warning about not finding _start for the starting address: ld.lld: warning: cannot find entry symbol _start; not setting start address ld.lld: warning: cannot find entry symbol _start; not setting start address Looking at GCC + GNU ld, the entry point address is 0x0: $ llvm-readelf -h vdso{32,64}.so.dbg &| rg "(File|Entry point address):" File: vdso32.so.dbg Entry point address: 0x0 File: vdso64.so.dbg Entry point address: 0x0 This matches what ld.lld emits: $ powerpc64le-linux-gnu-readelf -p .comment vdso{32,64}.so.dbg File: vdso32.so.dbg String dump of section '.comment': [ 0] Linker: LLD 14.0.0 [ 14] clang version 14.0.0 (Fedora 14.0.0-1.fc37) File: vdso64.so.dbg String dump of section '.comment': [ 0] Linker: LLD 14.0.0 [ 14] clang version 14.0.0 (Fedora 14.0.0-1.fc37) $ llvm-readelf -h vdso{32,64}.so.dbg &| rg "(File|Entry point address):" File: vdso32.so.dbg Entry point address: 0x0 File: vdso64.so.dbg Entry point address: 0x0 Remove ENTRY to remove the warning, as it is unnecessary for the vDSO to function correctly. Signed-off-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220511185001.3269404-2-nathan@kernel.org
2022-05-19powerpc: Export mmu_feature_keys[] as non-GPLKevin Hao1-1/+1
When the mmu_feature_keys[] was introduced in the commit c12e6f24d413 ("powerpc: Add option to use jump label for mmu_has_feature()"), it is unlikely that it would be used either directly or indirectly in the out of tree modules. So we exported it as GPL only. But with the evolution of the codes, especially the PPC_KUAP support, it may be indirectly referenced by some primitive macro or inline functions such as get_user() or __copy_from_user_inatomic(), this will make it impossible to build many non GPL modules (such as ZFS) on ppc architecture. Fix this by exposing the mmu_feature_keys[] to the non-GPL modules too. Fixes: 7613f5a66bec ("powerpc/64s/kuap: Use mmu_has_feature()") Reported-by: Nathaniel Filardo <nwfilardo@gmail.com> Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220329085709.4132729-1-haokexin@gmail.com
2022-05-19powerpc/setup: Refactor/untangle panic notifiersGuilherme G. Piccoli1-20/+54
The panic notifiers infrastructure is a bit limited in the scope of the callbacks - basically every kind of functionality is dropped in a list that runs in the same point during the kernel panic path. This is not really on par with the complexities and particularities of architecture / hypervisors' needs, and a refactor is ongoing. As part of this refactor, it was observed that powerpc has 2 notifiers, with mixed goals: one is just a KASLR offset dumper, whereas the other aims to hard-disable IRQs (necessary on panic path), warn firmware of the panic event (fadump) and run low-level platform-specific machinery that might stop kernel execution and never come back. Clearly, the 2nd notifier has opposed goals: disable IRQs / fadump should run earlier while low-level platform actions should run late since it might not even return. Hence, this patch decouples the notifiers splitting them in three: - First one is responsible for hard-disable IRQs and fadump, should run early; - The kernel KASLR offset dumper is really an informative notifier, harmless and may run at any moment in the panic path; - The last notifier should run last, since it aims to perform low-level actions for specific platforms, and might never return. It is also only registered for 2 platforms, pseries and ps3. The patch better documents the notifiers and clears the code too, also removing a useless header. Currently no functionality change should be observed, but after the planned panic refactor we should expect more panic reliability with this patch. Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Reviewed-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220427224924.592546-9-gpiccoli@igalia.com
2022-05-19Merge branch 'topic/ppc-kvm' into nextMichael Ellerman32-1729/+921
Merge our KVM topic branch.
2022-05-18KVM: PPC: Book3S HV: Fix vcore_blocked tracepointFabiano Rosas2-8/+8
We removed most of the vcore logic from the P9 path but there's still a tracepoint that tried to dereference vc->runner. Fixes: ecb6a7207f92 ("KVM: PPC: Book3S HV P9: Remove most of the vcore logic") Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220328215831.320409-1-farosas@linux.ibm.com
2022-05-18KVM: PPC: Book3s: Remove real mode interrupt controller hcalls handlersAlexey Kardashevskiy9-785/+632
Currently we have 2 sets of interrupt controller hypercalls handlers for real and virtual modes, this is from POWER8 times when switching MMU on was considered an expensive operation. POWER9 however does not have dependent threads and MMU is enabled for handling hcalls so the XIVE native or XICS-on-XIVE real mode handlers never execute on real P9 and later CPUs. This untemplate the handlers and only keeps the real mode handlers for XICS native (up to POWER8) and remove the rest of dead code. Changes in functions are mechanical except few missing empty lines to make checkpatch.pl happy. The default implemented hcalls list already contains XICS hcalls so no change there. This should not cause any behavioral change. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220509071150.181250-1-aik@ozlabs.ru
2022-05-18KVM: PPC: Book3s: PR: Enable default TCE hypercallsAlexey Kardashevskiy1-0/+6
When KVM_CAP_PPC_ENABLE_HCALL was introduced, H_GET_TCE and H_PUT_TCE were already implemented and enabled by default; however H_GET_TCE was missed out on PR KVM (probably because the handler was in the real mode code at the time). This enables H_GET_TCE by default. While at this, this wraps the checks in ifdef CONFIG_SPAPR_TCE_IOMMU just like HV KVM. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220506073737.3823347-1-aik@ozlabs.ru
2022-05-18KVM: PPC: Book3s: Retire H_PUT_TCE/etc real mode handlersAlexey Kardashevskiy14-801/+75
LoPAPR defines guest visible IOMMU with hypercalls to use it - H_PUT_TCE/etc. Implemented first on POWER7 where hypercalls would trap in the KVM in the real mode (with MMU off). The problem with the real mode is some memory is not available and some API usage crashed the host but enabling MMU was an expensive operation. The problems with the real mode handlers are: 1. Occasionally these cannot complete the request so the code is copied+modified to work in the virtual mode, very little is shared; 2. The real mode handlers have to be linked into vmlinux to work; 3. An exception in real mode immediately reboots the machine. If the small DMA window is used, the real mode handlers bring better performance. However since POWER8, there has always been a bigger DMA window which VMs use to map the entire VM memory to avoid calling H_PUT_TCE. Such 1:1 mapping happens once and uses H_PUT_TCE_INDIRECT (a bulk version of H_PUT_TCE) which virtual mode handler is even closer to its real mode version. On POWER9 hypercalls trap straight to the virtual mode so the real mode handlers never execute on POWER9 and later CPUs. So with the current use of the DMA windows and MMU improvements in POWER9 and later, there is no point in duplicating the code. The 32bit passed through devices may slow down but we do not have many of these in practice. For example, with this applied, a 1Gbit ethernet adapter still demostrates above 800Mbit/s of actual throughput. This removes the real mode handlers from KVM and related code from the powernv platform. This updates the list of implemented hcalls in KVM-HV as the realmode handlers are removed. This changes ABI - kvmppc_h_get_tce() moves to the KVM module and kvmppc_find_table() is static now. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220506053755.3820702-1-aik@ozlabs.ru
2022-05-18Merge branch 'fixes' into topic/ppc-kvmMichael Ellerman12-95/+123
Merge our fixes branch. In parciular this brings in the KVM TCE handling fix, which is a prerequisite for a subsequent patch.
2022-05-18KVM: PPC: Book3S HV: Initialize AMOR in nested entryFabiano Rosas1-0/+1
The hypervisor always sets AMOR to ~0, but let's ensure we're not passing stale values around. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220425142151.1495142-1-farosas@linux.ibm.com
2022-05-18Merge branch 'fixes' into nextMichael Ellerman6-29/+57
Merge our fixes branch from this cycle. In particular this brings in a papr_scm.c change which a subsequent patch has a dependency on.
2022-05-18KVM: PPC: Book3S HV: Use consistent type for return value of kvm_age_rmapp()Bo Liu1-3/+3
The return value type defined in the function kvm_age_rmapp() is "bool", but the return value type defined in the implementation of the function kvm_age_rmapp() is "int". Change the return value type to "bool". Signed-off-by: Bo Liu <liubo03@inspur.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220401065252.36472-1-liubo03@inspur.com
2022-05-18KVM: PPC: Book3S HV: fix incorrect NULL check on list iteratorXiaomeng Tong1-3/+5
The bug is here: if (!p) return ret; The list iterator value 'p' will *always* be set and non-NULL by list_for_each_entry(), so it is incorrect to assume that the iterator value will be NULL if the list is empty or no element is found. To fix the bug, Use a new value 'iter' as the list iterator, while use the old value 'p' as a dedicated variable to point to the found element. Fixes: dfaa973ae960 ("KVM: PPC: Book3S HV: In H_SVM_INIT_DONE, migrate remaining normal-GFNs to secure-GFNs") Cc: stable@vger.kernel.org # v5.9+ Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220414062103.8153-1-xiam0nd.tong@gmail.com
2022-05-18KVM: PPC: Book3S HV: remove extraneous asterisk from rm_host_ipi_action() ↵Bagas Sanjaya1-1/+1
comment kernel test robot reported kernel-doc warning for rm_host_ipi_action(): arch/powerpc/kvm/book3s_hv_rm_xics.c:887: warning: This comment starts with '/**', but isn't a kernel-doc comment. * Host Operations poked by RM KVM Since the function is static, remove the extraneous (second) asterisk at the head of function comment. Fixes: 0c2a66062470cd ("KVM: PPC: Book3S HV: Host side kick VCPU when poked by real-mode KVM") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/linux-doc/202204252334.Cd2IsiII-lkp@intel.com/ Link: https://lore.kernel.org/r/20220506070747.16309-1-bagasdotme@gmail.com
2022-05-13KVM: PPC: Book3S HV Nested: L2 LPCR should inherit L1 LPES settingNicholas Piggin2-2/+5
The L1 should not be able to adjust LPES mode for the L2. Setting LPES if the L0 needs it clear would cause external interrupts to be sent to L2 and missed by the L0. Clearing LPES when it may be set, as typically happens with XIVE enabled could cause a performance issue despite having no native XIVE support in the guest, because it will cause mediated interrupts for the L2 to be taken in HV mode, which then have to be injected. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220303053315.1056880-7-npiggin@gmail.com
2022-05-13KVM: PPC: Book3S HV Nested: L2 must not run with L1 xive contextNicholas Piggin2-7/+21
The PowerNV L0 currently pushes the OS xive context when running a vCPU, regardless of whether it is running a nested guest. The problem is that xive OS ring interrupts will be delivered while the L2 is running. At the moment, by default, the L2 guest runs with LPCR[LPES]=0, which actually makes external interrupts go to the L0. That causes the L2 to exit and the interrupt taken or injected into the L1, so in some respects this behaves like an escalation. It's not clear if this was deliberate or not, there's no comment about it and the L1 is actually allowed to clear LPES in the L2, so it's confusing at best. When the L2 is running, the L1 is essentially in a ceded state with respect to external interrupts (it can't respond to them directly and won't get scheduled again absent some additional event). So the natural way to solve this is when the L0 handles a H_ENTER_NESTED hypercall to run the L2, have it arm the escalation interrupt and don't push the L1 context while running the L2. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220303053315.1056880-6-npiggin@gmail.com
2022-05-13KVM: PPC: Book3S HV P9: Split !nested case out from guest entryNicholas Piggin1-4/+13
The differences between nested and !nested will become larger in later changes so split them out for readability. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220303053315.1056880-5-npiggin@gmail.com
2022-05-13KVM: PPC: Book3S HV P9: Move cede logic out of XIVE escalation rearmingNicholas Piggin3-7/+16
Move the cede abort logic out of xive escalation rearming and into the caller to prepare for handling a similar case with nested guest entry. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220303053315.1056880-4-npiggin@gmail.com
2022-05-13KVM: PPC: Book3S HV P9: Inject pending xive interrupts at guest entryNicholas Piggin1-2/+7
If there is a pending xive interrupt, inject it at guest entry (if MSR[EE] is enabled) rather than take another interrupt when the guest is entered. If xive is enabled then LPCR[LPES] is set so this behaviour should be expected. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220303053315.1056880-3-npiggin@gmail.com
2022-05-13KVM: PPC: Book3S HV: Remove KVMPPC_NR_LPIDSNicholas Piggin2-6/+0
KVMPPC_NR_LPIDS no longer represents any size restriction on the LPID space and can be removed. A CPU with more than 12 LPID bits implemented will now be able to create more than 4095 guests. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220123120043.3586018-7-npiggin@gmail.com
2022-05-13KVM: PPC: Book3S Nested: Use explicit 4096 LPID maximumNicholas Piggin3-15/+18
Rather than tie this to KVMPPC_NR_LPIDS which is becoming more dynamic, fix it to 4096 (12-bits) explicitly for now. kvmhv_get_nested() does not have to check against KVM_MAX_NESTED_GUESTS because the L1 partition table registration hcall already did that, and it checks against the partition table size. This patch also puts all the partition table size calculations into the same form, using 12 for the architected size field shift and 4 for the shift corresponding to the partition table entry size. Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-of-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220123120043.3586018-6-npiggin@gmail.com
2022-05-13KVM: PPC: Book3S HV Nested: Change nested guest lookup to use idrNicholas Piggin2-54/+59
This removes the fixed sized kvm->arch.nested_guests array. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220123120043.3586018-5-npiggin@gmail.com
2022-05-13KVM: PPC: Book3S HV: Use IDA allocator for LPID allocatorNicholas Piggin1-12/+13
This removes the fixed-size lpid_inuse array. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220123120043.3586018-4-npiggin@gmail.com
2022-05-13KVM: PPC: Book3S HV: Update LPID allocator init for POWER9, NestedNicholas Piggin5-11/+33
The LPID allocator init is changed to: - use mmu_lpid_bits rather than hard-coding; - use KVM_MAX_NESTED_GUESTS for nested hypervisors; - not reserve the top LPID on POWER9 and newer CPUs. The reserved LPID is made a POWER7/8-specific detail. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220123120043.3586018-3-npiggin@gmail.com
2022-05-13KVM: PPC: Remove kvmppc_claim_lpidNicholas Piggin4-16/+7
Removing kvmppc_claim_lpid makes the lpid allocator API a bit simpler to change the underlying implementation in a future patch. The host LPID is always 0, so that can be a detail of the allocator. If the allocator range is restricted, that can reserve LPIDs at the top of the range. This allows kvmppc_claim_lpid to be removed. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220123120043.3586018-2-npiggin@gmail.com
2022-05-13KVM: PPC: Book3S HV P9: Optimise loads around context switchNicholas Piggin1-4/+11
It is better to get all loads for the register values in flight before starting to switch LPID, PID, and LPCR because those mtSPRs are expensive and serialising. This also just tidies up the code for a potential future change to the context switching sequence. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220123114725.3549202-1-npiggin@gmail.com
2022-05-13KVM: PPC: Book3S HV: HFSCR[PREFIX] does not existNicholas Piggin2-2/+1
This facility is controlled by FSCR only. Reserved bits should not be set in the HFSCR register (although it's likely harmless as this position would not be re-used, and the L0 is forgiving here too). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220122105639.3477407-1-npiggin@gmail.com
2022-05-11powerpc/rtas: Keep MSR[RI] set when calling RTASLaurent Dufour2-12/+21
RTAS runs in real mode (MSR[DR] and MSR[IR] unset) and in 32-bit big endian mode (MSR[SF,LE] unset). The change in MSR is done in enter_rtas() in a relatively complex way, since the MSR value could be hardcoded. Furthermore, a panic has been reported when hitting the watchdog interrupt while running in RTAS, this leads to the following stack trace: watchdog: CPU 24 Hard LOCKUP watchdog: CPU 24 TB:997512652051031, last heartbeat TB:997504470175378 (15980ms ago) ... Supported: No, Unreleased kernel CPU: 24 PID: 87504 Comm: drmgr Kdump: loaded Tainted: G E X 5.14.21-150400.71.1.bz196362_2-default #1 SLE15-SP4 (unreleased) 0d821077ef4faa8dfaf370efb5fdca1fa35f4e2c NIP: 000000001fb41050 LR: 000000001fb4104c CTR: 0000000000000000 REGS: c00000000fc33d60 TRAP: 0100 Tainted: G E X (5.14.21-150400.71.1.bz196362_2-default) MSR: 8000000002981000 <SF,VEC,VSX,ME> CR: 48800002 XER: 20040020 CFAR: 000000000000011c IRQMASK: 1 GPR00: 0000000000000003 ffffffffffffffff 0000000000000001 00000000000050dc GPR04: 000000001ffb6100 0000000000000020 0000000000000001 000000001fb09010 GPR08: 0000000020000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 80040000072a40a8 c00000000ff8b680 0000000000000007 0000000000000034 GPR16: 000000001fbf6e94 000000001fbf6d84 000000001fbd1db0 000000001fb3f008 GPR20: 000000001fb41018 ffffffffffffffff 000000000000017f fffffffffffff68f GPR24: 000000001fb18fe8 000000001fb3e000 000000001fb1adc0 000000001fb1cf40 GPR28: 000000001fb26000 000000001fb460f0 000000001fb17f18 000000001fb17000 NIP [000000001fb41050] 0x1fb41050 LR [000000001fb4104c] 0x1fb4104c Call Trace: Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX Oops: Unrecoverable System Reset, sig: 6 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries ... Supported: No, Unreleased kernel CPU: 24 PID: 87504 Comm: drmgr Kdump: loaded Tainted: G E X 5.14.21-150400.71.1.bz196362_2-default #1 SLE15-SP4 (unreleased) 0d821077ef4faa8dfaf370efb5fdca1fa35f4e2c NIP: 000000001fb41050 LR: 000000001fb4104c CTR: 0000000000000000 REGS: c00000000fc33d60 TRAP: 0100 Tainted: G E X (5.14.21-150400.71.1.bz196362_2-default) MSR: 8000000002981000 <SF,VEC,VSX,ME> CR: 48800002 XER: 20040020 CFAR: 000000000000011c IRQMASK: 1 GPR00: 0000000000000003 ffffffffffffffff 0000000000000001 00000000000050dc GPR04: 000000001ffb6100 0000000000000020 0000000000000001 000000001fb09010 GPR08: 0000000020000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 80040000072a40a8 c00000000ff8b680 0000000000000007 0000000000000034 GPR16: 000000001fbf6e94 000000001fbf6d84 000000001fbd1db0 000000001fb3f008 GPR20: 000000001fb41018 ffffffffffffffff 000000000000017f fffffffffffff68f GPR24: 000000001fb18fe8 000000001fb3e000 000000001fb1adc0 000000001fb1cf40 GPR28: 000000001fb26000 000000001fb460f0 000000001fb17f18 000000001fb17000 NIP [000000001fb41050] 0x1fb41050 LR [000000001fb4104c] 0x1fb4104c Call Trace: Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX ---[ end trace 3ddec07f638c34a2 ]--- This happens because MSR[RI] is unset when entering RTAS but there is no valid reason to not set it here. RTAS is expected to be called with MSR[RI] as specified in PAPR+ section "7.2.1 Machine State": R1–7.2.1–9. If called with MSR[RI] equal to 1, then RTAS must protect its own critical regions from recursion by setting the MSR[RI] bit to 0 when in the critical regions. Fixing this by reviewing the way MSR is compute before calling RTAS. Now a hardcoded value meaning real mode, 32 bits big endian mode and Recoverable Interrupt is loaded. In the case MSR[S] is set, it will remain set while entering RTAS as only urfid can unset it (thanks Fabiano). In addition a check is added in do_enter_rtas() to detect calls made with MSR[RI] unset, as we are forcing it on later. This patch has been tested on the following machines: Power KVM Guest P8 S822L (host Ubuntu kernel 5.11.0-49-generic) PowerVM LPAR P8 9119-MME (FW860.A1) p9 9008-22L (FW950.00) P10 9080-HEX (FW1010.00) Suggested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220504101244.12107-1-ldufour@linux.ibm.com
2022-05-11powerpc/8xx: Use kmalloced data structure instead of global staticChristophe Leroy1-18/+30
Use a kmalloced data structure to store interrupt controller internal data instead of static global variables. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c8f0866ee013113d5e28948943cf0586e49f5353.1649226186.git.christophe.leroy@csgroup.eu
2022-05-11powerpc/8xx: Remove mpc8xx_pics_init()Christophe Leroy9-32/+15
mpc8xx_pics_init() is now only a trampoline to mpc8xx_pic_init(). Remove mpc8xx_pics_init() and use mpc8xx_pic_init() directly. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/9c55a698adb5ba3b7b77023170fcaf0acb5d2d81.1649226186.git.christophe.leroy@csgroup.eu
2022-05-11powerpc/8xx: Convert CPM1 interrupt controller to platform_deviceChristophe Leroy3-56/+50
In the same logic as commit be7ecbd240b2 ("soc: fsl: qe: convert QE interrupt controller to platform_device"), convert CPM1 interrupt controller to platform_device. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/fb80d0b2077312079c49da0296e25591578771cd.1649226186.git.christophe.leroy@csgroup.eu
2022-05-11powerpc/8xx: Convert CPM1 error interrupt handler to platform driverChristophe Leroy1-29/+44
Add CPM error interrupt as a standalone platform driver, to simplify the init of CPM interrupt handler. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/375a72df6e4a26c5959cc81a6c6d46152efa2306.1649226186.git.christophe.leroy@csgroup.eu