summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)AuthorFilesLines
2020-12-03powerpc/signal32: Add and use unsafe_put_sigset_t()Christophe Leroy1-2/+11
put_sigset_t() calls copy_to_user() for copying two words. This is terribly inefficient for copying two words. By switching to unsafe_put_user(), we end up with something as simple as: 3cc: 81 3d 00 00 lwz r9,0(r29) 3d0: 91 26 00 b4 stw r9,180(r6) 3d4: 81 3d 00 04 lwz r9,4(r29) 3d8: 91 26 00 b8 stw r9,184(r6) Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/06def97e87ac1c4ae8e3197e0982e1fab7b3c8ae.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/signal32: Remove ifdefery in middle of if/elseChristophe Leroy1-14/+8
MSR_TM_ACTIVE() is always defined and returns always 0 when CONFIG_PPC_TRANSACTIONAL_MEM is not selected, so the awful ifdefery in the middle of an if/else can be removed. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f3c36d687e4228f58d5c207a4036aa9ddcc7420a.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/signal32: Switch handle_rt_signal32() to user_access_begin() logicChristophe Leroy1-21/+34
On the same way as handle_signal32(), replace all user accesses with equivalent unsafe_ versions, and move the trampoline code icache flush outside the user access block. Functions that have no unsafe_ equivalent also remains outside the access block. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/2974314226256f958e2984912b48883ef1754185.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/signal32: Switch handle_signal32() to user_access_begin() logicChristophe Leroy1-13/+16
Replace the access_ok() by user_access_begin() and change all user accesses to unsafe_ version. Move flush_icache_range() outside the user access block. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a27797f781aa00da96f8284c898173d18e952361.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/signal32: Move signal trampoline setup to handle_[rt_]signal32Christophe Leroy1-39/+22
Move signal trampoline setup into handle_signal32() and handle_rt_signal32(). At the same time, remove the define which hides the mc_pad field used for trampoline. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e439cc0fa35aa45da6776520777a61848b92fd4b.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/signal32: Misc changes to make handle_[rt_]_signal32() more similarChristophe Leroy1-10/+14
Miscellaneous changes to clean and make handle_signal32() and handle_rt_signal32() even more similar. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/df0bc8c3b8fa96390c46f611df79b2a94ac21844.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/signal32: Rename local pointers in handle_rt_signal32()Christophe Leroy1-26/+25
Rename pointers in handle_rt_signal32() to make it more similar to handle_signal32() tm_frame becomes tm_mctx frame becomes mctx rt_sf becomes frame Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/be77477b0f05397876015b218e36548ee8f5e10b.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/signal32: Move handle_signal32() close to handle_rt_signal32()Christophe Leroy1-85/+85
Those two functions are similar and serving the same purpose. To ease refactorisation, move them close to each other. This is pure move, no code change, no cosmetic. Yes, checkpatch is not happy, most will clear later. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/dbce67900bf566bcf40179467bf1eb500814c405.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/signal32: Simplify logging in handle_rt_signal32()Christophe Leroy1-5/+1
If something is bad in the frame, there is no point in knowing which part of the frame exactly is wrong as it got allocated as a single block. Always print the root address of the frame in case of failed user access, just like handle_signal32(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/691895bd31fee89a2d8370befd66ad4eff5b63f2.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/signal: Refactor bad frame loggingChristophe Leroy4-43/+21
The logging of bad frame appears half a dozen of times and is pretty similar. Create signal_fault() fonction to perform that logging. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/fa094445c119fc00315e1c13783b493346306c6a.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/signal: Call get_tm_stackpointer() from get_sigframe()Christophe Leroy4-10/+11
Instead of calling get_tm_stackpointer() from the caller, call it directly from get_sigframe(). This avoids a double call and allows get_tm_stackpointer() to become static and be inlined into get_sigframe() by GCC. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/abfdc105b8b28c4eb3ab9a26297d17f302b600ea.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/signal: Remove get_clean_sp()Christophe Leroy2-15/+4
get_clean_sp() is only used once in kernel/signal.c . GCC is smart enough to see that x & 0xffffffff is a nop calculation on PPC32, no need of a special PPC32 trivial version. Include the logic from the PPC64 version of get_clean_sp() directly in get_sigframe(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/13ef6510ce30a4867e043157b93af5bb8c67fb3b.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/signal: Move access_ok() out of get_sigframe()Christophe Leroy3-7/+3
This access_ok() will soon be performed by user_access_begin(). So move it out of get_sigframe(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/900b93744732ed0887f28f5b6a40730fb04a43fa.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/signal: Remove BUG_ON() in handler_signal functionsChristophe Leroy2-6/+0
There is already the same BUG_ON() check in do_signal() which is the only caller of handle_rt_signal64() handle_rt_signal32() and handle_signal32(). Remove those three redundant BUG_ON(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3582e10a341d523c9c3f1ac925c3aaefc9d9293d.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/32s: Allow deselecting CONFIG_PPC_FPU on mpc832xChristophe Leroy2-2/+13
The e300c2 core which is embedded in mpc832x CPU doesn't have an FPU. Make it possible to not select CONFIG_PPC_FPU when building a kernel dedicated to that target. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/fcdc60d85baf80eaa0a7f3261d9d889282068216.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/signal: Don't manage floating point regs when no FPUChristophe Leroy11-3/+50
There is no point in copying floating point regs when there is no FPU and MATH_EMULATION is not selected. Create a new CONFIG_PPC_FPU_REGS bool that is selected by CONFIG_MATH_EMULATION and CONFIG_PPC_FPU, and use it to opt out everything related to fp_state in thread_struct. The asm const used only by fpu.S are opted out with CONFIG_PPC_FPU as fpu.S build is conditionnal to CONFIG_PPC_FPU. The following app spends approx 8.1 seconds system time on an 8xx without the patch, and 7.0 seconds with the patch (13.5% reduction). On an 832x, it spends approx 2.6 seconds system time without the patch and 2.1 seconds with the patch (19% reduction). void sigusr1(int sig) { } int main(int argc, char **argv) { int i = 100000; signal(SIGUSR1, sigusr1); for (;i--;) raise(SIGUSR1); exit(0); } Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7569070083e6cd5b279bb5023da601aba3c06f3c.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/ptrace: Create ptrace_get_fpr() and ptrace_put_fpr()Christophe Leroy4-29/+56
On the same model as ptrace_get_reg() and ptrace_put_reg(), create ptrace_get_fpr() and ptrace_put_fpr() to get/set the floating points registers. We move the boundary checkings in them. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/24a1baedea7f7ae7b6bf27be98bab6d01b5ca2c1.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/ptrace: Consolidate reg index calculationChristophe Leroy1-14/+4
Today we have: #ifdef CONFIG_PPC32 index = addr >> 2; if ((addr & 3) || child->thread.regs == NULL) #else index = addr >> 3; if ((addr & 7)) #endif sizeof(long) has value 4 for PPC32 and value 8 for PPC64. Dividing by 4 is equivalent to >> 2 and dividing by 8 is equivalent to >> 3. And 3 and 7 are respectively (sizeof(long) - 1). Use sizeof(long) to get rid of the #ifdef CONFIG_PPC32 and consolidate the calculation and checking. thread.regs have to be not NULL on both PPC32 and PPC64 so adding that test on PPC64 is harmless. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3cd1e284e93c60db981659585e18d1f6bb73ed2f.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/ptrace: Move declaration of ptrace_get_reg() and ptrace_set_reg()Christophe Leroy3-6/+5
ptrace_get_reg() and ptrace_set_reg() are only used internally by ptrace. Move them in arch/powerpc/kernel/ptrace/ptrace-decl.h Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/376c258267aeae54a4423bc4a2e107a9611f0039.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/signal: Move inline functions in signal.hChristophe Leroy2-38/+33
To really be inlined, the functions need to be defined in the same C file as the caller, or in an included header. Move functions defined inline from signal .c in signal.h Fixes: 3dd4eb83a9c0 ("powerpc: move common register copy functions from signal_32.c to signal.c") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/35b1bd44a1a66f5bcf9b457a1c480ac8d5ef50b2.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32Christophe Leroy4-0/+18
Provides __kernel_clock_gettime64() on vdso32. This is the 64 bits version of __kernel_clock_gettime() which is y2038 compliant. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126131006.2431205-9-mpe@ellerman.id.au
2020-12-03powerpc/vdso: Switch VDSO to generic C implementation.Christophe Leroy12-691/+106
With the C VDSO, the performance is slightly lower, but it is worth it as it will ease maintenance and evolution, and also brings clocks that are not supported with the ASM VDSO. On an 8xx at 132 MHz, vdsotest with the ASM VDSO: gettimeofday: vdso: 828 nsec/call clock-getres-realtime-coarse: vdso: 391 nsec/call clock-gettime-realtime-coarse: vdso: 614 nsec/call clock-getres-realtime: vdso: 460 nsec/call clock-gettime-realtime: vdso: 876 nsec/call clock-getres-monotonic-coarse: vdso: 399 nsec/call clock-gettime-monotonic-coarse: vdso: 691 nsec/call clock-getres-monotonic: vdso: 460 nsec/call clock-gettime-monotonic: vdso: 1026 nsec/call On an 8xx at 132 MHz, vdsotest with the C VDSO: gettimeofday: vdso: 955 nsec/call clock-getres-realtime-coarse: vdso: 545 nsec/call clock-gettime-realtime-coarse: vdso: 592 nsec/call clock-getres-realtime: vdso: 545 nsec/call clock-gettime-realtime: vdso: 941 nsec/call clock-getres-monotonic-coarse: vdso: 545 nsec/call clock-gettime-monotonic-coarse: vdso: 591 nsec/call clock-getres-monotonic: vdso: 545 nsec/call clock-gettime-monotonic: vdso: 940 nsec/call It is even better for gettime with monotonic clocks. Unsupported clocks with ASM VDSO: clock-gettime-boottime: vdso: 3851 nsec/call clock-gettime-tai: vdso: 3852 nsec/call clock-gettime-monotonic-raw: vdso: 3396 nsec/call Same clocks with C VDSO: clock-gettime-tai: vdso: 941 nsec/call clock-gettime-monotonic-raw: vdso: 1001 nsec/call clock-gettime-monotonic-coarse: vdso: 591 nsec/call On an 8321E at 333 MHz, vdsotest with the ASM VDSO: gettimeofday: vdso: 220 nsec/call clock-getres-realtime-coarse: vdso: 102 nsec/call clock-gettime-realtime-coarse: vdso: 178 nsec/call clock-getres-realtime: vdso: 129 nsec/call clock-gettime-realtime: vdso: 235 nsec/call clock-getres-monotonic-coarse: vdso: 105 nsec/call clock-gettime-monotonic-coarse: vdso: 208 nsec/call clock-getres-monotonic: vdso: 129 nsec/call clock-gettime-monotonic: vdso: 274 nsec/call On an 8321E at 333 MHz, vdsotest with the C VDSO: gettimeofday: vdso: 272 nsec/call clock-getres-realtime-coarse: vdso: 160 nsec/call clock-gettime-realtime-coarse: vdso: 184 nsec/call clock-getres-realtime: vdso: 166 nsec/call clock-gettime-realtime: vdso: 281 nsec/call clock-getres-monotonic-coarse: vdso: 160 nsec/call clock-gettime-monotonic-coarse: vdso: 184 nsec/call clock-getres-monotonic: vdso: 169 nsec/call clock-gettime-monotonic: vdso: 275 nsec/call On a Power9 Nimbus DD2.2 at 3.8GHz, with the ASM VDSO: clock-gettime-monotonic: vdso: 35 nsec/call clock-getres-monotonic: vdso: 16 nsec/call clock-gettime-monotonic-coarse: vdso: 18 nsec/call clock-getres-monotonic-coarse: vdso: 522 nsec/call clock-gettime-monotonic-raw: vdso: 598 nsec/call clock-getres-monotonic-raw: vdso: 520 nsec/call clock-gettime-realtime: vdso: 34 nsec/call clock-getres-realtime: vdso: 16 nsec/call clock-gettime-realtime-coarse: vdso: 18 nsec/call clock-getres-realtime-coarse: vdso: 517 nsec/call getcpu: vdso: 8 nsec/call gettimeofday: vdso: 25 nsec/call And with the C VDSO: clock-gettime-monotonic: vdso: 37 nsec/call clock-getres-monotonic: vdso: 20 nsec/call clock-gettime-monotonic-coarse: vdso: 21 nsec/call clock-getres-monotonic-coarse: vdso: 19 nsec/call clock-gettime-monotonic-raw: vdso: 38 nsec/call clock-getres-monotonic-raw: vdso: 20 nsec/call clock-gettime-realtime: vdso: 37 nsec/call clock-getres-realtime: vdso: 20 nsec/call clock-gettime-realtime-coarse: vdso: 20 nsec/call clock-getres-realtime-coarse: vdso: 19 nsec/call getcpu: vdso: 8 nsec/call gettimeofday: vdso: 28 nsec/call Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126131006.2431205-8-mpe@ellerman.id.au
2020-12-03powerpc/vdso: Save and restore TOC pointer on PPC64Christophe Leroy1-0/+12
On PPC64, the TOC pointer needs to be saved and restored. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126131006.2431205-7-mpe@ellerman.id.au
2020-12-03powerpc/vdso: Prepare for switching VDSO to generic C implementation.Christophe Leroy6-0/+260
Prepare for switching VDSO to generic C implementation in following patch. Here, we: - Prepare the helpers to call the C VDSO functions - Prepare the required callbacks for the C VDSO functions - Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES - Add the C trampolines to the generic C VDSO functions powerpc is a bit special for VDSO as well as system calls in the way that it requires setting CR SO bit which cannot be done in C. Therefore, entry/exit needs to be performed in ASM. Implementing __arch_get_vdso_data() would clobber the link register, requiring the caller to save it. As the ASM calling function already has to set a stack frame and saves the link register before calling the C vdso function, retriving the vdso data pointer there is lighter. Implement __arch_vdso_capable() and always return true. Provide vdso_shift_ns(), as the generic x >> s gives the following bad result: 18: 35 25 ff e0 addic. r9,r5,-32 1c: 41 80 00 10 blt 2c <shift+0x14> 20: 7c 64 4c 30 srw r4,r3,r9 24: 38 60 00 00 li r3,0 ... 2c: 54 69 08 3c rlwinm r9,r3,1,0,30 30: 21 45 00 1f subfic r10,r5,31 34: 7c 84 2c 30 srw r4,r4,r5 38: 7d 29 50 30 slw r9,r9,r10 3c: 7c 63 2c 30 srw r3,r3,r5 40: 7d 24 23 78 or r4,r9,r4 In our case the shift is always <= 32. In addition, the upper 32 bits of the result are likely nul. Lets GCC know it, it also optimises the following calculations. With the patch, we get: 0: 21 25 00 20 subfic r9,r5,32 4: 7c 69 48 30 slw r9,r3,r9 8: 7c 84 2c 30 srw r4,r4,r5 c: 7d 24 23 78 or r4,r9,r4 10: 7c 63 2c 30 srw r3,r3,r5 Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126131006.2431205-6-mpe@ellerman.id.au
2020-12-03powerpc/barrier: Use CONFIG_PPC64 for barrier selectionMichael Ellerman1-1/+1
Currently we use ifdef __powerpc64__ in barrier.h to decide if we should use lwsync or eieio for SMPWMB which is then used by __smp_wmb(). That means when we are building the compat VDSO we will use eieio, because it's 32-bit code, even though we're building a 64-bit kernel for a 64-bit CPU. Although eieio should work, it would be cleaner if we always used the same barrier, even for the 32-bit VDSO. So change the ifdef to CONFIG_PPC64, so that the selection is made based on the bitness of the kernel we're building for, not the current compilation unit. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126131006.2431205-5-mpe@ellerman.id.au
2020-12-03powerpc/time: Fix mftb()/get_tb() for use with the compat VDSOMichael Ellerman1-2/+10
When we're building the compat VDSO we are building 32-bit code but in the context of a 64-bit kernel configuration. To make this work we need to be careful in some places when using ifdefs to differentiate between CONFIG_PPC64 and __powerpc64__. CONFIG_PPC64 indicates the kernel we're building is 64-bit, but it doesn't tell us that we're currently building 64-bit code - we could be building 32-bit code for the compat VDSO. On the other hand __powerpc64__ tells us that we are currently building 64-bit code (and therefore we must also be building a 64-bit kernel). In the case of get_tb() we want to use the 32-bit code sequence regardless of whether the kernel we're building for is 64-bit or 32-bit, what matters is the word size of the current object. So we need to check __powerpc64__ to decide if we use mftb() or the mftbu()/mftb() sequence. For mftb() the logic for CPU_FTR_CELL_TB_BUG only makes sense if we're building 64-bit code, so guard that with a __powerpc64__ check. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126131006.2431205-4-mpe@ellerman.id.au
2020-12-03powerpc/time: Move timebase functions into new asm/vdso/timebase.hChristophe Leroy4-61/+73
In order to easily use get_tb() from C VDSO, move timebase functions into a new header named asm/vdso/timebase.h Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126131006.2431205-3-mpe@ellerman.id.au
2020-12-03powerpc/processor: Move cpu_relax() into asm/vdso/processor.hChristophe Leroy2-11/+25
cpu_relax() need to be in asm/vdso/processor.h to be used by the C VDSO generic library. Move it there. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126131006.2431205-2-mpe@ellerman.id.au
2020-12-03powerpc/feature: Use CONFIG_PPC64 instead of __powerpc64__ to define ↵Christophe Leroy1-2/+2
possible features In order to build VDSO32 for PPC64, we need to have CPU_FTRS_POSSIBLE and CPU_FTRS_ALWAYS independant of whether we are building the 32 bits VDSO or the 64 bits VDSO. Use #ifdef CONFIG_PPC64 instead of #ifdef __powerpc64__ Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126131006.2431205-1-mpe@ellerman.id.au
2020-12-03powerpc: Update NUMA Kconfig description & help textMichael Ellerman1-1/+7
Update the NUMA Kconfig description to match other architectures, and add some help text. Shamelessly borrowed from x86/arm64. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20201124120547.1940635-3-mpe@ellerman.id.au
2020-12-03powerpc: Make NUMA default y for powernvMichael Ellerman1-1/+1
Our NUMA option is default y for pseries, but not powernv. The bulk of powernv systems are NUMA, so make NUMA default y for powernv also. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20201124120547.1940635-2-mpe@ellerman.id.au
2020-12-03powerpc: Make NUMA depend on SMPMichael Ellerman1-1/+1
Our Kconfig allows NUMA to be enabled without SMP, but none of our defconfigs use that combination. This means it can easily be broken inadvertently by code changes, which has happened recently. Although it's theoretically possible to have a machine with a single CPU and multiple memory nodes, I can't think of any real systems where that's the case. Even so if such a system exists, it can just run an SMP kernel anyway. So to avoid the need to add extra #ifdefs and/or build breaks, make NUMA depend on SMP. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20201124120547.1940635-1-mpe@ellerman.id.au
2020-12-03powerpc: inline iomap accessorsChristophe Leroy2-167/+153
ioreadXX()/ioreadXXbe() accessors are equivalent to ppc in_leXX()/in_be16() accessors but they are not inlined. Since commit 0eb573682872 ("powerpc/kerenl: Enable EEH for IO accessors"), the 'le' versions are equivalent to the ones defined in asm-generic/io.h, allthough the ones there are inlined. Include asm-generic/io.h to get them. Keep ppc versions of the 'be' ones as they are optimised, but make them inline in ppc io.h. This reduces the size of ppc64e_defconfig build by 3 kbytes: text data bss dec hex filename 10160733 4343422 562972 15067127 e5e7f7 vmlinux.before 10159239 4341590 562972 15063801 e5daf9 vmlinux.after A typical function using ioread and iowrite before the change: c00000000066a3c4 <.ata_bmdma_stop>: c00000000066a3c4: 7c 08 02 a6 mflr r0 c00000000066a3c8: fb c1 ff f0 std r30,-16(r1) c00000000066a3cc: f8 01 00 10 std r0,16(r1) c00000000066a3d0: fb e1 ff f8 std r31,-8(r1) c00000000066a3d4: f8 21 ff 81 stdu r1,-128(r1) c00000000066a3d8: eb e3 00 00 ld r31,0(r3) c00000000066a3dc: eb df 00 98 ld r30,152(r31) c00000000066a3e0: 7f c3 f3 78 mr r3,r30 c00000000066a3e4: 4b 9b 6f 7d bl c000000000021360 <.ioread8> c00000000066a3e8: 60 00 00 00 nop c00000000066a3ec: 7f c4 f3 78 mr r4,r30 c00000000066a3f0: 54 63 06 3c rlwinm r3,r3,0,24,30 c00000000066a3f4: 4b 9b 70 4d bl c000000000021440 <.iowrite8> c00000000066a3f8: 60 00 00 00 nop c00000000066a3fc: 7f e3 fb 78 mr r3,r31 c00000000066a400: 38 21 00 80 addi r1,r1,128 c00000000066a404: e8 01 00 10 ld r0,16(r1) c00000000066a408: eb c1 ff f0 ld r30,-16(r1) c00000000066a40c: 7c 08 03 a6 mtlr r0 c00000000066a410: eb e1 ff f8 ld r31,-8(r1) c00000000066a414: 4b ff ff 8c b c00000000066a3a0 <.ata_sff_dma_pause> The same function with this patch: c000000000669cb4 <.ata_bmdma_stop>: c000000000669cb4: e8 63 00 00 ld r3,0(r3) c000000000669cb8: e9 43 00 98 ld r10,152(r3) c000000000669cbc: 7c 00 04 ac hwsync c000000000669cc0: 89 2a 00 00 lbz r9,0(r10) c000000000669cc4: 0c 09 00 00 twi 0,r9,0 c000000000669cc8: 4c 00 01 2c isync c000000000669ccc: 55 29 06 3c rlwinm r9,r9,0,24,30 c000000000669cd0: 7c 00 04 ac hwsync c000000000669cd4: 99 2a 00 00 stb r9,0(r10) c000000000669cd8: a1 4d 06 f0 lhz r10,1776(r13) c000000000669cdc: 2c 2a 00 00 cmpdi r10,0 c000000000669ce0: 41 c2 00 08 beq- c000000000669ce8 <.ata_bmdma_stop+0x34> c000000000669ce4: b1 4d 06 f2 sth r10,1778(r13) c000000000669ce8: 4b ff ff a8 b c000000000669c90 <.ata_sff_dma_pause> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/18b357d68c4cde149f75c7a1031c850925cd8128.1605981539.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/perf: Fix crash with is_sier_available when pmu is not setAthira Rajeev1-0/+3
On systems without any specific PMU driver support registered, running 'perf record' with —intr-regs will crash ( perf record -I <workload> ). The relevant portion from crash logs and Call Trace: Unable to handle kernel paging request for data at address 0x00000068 Faulting instruction address: 0xc00000000013eb18 Oops: Kernel access of bad area, sig: 11 [#1] CPU: 2 PID: 13435 Comm: kill Kdump: loaded Not tainted 4.18.0-193.el8.ppc64le #1 NIP: c00000000013eb18 LR: c000000000139f2c CTR: c000000000393d80 REGS: c0000004a07ab4f0 TRAP: 0300 Not tainted (4.18.0-193.el8.ppc64le) NIP [c00000000013eb18] is_sier_available+0x18/0x30 LR [c000000000139f2c] perf_reg_value+0x6c/0xb0 Call Trace: [c0000004a07ab770] [c0000004a07ab7c8] 0xc0000004a07ab7c8 (unreliable) [c0000004a07ab7a0] [c0000000003aa77c] perf_output_sample+0x60c/0xac0 [c0000004a07ab840] [c0000000003ab3f0] perf_event_output_forward+0x70/0xb0 [c0000004a07ab8c0] [c00000000039e208] __perf_event_overflow+0x88/0x1a0 [c0000004a07ab910] [c00000000039e42c] perf_swevent_hrtimer+0x10c/0x1d0 [c0000004a07abc50] [c000000000228b9c] __hrtimer_run_queues+0x17c/0x480 [c0000004a07abcf0] [c00000000022aaf4] hrtimer_interrupt+0x144/0x520 [c0000004a07abdd0] [c00000000002a864] timer_interrupt+0x104/0x2f0 [c0000004a07abe30] [c0000000000091c4] decrementer_common+0x114/0x120 When perf record session is started with "-I" option, capturing registers on each sample calls is_sier_available() to check for the SIER (Sample Instruction Event Register) availability in the platform. This function in core-book3s accesses 'ppmu->flags'. If a platform specific PMU driver is not registered, ppmu is set to NULL and accessing its members results in a crash. Fix the crash by returning false in is_sier_available() if ppmu is not set. Fixes: 333804dc3b7a ("powerpc/perf: Update perf_regs structure to include SIER") Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1606185640-1720-1-git-send-email-atrajeev@linux.vnet.ibm.com
2020-12-03powerpc/boot: Make use of REL16 relocs in powerpc/boot/util.SAlan Modra1-6/+3
Use bcl 20,31,0f rather than plain bl to avoid unbalancing the link stack. Update the code to use REL16 relocs, available for ppc64 in 2009 (and ppc32 in 2005). Signed-off-by: Alan Modra <amodra@gmail.com> [mpe: Incorporate more detail into the change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-12-02sched/vtime: Consolidate IRQ time accountingFrederic Weisbecker1-16/+40
The 3 architectures implementing CONFIG_VIRT_CPU_ACCOUNTING_NATIVE all have their own version of irq time accounting that dispatch the cputime to the appropriate index: hardirq, softirq, system, idle, guest... from an all-in-one function. Instead of having these ad-hoc versions, move the cputime destination dispatch decision to the core code and leave only the actual per-index cputime accounting to the architecture. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201202115732.27827-4-frederic@kernel.org
2020-12-02powerpc/64s/powernv: Fix memory corruption when saving SLB entries on MCENicholas Piggin1-2/+7
This can be hit by an HPT guest running on an HPT host and bring down the host, so it's quite important to fix. Fixes: 7290f3b3d3e6 ("powerpc/64s/powernv: machine check dump SLB contents") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201128070728.825934-2-npiggin@gmail.com
2020-12-01kbuild: Hoist '--orphan-handling' into KconfigNathan Chancellor2-1/+1
Currently, '--orphan-handling=warn' is spread out across four different architectures in their respective Makefiles, which makes it a little unruly to deal with in case it needs to be disabled for a specific linker version (in this case, ld.lld 10.0.1). To make it easier to control this, hoist this warning into Kconfig and the main Makefile so that disabling it is simpler, as the warning will only be enabled in a couple places (main Makefile and a couple of compressed boot folders that blow away LDFLAGS_vmlinx) and making it conditional is easier due to Kconfig syntax. One small additional benefit of this is saving a call to ld-option on incremental builds because we will have already evaluated it for CONFIG_LD_ORPHAN_WARN. To keep the list of supported architectures the same, introduce CONFIG_ARCH_WANT_LD_ORPHAN_WARN, which an architecture can select to gain this automatically after all of the sections are specified and size asserted. A special thanks to Kees Cook for the help text on this config. Link: https://github.com/ClangBuiltLinux/linux/issues/1187 Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-12-01KVM: PPC: Book3S HV: XIVE: Fix vCPU id sanity checkGreg Kurz1-5/+2
Commit 062cfab7069f ("KVM: PPC: Book3S HV: XIVE: Make VP block size configurable") updated kvmppc_xive_vcpu_id_valid() in a way that allows userspace to trigger an assertion in skiboot and crash the host: [ 696.186248988,3] XIVE[ IC 08 ] eq_blk != vp_blk (0 vs. 1) for target 0x4300008c/0 [ 696.186314757,0] Assert fail: hw/xive.c:2370:0 [ 696.186342458,0] Aborting! xive-kvCPU 0043 Backtrace: S: 0000000031e2b8f0 R: 0000000030013840 .backtrace+0x48 S: 0000000031e2b990 R: 000000003001b2d0 ._abort+0x4c S: 0000000031e2ba10 R: 000000003001b34c .assert_fail+0x34 S: 0000000031e2ba90 R: 0000000030058984 .xive_eq_for_target.part.20+0xb0 S: 0000000031e2bb40 R: 0000000030059fdc .xive_setup_silent_gather+0x2c S: 0000000031e2bc20 R: 000000003005a334 .opal_xive_set_vp_info+0x124 S: 0000000031e2bd20 R: 00000000300051a4 opal_entry+0x134 --- OPAL call token: 0x8a caller R1: 0xc000001f28563850 --- XIVE maintains the interrupt context state of non-dispatched vCPUs in an internal VP structure. We allocate a bunch of those on startup to accommodate all possible vCPUs. Each VP has an id, that we derive from the vCPU id for efficiency: static inline u32 kvmppc_xive_vp(struct kvmppc_xive *xive, u32 server) { return xive->vp_base + kvmppc_pack_vcpu_id(xive->kvm, server); } The KVM XIVE device used to allocate KVM_MAX_VCPUS VPs. This was limitting the number of concurrent VMs because the VP space is limited on the HW. Since most of the time, VMs run with a lot less vCPUs, commit 062cfab7069f ("KVM: PPC: Book3S HV: XIVE: Make VP block size configurable") gave the possibility for userspace to tune the size of the VP block through the KVM_DEV_XIVE_NR_SERVERS attribute. The check in kvmppc_pack_vcpu_id() was changed from cpu < KVM_MAX_VCPUS * xive->kvm->arch.emul_smt_mode to cpu < xive->nr_servers * xive->kvm->arch.emul_smt_mode The previous check was based on the fact that the VP block had KVM_MAX_VCPUS entries and that kvmppc_pack_vcpu_id() guarantees that packed vCPU ids are below KVM_MAX_VCPUS. We've changed the size of the VP block, but kvmppc_pack_vcpu_id() has nothing to do with it and it certainly doesn't ensure that the packed vCPU ids are below xive->nr_servers. kvmppc_xive_vcpu_id_valid() might thus return true when the VM was configured with a non-standard VSMT mode, even if the packed vCPU id is higher than what we expect. We end up using an unallocated VP id, which confuses OPAL. The assert in OPAL is probably abusive and should be converted to a regular error that the kernel can handle, but we shouldn't really use broken VP ids in the first place. Fix kvmppc_xive_vcpu_id_valid() so that it checks the packed vCPU id is below xive->nr_servers, which is explicitly what we want. Fixes: 062cfab7069f ("KVM: PPC: Book3S HV: XIVE: Make VP block size configurable") Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/160673876747.695514.1809676603724514920.stgit@bahia.lan
2020-11-30powerpc/pseries: Pass MSI affinity to irq_create_mapping()Laurent Vivier1-1/+2
With virtio multiqueue, normally each queue IRQ is mapped to a CPU. Commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity") exposed an existing shortcoming of the arch code by moving virtio_scsi to the automatic IRQ affinity assignment. The affinity is correctly computed in msi_desc but this is not applied to the system IRQs. It appears the affinity is correctly passed to rtas_setup_msi_irqs() but lost at this point and never passed to irq_domain_alloc_descs() (see commit 06ee6d571f0e ("genirq: Add affinity hint to irq allocation")) because irq_create_mapping() doesn't take an affinity parameter. Use the new irq_create_mapping_affinity() function, which allows to forward the affinity setting from rtas_setup_msi_irqs() to irq_domain_alloc_descs(). With this change, the virtqueues are correctly dispatched between the CPUs on pseries. Fixes: e75eafb9b039 ("genirq/msi: Switch to new irq spreading infrastructure") Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201126082852.1178497-3-lvivier@redhat.com
2020-11-29Merge tag 'locking-urgent-2020-11-29' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Thomas Gleixner: "Two more places which invoke tracing from RCU disabled regions in the idle path. Similar to the entry path the low level idle functions have to be non-instrumentable" * tag 'locking-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: intel_idle: Fix intel_idle() vs tracing sched/idle: Fix arch_cpu_idle() vs tracing
2020-11-28Merge tag 'asm-generic-fixes-5.10-2' of ↵Linus Torvalds2-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic fix from Arnd Bergmann: "Add correct MAX_POSSIBLE_PHYSMEM_BITS setting to asm-generic. This is a single bugfix for a bug that Stefan Agner found on 32-bit Arm, but that exists on several other architectures" * tag 'asm-generic-fixes-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed
2020-11-27Merge tag 'powerpc-5.10-4' of ↵Linus Torvalds5-8/+18
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Some more powerpc fixes for 5.10: - regression fix for a boot failure on some 32-bit machines. - fix for host crashes in the KVM system reset handling. - fix for a possible oops in the KVM XIVE interrupt handling on Power9. - fix for host crashes triggerable via the KVM emulated MMIO handling when running HPT guests. - a couple of small build fixes. Thanks to Andreas Schwab, Cédric Le Goater, Christophe Leroy, Erhard Furtner, Greg Kurz, Greg Kurz, Németh Márton, Nicholas Piggin, Nick Desaulniers, Serge Belyshev, and Stephen Rothwell" * tag 'powerpc-5.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Fix allnoconfig build since uaccess flush powerpc/64s/exception: KVM Fix for host DSI being taken in HPT guest MMU context powerpc: Drop -me200 addition to build flags KVM: PPC: Book3S HV: XIVE: Fix possible oops when accessing ESB page powerpc/64s: Fix KVM system reset handling when CONFIG_PPC_PSERIES=y powerpc/32s: Use relocation offset when setting early hash table
2020-11-27powerpc/numa: Fix a regression on memoryless node 0Srikar Dronamraju1-2/+1
Commit e75130f20b1f ("powerpc/numa: Offline memoryless cpuless node 0") offlines node 0 and expects nodes to be subsequently onlined when CPUs or nodes are detected. Commit 6398eaa26816 ("powerpc/numa: Prefer node id queried from vphn") skips onlining node 0 when CPUs are associated with node 0. On systems with node 0 having CPUs but no memory, this causes node 0 be marked offline. This causes issues at boot time when trying to set memory node for online CPUs while building the zonelist. 0:mon> t [link register ] c000000000400354 __build_all_zonelists+0x164/0x280 [c00000000161bda0] c0000000016533c8 node_states+0x20/0xa0 (unreliable) [c00000000161bdc0] c000000000400384 __build_all_zonelists+0x194/0x280 [c00000000161be30] c000000001041800 build_all_zonelists_init+0x4c/0x118 [c00000000161be80] c0000000004020d0 build_all_zonelists+0x190/0x1b0 [c00000000161bef0] c000000001003cf8 start_kernel+0x18c/0x6a8 [c00000000161bf90] c00000000000adb4 start_here_common+0x1c/0x3e8 0:mon> r R00 = c000000000400354 R16 = 000000000b57a0e8 R01 = c00000000161bda0 R17 = 000000000b57a6b0 R02 = c00000000161ce00 R18 = 000000000b5afee8 R03 = 0000000000000000 R19 = 000000000b6448a0 R04 = 0000000000000000 R20 = fffffffffffffffd R05 = 0000000000000000 R21 = 0000000001400000 R06 = 0000000000000000 R22 = 000000001ec00000 R07 = 0000000000000001 R23 = c000000001175580 R08 = 0000000000000000 R24 = c000000001651ed8 R09 = c0000000017e84d8 R25 = c000000001652480 R10 = 0000000000000000 R26 = c000000001175584 R11 = c000000c7fac0d10 R27 = c0000000019568d0 R12 = c000000000400180 R28 = 0000000000000000 R13 = c000000002200000 R29 = c00000000164dd78 R14 = 000000000b579f78 R30 = 0000000000000000 R15 = 000000000b57a2b8 R31 = c000000001175584 pc = c000000000400194 local_memory_node+0x24/0x80 cfar= c000000000074334 mcount+0xc/0x10 lr = c000000000400354 __build_all_zonelists+0x164/0x280 msr = 8000000002001033 cr = 44002284 ctr = c000000000400180 xer = 0000000000000001 trap = 380 dar = 0000000000001388 dsisr = c00000000161bc90 0:mon> Fix this by setting node to be online while onlining CPUs that belong to node 0. Fixes: e75130f20b1f ("powerpc/numa: Offline memoryless cpuless node 0") Fixes: 6398eaa26816 ("powerpc/numa: Prefer node id queried from vphn") Reported-by: Milan Mohanty <milmohan@in.ibm.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201127053738.10085-1-srikar@linux.vnet.ibm.com
2020-11-27powerpc/dma: Fallback to dma_ops when persistent memory presentAlexey Kardashevskiy3-12/+111
So far we have been using huge DMA windows to map all the RAM available. The RAM is normally mapped to the VM address space contiguously, and there is always a reasonable upper limit for possible future hot plugged RAM which makes it easy to map all RAM via IOMMU. Now there is persistent memory ("ibm,pmemory" in the FDT) which (unlike normal RAM) can map anywhere in the VM space beyond the maximum RAM size and since it can be used for DMA, it requires extending the huge window up to MAX_PHYSMEM_BITS which requires hypervisor support for: 1. huge TCE tables; 2. multilevel TCE tables; 3. huge IOMMU pages. Certain hypervisors cannot do either so the only option left is restricting the huge DMA window to include only RAM and fallback to the default DMA window for persistent memory. This defines arch_dma_map_direct/etc to allow generic DMA code perform additional checks on whether direct DMA is still possible. This checks if the system has persistent memory. If it does not, the DMA bypass mode is selected, i.e. * dev->bus_dma_limit = 0 * dev->dma_ops_bypass = true <- this avoid calling dma_ops for mapping. If there is such memory, this creates identity mapping only for RAM and sets the dev->bus_dma_limit to let the generic code decide whether to call into the direct DMA or the indirect DMA ops. This should not change the existing behaviour when no persistent memory as dev->dma_ops_bypass is expected to be set. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-11-27crypto: powerpc/sha256-spe - Fix sparse endianness warningHerbert Xu1-1/+1
This patch fixes a sparse endianness warning in sha256-spe. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-11-26powerpc/64s: Trim offlined CPUs from mm_cpumasksNicholas Piggin5-0/+40
When offlining a CPU, powerpc/64s does not flush TLBs, rather it just leaves the CPU set in mm_cpumasks, so it continues to receive TLBIEs to manage its TLBs. However the exit_flush_lazy_tlbs() function expects that after returning, all CPUs (except self) have flushed TLBs for that mm, in which case TLBIEL can be used for this flush. This breaks for offline CPUs because they don't get the IPI to flush their TLB. This can lead to stale translations. Fix this by clearing the CPU from mm_cpumasks, then flushing all TLBs before going offline. These offlined CPU bits stuck in the cpumask also prevents the cpumask from being trimmed back to local mode, which means continual broadcast IPIs or TLBIEs are needed for TLB flushing. This patch prevents that situation too. A cast of many were involved in working this out, but in particular Milton, Aneesh, Paul made key discoveries. Fixes: 0cef77c7798a7 ("powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Debugged-by: Milton Miller <miltonm@us.ibm.com> Debugged-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Debugged-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126102530.691335-5-npiggin@gmail.com
2020-11-26powerpc/64s/pseries: Fix hash tlbiel_all_isa300 for guest kernelsNicholas Piggin1-7/+14
tlbiel_all() can not be usable in !HVMODE when running hash presently, remove HV privileged flushes when running in guest to make it usable. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126102530.691335-3-npiggin@gmail.com
2020-11-26powerpc/64s: Fix hash ISA v3.0 TLBIEL instruction generationNicholas Piggin1-1/+1
A typo has the R field of the instruction assigned by lucky dip a la register allocator. Fixes: d4748276ae14c ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126102530.691335-2-npiggin@gmail.com
2020-11-26Merge remote-tracking branch 'origin/master' into perf/corePeter Zijlstra257-3153/+4076
Further perf/core patches will depend on: d3f7b1bb2040 ("mm/gup: fix gup_fast with dynamic page table folding") which is already in Linus' tree.