summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)AuthorFilesLines
2020-12-03powerpc/signal32: Misc changes to make handle_[rt_]_signal32() more similarChristophe Leroy1-10/+14
Miscellaneous changes to clean and make handle_signal32() and handle_rt_signal32() even more similar. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/df0bc8c3b8fa96390c46f611df79b2a94ac21844.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/signal32: Rename local pointers in handle_rt_signal32()Christophe Leroy1-26/+25
Rename pointers in handle_rt_signal32() to make it more similar to handle_signal32() tm_frame becomes tm_mctx frame becomes mctx rt_sf becomes frame Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/be77477b0f05397876015b218e36548ee8f5e10b.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/signal32: Move handle_signal32() close to handle_rt_signal32()Christophe Leroy1-85/+85
Those two functions are similar and serving the same purpose. To ease refactorisation, move them close to each other. This is pure move, no code change, no cosmetic. Yes, checkpatch is not happy, most will clear later. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/dbce67900bf566bcf40179467bf1eb500814c405.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/signal32: Simplify logging in handle_rt_signal32()Christophe Leroy1-5/+1
If something is bad in the frame, there is no point in knowing which part of the frame exactly is wrong as it got allocated as a single block. Always print the root address of the frame in case of failed user access, just like handle_signal32(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/691895bd31fee89a2d8370befd66ad4eff5b63f2.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/signal: Refactor bad frame loggingChristophe Leroy4-43/+21
The logging of bad frame appears half a dozen of times and is pretty similar. Create signal_fault() fonction to perform that logging. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/fa094445c119fc00315e1c13783b493346306c6a.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/signal: Call get_tm_stackpointer() from get_sigframe()Christophe Leroy4-10/+11
Instead of calling get_tm_stackpointer() from the caller, call it directly from get_sigframe(). This avoids a double call and allows get_tm_stackpointer() to become static and be inlined into get_sigframe() by GCC. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/abfdc105b8b28c4eb3ab9a26297d17f302b600ea.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/signal: Remove get_clean_sp()Christophe Leroy1-1/+4
get_clean_sp() is only used once in kernel/signal.c . GCC is smart enough to see that x & 0xffffffff is a nop calculation on PPC32, no need of a special PPC32 trivial version. Include the logic from the PPC64 version of get_clean_sp() directly in get_sigframe(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/13ef6510ce30a4867e043157b93af5bb8c67fb3b.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/signal: Move access_ok() out of get_sigframe()Christophe Leroy3-7/+3
This access_ok() will soon be performed by user_access_begin(). So move it out of get_sigframe(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/900b93744732ed0887f28f5b6a40730fb04a43fa.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/signal: Remove BUG_ON() in handler_signal functionsChristophe Leroy2-6/+0
There is already the same BUG_ON() check in do_signal() which is the only caller of handle_rt_signal64() handle_rt_signal32() and handle_signal32(). Remove those three redundant BUG_ON(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3582e10a341d523c9c3f1ac925c3aaefc9d9293d.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/32s: Allow deselecting CONFIG_PPC_FPU on mpc832xChristophe Leroy1-0/+4
The e300c2 core which is embedded in mpc832x CPU doesn't have an FPU. Make it possible to not select CONFIG_PPC_FPU when building a kernel dedicated to that target. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/fcdc60d85baf80eaa0a7f3261d9d889282068216.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/signal: Don't manage floating point regs when no FPUChristophe Leroy8-3/+43
There is no point in copying floating point regs when there is no FPU and MATH_EMULATION is not selected. Create a new CONFIG_PPC_FPU_REGS bool that is selected by CONFIG_MATH_EMULATION and CONFIG_PPC_FPU, and use it to opt out everything related to fp_state in thread_struct. The asm const used only by fpu.S are opted out with CONFIG_PPC_FPU as fpu.S build is conditionnal to CONFIG_PPC_FPU. The following app spends approx 8.1 seconds system time on an 8xx without the patch, and 7.0 seconds with the patch (13.5% reduction). On an 832x, it spends approx 2.6 seconds system time without the patch and 2.1 seconds with the patch (19% reduction). void sigusr1(int sig) { } int main(int argc, char **argv) { int i = 100000; signal(SIGUSR1, sigusr1); for (;i--;) raise(SIGUSR1); exit(0); } Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7569070083e6cd5b279bb5023da601aba3c06f3c.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/ptrace: Create ptrace_get_fpr() and ptrace_put_fpr()Christophe Leroy4-29/+56
On the same model as ptrace_get_reg() and ptrace_put_reg(), create ptrace_get_fpr() and ptrace_put_fpr() to get/set the floating points registers. We move the boundary checkings in them. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/24a1baedea7f7ae7b6bf27be98bab6d01b5ca2c1.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/ptrace: Consolidate reg index calculationChristophe Leroy1-14/+4
Today we have: #ifdef CONFIG_PPC32 index = addr >> 2; if ((addr & 3) || child->thread.regs == NULL) #else index = addr >> 3; if ((addr & 7)) #endif sizeof(long) has value 4 for PPC32 and value 8 for PPC64. Dividing by 4 is equivalent to >> 2 and dividing by 8 is equivalent to >> 3. And 3 and 7 are respectively (sizeof(long) - 1). Use sizeof(long) to get rid of the #ifdef CONFIG_PPC32 and consolidate the calculation and checking. thread.regs have to be not NULL on both PPC32 and PPC64 so adding that test on PPC64 is harmless. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3cd1e284e93c60db981659585e18d1f6bb73ed2f.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/ptrace: Move declaration of ptrace_get_reg() and ptrace_set_reg()Christophe Leroy2-0/+5
ptrace_get_reg() and ptrace_set_reg() are only used internally by ptrace. Move them in arch/powerpc/kernel/ptrace/ptrace-decl.h Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/376c258267aeae54a4423bc4a2e107a9611f0039.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/signal: Move inline functions in signal.hChristophe Leroy2-38/+33
To really be inlined, the functions need to be defined in the same C file as the caller, or in an included header. Move functions defined inline from signal .c in signal.h Fixes: 3dd4eb83a9c0 ("powerpc: move common register copy functions from signal_32.c to signal.c") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/35b1bd44a1a66f5bcf9b457a1c480ac8d5ef50b2.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32Christophe Leroy3-0/+16
Provides __kernel_clock_gettime64() on vdso32. This is the 64 bits version of __kernel_clock_gettime() which is y2038 compliant. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126131006.2431205-9-mpe@ellerman.id.au
2020-12-03powerpc/vdso: Switch VDSO to generic C implementation.Christophe Leroy9-664/+66
With the C VDSO, the performance is slightly lower, but it is worth it as it will ease maintenance and evolution, and also brings clocks that are not supported with the ASM VDSO. On an 8xx at 132 MHz, vdsotest with the ASM VDSO: gettimeofday: vdso: 828 nsec/call clock-getres-realtime-coarse: vdso: 391 nsec/call clock-gettime-realtime-coarse: vdso: 614 nsec/call clock-getres-realtime: vdso: 460 nsec/call clock-gettime-realtime: vdso: 876 nsec/call clock-getres-monotonic-coarse: vdso: 399 nsec/call clock-gettime-monotonic-coarse: vdso: 691 nsec/call clock-getres-monotonic: vdso: 460 nsec/call clock-gettime-monotonic: vdso: 1026 nsec/call On an 8xx at 132 MHz, vdsotest with the C VDSO: gettimeofday: vdso: 955 nsec/call clock-getres-realtime-coarse: vdso: 545 nsec/call clock-gettime-realtime-coarse: vdso: 592 nsec/call clock-getres-realtime: vdso: 545 nsec/call clock-gettime-realtime: vdso: 941 nsec/call clock-getres-monotonic-coarse: vdso: 545 nsec/call clock-gettime-monotonic-coarse: vdso: 591 nsec/call clock-getres-monotonic: vdso: 545 nsec/call clock-gettime-monotonic: vdso: 940 nsec/call It is even better for gettime with monotonic clocks. Unsupported clocks with ASM VDSO: clock-gettime-boottime: vdso: 3851 nsec/call clock-gettime-tai: vdso: 3852 nsec/call clock-gettime-monotonic-raw: vdso: 3396 nsec/call Same clocks with C VDSO: clock-gettime-tai: vdso: 941 nsec/call clock-gettime-monotonic-raw: vdso: 1001 nsec/call clock-gettime-monotonic-coarse: vdso: 591 nsec/call On an 8321E at 333 MHz, vdsotest with the ASM VDSO: gettimeofday: vdso: 220 nsec/call clock-getres-realtime-coarse: vdso: 102 nsec/call clock-gettime-realtime-coarse: vdso: 178 nsec/call clock-getres-realtime: vdso: 129 nsec/call clock-gettime-realtime: vdso: 235 nsec/call clock-getres-monotonic-coarse: vdso: 105 nsec/call clock-gettime-monotonic-coarse: vdso: 208 nsec/call clock-getres-monotonic: vdso: 129 nsec/call clock-gettime-monotonic: vdso: 274 nsec/call On an 8321E at 333 MHz, vdsotest with the C VDSO: gettimeofday: vdso: 272 nsec/call clock-getres-realtime-coarse: vdso: 160 nsec/call clock-gettime-realtime-coarse: vdso: 184 nsec/call clock-getres-realtime: vdso: 166 nsec/call clock-gettime-realtime: vdso: 281 nsec/call clock-getres-monotonic-coarse: vdso: 160 nsec/call clock-gettime-monotonic-coarse: vdso: 184 nsec/call clock-getres-monotonic: vdso: 169 nsec/call clock-gettime-monotonic: vdso: 275 nsec/call On a Power9 Nimbus DD2.2 at 3.8GHz, with the ASM VDSO: clock-gettime-monotonic: vdso: 35 nsec/call clock-getres-monotonic: vdso: 16 nsec/call clock-gettime-monotonic-coarse: vdso: 18 nsec/call clock-getres-monotonic-coarse: vdso: 522 nsec/call clock-gettime-monotonic-raw: vdso: 598 nsec/call clock-getres-monotonic-raw: vdso: 520 nsec/call clock-gettime-realtime: vdso: 34 nsec/call clock-getres-realtime: vdso: 16 nsec/call clock-gettime-realtime-coarse: vdso: 18 nsec/call clock-getres-realtime-coarse: vdso: 517 nsec/call getcpu: vdso: 8 nsec/call gettimeofday: vdso: 25 nsec/call And with the C VDSO: clock-gettime-monotonic: vdso: 37 nsec/call clock-getres-monotonic: vdso: 20 nsec/call clock-gettime-monotonic-coarse: vdso: 21 nsec/call clock-getres-monotonic-coarse: vdso: 19 nsec/call clock-gettime-monotonic-raw: vdso: 38 nsec/call clock-getres-monotonic-raw: vdso: 20 nsec/call clock-gettime-realtime: vdso: 37 nsec/call clock-getres-realtime: vdso: 20 nsec/call clock-gettime-realtime-coarse: vdso: 20 nsec/call clock-getres-realtime-coarse: vdso: 19 nsec/call getcpu: vdso: 8 nsec/call gettimeofday: vdso: 28 nsec/call Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126131006.2431205-8-mpe@ellerman.id.au
2020-12-03powerpc/vdso: Prepare for switching VDSO to generic C implementation.Christophe Leroy2-0/+57
Prepare for switching VDSO to generic C implementation in following patch. Here, we: - Prepare the helpers to call the C VDSO functions - Prepare the required callbacks for the C VDSO functions - Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES - Add the C trampolines to the generic C VDSO functions powerpc is a bit special for VDSO as well as system calls in the way that it requires setting CR SO bit which cannot be done in C. Therefore, entry/exit needs to be performed in ASM. Implementing __arch_get_vdso_data() would clobber the link register, requiring the caller to save it. As the ASM calling function already has to set a stack frame and saves the link register before calling the C vdso function, retriving the vdso data pointer there is lighter. Implement __arch_vdso_capable() and always return true. Provide vdso_shift_ns(), as the generic x >> s gives the following bad result: 18: 35 25 ff e0 addic. r9,r5,-32 1c: 41 80 00 10 blt 2c <shift+0x14> 20: 7c 64 4c 30 srw r4,r3,r9 24: 38 60 00 00 li r3,0 ... 2c: 54 69 08 3c rlwinm r9,r3,1,0,30 30: 21 45 00 1f subfic r10,r5,31 34: 7c 84 2c 30 srw r4,r4,r5 38: 7d 29 50 30 slw r9,r9,r10 3c: 7c 63 2c 30 srw r3,r3,r5 40: 7d 24 23 78 or r4,r9,r4 In our case the shift is always <= 32. In addition, the upper 32 bits of the result are likely nul. Lets GCC know it, it also optimises the following calculations. With the patch, we get: 0: 21 25 00 20 subfic r9,r5,32 4: 7c 69 48 30 slw r9,r3,r9 8: 7c 84 2c 30 srw r4,r4,r5 c: 7d 24 23 78 or r4,r9,r4 10: 7c 63 2c 30 srw r3,r3,r5 Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126131006.2431205-6-mpe@ellerman.id.au
2020-12-03powerpc: inline iomap accessorsChristophe Leroy1-166/+0
ioreadXX()/ioreadXXbe() accessors are equivalent to ppc in_leXX()/in_be16() accessors but they are not inlined. Since commit 0eb573682872 ("powerpc/kerenl: Enable EEH for IO accessors"), the 'le' versions are equivalent to the ones defined in asm-generic/io.h, allthough the ones there are inlined. Include asm-generic/io.h to get them. Keep ppc versions of the 'be' ones as they are optimised, but make them inline in ppc io.h. This reduces the size of ppc64e_defconfig build by 3 kbytes: text data bss dec hex filename 10160733 4343422 562972 15067127 e5e7f7 vmlinux.before 10159239 4341590 562972 15063801 e5daf9 vmlinux.after A typical function using ioread and iowrite before the change: c00000000066a3c4 <.ata_bmdma_stop>: c00000000066a3c4: 7c 08 02 a6 mflr r0 c00000000066a3c8: fb c1 ff f0 std r30,-16(r1) c00000000066a3cc: f8 01 00 10 std r0,16(r1) c00000000066a3d0: fb e1 ff f8 std r31,-8(r1) c00000000066a3d4: f8 21 ff 81 stdu r1,-128(r1) c00000000066a3d8: eb e3 00 00 ld r31,0(r3) c00000000066a3dc: eb df 00 98 ld r30,152(r31) c00000000066a3e0: 7f c3 f3 78 mr r3,r30 c00000000066a3e4: 4b 9b 6f 7d bl c000000000021360 <.ioread8> c00000000066a3e8: 60 00 00 00 nop c00000000066a3ec: 7f c4 f3 78 mr r4,r30 c00000000066a3f0: 54 63 06 3c rlwinm r3,r3,0,24,30 c00000000066a3f4: 4b 9b 70 4d bl c000000000021440 <.iowrite8> c00000000066a3f8: 60 00 00 00 nop c00000000066a3fc: 7f e3 fb 78 mr r3,r31 c00000000066a400: 38 21 00 80 addi r1,r1,128 c00000000066a404: e8 01 00 10 ld r0,16(r1) c00000000066a408: eb c1 ff f0 ld r30,-16(r1) c00000000066a40c: 7c 08 03 a6 mtlr r0 c00000000066a410: eb e1 ff f8 ld r31,-8(r1) c00000000066a414: 4b ff ff 8c b c00000000066a3a0 <.ata_sff_dma_pause> The same function with this patch: c000000000669cb4 <.ata_bmdma_stop>: c000000000669cb4: e8 63 00 00 ld r3,0(r3) c000000000669cb8: e9 43 00 98 ld r10,152(r3) c000000000669cbc: 7c 00 04 ac hwsync c000000000669cc0: 89 2a 00 00 lbz r9,0(r10) c000000000669cc4: 0c 09 00 00 twi 0,r9,0 c000000000669cc8: 4c 00 01 2c isync c000000000669ccc: 55 29 06 3c rlwinm r9,r9,0,24,30 c000000000669cd0: 7c 00 04 ac hwsync c000000000669cd4: 99 2a 00 00 stb r9,0(r10) c000000000669cd8: a1 4d 06 f0 lhz r10,1776(r13) c000000000669cdc: 2c 2a 00 00 cmpdi r10,0 c000000000669ce0: 41 c2 00 08 beq- c000000000669ce8 <.ata_bmdma_stop+0x34> c000000000669ce4: b1 4d 06 f2 sth r10,1778(r13) c000000000669ce8: 4b ff ff a8 b c000000000669c90 <.ata_sff_dma_pause> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/18b357d68c4cde149f75c7a1031c850925cd8128.1605981539.git.christophe.leroy@csgroup.eu
2020-12-02sched/vtime: Consolidate IRQ time accountingFrederic Weisbecker1-16/+40
The 3 architectures implementing CONFIG_VIRT_CPU_ACCOUNTING_NATIVE all have their own version of irq time accounting that dispatch the cputime to the appropriate index: hardirq, softirq, system, idle, guest... from an all-in-one function. Instead of having these ad-hoc versions, move the cputime destination dispatch decision to the core code and leave only the actual per-index cputime accounting to the architecture. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201202115732.27827-4-frederic@kernel.org
2020-11-29Merge tag 'locking-urgent-2020-11-29' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Thomas Gleixner: "Two more places which invoke tracing from RCU disabled regions in the idle path. Similar to the entry path the low level idle functions have to be non-instrumentable" * tag 'locking-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: intel_idle: Fix intel_idle() vs tracing sched/idle: Fix arch_cpu_idle() vs tracing
2020-11-27powerpc/dma: Fallback to dma_ops when persistent memory presentAlexey Kardashevskiy1-2/+69
So far we have been using huge DMA windows to map all the RAM available. The RAM is normally mapped to the VM address space contiguously, and there is always a reasonable upper limit for possible future hot plugged RAM which makes it easy to map all RAM via IOMMU. Now there is persistent memory ("ibm,pmemory" in the FDT) which (unlike normal RAM) can map anywhere in the VM space beyond the maximum RAM size and since it can be used for DMA, it requires extending the huge window up to MAX_PHYSMEM_BITS which requires hypervisor support for: 1. huge TCE tables; 2. multilevel TCE tables; 3. huge IOMMU pages. Certain hypervisors cannot do either so the only option left is restricting the huge DMA window to include only RAM and fallback to the default DMA window for persistent memory. This defines arch_dma_map_direct/etc to allow generic DMA code perform additional checks on whether direct DMA is still possible. This checks if the system has persistent memory. If it does not, the DMA bypass mode is selected, i.e. * dev->bus_dma_limit = 0 * dev->dma_ops_bypass = true <- this avoid calling dma_ops for mapping. If there is such memory, this creates identity mapping only for RAM and sets the dev->bus_dma_limit to let the generic code decide whether to call into the direct DMA or the indirect DMA ops. This should not change the existing behaviour when no persistent memory as dev->dma_ops_bypass is expected to be set. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-11-26powerpc/ptrace: Hard wire PT_SOFTE value to 1 in gpr_get() tooOleg Nesterov2-2/+12
The commit a8a4b03ab95f ("powerpc: Hard wire PT_SOFTE value to 1 in ptrace & signals") changed ptrace_get_reg(PT_SOFTE) to report 0x1, but PTRACE_GETREGS still copies pt_regs->softe as is. This is not consistent and this breaks the user-regs-peekpoke test from https://sourceware.org/systemtap/wiki/utrace/tests/ Reported-by: Jan Kratochvil <jan.kratochvil@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201119160247.GB5188@redhat.com
2020-11-26powerpc/ptrace: Simplify gpr_get()/tm_cgpr_get()Oleg Nesterov2-15/+7
gpr_get() does membuf_write() twice to override pt_regs->msr in between. We can call membuf_write() once and change ->msr in the kernel buffer, this simplifies the code and the next fix. The patch adds a new simple helper, membuf_at(offs), it returns the new membuf which can be safely used after membuf_write(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> [mpe: Fixup some minor whitespace issues noticed by Christophe] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201119160221.GA5188@redhat.com
2020-11-25Merge branch 'fixes' into nextMichael Ellerman9-109/+199
Merge our fixes branch, in particular to bring in the changes for the entry/uaccess flush.
2020-11-24sched/idle: Fix arch_cpu_idle() vs tracingPeter Zijlstra1-2/+2
We call arch_cpu_idle() with RCU disabled, but then use local_irq_{en,dis}able(), which invokes tracing, which relies on RCU. Switch all arch_cpu_idle() implementations to use raw_local_irq_{en,dis}able() and carefully manage the lockdep,rcu,tracing state like we do in entry. (XXX: we really should change arch_cpu_idle() to not return with interrupts enabled) Reported-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org
2020-11-23Merge tag 'powerpc-cve-2020-4788' into fixesMichael Ellerman4-40/+178
From Daniel's cover letter: IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch series flushes the L1 cache on kernel entry (patch 2) and after the kernel performs any user accesses (patch 3). It also adds a self-test and performs some related cleanups.
2020-11-19Merge tag 'powerpc-cve-2020-4788' of ↵Linus Torvalds4-40/+178
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Fixes for CVE-2020-4788. From Daniel's cover letter: IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch series flushes the L1 cache on kernel entry (patch 2) and after the kernel performs any user accesses (patch 3). It also adds a self-test and performs some related cleanups" * tag 'powerpc-cve-2020-4788' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: rename pnv|pseries_setup_rfi_flush to _setup_security_mitigations selftests/powerpc: refactor entry and rfi_flush tests selftests/powerpc: entry flush test powerpc: Only include kup-radix.h for 64-bit Book3S powerpc/64s: flush L1D after user accesses powerpc/64s: flush L1D on kernel entry selftests/powerpc: rfi_flush: disable entry flush if present
2020-11-19powerpc: Only include kup-radix.h for 64-bit Book3SMichael Ellerman1-1/+1
In kup.h we currently include kup-radix.h for all 64-bit builds, which includes Book3S and Book3E. The latter doesn't make sense, Book3E never uses the Radix MMU. This has worked up until now, but almost by accident, and the recent uaccess flush changes introduced a build breakage on Book3E because of the bad structure of the code. So disentangle things so that we only use kup-radix.h for Book3S. This requires some more stubs in kup.h and fixing an include in syscall_64.c. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-11-19powerpc/64s: flush L1D after user accessesNicholas Piggin3-59/+95
IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch flushes the L1 cache after user accesses. This is part of the fix for CVE-2020-4788. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-11-19powerpc/64s: flush L1D on kernel entryNicholas Piggin3-1/+103
IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch flushes the L1 cache on kernel entry. This is part of the fix for CVE-2020-4788. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-11-19powerpc: Remove RFI macroChristophe Leroy2-7/+28
RFI macro is just there to add an infinite loop past rfi in order to avoid prefetch on 40x in half a dozen of places in entry_32 and head_32. Those places are already full of #ifdefs, so just add a few more to explicitely show those loops and remove RFI. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f7e9cb9e9240feec63cb330abf40b67d1aad852f.1604854583.git.christophe.leroy@csgroup.eu
2020-11-19powerpc: Replace RFI by rfi on book3s/32 and bookeChristophe Leroy3-13/+13
For book3s/32 and for booke, RFI is just an rfi. Only 40x has a non trivial RFI. CONFIG_PPC_RTAS is never selected by 40x platforms. Make it more explicit by replacing RFI by rfi wherever possible. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b901ddfdeb8a0a3b7cb59999599cdfde1bbfe834.1604854583.git.christophe.leroy@csgroup.eu
2020-11-19powerpc/64s: Replace RFI by RFI_TO_KERNEL and remove RFIChristophe Leroy1-2/+7
In head_64.S, we have two places using RFI to return to kernel. Use RFI_TO_KERNEL instead. They are the two only places using RFI on book3s/64, so the RFI macro can go away. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7719261b0a0d2787772339484c33eb809723bca7.1604854583.git.christophe.leroy@csgroup.eu
2020-11-19powerpc: Use the common INIT_DATA_SECTION macro in vmlinux.lds.SYouling Tang1-18/+1
Use the common INIT_DATA_SECTION rule for the linker script in an effort to regularize the linker script. Signed-off-by: Youling Tang <tangyouling@loongson.cn> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1604487550-20040-1-git-send-email-tangyouling@loongson.cn
2020-11-19powerpc: Avoid broken GCC __attribute__((optimize))Ard Biesheuvel4-9/+6
Commit 7053f80d9696 ("powerpc/64: Prevent stack protection in early boot") introduced a couple of uses of __attribute__((optimize)) with function scope, to disable the stack protector in some early boot code. Unfortunately, and this is documented in the GCC man pages [0], overriding function attributes for optimization is broken, and is only supported for debug scenarios, not for production: the problem appears to be that setting GCC -f flags using this method will cause it to forget about some or all other optimization settings that have been applied. So the only safe way to disable the stack protector is to disable it for the entire source file. [0] https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html Fixes: 7053f80d9696 ("powerpc/64: Prevent stack protection in early boot") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> [mpe: Drop one remaining use of __nostackprotector, reported by snowpatch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201028080433.26799-1-ardb@kernel.org
2020-11-19powerpc/64s: Convert some cpu_setup() and cpu_restore() functions to CJordan Niethe3-260/+275
The only thing keeping the cpu_setup() and cpu_restore() functions used in the cputable entries for Power7, Power8, Power9 and Power10 in assembly was cpu_restore() being called before there was a stack in generic_secondary_smp_init(). Commit ("powerpc/64: Set up a kernel stack for secondaries before cpu_restore()") means that it is now possible to use C. Rewrite the functions in C so they are a little bit easier to read. This is not changing their functionality. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> [mpe: Tweak copyright and authorship notes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201014072837.24539-2-jniethe5@gmail.com
2020-11-18powerpc: fix -Wimplicit-fallthroughNick Desaulniers2-0/+2
The "fallthrough" pseudo-keyword was added as a portable way to denote intentional fallthrough. Clang will still warn on cases where there is a fallthrough to an immediate break. Add explicit breaks for those cases. Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://github.com/ClangBuiltLinux/linux/issues/236 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-11-18powerpc/64s/exception: KVM Fix for host DSI being taken in HPT guest MMU contextNicholas Piggin1-4/+7
Commit 2284ffea8f0c ("powerpc/64s/exception: Only test KVM in SRR interrupts when PR KVM is supported") removed KVM guest tests from interrupts that do not set HV=1, when PR-KVM is not configured. This is wrong for HV-KVM HPT guest MMIO emulation case which attempts to load the faulting instruction word with MSR[DR]=1 and MSR[HV]=1 with the guest MMU context loaded. This can cause host DSI, DSLB interrupts which must test for KVM guest. Restore this and add a comment. Fixes: 2284ffea8f0c ("powerpc/64s/exception: Only test KVM in SRR interrupts when PR KVM is supported") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201117135617.3521127-1-npiggin@gmail.com
2020-11-16powerpc/64s: Fix KVM system reset handling when CONFIG_PPC_PSERIES=yNicholas Piggin1-2/+0
pseries guest kernels have a FWNMI handler for SRESET and MCE NMIs, which is basically the same as the regular handlers for those interrupts. The system reset FWNMI handler did not have a KVM guest test in it, although it probably should have because the guest can itself run guests. Commit 4f50541f6703b ("powerpc/64s/exception: Move all interrupt handlers to new style code gen macros") convert the handler faithfully to avoid a KVM test with a "clever" trick to modify the IKVM_REAL setting to 0 when the fwnmi handler is to be generated (PPC_PSERIES=y). This worked when the KVM test was generated in the interrupt entry handlers, but a later patch moved the KVM test to the common handler, and the common handler macro is expanded below the fwnmi entry. This prevents the KVM test from being generated even for the 0x100 entry point as well. The result is NMI IPIs in the host kernel when a guest is running will use gest registers. This goes particularly badly when an HPT guest is running and the MMU is set to guest mode. Remove this trickery and just generate the test always. Fixes: 9600f261acaa ("powerpc/64s/exception: Move KVM test to common code") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201114114743.3306283-1-npiggin@gmail.com
2020-11-13ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regsSteven Rostedt (VMware)1-1/+3
In preparation to have arguments of a function passed to callbacks attached to functions as default, change the default callback prototype to receive a struct ftrace_regs as the forth parameter instead of a pt_regs. For callbacks that set the FL_SAVE_REGS flag in their ftrace_ops flags, they will now need to get the pt_regs via a ftrace_get_regs() helper call. If this is called by a callback that their ftrace_ops did not have a FL_SAVE_REGS flag set, it that helper function will return NULL. This will allow the ftrace_regs to hold enough just to get the parameters and stack pointer, but without the worry that callbacks may have a pt_regs that is not completely filled. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-09powerpc: add support for TIF_NOTIFY_SIGNALJens Axboe1-1/+1
Wire up TIF_NOTIFY_SIGNAL handling for powerpc. Cc: linuxppc-dev@lists.ozlabs.org Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-11-08powerpc/32s: Use relocation offset when setting early hash tableChristophe Leroy1-1/+2
When calling early_hash_table(), the kernel hasn't been yet relocated to its linking address, so data must be addressed with relocation offset. Add relocation offset to write into Hash in early_hash_table(). Fixes: 69a1593abdbc ("powerpc/32s: Setup the early hash table at all time.") Reported-by: Erhard Furtner <erhard_f@mailbox.org> Reported-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Serge Belyshev <belyshev@depni.sinp.msu.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/9e225a856a8b22e0e77587ee22ab7a2f5bca8753.1604740029.git.christophe.leroy@csgroup.eu
2020-11-06ftrace: Add recording of functions that caused recursionSteven Rostedt (VMware)1-1/+1
This adds CONFIG_FTRACE_RECORD_RECURSION that will record to a file "recursed_functions" all the functions that caused recursion while a callback to the function tracer was running. Link: https://lkml.kernel.org/r/20201106023548.102375687@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Guo Ren <guoren@kernel.org> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Petr Mladek <pmladek@suse.com> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-csky@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Cc: live-patching@vger.kernel.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-06kprobes/ftrace: Add recursion protection to the ftrace callbackSteven Rostedt (VMware)1-1/+10
If a ftrace callback does not supply its own recursion protection and does not set the RECURSION_SAFE flag in its ftrace_ops, then ftrace will make a helper trampoline to do so before calling the callback instead of just calling the callback directly. The default for ftrace_ops is going to change. It will expect that handlers provide their own recursion protection, unless its ftrace_ops states otherwise. Link: https://lkml.kernel.org/r/20201028115613.140212174@goodmis.org Link: https://lkml.kernel.org/r/20201106023546.944907560@goodmis.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Petr Mladek <pmladek@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Guo Ren <guoren@kernel.org> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-csky@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-05powerpc/8xx: Manage _PAGE_ACCESSED through APG bits in L1 entryChristophe Leroy1-29/+7
When _PAGE_ACCESSED is not set, a minor fault is expected. To do this, TLB miss exception ANDs _PAGE_PRESENT and _PAGE_ACCESSED into the L2 entry valid bit. To simplify the processing and reduce the number of instructions in TLB miss exceptions, manage it as an APG bit and get it next to _PAGE_GUARDED bit to allow a copy in one go. Then declare the corresponding groups as handling all accesses as user accesses. As the PP bits always define user as No Access, it will generate a fault. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/80f488db230c6b0e7b3b990d72bd94a8a069e93e.1602492856.git.christophe.leroy@csgroup.eu
2020-11-05powerpc/8xx: Always fault when _PAGE_ACCESSED is not setChristophe Leroy1-12/+2
The kernel expects pte_young() to work regardless of CONFIG_SWAP. Make sure a minor fault is taken to set _PAGE_ACCESSED when it is not already set, regardless of the selection of CONFIG_SWAP. This adds at least 3 instructions to the TLB miss exception handlers fast path. Following patch will reduce this overhead. Also update the rotation instruction to the correct number of bits to reflect all changes done to _PAGE_ACCESSED over time. Fixes: d069cb4373fe ("powerpc/8xx: Don't touch ACCESSED when no SWAP.") Fixes: 5f356497c384 ("powerpc/8xx: remove unused _PAGE_WRITETHRU") Fixes: e0a8e0d90a9f ("powerpc/8xx: Handle PAGE_USER via APG bits") Fixes: 5b2753fc3e8a ("powerpc/8xx: Implementation of PAGE_EXEC") Fixes: a891c43b97d3 ("powerpc/8xx: Prepare handlers for _PAGE_HUGE for 512k pages.") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/af834e8a0f1fa97bfae65664950f0984a70c4750.1602492856.git.christophe.leroy@csgroup.eu
2020-11-05powerpc/40x: Always fault when _PAGE_ACCESSED is not setChristophe Leroy1-8/+0
The kernel expects pte_young() to work regardless of CONFIG_SWAP. Make sure a minor fault is taken to set _PAGE_ACCESSED when it is not already set, regardless of the selection of CONFIG_SWAP. Fixes: 2c74e2586bb9 ("powerpc/40x: Rework 40x PTE access and TLB miss") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b02ca2ed2d3676a096219b48c0f69ec982a75bcf.1602342801.git.christophe.leroy@csgroup.eu
2020-11-05powerpc/603: Always fault when _PAGE_ACCESSED is not setChristophe Leroy1-12/+0
The kernel expects pte_young() to work regardless of CONFIG_SWAP. Make sure a minor fault is taken to set _PAGE_ACCESSED when it is not already set, regardless of the selection of CONFIG_SWAP. Fixes: 84de6ab0e904 ("powerpc/603: don't handle PAGE_ACCESSED in TLB miss handlers.") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a44367744de54e2315b2f1a8cbbd7f88488072e0.1602342806.git.christophe.leroy@csgroup.eu
2020-11-02powerpc/64: Set up a kernel stack for secondaries before cpu_restore()Jordan Niethe2-6/+6
Currently in generic_secondary_smp_init(), cur_cpu_spec->cpu_restore() is called before a stack has been set up in r1. This was previously fine as the cpu_restore() functions were implemented in assembly and did not use a stack. However commit 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features") used __restore_cpu_cpufeatures() as the cpu_restore() function for a device-tree features based cputable entry. This is a C function and hence uses a stack in r1. generic_secondary_smp_init() is entered on the secondary cpus via the primary cpu using the OPAL call opal_start_cpu(). In OPAL, each hardware thread has its own stack. The OPAL call is ran in the primary's hardware thread. During the call, a job is scheduled on a secondary cpu that will start executing at the address of generic_secondary_smp_init(). Hence the value that will be left in r1 when the secondary cpu enters the kernel is part of that secondary cpu's individual OPAL stack. This means that __restore_cpu_cpufeatures() will write to that OPAL stack. This is not horribly bad as each hardware thread has its own stack and the call that enters the kernel from OPAL never returns, but it is still wrong and should be corrected. Create the temp kernel stack before calling cpu_restore(). As noted by mpe, for a kexec boot, the secondary CPUs are released from the spin loop at address 0x60 by smp_release_cpus() and then jump to generic_secondary_smp_init(). The call to smp_release_cpus() is in setup_arch(), and it comes before the call to emergency_stack_init(). emergency_stack_init() allocates an emergency stack in the PACA for each CPU. This address in the PACA is what is used to set up the temp kernel stack in generic_secondary_smp_init(). Move releasing the secondary CPUs to after the PACAs have been allocated an emergency stack, otherwise the PACA stack pointer will contain garbage and hence the temp kernel stack created from it will be broken. Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features") Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201014072837.24539-1-jniethe5@gmail.com