summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)AuthorFilesLines
2020-12-03powerpc/ptrace: Consolidate reg index calculationChristophe Leroy1-14/+4
Today we have: #ifdef CONFIG_PPC32 index = addr >> 2; if ((addr & 3) || child->thread.regs == NULL) #else index = addr >> 3; if ((addr & 7)) #endif sizeof(long) has value 4 for PPC32 and value 8 for PPC64. Dividing by 4 is equivalent to >> 2 and dividing by 8 is equivalent to >> 3. And 3 and 7 are respectively (sizeof(long) - 1). Use sizeof(long) to get rid of the #ifdef CONFIG_PPC32 and consolidate the calculation and checking. thread.regs have to be not NULL on both PPC32 and PPC64 so adding that test on PPC64 is harmless. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3cd1e284e93c60db981659585e18d1f6bb73ed2f.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/ptrace: Move declaration of ptrace_get_reg() and ptrace_set_reg()Christophe Leroy2-0/+5
ptrace_get_reg() and ptrace_set_reg() are only used internally by ptrace. Move them in arch/powerpc/kernel/ptrace/ptrace-decl.h Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/376c258267aeae54a4423bc4a2e107a9611f0039.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/signal: Move inline functions in signal.hChristophe Leroy2-38/+33
To really be inlined, the functions need to be defined in the same C file as the caller, or in an included header. Move functions defined inline from signal .c in signal.h Fixes: 3dd4eb83a9c0 ("powerpc: move common register copy functions from signal_32.c to signal.c") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/35b1bd44a1a66f5bcf9b457a1c480ac8d5ef50b2.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32Christophe Leroy3-0/+16
Provides __kernel_clock_gettime64() on vdso32. This is the 64 bits version of __kernel_clock_gettime() which is y2038 compliant. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126131006.2431205-9-mpe@ellerman.id.au
2020-12-03powerpc/vdso: Switch VDSO to generic C implementation.Christophe Leroy9-664/+66
With the C VDSO, the performance is slightly lower, but it is worth it as it will ease maintenance and evolution, and also brings clocks that are not supported with the ASM VDSO. On an 8xx at 132 MHz, vdsotest with the ASM VDSO: gettimeofday: vdso: 828 nsec/call clock-getres-realtime-coarse: vdso: 391 nsec/call clock-gettime-realtime-coarse: vdso: 614 nsec/call clock-getres-realtime: vdso: 460 nsec/call clock-gettime-realtime: vdso: 876 nsec/call clock-getres-monotonic-coarse: vdso: 399 nsec/call clock-gettime-monotonic-coarse: vdso: 691 nsec/call clock-getres-monotonic: vdso: 460 nsec/call clock-gettime-monotonic: vdso: 1026 nsec/call On an 8xx at 132 MHz, vdsotest with the C VDSO: gettimeofday: vdso: 955 nsec/call clock-getres-realtime-coarse: vdso: 545 nsec/call clock-gettime-realtime-coarse: vdso: 592 nsec/call clock-getres-realtime: vdso: 545 nsec/call clock-gettime-realtime: vdso: 941 nsec/call clock-getres-monotonic-coarse: vdso: 545 nsec/call clock-gettime-monotonic-coarse: vdso: 591 nsec/call clock-getres-monotonic: vdso: 545 nsec/call clock-gettime-monotonic: vdso: 940 nsec/call It is even better for gettime with monotonic clocks. Unsupported clocks with ASM VDSO: clock-gettime-boottime: vdso: 3851 nsec/call clock-gettime-tai: vdso: 3852 nsec/call clock-gettime-monotonic-raw: vdso: 3396 nsec/call Same clocks with C VDSO: clock-gettime-tai: vdso: 941 nsec/call clock-gettime-monotonic-raw: vdso: 1001 nsec/call clock-gettime-monotonic-coarse: vdso: 591 nsec/call On an 8321E at 333 MHz, vdsotest with the ASM VDSO: gettimeofday: vdso: 220 nsec/call clock-getres-realtime-coarse: vdso: 102 nsec/call clock-gettime-realtime-coarse: vdso: 178 nsec/call clock-getres-realtime: vdso: 129 nsec/call clock-gettime-realtime: vdso: 235 nsec/call clock-getres-monotonic-coarse: vdso: 105 nsec/call clock-gettime-monotonic-coarse: vdso: 208 nsec/call clock-getres-monotonic: vdso: 129 nsec/call clock-gettime-monotonic: vdso: 274 nsec/call On an 8321E at 333 MHz, vdsotest with the C VDSO: gettimeofday: vdso: 272 nsec/call clock-getres-realtime-coarse: vdso: 160 nsec/call clock-gettime-realtime-coarse: vdso: 184 nsec/call clock-getres-realtime: vdso: 166 nsec/call clock-gettime-realtime: vdso: 281 nsec/call clock-getres-monotonic-coarse: vdso: 160 nsec/call clock-gettime-monotonic-coarse: vdso: 184 nsec/call clock-getres-monotonic: vdso: 169 nsec/call clock-gettime-monotonic: vdso: 275 nsec/call On a Power9 Nimbus DD2.2 at 3.8GHz, with the ASM VDSO: clock-gettime-monotonic: vdso: 35 nsec/call clock-getres-monotonic: vdso: 16 nsec/call clock-gettime-monotonic-coarse: vdso: 18 nsec/call clock-getres-monotonic-coarse: vdso: 522 nsec/call clock-gettime-monotonic-raw: vdso: 598 nsec/call clock-getres-monotonic-raw: vdso: 520 nsec/call clock-gettime-realtime: vdso: 34 nsec/call clock-getres-realtime: vdso: 16 nsec/call clock-gettime-realtime-coarse: vdso: 18 nsec/call clock-getres-realtime-coarse: vdso: 517 nsec/call getcpu: vdso: 8 nsec/call gettimeofday: vdso: 25 nsec/call And with the C VDSO: clock-gettime-monotonic: vdso: 37 nsec/call clock-getres-monotonic: vdso: 20 nsec/call clock-gettime-monotonic-coarse: vdso: 21 nsec/call clock-getres-monotonic-coarse: vdso: 19 nsec/call clock-gettime-monotonic-raw: vdso: 38 nsec/call clock-getres-monotonic-raw: vdso: 20 nsec/call clock-gettime-realtime: vdso: 37 nsec/call clock-getres-realtime: vdso: 20 nsec/call clock-gettime-realtime-coarse: vdso: 20 nsec/call clock-getres-realtime-coarse: vdso: 19 nsec/call getcpu: vdso: 8 nsec/call gettimeofday: vdso: 28 nsec/call Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126131006.2431205-8-mpe@ellerman.id.au
2020-12-03powerpc/vdso: Prepare for switching VDSO to generic C implementation.Christophe Leroy2-0/+57
Prepare for switching VDSO to generic C implementation in following patch. Here, we: - Prepare the helpers to call the C VDSO functions - Prepare the required callbacks for the C VDSO functions - Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES - Add the C trampolines to the generic C VDSO functions powerpc is a bit special for VDSO as well as system calls in the way that it requires setting CR SO bit which cannot be done in C. Therefore, entry/exit needs to be performed in ASM. Implementing __arch_get_vdso_data() would clobber the link register, requiring the caller to save it. As the ASM calling function already has to set a stack frame and saves the link register before calling the C vdso function, retriving the vdso data pointer there is lighter. Implement __arch_vdso_capable() and always return true. Provide vdso_shift_ns(), as the generic x >> s gives the following bad result: 18: 35 25 ff e0 addic. r9,r5,-32 1c: 41 80 00 10 blt 2c <shift+0x14> 20: 7c 64 4c 30 srw r4,r3,r9 24: 38 60 00 00 li r3,0 ... 2c: 54 69 08 3c rlwinm r9,r3,1,0,30 30: 21 45 00 1f subfic r10,r5,31 34: 7c 84 2c 30 srw r4,r4,r5 38: 7d 29 50 30 slw r9,r9,r10 3c: 7c 63 2c 30 srw r3,r3,r5 40: 7d 24 23 78 or r4,r9,r4 In our case the shift is always <= 32. In addition, the upper 32 bits of the result are likely nul. Lets GCC know it, it also optimises the following calculations. With the patch, we get: 0: 21 25 00 20 subfic r9,r5,32 4: 7c 69 48 30 slw r9,r3,r9 8: 7c 84 2c 30 srw r4,r4,r5 c: 7d 24 23 78 or r4,r9,r4 10: 7c 63 2c 30 srw r3,r3,r5 Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126131006.2431205-6-mpe@ellerman.id.au
2020-12-03powerpc: inline iomap accessorsChristophe Leroy1-166/+0
ioreadXX()/ioreadXXbe() accessors are equivalent to ppc in_leXX()/in_be16() accessors but they are not inlined. Since commit 0eb573682872 ("powerpc/kerenl: Enable EEH for IO accessors"), the 'le' versions are equivalent to the ones defined in asm-generic/io.h, allthough the ones there are inlined. Include asm-generic/io.h to get them. Keep ppc versions of the 'be' ones as they are optimised, but make them inline in ppc io.h. This reduces the size of ppc64e_defconfig build by 3 kbytes: text data bss dec hex filename 10160733 4343422 562972 15067127 e5e7f7 vmlinux.before 10159239 4341590 562972 15063801 e5daf9 vmlinux.after A typical function using ioread and iowrite before the change: c00000000066a3c4 <.ata_bmdma_stop>: c00000000066a3c4: 7c 08 02 a6 mflr r0 c00000000066a3c8: fb c1 ff f0 std r30,-16(r1) c00000000066a3cc: f8 01 00 10 std r0,16(r1) c00000000066a3d0: fb e1 ff f8 std r31,-8(r1) c00000000066a3d4: f8 21 ff 81 stdu r1,-128(r1) c00000000066a3d8: eb e3 00 00 ld r31,0(r3) c00000000066a3dc: eb df 00 98 ld r30,152(r31) c00000000066a3e0: 7f c3 f3 78 mr r3,r30 c00000000066a3e4: 4b 9b 6f 7d bl c000000000021360 <.ioread8> c00000000066a3e8: 60 00 00 00 nop c00000000066a3ec: 7f c4 f3 78 mr r4,r30 c00000000066a3f0: 54 63 06 3c rlwinm r3,r3,0,24,30 c00000000066a3f4: 4b 9b 70 4d bl c000000000021440 <.iowrite8> c00000000066a3f8: 60 00 00 00 nop c00000000066a3fc: 7f e3 fb 78 mr r3,r31 c00000000066a400: 38 21 00 80 addi r1,r1,128 c00000000066a404: e8 01 00 10 ld r0,16(r1) c00000000066a408: eb c1 ff f0 ld r30,-16(r1) c00000000066a40c: 7c 08 03 a6 mtlr r0 c00000000066a410: eb e1 ff f8 ld r31,-8(r1) c00000000066a414: 4b ff ff 8c b c00000000066a3a0 <.ata_sff_dma_pause> The same function with this patch: c000000000669cb4 <.ata_bmdma_stop>: c000000000669cb4: e8 63 00 00 ld r3,0(r3) c000000000669cb8: e9 43 00 98 ld r10,152(r3) c000000000669cbc: 7c 00 04 ac hwsync c000000000669cc0: 89 2a 00 00 lbz r9,0(r10) c000000000669cc4: 0c 09 00 00 twi 0,r9,0 c000000000669cc8: 4c 00 01 2c isync c000000000669ccc: 55 29 06 3c rlwinm r9,r9,0,24,30 c000000000669cd0: 7c 00 04 ac hwsync c000000000669cd4: 99 2a 00 00 stb r9,0(r10) c000000000669cd8: a1 4d 06 f0 lhz r10,1776(r13) c000000000669cdc: 2c 2a 00 00 cmpdi r10,0 c000000000669ce0: 41 c2 00 08 beq- c000000000669ce8 <.ata_bmdma_stop+0x34> c000000000669ce4: b1 4d 06 f2 sth r10,1778(r13) c000000000669ce8: 4b ff ff a8 b c000000000669c90 <.ata_sff_dma_pause> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/18b357d68c4cde149f75c7a1031c850925cd8128.1605981539.git.christophe.leroy@csgroup.eu
2020-12-02sched/vtime: Consolidate IRQ time accountingFrederic Weisbecker1-16/+40
The 3 architectures implementing CONFIG_VIRT_CPU_ACCOUNTING_NATIVE all have their own version of irq time accounting that dispatch the cputime to the appropriate index: hardirq, softirq, system, idle, guest... from an all-in-one function. Instead of having these ad-hoc versions, move the cputime destination dispatch decision to the core code and leave only the actual per-index cputime accounting to the architecture. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201202115732.27827-4-frederic@kernel.org
2020-11-29Merge tag 'locking-urgent-2020-11-29' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Thomas Gleixner: "Two more places which invoke tracing from RCU disabled regions in the idle path. Similar to the entry path the low level idle functions have to be non-instrumentable" * tag 'locking-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: intel_idle: Fix intel_idle() vs tracing sched/idle: Fix arch_cpu_idle() vs tracing
2020-11-27powerpc/dma: Fallback to dma_ops when persistent memory presentAlexey Kardashevskiy1-2/+69
So far we have been using huge DMA windows to map all the RAM available. The RAM is normally mapped to the VM address space contiguously, and there is always a reasonable upper limit for possible future hot plugged RAM which makes it easy to map all RAM via IOMMU. Now there is persistent memory ("ibm,pmemory" in the FDT) which (unlike normal RAM) can map anywhere in the VM space beyond the maximum RAM size and since it can be used for DMA, it requires extending the huge window up to MAX_PHYSMEM_BITS which requires hypervisor support for: 1. huge TCE tables; 2. multilevel TCE tables; 3. huge IOMMU pages. Certain hypervisors cannot do either so the only option left is restricting the huge DMA window to include only RAM and fallback to the default DMA window for persistent memory. This defines arch_dma_map_direct/etc to allow generic DMA code perform additional checks on whether direct DMA is still possible. This checks if the system has persistent memory. If it does not, the DMA bypass mode is selected, i.e. * dev->bus_dma_limit = 0 * dev->dma_ops_bypass = true <- this avoid calling dma_ops for mapping. If there is such memory, this creates identity mapping only for RAM and sets the dev->bus_dma_limit to let the generic code decide whether to call into the direct DMA or the indirect DMA ops. This should not change the existing behaviour when no persistent memory as dev->dma_ops_bypass is expected to be set. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-11-26powerpc/ptrace: Hard wire PT_SOFTE value to 1 in gpr_get() tooOleg Nesterov2-2/+12
The commit a8a4b03ab95f ("powerpc: Hard wire PT_SOFTE value to 1 in ptrace & signals") changed ptrace_get_reg(PT_SOFTE) to report 0x1, but PTRACE_GETREGS still copies pt_regs->softe as is. This is not consistent and this breaks the user-regs-peekpoke test from https://sourceware.org/systemtap/wiki/utrace/tests/ Reported-by: Jan Kratochvil <jan.kratochvil@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201119160247.GB5188@redhat.com
2020-11-26powerpc/ptrace: Simplify gpr_get()/tm_cgpr_get()Oleg Nesterov2-15/+7
gpr_get() does membuf_write() twice to override pt_regs->msr in between. We can call membuf_write() once and change ->msr in the kernel buffer, this simplifies the code and the next fix. The patch adds a new simple helper, membuf_at(offs), it returns the new membuf which can be safely used after membuf_write(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> [mpe: Fixup some minor whitespace issues noticed by Christophe] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201119160221.GA5188@redhat.com
2020-11-25Merge branch 'fixes' into nextMichael Ellerman9-109/+199
Merge our fixes branch, in particular to bring in the changes for the entry/uaccess flush.
2020-11-24sched/idle: Fix arch_cpu_idle() vs tracingPeter Zijlstra1-2/+2
We call arch_cpu_idle() with RCU disabled, but then use local_irq_{en,dis}able(), which invokes tracing, which relies on RCU. Switch all arch_cpu_idle() implementations to use raw_local_irq_{en,dis}able() and carefully manage the lockdep,rcu,tracing state like we do in entry. (XXX: we really should change arch_cpu_idle() to not return with interrupts enabled) Reported-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org
2020-11-23Merge tag 'powerpc-cve-2020-4788' into fixesMichael Ellerman4-40/+178
From Daniel's cover letter: IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch series flushes the L1 cache on kernel entry (patch 2) and after the kernel performs any user accesses (patch 3). It also adds a self-test and performs some related cleanups.
2020-11-19Merge tag 'powerpc-cve-2020-4788' of ↵Linus Torvalds4-40/+178
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Fixes for CVE-2020-4788. From Daniel's cover letter: IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch series flushes the L1 cache on kernel entry (patch 2) and after the kernel performs any user accesses (patch 3). It also adds a self-test and performs some related cleanups" * tag 'powerpc-cve-2020-4788' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: rename pnv|pseries_setup_rfi_flush to _setup_security_mitigations selftests/powerpc: refactor entry and rfi_flush tests selftests/powerpc: entry flush test powerpc: Only include kup-radix.h for 64-bit Book3S powerpc/64s: flush L1D after user accesses powerpc/64s: flush L1D on kernel entry selftests/powerpc: rfi_flush: disable entry flush if present
2020-11-19powerpc: Only include kup-radix.h for 64-bit Book3SMichael Ellerman1-1/+1
In kup.h we currently include kup-radix.h for all 64-bit builds, which includes Book3S and Book3E. The latter doesn't make sense, Book3E never uses the Radix MMU. This has worked up until now, but almost by accident, and the recent uaccess flush changes introduced a build breakage on Book3E because of the bad structure of the code. So disentangle things so that we only use kup-radix.h for Book3S. This requires some more stubs in kup.h and fixing an include in syscall_64.c. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-11-19powerpc/64s: flush L1D after user accessesNicholas Piggin3-59/+95
IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch flushes the L1 cache after user accesses. This is part of the fix for CVE-2020-4788. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-11-19powerpc/64s: flush L1D on kernel entryNicholas Piggin3-1/+103
IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch flushes the L1 cache on kernel entry. This is part of the fix for CVE-2020-4788. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-11-19powerpc: Remove RFI macroChristophe Leroy2-7/+28
RFI macro is just there to add an infinite loop past rfi in order to avoid prefetch on 40x in half a dozen of places in entry_32 and head_32. Those places are already full of #ifdefs, so just add a few more to explicitely show those loops and remove RFI. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f7e9cb9e9240feec63cb330abf40b67d1aad852f.1604854583.git.christophe.leroy@csgroup.eu
2020-11-19powerpc: Replace RFI by rfi on book3s/32 and bookeChristophe Leroy3-13/+13
For book3s/32 and for booke, RFI is just an rfi. Only 40x has a non trivial RFI. CONFIG_PPC_RTAS is never selected by 40x platforms. Make it more explicit by replacing RFI by rfi wherever possible. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b901ddfdeb8a0a3b7cb59999599cdfde1bbfe834.1604854583.git.christophe.leroy@csgroup.eu
2020-11-19powerpc/64s: Replace RFI by RFI_TO_KERNEL and remove RFIChristophe Leroy1-2/+7
In head_64.S, we have two places using RFI to return to kernel. Use RFI_TO_KERNEL instead. They are the two only places using RFI on book3s/64, so the RFI macro can go away. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7719261b0a0d2787772339484c33eb809723bca7.1604854583.git.christophe.leroy@csgroup.eu
2020-11-19powerpc: Use the common INIT_DATA_SECTION macro in vmlinux.lds.SYouling Tang1-18/+1
Use the common INIT_DATA_SECTION rule for the linker script in an effort to regularize the linker script. Signed-off-by: Youling Tang <tangyouling@loongson.cn> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1604487550-20040-1-git-send-email-tangyouling@loongson.cn
2020-11-19powerpc: Avoid broken GCC __attribute__((optimize))Ard Biesheuvel4-9/+6
Commit 7053f80d9696 ("powerpc/64: Prevent stack protection in early boot") introduced a couple of uses of __attribute__((optimize)) with function scope, to disable the stack protector in some early boot code. Unfortunately, and this is documented in the GCC man pages [0], overriding function attributes for optimization is broken, and is only supported for debug scenarios, not for production: the problem appears to be that setting GCC -f flags using this method will cause it to forget about some or all other optimization settings that have been applied. So the only safe way to disable the stack protector is to disable it for the entire source file. [0] https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html Fixes: 7053f80d9696 ("powerpc/64: Prevent stack protection in early boot") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> [mpe: Drop one remaining use of __nostackprotector, reported by snowpatch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201028080433.26799-1-ardb@kernel.org
2020-11-19powerpc/64s: Convert some cpu_setup() and cpu_restore() functions to CJordan Niethe3-260/+275
The only thing keeping the cpu_setup() and cpu_restore() functions used in the cputable entries for Power7, Power8, Power9 and Power10 in assembly was cpu_restore() being called before there was a stack in generic_secondary_smp_init(). Commit ("powerpc/64: Set up a kernel stack for secondaries before cpu_restore()") means that it is now possible to use C. Rewrite the functions in C so they are a little bit easier to read. This is not changing their functionality. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> [mpe: Tweak copyright and authorship notes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201014072837.24539-2-jniethe5@gmail.com
2020-11-18powerpc: fix -Wimplicit-fallthroughNick Desaulniers2-0/+2
The "fallthrough" pseudo-keyword was added as a portable way to denote intentional fallthrough. Clang will still warn on cases where there is a fallthrough to an immediate break. Add explicit breaks for those cases. Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://github.com/ClangBuiltLinux/linux/issues/236 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-11-18powerpc/64s/exception: KVM Fix for host DSI being taken in HPT guest MMU contextNicholas Piggin1-4/+7
Commit 2284ffea8f0c ("powerpc/64s/exception: Only test KVM in SRR interrupts when PR KVM is supported") removed KVM guest tests from interrupts that do not set HV=1, when PR-KVM is not configured. This is wrong for HV-KVM HPT guest MMIO emulation case which attempts to load the faulting instruction word with MSR[DR]=1 and MSR[HV]=1 with the guest MMU context loaded. This can cause host DSI, DSLB interrupts which must test for KVM guest. Restore this and add a comment. Fixes: 2284ffea8f0c ("powerpc/64s/exception: Only test KVM in SRR interrupts when PR KVM is supported") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201117135617.3521127-1-npiggin@gmail.com
2020-11-16powerpc/64s: Fix KVM system reset handling when CONFIG_PPC_PSERIES=yNicholas Piggin1-2/+0
pseries guest kernels have a FWNMI handler for SRESET and MCE NMIs, which is basically the same as the regular handlers for those interrupts. The system reset FWNMI handler did not have a KVM guest test in it, although it probably should have because the guest can itself run guests. Commit 4f50541f6703b ("powerpc/64s/exception: Move all interrupt handlers to new style code gen macros") convert the handler faithfully to avoid a KVM test with a "clever" trick to modify the IKVM_REAL setting to 0 when the fwnmi handler is to be generated (PPC_PSERIES=y). This worked when the KVM test was generated in the interrupt entry handlers, but a later patch moved the KVM test to the common handler, and the common handler macro is expanded below the fwnmi entry. This prevents the KVM test from being generated even for the 0x100 entry point as well. The result is NMI IPIs in the host kernel when a guest is running will use gest registers. This goes particularly badly when an HPT guest is running and the MMU is set to guest mode. Remove this trickery and just generate the test always. Fixes: 9600f261acaa ("powerpc/64s/exception: Move KVM test to common code") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201114114743.3306283-1-npiggin@gmail.com
2020-11-13ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regsSteven Rostedt (VMware)1-1/+3
In preparation to have arguments of a function passed to callbacks attached to functions as default, change the default callback prototype to receive a struct ftrace_regs as the forth parameter instead of a pt_regs. For callbacks that set the FL_SAVE_REGS flag in their ftrace_ops flags, they will now need to get the pt_regs via a ftrace_get_regs() helper call. If this is called by a callback that their ftrace_ops did not have a FL_SAVE_REGS flag set, it that helper function will return NULL. This will allow the ftrace_regs to hold enough just to get the parameters and stack pointer, but without the worry that callbacks may have a pt_regs that is not completely filled. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-09powerpc: add support for TIF_NOTIFY_SIGNALJens Axboe1-1/+1
Wire up TIF_NOTIFY_SIGNAL handling for powerpc. Cc: linuxppc-dev@lists.ozlabs.org Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-11-08powerpc/32s: Use relocation offset when setting early hash tableChristophe Leroy1-1/+2
When calling early_hash_table(), the kernel hasn't been yet relocated to its linking address, so data must be addressed with relocation offset. Add relocation offset to write into Hash in early_hash_table(). Fixes: 69a1593abdbc ("powerpc/32s: Setup the early hash table at all time.") Reported-by: Erhard Furtner <erhard_f@mailbox.org> Reported-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Serge Belyshev <belyshev@depni.sinp.msu.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/9e225a856a8b22e0e77587ee22ab7a2f5bca8753.1604740029.git.christophe.leroy@csgroup.eu
2020-11-06ftrace: Add recording of functions that caused recursionSteven Rostedt (VMware)1-1/+1
This adds CONFIG_FTRACE_RECORD_RECURSION that will record to a file "recursed_functions" all the functions that caused recursion while a callback to the function tracer was running. Link: https://lkml.kernel.org/r/20201106023548.102375687@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Guo Ren <guoren@kernel.org> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Petr Mladek <pmladek@suse.com> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-csky@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Cc: live-patching@vger.kernel.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-06kprobes/ftrace: Add recursion protection to the ftrace callbackSteven Rostedt (VMware)1-1/+10
If a ftrace callback does not supply its own recursion protection and does not set the RECURSION_SAFE flag in its ftrace_ops, then ftrace will make a helper trampoline to do so before calling the callback instead of just calling the callback directly. The default for ftrace_ops is going to change. It will expect that handlers provide their own recursion protection, unless its ftrace_ops states otherwise. Link: https://lkml.kernel.org/r/20201028115613.140212174@goodmis.org Link: https://lkml.kernel.org/r/20201106023546.944907560@goodmis.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Petr Mladek <pmladek@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Guo Ren <guoren@kernel.org> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-csky@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-05powerpc/8xx: Manage _PAGE_ACCESSED through APG bits in L1 entryChristophe Leroy1-29/+7
When _PAGE_ACCESSED is not set, a minor fault is expected. To do this, TLB miss exception ANDs _PAGE_PRESENT and _PAGE_ACCESSED into the L2 entry valid bit. To simplify the processing and reduce the number of instructions in TLB miss exceptions, manage it as an APG bit and get it next to _PAGE_GUARDED bit to allow a copy in one go. Then declare the corresponding groups as handling all accesses as user accesses. As the PP bits always define user as No Access, it will generate a fault. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/80f488db230c6b0e7b3b990d72bd94a8a069e93e.1602492856.git.christophe.leroy@csgroup.eu
2020-11-05powerpc/8xx: Always fault when _PAGE_ACCESSED is not setChristophe Leroy1-12/+2
The kernel expects pte_young() to work regardless of CONFIG_SWAP. Make sure a minor fault is taken to set _PAGE_ACCESSED when it is not already set, regardless of the selection of CONFIG_SWAP. This adds at least 3 instructions to the TLB miss exception handlers fast path. Following patch will reduce this overhead. Also update the rotation instruction to the correct number of bits to reflect all changes done to _PAGE_ACCESSED over time. Fixes: d069cb4373fe ("powerpc/8xx: Don't touch ACCESSED when no SWAP.") Fixes: 5f356497c384 ("powerpc/8xx: remove unused _PAGE_WRITETHRU") Fixes: e0a8e0d90a9f ("powerpc/8xx: Handle PAGE_USER via APG bits") Fixes: 5b2753fc3e8a ("powerpc/8xx: Implementation of PAGE_EXEC") Fixes: a891c43b97d3 ("powerpc/8xx: Prepare handlers for _PAGE_HUGE for 512k pages.") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/af834e8a0f1fa97bfae65664950f0984a70c4750.1602492856.git.christophe.leroy@csgroup.eu
2020-11-05powerpc/40x: Always fault when _PAGE_ACCESSED is not setChristophe Leroy1-8/+0
The kernel expects pte_young() to work regardless of CONFIG_SWAP. Make sure a minor fault is taken to set _PAGE_ACCESSED when it is not already set, regardless of the selection of CONFIG_SWAP. Fixes: 2c74e2586bb9 ("powerpc/40x: Rework 40x PTE access and TLB miss") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b02ca2ed2d3676a096219b48c0f69ec982a75bcf.1602342801.git.christophe.leroy@csgroup.eu
2020-11-05powerpc/603: Always fault when _PAGE_ACCESSED is not setChristophe Leroy1-12/+0
The kernel expects pte_young() to work regardless of CONFIG_SWAP. Make sure a minor fault is taken to set _PAGE_ACCESSED when it is not already set, regardless of the selection of CONFIG_SWAP. Fixes: 84de6ab0e904 ("powerpc/603: don't handle PAGE_ACCESSED in TLB miss handlers.") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a44367744de54e2315b2f1a8cbbd7f88488072e0.1602342806.git.christophe.leroy@csgroup.eu
2020-11-02powerpc/64: Set up a kernel stack for secondaries before cpu_restore()Jordan Niethe2-6/+6
Currently in generic_secondary_smp_init(), cur_cpu_spec->cpu_restore() is called before a stack has been set up in r1. This was previously fine as the cpu_restore() functions were implemented in assembly and did not use a stack. However commit 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features") used __restore_cpu_cpufeatures() as the cpu_restore() function for a device-tree features based cputable entry. This is a C function and hence uses a stack in r1. generic_secondary_smp_init() is entered on the secondary cpus via the primary cpu using the OPAL call opal_start_cpu(). In OPAL, each hardware thread has its own stack. The OPAL call is ran in the primary's hardware thread. During the call, a job is scheduled on a secondary cpu that will start executing at the address of generic_secondary_smp_init(). Hence the value that will be left in r1 when the secondary cpu enters the kernel is part of that secondary cpu's individual OPAL stack. This means that __restore_cpu_cpufeatures() will write to that OPAL stack. This is not horribly bad as each hardware thread has its own stack and the call that enters the kernel from OPAL never returns, but it is still wrong and should be corrected. Create the temp kernel stack before calling cpu_restore(). As noted by mpe, for a kexec boot, the secondary CPUs are released from the spin loop at address 0x60 by smp_release_cpus() and then jump to generic_secondary_smp_init(). The call to smp_release_cpus() is in setup_arch(), and it comes before the call to emergency_stack_init(). emergency_stack_init() allocates an emergency stack in the PACA for each CPU. This address in the PACA is what is used to set up the temp kernel stack in generic_secondary_smp_init(). Move releasing the secondary CPUs to after the PACAs have been allocated an emergency stack, otherwise the PACA stack pointer will contain garbage and hence the temp kernel stack created from it will be broken. Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features") Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201014072837.24539-1-jniethe5@gmail.com
2020-11-02powerpc/smp: Call rcu_cpu_starting() earlierQian Cai1-1/+2
The call to rcu_cpu_starting() in start_secondary() is not early enough in the CPU-hotplug onlining process, which results in lockdep splats as follows (with CONFIG_PROVE_RCU_LIST=y): WARNING: suspicious RCU usage ----------------------------- kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1 no locks held by swapper/1/0. Call Trace: dump_stack+0xec/0x144 (unreliable) lockdep_rcu_suspicious+0x128/0x14c __lock_acquire+0x1060/0x1c60 lock_acquire+0x140/0x5f0 _raw_spin_lock_irqsave+0x64/0xb0 clockevents_register_device+0x74/0x270 register_decrementer_clockevent+0x94/0x110 start_secondary+0x134/0x800 start_secondary_prolog+0x10/0x14 This is avoided by adding a call to rcu_cpu_starting() near the beginning of the start_secondary() function. Note that the raw_smp_processor_id() is required in order to avoid calling into lockdep before RCU has declared the CPU to be watched for readers. It's safe to call rcu_cpu_starting() in the arch code as well as later in generic code, as explained by Paul: It uses a per-CPU variable so that RCU pays attention only to the first call to rcu_cpu_starting() if there is more than one of them. This is even intentional, due to there being a generic arch-independent call to rcu_cpu_starting() in notify_cpu_starting(). So multiple calls to rcu_cpu_starting() are fine by design. Fixes: 4d004099a668 ("lockdep: Fix lockdep recursion") Signed-off-by: Qian Cai <cai@redhat.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> [mpe: Add Fixes tag, reword slightly & expand change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201028182334.13466-1-cai@redhat.com
2020-11-02powerpc/eeh_cache: Fix a possible debugfs deadlockQian Cai1-2/+3
Lockdep complains that a possible deadlock below in eeh_addr_cache_show() because it is acquiring a lock with IRQ enabled, but eeh_addr_cache_insert_dev() needs to acquire the same lock with IRQ disabled. Let's just make eeh_addr_cache_show() acquire the lock with IRQ disabled as well. CPU0 CPU1 ---- ---- lock(&pci_io_addr_cache_root.piar_lock); local_irq_disable(); lock(&tp->lock); lock(&pci_io_addr_cache_root.piar_lock); <Interrupt> lock(&tp->lock); *** DEADLOCK *** lock_acquire+0x140/0x5f0 _raw_spin_lock_irqsave+0x64/0xb0 eeh_addr_cache_insert_dev+0x48/0x390 eeh_probe_device+0xb8/0x1a0 pnv_pcibios_bus_add_device+0x3c/0x80 pcibios_bus_add_device+0x118/0x290 pci_bus_add_device+0x28/0xe0 pci_bus_add_devices+0x54/0xb0 pcibios_init+0xc4/0x124 do_one_initcall+0xac/0x528 kernel_init_freeable+0x35c/0x3fc kernel_init+0x24/0x148 ret_from_kernel_thread+0x5c/0x80 lock_acquire+0x140/0x5f0 _raw_spin_lock+0x4c/0x70 eeh_addr_cache_show+0x38/0x110 seq_read+0x1a0/0x660 vfs_read+0xc8/0x1f0 ksys_read+0x74/0x130 system_call_exception+0xf8/0x1d0 system_call_common+0xe8/0x218 Fixes: 5ca85ae6318d ("powerpc/eeh_cache: Add a way to dump the EEH address cache") Signed-off-by: Qian Cai <cai@redhat.com> Reviewed-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201028152717.8967-1-cai@redhat.com
2020-10-26treewide: Convert macro and uses of __section(foo) to __section("foo")Joe Perches2-2/+2
Use a more generic form for __section that requires quotes to avoid complications with clang and gcc differences. Remove the quote operator # from compiler_attributes.h __section macro. Convert all unquoted __section(foo) uses to quoted __section("foo"). Also convert __attribute__((section("foo"))) uses to __section("foo") even if the __attribute__ has multiple list entry forms. Conversion done using the script at: https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Nick Desaulniers <ndesaulniers@gooogle.com> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-24Merge tag 'powerpc-5.10-2' of ↵Linus Torvalds5-48/+49
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - A fix for undetected data corruption on Power9 Nimbus <= DD2.1 in the emulation of VSX loads. The affected CPUs were not widely available. - Two fixes for machine check handling in guests under PowerVM. - A fix for our recent changes to SMP setup, when CONFIG_CPUMASK_OFFSTACK=y. - Three fixes for races in the handling of some of our powernv sysfs attributes. - One change to remove TM from the set of Power10 CPU features. - A couple of other minor fixes. Thanks to: Aneesh Kumar K.V, Christophe Leroy, Ganesh Goudar, Jordan Niethe, Mahesh Salgaonkar, Michael Neuling, Oliver O'Halloran, Qian Cai, Srikar Dronamraju, Vasant Hegde. * tag 'powerpc-5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/pseries: Avoid using addr_to_pfn in real mode powerpc/uaccess: Don't use "m<>" constraint with GCC 4.9 powerpc/eeh: Fix eeh_dev_check_failure() for PE#0 powerpc/64s: Remove TM from Power10 features selftests/powerpc: Make alignment handler test P9N DD2.1 vector CI load workaround powerpc: Fix undetected data corruption with P9N DD2.1 VSX CI load emulation powerpc/powernv/dump: Handle multiple writes to ack attribute powerpc/powernv/dump: Fix race while processing OPAL dump powerpc/smp: Use GFP_ATOMIC while allocating tmp mask powerpc/smp: Remove unnecessary variable powerpc/mce: Avoid nmi_enter/exit in real mode on pseries hash powerpc/opal_elog: Handle multiple writes to ack attribute
2020-10-23Merge tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+0
Pull arch task_work cleanups from Jens Axboe: "Two cleanups that don't fit other categories: - Finally get the task_work_add() cleanup done properly, so we don't have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates all callers, and also fixes up the documentation for task_work_add(). - While working on some TIF related changes for 5.11, this TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch duplication for how that is handled" * tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block: task_work: cleanup notification modes tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()
2020-10-22Merge tag 'kbuild-v5.10' of ↵Linus Torvalds3-10/+2
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Support 'make compile_commands.json' to generate the compilation database more easily, avoiding stale entries - Support 'make clang-analyzer' and 'make clang-tidy' for static checks using clang-tidy - Preprocess scripts/modules.lds.S to allow CONFIG options in the module linker script - Drop cc-option tests from compiler flags supported by our minimal GCC/Clang versions - Use always 12-digits commit hash for CONFIG_LOCALVERSION_AUTO=y - Use sha1 build id for both BFD linker and LLD - Improve deb-pkg for reproducible builds and rootless builds - Remove stale, useless scripts/namespace.pl - Turn -Wreturn-type warning into error - Fix build error of deb-pkg when CONFIG_MODULES=n - Replace 'hostname' command with more portable 'uname -n' - Various Makefile cleanups * tag 'kbuild-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits) kbuild: Use uname for LINUX_COMPILE_HOST detection kbuild: Only add -fno-var-tracking-assignments for old GCC versions kbuild: remove leftover comment for filechk utility treewide: remove DISABLE_LTO kbuild: deb-pkg: clean up package name variables kbuild: deb-pkg: do not build linux-headers package if CONFIG_MODULES=n kbuild: enforce -Werror=return-type scripts: remove namespace.pl builddeb: Add support for all required debian/rules targets builddeb: Enable rootless builds builddeb: Pass -n to gzip for reproducible packages kbuild: split the build log of kallsyms kbuild: explicitly specify the build id style scripts/setlocalversion: make git describe output more reliable kbuild: remove cc-option test of -Werror=date-time kbuild: remove cc-option test of -fno-stack-check kbuild: remove cc-option test of -fno-strict-overflow kbuild: move CFLAGS_{KASAN,UBSAN,KCSAN} exports to relevant Makefiles kbuild: remove redundant CONFIG_KASAN check from scripts/Makefile.kasan kbuild: do not create built-in objects for external module builds ...
2020-10-22Merge branch 'work.set_fs' of ↵Linus Torvalds1-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull initial set_fs() removal from Al Viro: "Christoph's set_fs base series + fixups" * 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: Allow a NULL pos pointer to __kernel_read fs: Allow a NULL pos pointer to __kernel_write powerpc: remove address space overrides using set_fs() powerpc: use non-set_fs based maccess routines x86: remove address space overrides using set_fs() x86: make TASK_SIZE_MAX usable from assembly code x86: move PAGE_OFFSET, TASK_SIZE & friends to page_{32,64}_types.h lkdtm: remove set_fs-based tests test_bitmap: remove user bitmap tests uaccess: add infrastructure for kernel builds with set_fs() fs: don't allow splice read/write without explicit ops fs: don't allow kernel reads and writes without iter ops sysctl: Convert to iter interfaces proc: add a read_iter method to proc proc_ops proc: cleanup the compat vs no compat file ops proc: remove a level of indentation in proc_get_inode
2020-10-22powerpc/eeh: Fix eeh_dev_check_failure() for PE#0Oliver O'Halloran1-5/+0
In commit 269e583357df ("powerpc/eeh: Delete eeh_pe->config_addr") the following simplification was made: - if (!pe->addr && !pe->config_addr) { + if (!pe->addr) { eeh_stats.no_cfg_addr++; return 0; } This introduced a bug which causes EEH checking to be skipped for devices in PE#0. Before the change above the check would always pass since at least one of the two PE addresses would be non-zero in all circumstances. On PowerNV pe->config_addr would be the BDFN of the first device added to the PE. The zero BDFN is reserved for the PHB's root port, but this is fine since for obscure platform reasons the root port is never assigned to PE#0. Similarly, on pseries pe->addr has always been non-zero for the reasons outlined in commit 42de19d5ef71 ("powerpc/pseries/eeh: Allow zero to be a valid PE configuration address"). We can fix the problem by deleting the block entirely The original purpose of this test was to avoid performing EEH checks on devices that were not on an EEH capable bus. In modern Linux the edev->pe pointer will be NULL for devices that are not on an EEH capable bus. The code block immediately above this one already checks for the edev->pe == NULL case so this test (new and old) is entirely redundant. Ideally we'd delete eeh_stats.no_cfg_addr too since nothing increments it any more. Unfortunately, that information is exposed via /proc/powerpc/eeh which means it's technically ABI. We could make it hard-coded, but that's a change for another patch. Fixes: 269e583357df ("powerpc/eeh: Delete eeh_pe->config_addr") Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201021232554.1434687-1-oohall@gmail.com
2020-10-20powerpc/64s: Remove TM from Power10 featuresJordan Niethe1-3/+10
ISA v3.1 removes transactional memory and hence it should not be present in cpu_features or cpu_user_features2. Remove CPU_FTR_TM_COMP from CPU_FTRS_POWER10. Remove PPC_FEATURE2_HTM_COMP and PPC_FEATURE2_HTM_NOSC_COMP from COMMON_USER2_POWER10. Fixes: a3ea40d5c736 ("powerpc: Add POWER10 architected mode") Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200827035529.900-1-jniethe5@gmail.com
2020-10-19powerpc: Fix undetected data corruption with P9N DD2.1 VSX CI load emulationMichael Neuling1-1/+1
__get_user_atomic_128_aligned() stores to kaddr using stvx which is a VMX store instruction, hence kaddr must be 16 byte aligned otherwise the store won't occur as expected. Unfortunately when we call __get_user_atomic_128_aligned() in p9_hmi_special_emu(), the buffer we pass as kaddr (ie. vbuf) isn't guaranteed to be 16B aligned. This means that the write to vbuf in __get_user_atomic_128_aligned() has the bottom bits of the address truncated. This results in other local variables being overwritten. Also vbuf will not contain the correct data which results in the userspace emulation being wrong and hence undetected user data corruption. In the past we've been mostly lucky as vbuf has ended up aligned but this is fragile and isn't always true. CONFIG_STACKPROTECTOR in particular can change the stack arrangement enough that our luck runs out. This issue only occurs on POWER9 Nimbus <= DD2.1 bare metal. The fix is to align vbuf to a 16 byte boundary. Fixes: 5080332c2c89 ("powerpc/64s: Add workaround for P9 vector CI load issue") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201013043741.743413-1-mikey@neuling.org
2020-10-19powerpc/smp: Use GFP_ATOMIC while allocating tmp maskSrikar Dronamraju1-26/+31
Qian Cai reported a regression where CPU Hotplug fails with the latest powerpc/next BUG: sleeping function called from invalid context at mm/slab.h:494 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/88 no locks held by swapper/88/0. irq event stamp: 18074448 hardirqs last enabled at (18074447): [<c0000000001a2a7c>] tick_nohz_idle_enter+0x9c/0x110 hardirqs last disabled at (18074448): [<c000000000106798>] do_idle+0x138/0x3b0 do_idle at kernel/sched/idle.c:253 (discriminator 1) softirqs last enabled at (18074440): [<c0000000000bbec4>] irq_enter_rcu+0x94/0xa0 softirqs last disabled at (18074439): [<c0000000000bbea0>] irq_enter_rcu+0x70/0xa0 CPU: 88 PID: 0 Comm: swapper/88 Tainted: G W 5.9.0-rc8-next-20201007 #1 Call Trace: [c00020000a4bfcf0] [c000000000649e98] dump_stack+0xec/0x144 (unreliable) [c00020000a4bfd30] [c0000000000f6c34] ___might_sleep+0x2f4/0x310 [c00020000a4bfdb0] [c000000000354f94] slab_pre_alloc_hook.constprop.82+0x124/0x190 [c00020000a4bfe00] [c00000000035e9e8] __kmalloc_node+0x88/0x3a0 slab_alloc_node at mm/slub.c:2817 (inlined by) __kmalloc_node at mm/slub.c:4013 [c00020000a4bfe80] [c0000000006494d8] alloc_cpumask_var_node+0x38/0x80 kmalloc_node at include/linux/slab.h:577 (inlined by) alloc_cpumask_var_node at lib/cpumask.c:116 [c00020000a4bfef0] [c00000000003eedc] start_secondary+0x27c/0x800 update_mask_by_l2 at arch/powerpc/kernel/smp.c:1267 (inlined by) add_cpu_to_masks at arch/powerpc/kernel/smp.c:1387 (inlined by) start_secondary at arch/powerpc/kernel/smp.c:1420 [c00020000a4bff90] [c00000000000c468] start_secondary_resume+0x10/0x14 Allocating a temporary mask while performing a CPU Hotplug operation with CONFIG_CPUMASK_OFFSTACK enabled, leads to calling a sleepable function from a atomic context. Fix this by allocating the temporary mask with GFP_ATOMIC flag. Also instead of having to allocate twice, allocate the mask in the caller so that we only have to allocate once. If the allocation fails, assume the mask to be same as sibling mask, which will make the scheduler to drop this domain for this CPU. Fixes: 70a94089d7f7 ("powerpc/smp: Optimize update_coregroup_mask") Fixes: 3ab33d6dc3e9 ("powerpc/smp: Optimize update_mask_by_l2") Reported-by: Qian Cai <cai@redhat.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201019042716.106234-3-srikar@linux.vnet.ibm.com
2020-10-19powerpc/smp: Remove unnecessary variableSrikar Dronamraju1-9/+4
Commit 3ab33d6dc3e9 ("powerpc/smp: Optimize update_mask_by_l2") introduced submask_fn in update_mask_by_l2 to track the right submask. However commit f6606cfdfbcd ("powerpc/smp: Dont assume l2-cache to be superset of sibling") introduced sibling_mask in update_mask_by_l2 to track the same submask. Remove sibling_mask in favour of submask_fn. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201019042716.106234-2-srikar@linux.vnet.ibm.com