summaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)AuthorFilesLines
2020-11-26powerpc: Work around inline asm issues in alternate feature sectionsBill Wendling1-3/+19
The clang toolchain treats inline assembly a bit differently than straight assembly code. In particular, inline assembly doesn't have the complete context available to resolve expressions. This is intentional to avoid divergence in the resulting assembly code. We can work around this issue by borrowing a workaround done for ARM, i.e. not directly testing the labels themselves, but by moving the current output pointer by a value that should always be zero. If this value is not null, then we will trigger a backward move, which is explicitly forbidden. Signed-off-by: Bill Wendling <morbo@google.com> [mpe: Put it in a macro and only do the workaround for clang] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201120224034.191382-4-morbo@google.com
2020-11-26powerpc/boot: Use clang when CC is clangBill Wendling1-0/+4
The gcc compiler may not be available if CC is clang. Signed-off-by: Bill Wendling <morbo@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201120224034.191382-3-morbo@google.com
2020-11-26powerpc/boot/wrapper: Add "-z notext" flag to disable diagnosticBill Wendling1-1/+3
The "-z notext" flag disables reporting an error if DT_TEXTREL is set. ld.lld: error: can't create dynamic relocation R_PPC64_ADDR64 against symbol: _start in readonly segment; recompile object files with -fPIC or pass '-Wl,-z,notext' to allow text relocations in the output >>> defined in >>> referenced by crt0.o:(.text+0x8) in archive arch/powerpc/boot/wrapper.a The BFD linker disables this by default (though it's configurable in current versions). LLD enables this by default. So we add the flag to keep LLD from emitting the error. Signed-off-by: Bill Wendling <morbo@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201120224034.191382-2-morbo@google.com
2020-11-26powerpc/boot/wrapper: Add "-z rodynamic" when using LLDBill Wendling1-1/+3
Normally all read-only sections precede SHF_WRITE sections. .dynamic and .got have the SHF_WRITE flag; .dynamic probably because of DT_DEBUG. LLD emits an error when this happens, so use "-z rodynamic" to mark .dynamic as read-only. Signed-off-by: Bill Wendling <morbo@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201118223910.2711337-1-morbo@google.com
2020-11-26powerpc/boot: Move the .got section to after the .dynamic sectionBill Wendling1-10/+11
Both .dynamic and .got are RELRO sections and should be placed together, and LLD emits an error: ld.lld: error: section: .got is not contiguous with other relro sections Place them together to avoid this. Signed-off-by: Bill Wendling <morbo@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201017000151.150788-1-morbo@google.com
2020-11-26powerpc/ptrace: Hard wire PT_SOFTE value to 1 in gpr_get() tooOleg Nesterov2-2/+12
The commit a8a4b03ab95f ("powerpc: Hard wire PT_SOFTE value to 1 in ptrace & signals") changed ptrace_get_reg(PT_SOFTE) to report 0x1, but PTRACE_GETREGS still copies pt_regs->softe as is. This is not consistent and this breaks the user-regs-peekpoke test from https://sourceware.org/systemtap/wiki/utrace/tests/ Reported-by: Jan Kratochvil <jan.kratochvil@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201119160247.GB5188@redhat.com
2020-11-26powerpc/ptrace: Simplify gpr_get()/tm_cgpr_get()Oleg Nesterov2-15/+7
gpr_get() does membuf_write() twice to override pt_regs->msr in between. We can call membuf_write() once and change ->msr in the kernel buffer, this simplifies the code and the next fix. The patch adds a new simple helper, membuf_at(offs), it returns the new membuf which can be safely used after membuf_write(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> [mpe: Fixup some minor whitespace issues noticed by Christophe] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201119160221.GA5188@redhat.com
2020-11-25Merge branch 'fixes' into nextMichael Ellerman27-192/+483
Merge our fixes branch, in particular to bring in the changes for the entry/uaccess flush.
2020-11-24sched/idle: Fix arch_cpu_idle() vs tracingPeter Zijlstra1-2/+2
We call arch_cpu_idle() with RCU disabled, but then use local_irq_{en,dis}able(), which invokes tracing, which relies on RCU. Switch all arch_cpu_idle() implementations to use raw_local_irq_{en,dis}able() and carefully manage the lockdep,rcu,tracing state like we do in entry. (XXX: we really should change arch_cpu_idle() to not return with interrupts enabled) Reported-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org
2020-11-23arch: move SA_* definitions to generic headersPeter Collingbourne1-24/+0
Most architectures with the exception of alpha, mips, parisc and sparc use the same values for these flags. Move their definitions into asm-generic/signal-defs.h and allow the architectures with non-standard values to override them. Also, document the non-standard flag values in order to make it easier to add new generic flags in the future. A consequence of this change is that on powerpc and x86, the constants' values aside from SA_RESETHAND change signedness from unsigned to signed. This is not expected to impact realistic use of these constants. In particular the typical use of the constants where they are or'ed together and assigned to sa_flags (or another int variable) would not be affected. Signed-off-by: Peter Collingbourne <pcc@google.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Reviewed-by: Dave Martin <Dave.Martin@arm.com> Link: https://linux-review.googlesource.com/id/Ia3849f18b8009bf41faca374e701cdca36974528 Link: https://lkml.kernel.org/r/b6d0d1ec34f9ee93e1105f14f288fba5f89d1f24.1605235762.git.pcc@google.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-11-23powerpc/64s: Fix allnoconfig build since uaccess flushStephen Rothwell1-0/+2
Using DECLARE_STATIC_KEY_FALSE needs linux/jump_table.h. Otherwise the build fails with eg: arch/powerpc/include/asm/book3s/64/kup-radix.h:66:1: warning: data definition has no type or storage class 66 | DECLARE_STATIC_KEY_FALSE(uaccess_flush_key); Fixes: 9a32a7e78bd0 ("powerpc/64s: flush L1D after user accesses") Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> [mpe: Massage change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201123184016.693fe464@canb.auug.org.au
2020-11-23Merge tag 'powerpc-cve-2020-4788' into fixesMichael Ellerman15-80/+421
From Daniel's cover letter: IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch series flushes the L1 cache on kernel entry (patch 2) and after the kernel performs any user accesses (patch 3). It also adds a self-test and performs some related cleanups.
2020-11-23Merge 5.10-rc5 into tty-nextGreg Kroah-Hartman20-86/+431
We need the tty/serial fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-22mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exportsDan Williams3-3/+8
The core-mm has a default __weak implementation of phys_to_target_node() to mirror the weak definition of memory_add_physaddr_to_nid(). That symbol is exported for modules. However, while the export in mm/memory_hotplug.c exported the symbol in the configuration cases of: CONFIG_NUMA_KEEP_MEMINFO=y CONFIG_MEMORY_HOTPLUG=y ...and: CONFIG_NUMA_KEEP_MEMINFO=n CONFIG_MEMORY_HOTPLUG=y ...it failed to export the symbol in the case of: CONFIG_NUMA_KEEP_MEMINFO=y CONFIG_MEMORY_HOTPLUG=n Not only is that broken, but Christoph points out that the kernel should not be exporting any __weak symbol, which means that memory_add_physaddr_to_nid() example that phys_to_target_node() copied is broken too. Rework the definition of phys_to_target_node() and memory_add_physaddr_to_nid() to not require weak symbols. Move to the common arch override design-pattern of an asm header defining a symbol to replace the default implementation. The only common header that all memory_add_physaddr_to_nid() producing architectures implement is asm/sparsemem.h. In fact, powerpc already defines its memory_add_physaddr_to_nid() helper in sparsemem.h. Double-down on that observation and define phys_to_target_node() where necessary in asm/sparsemem.h. An alternate consideration that was discarded was to put this override in asm/numa.h, but that entangles with the definition of MAX_NUMNODES relative to the inclusion of linux/nodemask.h, and requires powerpc to grow a new header. The dependency on NUMA_KEEP_MEMINFO for DEV_DAX_HMEM_DEVICES is invalid now that the symbol is properly exported / stubbed in all combinations of CONFIG_NUMA_KEEP_MEMINFO and CONFIG_MEMORY_HOTPLUG. [dan.j.williams@intel.com: v4] Link: https://lkml.kernel.org/r/160461461867.1505359.5301571728749534585.stgit@dwillia2-desk3.amr.corp.intel.com [dan.j.williams@intel.com: powerpc: fix create_section_mapping compile warning] Link: https://lkml.kernel.org/r/160558386174.2948926.2740149041249041764.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: a035b6bf863e ("mm/memory_hotplug: introduce default phys_to_target_node() implementation") Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lkml.kernel.org/r/160447639846.1133764.7044090803980177548.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-20powerpc: Enable seccomp architecture trackingYiFei Zhu1-0/+23
To enable seccomp constant action bitmaps, we need to have a static mapping to the audit architecture and system call table size. Add these for powerpc. __LITTLE_ENDIAN__ is used here instead of CONFIG_CPU_LITTLE_ENDIAN to keep it consistent with asm/syscall.h. Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/0b64925362671cdaa26d01bfe50b3ba5e164adfd.1605101222.git.yifeifz2@illinois.edu
2020-11-20crypto: sha - split sha.h into sha1.h and sha2.hEric Biggers3-3/+3
Currently <crypto/sha.h> contains declarations for both SHA-1 and SHA-2, and <crypto/sha3.h> contains declarations for SHA-3. This organization is inconsistent, but more importantly SHA-1 is no longer considered to be cryptographically secure. So to the extent possible, SHA-1 shouldn't be grouped together with any of the other SHA versions, and usage of it should be phased out. Therefore, split <crypto/sha.h> into two headers <crypto/sha1.h> and <crypto/sha2.h>, and make everyone explicitly specify whether they want the declarations for SHA-1, SHA-2, or both. This avoids making the SHA-1 declarations visible to files that don't want anything to do with SHA-1. It also prepares for potentially moving sha1.h into a new insecure/ or dangerous/ directory. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-11-19Merge tag 'powerpc-cve-2020-4788' of ↵Linus Torvalds15-80/+421
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Fixes for CVE-2020-4788. From Daniel's cover letter: IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch series flushes the L1 cache on kernel entry (patch 2) and after the kernel performs any user accesses (patch 3). It also adds a self-test and performs some related cleanups" * tag 'powerpc-cve-2020-4788' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: rename pnv|pseries_setup_rfi_flush to _setup_security_mitigations selftests/powerpc: refactor entry and rfi_flush tests selftests/powerpc: entry flush test powerpc: Only include kup-radix.h for 64-bit Book3S powerpc/64s: flush L1D after user accesses powerpc/64s: flush L1D on kernel entry selftests/powerpc: rfi_flush: disable entry flush if present
2020-11-19powerpc/64s: rename pnv|pseries_setup_rfi_flush to _setup_security_mitigationsDaniel Axtens4-9/+11
pseries|pnv_setup_rfi_flush already does the count cache flush setup, and we just added entry and uaccess flushes. So the name is not very accurate any more. In both platforms we then also immediately setup the STF flush. Rename them to _setup_security_mitigations and fold the STF flush in. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-11-19powerpc: Only include kup-radix.h for 64-bit Book3SMichael Ellerman3-6/+11
In kup.h we currently include kup-radix.h for all 64-bit builds, which includes Book3S and Book3E. The latter doesn't make sense, Book3E never uses the Radix MMU. This has worked up until now, but almost by accident, and the recent uaccess flush changes introduced a build breakage on Book3E because of the bad structure of the code. So disentangle things so that we only use kup-radix.h for Book3S. This requires some more stubs in kup.h and fixing an include in syscall_64.c. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-11-19powerpc/64s: flush L1D after user accessesNicholas Piggin12-90/+229
IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch flushes the L1 cache after user accesses. This is part of the fix for CVE-2020-4788. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-11-19powerpc/64s: flush L1D on kernel entryNicholas Piggin10-2/+197
IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch flushes the L1 cache on kernel entry. This is part of the fix for CVE-2020-4788. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-11-19powernv/memtrace: don't abuse memory hot(un)plug infrastructure for memory ↵David Hildenbrand2-109/+62
allocations Let's use alloc_contig_pages() for allocating memory and remove the linear mapping manually via arch_remove_linear_mapping(). Mark all pages PG_offline, such that they will definitely not get touched - e.g., when hibernating. When freeing memory, try to revert what we did. The original idea was discussed in: https://lkml.kernel.org/r/48340e96-7e6b-736f-9e23-d3111b915b6e@redhat.com This is similar to CONFIG_DEBUG_PAGEALLOC handling on other architectures, whereby only single pages are unmapped from the linear mapping. Let's mimic what memory hot(un)plug would do with the linear mapping. We now need MEMORY_HOTPLUG and CONTIG_ALLOC as dependencies. Add a TODO that we want to use __GFP_ZERO for clearing once alloc_contig_pages() understands that. Tested with in QEMU/TCG with 10 GiB of main memory: [root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable [ 105.903043][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000 [root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable [ 145.042493][ T1080] radix-mmu: Mapped 0x0000000080000000-0x00000000c0000000 with 64.0 KiB pages [ 145.049019][ T1080] memtrace: Freed trace memory back on node 0 [ 145.333960][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000 [root@localhost ~]# echo 0x80000000 > /sys/kernel/debug/powerpc/memtrace/enable [ 213.606916][ T1080] radix-mmu: Mapped 0x0000000080000000-0x00000000c0000000 with 64.0 KiB pages [ 213.613855][ T1080] memtrace: Freed trace memory back on node 0 [ 214.185094][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000 [root@localhost ~]# echo 0x100000000 > /sys/kernel/debug/powerpc/memtrace/enable [ 234.874872][ T1080] radix-mmu: Mapped 0x0000000080000000-0x0000000100000000 with 64.0 KiB pages [ 234.886974][ T1080] memtrace: Freed trace memory back on node 0 [ 234.890153][ T1080] memtrace: Failed to allocate trace memory on node 0 [root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable [ 259.490196][ T1080] memtrace: Allocated trace memory on node 0 at 0x0000000080000000 I also made sure allocated memory is properly zeroed. Note 1: We currently won't be allocating from ZONE_MOVABLE - because our pages are not movable. However, as we don't run with any memory hot(un)plug mechanism around, we could make an exception to increase the chance of allocations succeeding. Note 2: PG_reserved isn't sufficient. E.g., kernel_page_present() used along PG_reserved in hibernation code will always return "true" on powerpc, resulting in the pages getting touched. It's too generic - e.g., indicates boot allocations. Note 3: For now, we keep using memory_block_size_bytes() as minimum granularity. Suggested-by: Michal Hocko <mhocko@kernel.org> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111145322.15793-9-david@redhat.com
2020-11-19powerpc/mm: remove linear mapping if __add_pages() fails in arch_add_memory()David Hildenbrand1-1/+4
Let's revert what we did in case something goes wrong and we return an error - as already done on arm64 and s390x. Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111145322.15793-8-david@redhat.com
2020-11-19powerpc/book3s64/hash: Drop WARN_ON in hash__remove_section_mapping()David Hildenbrand1-1/+0
The single caller (arch_remove_linear_mapping()) prints a proper warning when this function fails. No need to eventually crash the kernel - let's drop this WARN_ON. Suggested-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111145322.15793-7-david@redhat.com
2020-11-19powerpc/mm: print warning in arch_remove_linear_mapping()David Hildenbrand1-1/+3
Let's print a warning similar to in arch_add_linear_mapping() instead of WARN_ON_ONCE() and eventually crashing the kernel. Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111145322.15793-6-david@redhat.com
2020-11-19powerpc/mm: protect linear mapping modifications by a mutexDavid Hildenbrand1-0/+5
This code currently relies on mem_hotplug_begin()/mem_hotplug_done() - create_section_mapping()/remove_section_mapping() implementations cannot tollerate getting called concurrently. Let's prepare for callers (memtrace) not holding any such locks (and don't force them to mess with memory hotplug locks). Other parts in these functions don't seem to rely on external locking. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111145322.15793-5-david@redhat.com
2020-11-19powerpc/mm: factor out creating/removing linear mappingDavid Hildenbrand1-13/+28
We want to stop abusing memory hotplug infrastructure in memtrace code to perform allocations and remove the linear mapping. Instead we will use alloc_contig_pages() and remove the linear mapping manually. Let's factor out creating/removing the linear mapping into arch_create_linear_mapping() / arch_remove_linear_mapping() - so in the future, we might be able to have whole arch_add_memory() / arch_remove_memory() be implemented in common code. Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111145322.15793-4-david@redhat.com
2020-11-19powerpc/powernv/memtrace: Fix crashing the kernel when enabling concurrentlyDavid Hildenbrand1-7/+15
It's very easy to crash the kernel right now by simply trying to enable memtrace concurrently, hammering on the "enable" interface loop.sh: #!/bin/bash dmesg --console-off while true; do echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable done [root@localhost ~]# loop.sh & [root@localhost ~]# loop.sh & Resulting quickly in a kernel crash. Let's properly protect using a mutex. Fixes: 9d5171a8f248 ("powerpc/powernv: Enable removal of memory for in memory tracing") Cc: stable@vger.kernel.org# v4.14+ Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111145322.15793-3-david@redhat.com
2020-11-19powerpc/powernv/memtrace: Don't leak kernel memory to user spaceDavid Hildenbrand1-0/+22
We currently leak kernel memory to user space, because memory offlining doesn't do any implicit clearing of memory and we are missing explicit clearing of memory. Let's keep it simple and clear pages before removing the linear mapping. Reproduced in QEMU/TCG with 10 GiB of main memory: [root@localhost ~]# dd obs=9G if=/dev/urandom of=/dev/null [... wait until "free -m" used counter no longer changes and cancel] 19665802+0 records in 1+0 records out 9663676416 bytes (9.7 GB, 9.0 GiB) copied, 135.548 s, 71.3 MB/s [root@localhost ~]# cat /sys/devices/system/memory/block_size_bytes 40000000 [root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable [ 402.978663][ T1086] page:000000001bc4bc74 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24900 [ 402.980063][ T1086] flags: 0x7ffff000001000(reserved) [ 402.980415][ T1086] raw: 007ffff000001000 c00c000000924008 c00c000000924008 0000000000000000 [ 402.980627][ T1086] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 [ 402.980845][ T1086] page dumped because: unmovable page [ 402.989608][ T1086] Offlined Pages 16384 [ 403.324155][ T1086] memtrace: Allocated trace memory on node 0 at 0x0000000200000000 Before this patch: [root@localhost ~]# hexdump -C /sys/kernel/debug/powerpc/memtrace/00000000/trace | head 00000000 c8 25 72 51 4d 26 36 c5 5c c2 56 15 d5 1a cd 10 |.%rQM&6.\.V.....| 00000010 19 b9 50 b2 cb e3 60 b8 ec 0a f3 ec 4b 3c 39 f0 |..P...`.....K<9.|$ 00000020 4e 5a 4c cf bd 26 19 ff 37 79 13 67 24 b7 b8 57 |NZL..&..7y.g$..W|$ 00000030 98 3e f5 be 6f 14 6a bd a4 52 bc 6e e9 e0 c1 5d |.>..o.j..R.n...]|$ 00000040 76 b3 ae b5 88 d7 da e3 64 23 85 2c 10 88 07 b6 |v.......d#.,....|$ 00000050 9a d8 91 de f7 50 27 69 2e 64 9c 6f d3 19 45 79 |.....P'i.d.o..Ey|$ 00000060 6a 6f 8a 61 71 19 1f c7 f1 df 28 26 ca 0f 84 55 |jo.aq.....(&...U|$ 00000070 01 3f be e4 e2 e1 da ff 7b 8c 8e 32 37 b4 24 53 |.?......{..27.$S|$ 00000080 1b 70 30 45 56 e6 8c c4 0e b5 4c fb 9f dd 88 06 |.p0EV.....L.....|$ 00000090 ef c4 18 79 f1 60 b1 5c 79 59 4d f4 36 d7 4a 5c |...y.`.\yYM.6.J\|$ After this patch: [root@localhost ~]# hexdump -C /sys/kernel/debug/powerpc/memtrace/00000000/trace | head 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 40000000 Fixes: 9d5171a8f248 ("powerpc/powernv: Enable removal of memory for in memory tracing") Cc: stable@vger.kernel.org # v4.14+ Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111145322.15793-2-david@redhat.com
2020-11-19powerpc/perf: Use regs->nip when SIAR is zeroMadhavan Srinivasan1-4/+17
In power10 DD1, there is an issue where the SIAR (Sampled Instruction Address Register) is not latching to the sampled address during random sampling. This results in value of 0s in the SIAR. Add a check to use regs->nip when SIAR is zero. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201021085329.384535-5-maddy@linux.ibm.com
2020-11-19powerpc/perf: Use the address from SIAR register to set cpumode flagsAthira Rajeev1-0/+14
While setting the processor mode for any sample, perf_get_misc_flags() expects the privilege level to differentiate the userspace and kernel address. On power10 DD1, there is an issue that causes MSR_HV MSR_PR bits of Sampled Instruction Event Register (SIER) not to be set for marked events. Hence add a check to use the address in SIAR (Sampled Instruction Address Register) to identify the privilege level. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201021085329.384535-3-maddy@linux.ibm.com
2020-11-19powerpc/perf: Drop the check for SIAR_VALIDAthira Rajeev1-1/+8
In power10 DD1, there is an issue that causes the SIAR_VALID bit of the SIER (Sampled Instruction Event Register) to not be set. But the SIAR_VALID bit is used for fetching the instruction address from the SIAR (Sampled Instruction Address Register), and marked events are sampled only if the SIAR_VALID bit is set. So drop the check for SIAR_VALID and return true always incase of power10 DD1. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201021085329.384535-2-maddy@linux.ibm.com
2020-11-19powerpc/perf: Add new power PMU flag "PPMU_P10_DD1" for power10 DD1Athira Rajeev2-0/+7
Add a new power PMU flag "PPMU_P10_DD1" which can be used to conditionally add any code path for power10 DD1 processor version. Also modify power10 PMU driver code to set this flag only for DD1, based on the Processor Version Register (PVR) value. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201021085329.384535-1-maddy@linux.ibm.com
2020-11-19powerpc/mm: Fix comparing pointer to 0 warningKaixu Xia1-1/+1
Fixes coccicheck warning: ./arch/powerpc/mm/pgtable_32.c:87:11-12: WARNING comparing pointer to 0 Avoid pointer type value compared to 0. Reported-by: Tosk Robot <tencent_os_robot@tencent.com> Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1604976961-20441-1-git-send-email-kaixuxia@tencent.com
2020-11-19powerpc: Remove RFI macroChristophe Leroy3-12/+28
RFI macro is just there to add an infinite loop past rfi in order to avoid prefetch on 40x in half a dozen of places in entry_32 and head_32. Those places are already full of #ifdefs, so just add a few more to explicitely show those loops and remove RFI. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f7e9cb9e9240feec63cb330abf40b67d1aad852f.1604854583.git.christophe.leroy@csgroup.eu
2020-11-19powerpc: Replace RFI by rfi on book3s/32 and bookeChristophe Leroy4-15/+15
For book3s/32 and for booke, RFI is just an rfi. Only 40x has a non trivial RFI. CONFIG_PPC_RTAS is never selected by 40x platforms. Make it more explicit by replacing RFI by rfi wherever possible. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b901ddfdeb8a0a3b7cb59999599cdfde1bbfe834.1604854583.git.christophe.leroy@csgroup.eu
2020-11-19powerpc/64s: Replace RFI by RFI_TO_KERNEL and remove RFIChristophe Leroy2-3/+7
In head_64.S, we have two places using RFI to return to kernel. Use RFI_TO_KERNEL instead. They are the two only places using RFI on book3s/64, so the RFI macro can go away. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7719261b0a0d2787772339484c33eb809723bca7.1604854583.git.christophe.leroy@csgroup.eu
2020-11-19powerpc/powernv/sriov: fix unsigned int win compared to less than zeroKaixu Xia1-1/+1
Fix coccicheck warning: arch/powerpc/platforms/powernv/pci-sriov.c:443:7-10: WARNING: Unsigned expression compared with zero: win < 0 arch/powerpc/platforms/powernv/pci-sriov.c:462:7-10: WARNING: Unsigned expression compared with zero: win < 0 Fixes: 39efc03e3ee8 ("powerpc/powernv/sriov: Move M64 BAR allocation into a helper") Reported-by: Tosk Robot <tencent_os_robot@tencent.com> Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1605007170-22171-1-git-send-email-kaixuxia@tencent.com
2020-11-19Revert "powerpc/pseries/hotplug-cpu: Remove double free in error path"Zhang Xiaoxu1-0/+1
This reverts commit a0ff72f9f5a780341e7ff5e9ba50a0dad5fa1980. Since the commit b015f6bc9547 ("powerpc/pseries: Add cpu DLPAR support for drc-info property"), the 'cpu_drcs' wouldn't be double freed when the 'cpus' node not found. So we needn't apply this patch, otherwise, the memory will be leaked. Fixes: a0ff72f9f5a7 ("powerpc/pseries/hotplug-cpu: Remove double free in error path") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> [mpe: Caused by me applying a patch to a function that had changed in the interim] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111020752.1686139-1-zhangxiaoxu5@huawei.com
2020-11-19powerpc/64s/perf: perf interrupt does not have to get_user_pages to access ↵Nicholas Piggin2-2/+3
user memory read_user_stack_slow that walks user address translation by hand is only required on hash, because a hash fault can not be serviced from "NMI" context (to avoid re-entering the hash code) so the user stack can be mapped into Linux page tables but not accessible by the CPU. Radix MMU mode does not have this restriction. A page fault failure would indicate the page is not accessible via get_user_pages either, so avoid this on radix. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111120151.3150658-1-npiggin@gmail.com
2020-11-19powerpc: Use the common INIT_DATA_SECTION macro in vmlinux.lds.SYouling Tang1-18/+1
Use the common INIT_DATA_SECTION rule for the linker script in an effort to regularize the linker script. Signed-off-by: Youling Tang <tangyouling@loongson.cn> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1604487550-20040-1-git-send-email-tangyouling@loongson.cn
2020-11-19powerpc/feature: Fix CPU_FTRS_ALWAYS by removing CPU_FTRS_GENERIC_32Christophe Leroy1-5/+0
On 8xx, we get the following features: [ 0.000000] cpu_features = 0x0000000000000100 [ 0.000000] possible = 0x0000000000000120 [ 0.000000] always = 0x0000000000000000 This is not correct. As CONFIG_PPC_8xx is mutually exclusive with all other configurations, the three lines should be equal. The problem is due to CPU_FTRS_GENERIC_32 which is taken when CONFIG_BOOK3S_32 is NOT selected. This CPU_FTRS_GENERIC_32 is pointless because there is no generic configuration supporting all 32 bits but book3s/32. Remove this pointless generic features definition to unbreak the calculation of 'possible' features and 'always' features. Fixes: 76bc080ef5a3 ("[POWERPC] Make default cputable entries reflect selected CPU family") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/76a85f30bf981d1aeaae00df99321235494da254.1604426550.git.christophe.leroy@csgroup.eu
2020-11-19powerpc/mm: Update tlbiel loop on POWER10Aneesh Kumar K.V3-9/+32
With POWER10, single tlbiel instruction invalidates all the congruence class of the TLB and hence we need to issue only one tlbiel with SET=0. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201007053305.232879-1-aneesh.kumar@linux.ibm.com
2020-11-19powerpc: Avoid broken GCC __attribute__((optimize))Ard Biesheuvel4-9/+6
Commit 7053f80d9696 ("powerpc/64: Prevent stack protection in early boot") introduced a couple of uses of __attribute__((optimize)) with function scope, to disable the stack protector in some early boot code. Unfortunately, and this is documented in the GCC man pages [0], overriding function attributes for optimization is broken, and is only supported for debug scenarios, not for production: the problem appears to be that setting GCC -f flags using this method will cause it to forget about some or all other optimization settings that have been applied. So the only safe way to disable the stack protector is to disable it for the entire source file. [0] https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html Fixes: 7053f80d9696 ("powerpc/64: Prevent stack protection in early boot") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> [mpe: Drop one remaining use of __nostackprotector, reported by snowpatch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201028080433.26799-1-ardb@kernel.org
2020-11-19powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe()Qinglang Miao1-1/+1
I noticed that iounmap() of msgr_block_addr before return from mpic_msgr_probe() in the error handling case is missing. So use devm_ioremap() instead of just ioremap() when remapping the message register block, so the mapping will be automatically released on probe failure. Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201028091551.136400-1-miaoqinglang@huawei.com
2020-11-19powerpc/ps3: Drop unused DBG macroMichael Ellerman1-7/+0
This DBG macro is unused, and has been unused since the file was originally merged into mainline. Just drop it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201023031305.3284819-1-mpe@ellerman.id.au
2020-11-19powerpc/85xx: Fix declaration made after definitionMichael Ellerman1-2/+1
Currently the clang build of corenet64_smp_defconfig fails with: arch/powerpc/platforms/85xx/corenet_generic.c:210:1: error: attribute declaration must precede definition machine_arch_initcall(corenet_generic, corenet_gen_publish_devices); Fix it by moving the initcall definition prior to the machine definition, and directly below the function it calls, which is the usual style anyway. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201023020838.3274226-1-mpe@ellerman.id.au
2020-11-19powerpc/mm: Move setting PTE specific flags to pfn_pmd()Aneesh Kumar K.V2-2/+23
powerpc used to set the PTE specific flags in set_pte_at(). That is different from other architectures. To be consistent with other architectures powerpc updated pfn_pte() to set _PAGE_PTE in commit 379c926d6334 ("powerpc/mm: move setting pte specific flags to pfn_pte") That commit didn't do the same for pfn_pmd() because we expect pmd_mkhuge() to do that. But as per Linus that is a bad rule: The rule that you must use "pmd_mkhuge()" seems _completely_ wrong. The only valid use to ever make a pmd out of a pfn is to make a huge-page. Hence update pfn_pmd() to set _PAGE_PTE. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201022091115.39568-1-aneesh.kumar@linux.ibm.com
2020-11-19powerpc/bitops: Fix possible undefined behaviour with fls() and fls64()Christophe Leroy1-2/+21
fls() and fls64() are using __builtin_ctz() and _builtin_ctzll(). On powerpc, those builtins trivially use ctlzw and ctlzd power instructions. Allthough those instructions provide the expected result with input argument 0, __builtin_ctz() and __builtin_ctzll() are documented as undefined for value 0. The easiest fix would be to use fls() and fls64() functions defined in include/asm-generic/bitops/builtin-fls.h and include/asm-generic/bitops/fls64.h, but GCC output is not optimal: 00000388 <testfls>: 388: 2c 03 00 00 cmpwi r3,0 38c: 41 82 00 10 beq 39c <testfls+0x14> 390: 7c 63 00 34 cntlzw r3,r3 394: 20 63 00 20 subfic r3,r3,32 398: 4e 80 00 20 blr 39c: 38 60 00 00 li r3,0 3a0: 4e 80 00 20 blr 000003b0 <testfls64>: 3b0: 2c 03 00 00 cmpwi r3,0 3b4: 40 82 00 1c bne 3d0 <testfls64+0x20> 3b8: 2f 84 00 00 cmpwi cr7,r4,0 3bc: 38 60 00 00 li r3,0 3c0: 4d 9e 00 20 beqlr cr7 3c4: 7c 83 00 34 cntlzw r3,r4 3c8: 20 63 00 20 subfic r3,r3,32 3cc: 4e 80 00 20 blr 3d0: 7c 63 00 34 cntlzw r3,r3 3d4: 20 63 00 40 subfic r3,r3,64 3d8: 4e 80 00 20 blr When the input of fls(x) is a constant, just check x for nullity and return either 0 or __builtin_clz(x). Otherwise, use cntlzw instruction directly. For fls64() on PPC64, do the same but with __builtin_clzll() and cntlzd instruction. On PPC32, lets take the generic fls64() which will use our fls(). The result is as expected: 00000388 <testfls>: 388: 7c 63 00 34 cntlzw r3,r3 38c: 20 63 00 20 subfic r3,r3,32 390: 4e 80 00 20 blr 000003a0 <testfls64>: 3a0: 2c 03 00 00 cmpwi r3,0 3a4: 40 82 00 10 bne 3b4 <testfls64+0x14> 3a8: 7c 83 00 34 cntlzw r3,r4 3ac: 20 63 00 20 subfic r3,r3,32 3b0: 4e 80 00 20 blr 3b4: 7c 63 00 34 cntlzw r3,r3 3b8: 20 63 00 40 subfic r3,r3,64 3bc: 4e 80 00 20 blr Fixes: 2fcff790dcb4 ("powerpc: Use builtin functions for fls()/__fls()/fls64()") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/348c2d3f19ffcff8abe50d52513f989c4581d000.1603375524.git.christophe.leroy@csgroup.eu
2020-11-19powerpc/64s: Convert some cpu_setup() and cpu_restore() functions to CJordan Niethe4-260/+287
The only thing keeping the cpu_setup() and cpu_restore() functions used in the cputable entries for Power7, Power8, Power9 and Power10 in assembly was cpu_restore() being called before there was a stack in generic_secondary_smp_init(). Commit ("powerpc/64: Set up a kernel stack for secondaries before cpu_restore()") means that it is now possible to use C. Rewrite the functions in C so they are a little bit easier to read. This is not changing their functionality. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> [mpe: Tweak copyright and authorship notes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201014072837.24539-2-jniethe5@gmail.com