summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)AuthorFilesLines
2020-06-02powerpc: add an ioremap_phb helperChristoph Hellwig1-18/+35
Factor code shared between pci_64 and electra_cf into a ioremap_pbh helper that follows the normal ioremap semantics, and returns a useful __iomem pointer. Note that it opencodes __ioremap_at as we know from the callers the slab is available. Switch pci_64 to also store the result as __iomem pointer, and unmap the result using iounmap instead of force casting and using vmalloc APIs. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-7-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02powerpc: Add POWER10 architected modeAlistair Popple3-4/+50
PVR value of 0x0F000006 means we are arch v3.1 compliant (i.e. POWER10). This is used by phyp and kvm when booting as a pseries guest to detect the presence of new P10 features and to enable the appropriate hwcap and facility bits. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Cédric Le Goater <clg@kaod.org> [mpe: Fall through to __init_FSCR rather than duplicating it, drop hack to set current->thread.fscr now that is handled elsewhere.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200521014341.29095-8-alistair@popple.id.au
2020-06-02powerpc/dt_cpu_ftrs: Add MMA featureAlistair Popple1-1/+16
Matrix multiple assist (MMA) is a new feature added to ISAv3.1 and POWER10. Support on powernv can be selected via a firmware CPU device tree feature which enables it via a PCR bit. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200521014341.29095-7-alistair@popple.id.au
2020-06-02powerpc/dt_cpu_ftrs: Enable Prefixed InstructionsAlistair Popple1-0/+1
Prefix instructions have their own FSCR bit which needs to be enabled via a CPU feature. The kernel will save the FSCR for problem state but it needs to be enabled initially. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200521014341.29095-6-alistair@popple.id.au
2020-06-02powerpc/dt_cpu_ftrs: Advertise support for ISA v3.1 if selectedAlistair Popple1-0/+6
On powernv hardware support for ISAv3.1 is advertised via a cpu feature bit in the device tree. This patch enables the associated HWCAP bit if the device tree indicates ISAv3.1 is available. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200521014341.29095-4-alistair@popple.id.au
2020-06-02powerpc/64s: Save FSCR to init_task.thread.fscr after feature initMichael Ellerman1-0/+19
At boot the FSCR is initialised via one of two paths. On most systems it's set to a hard coded value in __init_FSCR(). On newer skiboot systems we use the device tree CPU features binding, where firmware can tell Linux what bits to set in FSCR (and HFSCR). In both cases the value that's configured at boot is not propagated into the init_task.thread.fscr value prior to the initial fork of init (pid 1), which means the value is not used by any processes other than swapper (the idle task). For the __init_FSCR() case this is OK, because the value in init_task.thread.fscr is initialised to something sensible. However it does mean that the value set in __init_FSCR() is not used other than for swapper, which is odd and confusing. The bigger problem is for the device tree CPU features case it prevents firmware from setting (or clearing) FSCR bits for use by user space. This means all existing kernels can not have features enabled/disabled by firmware if those features require setting/clearing FSCR bits. We can handle both cases by saving the FSCR value into init_task.thread.fscr after we have initialised it at boot. This fixes the bug for device tree CPU features, and will allow us to simplify the initialisation for the __init_FSCR() case in a future patch. Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200527145843.2761782-3-mpe@ellerman.id.au
2020-06-02powerpc/64s: Don't let DT CPU features set FSCR_DSCRMichael Ellerman1-0/+8
The device tree CPU features binding includes FSCR bit numbers which Linux is instructed to set by firmware. Whether that's a good idea or not, in the case of the DSCR the Linux implementation has a hard requirement that the FSCR_DSCR bit not be set by default. We use it to track when a process reads/writes to DSCR, so it must be clear to begin with. So if firmware tells us to set FSCR_DSCR we must ignore it. Currently this does not cause a bug in our DSCR handling because the value of FSCR that the device tree CPU features code establishes is only used by swapper. All other tasks use the value hard coded in init_task.thread.fscr. However we'd like to fix that in a future commit, at which point this will become necessary. Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200527145843.2761782-2-mpe@ellerman.id.au
2020-06-02powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()Michael Ellerman1-1/+1
__init_FSCR() was added originally in commit 2468dcf641e4 ("powerpc: Add support for context switching the TAR register") (Feb 2013), and only set FSCR_TAR. At that point FSCR (Facility Status and Control Register) was not context switched, so the setting was permanent after boot. Later we added initialisation of FSCR_DSCR to __init_FSCR(), in commit 54c9b2253d34 ("powerpc: Set DSCR bit in FSCR setup") (Mar 2013), again that was permanent after boot. Then commit 2517617e0de6 ("powerpc: Fix context switch DSCR on POWER8") (Aug 2013) added a limited context switch of FSCR, just the FSCR_DSCR bit was context switched based on thread.dscr_inherit. That commit said "This clears the H/FSCR DSCR bit initially", but it didn't, it left the initialisation of FSCR_DSCR in __init_FSCR(). However the initial context switch from init_task to pid 1 would clear FSCR_DSCR because thread.dscr_inherit was 0. That commit also introduced the requirement that FSCR_DSCR be clear for user processes, so that we can take the facility unavailable interrupt in order to manage dscr_inherit. Then in commit 152d523e6307 ("powerpc: Create context switch helpers save_sprs() and restore_sprs()") (Dec 2015) FSCR was added to thread_struct. However it still wasn't fully context switched, we just took the existing value and set FSCR_DSCR if the new thread had dscr_inherit set. FSCR was still initialised at boot to FSCR_DSCR | FSCR_TAR, but that value was not propagated into the thread_struct, so the initial context switch set FSCR_DSCR back to 0. Finally commit b57bd2de8c6c ("powerpc: Improve FSCR init and context switching") (Jun 2016) added a full context switch of the FSCR, and added an initialisation of init_task.thread.fscr to FSCR_TAR | FSCR_EBB, but omitted FSCR_DSCR. The end result is that swapper runs with FSCR_DSCR set because of the initialisation in __init_FSCR(), but no other processes do, they use the value from init_task.thread.fscr. Having FSCR_DSCR set for swapper allows it to access SPR 3 from userspace, but swapper never runs userspace, so it has no useful effect. It's also confusing to have the value initialised in two places to two different values. So remove FSCR_DSCR from __init_FSCR(), this at least gets us to the point where there's a single value of FSCR, even if it's still set in two places. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Alistair Popple <alistair@popple.id.au> Link: https://lore.kernel.org/r/20200527145843.2761782-1-mpe@ellerman.id.au
2020-06-02powerpc/module_64: Use special stub for _mcount() with -mprofile-kernelNaveen N. Rao1-118/+104
Since commit c55d7b5e64265f ("powerpc: Remove STRICT_KERNEL_RWX incompatibility with RELOCATABLE"), powerpc kernels with -mprofile-kernel can crash in certain scenarios with a trace like below: BUG: Unable to handle kernel instruction fetch (NULL pointer?) Faulting instruction address: 0x00000000 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 DEBUG_PAGEALLOC NUMA PowerNV <snip> NIP [0000000000000000] 0x0 LR [c0080000102c0048] ext4_iomap_end+0x8/0x30 [ext4] Call Trace: iomap_apply+0x20c/0x920 (unreliable) iomap_bmap+0xfc/0x160 ext4_bmap+0xa4/0x180 [ext4] bmap+0x4c/0x80 jbd2_journal_init_inode+0x44/0x1a0 [jbd2] ext4_load_journal+0x440/0x860 [ext4] ext4_fill_super+0x342c/0x3ab0 [ext4] mount_bdev+0x25c/0x290 ext4_mount+0x28/0x50 [ext4] legacy_get_tree+0x4c/0xb0 vfs_get_tree+0x4c/0x130 do_mount+0xa18/0xc50 sys_mount+0x158/0x180 system_call+0x5c/0x68 The NIP points to NULL, or a random location (data even), while the LR always points to the LEP of a function (with an offset of 8), indicating that something went wrong with ftrace. However, ftrace is not necessarily active when such crashes occur. The kernel OOPS sometimes follows a warning from ftrace indicating that some module functions could not be patched with a nop. Other times, if a module is loaded early during boot, instruction patching can fail due to a separate bug, but the error is not reported due to missing error reporting. In all the above cases when instruction patching fails, ftrace will be disabled but certain kernel module functions will be left with default calls to _mcount(). This is not a problem with ELFv1. However, with -mprofile-kernel, the default stub is problematic since it depends on a valid module TOC in r2. If the kernel (or a different module) calls into a function that does not use the TOC, the function won't have a prologue to setup the module TOC. When that function calls into _mcount(), we will end up in the relocation stub that will use the previous TOC, and end up trying to jump into a random location. From the above trace: iomap_apply+0x20c/0x920 [kernel TOC] | V ext4_iomap_end+0x8/0x30 [no GEP == kernel TOC] | V _mcount() stub [uses kernel TOC -> random entry] To address this, let's change over to using the special stub that is used for ftrace_[regs_]caller() for _mcount(). This ensures that we are not dependent on a valid module TOC in r2 for default _mcount() handling. Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Tested-by: Qian Cai <cai@lca.pw> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8affd4298d22099bbd82544fab8185700a6222b1.1587488954.git.naveen.n.rao@linux.vnet.ibm.com
2020-06-02powerpc/module_64: Simplify check for -mprofile-kernel ftrace relocationsNaveen N. Rao1-14/+4
For -mprofile-kernel, we need special handling when generating stubs for ftrace calls such as _mcount(). To faciliate this, we check if a R_PPC64_REL24 relocation is for a symbol named "_mcount()" along with also checking the instruction sequence. The latter is not really required since "_mcount()" is an exported symbol and kernel modules cannot use it. As such, drop the additional checking and simplify the code. This helps unify stub creation for ftrace stubs with -mprofile-kernel and aids in code reuse. Also rename is_mprofile_mcount_callsite() to is_mprofile_ftrace_call() to reflect the checking being done. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7d9c316adfa1fb787ad268bb4691e7e4059ff2d5.1587488954.git.naveen.n.rao@linux.vnet.ibm.com
2020-06-02powerpc/module_64: Consolidate ftrace codeNaveen N. Rao1-36/+33
module_trampoline_target() is only used by ftrace. Move the prototype within the appropriate #ifdef in the header. Also, move the function body to the end of module_64.c so as to consolidate all ftrace code in one place. No functional changes. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/2527351f65c53c5866068ae130dc34c5d4ee8ad9.1587488954.git.naveen.n.rao@linux.vnet.ibm.com
2020-06-02powerpc/entry32: Blacklist exception exit points for kprobe.Christophe Leroy1-1/+8
kprobe does not handle events happening in real mode. The very last part of exception exits cannot support a trap. Blacklist them from kprobe. While we are at it, remove exc_exit_start symbol which is not used to avoid having to blacklist it. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/098b0fd3f6299aa1bd692bd576bd7012c84608de.1585670437.git.christophe.leroy@c-s.fr
2020-06-02powerpc/entry32: Blacklist syscall exit points for kprobe.Christophe Leroy1-0/+3
kprobe does not handle events happening in real mode. The very last part of syscall cannot support a trap. Add a symbol syscall_exit_finish to identify that part and blacklist it from kprobe. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/23eddf49abb03d1359fa0be4206998eb3800f42c.1585670437.git.christophe.leroy@c-s.fr
2020-06-02powerpc/entry32: Blacklist exception entry points for kprobe.Christophe Leroy1-15/+22
kprobe does not handle events happening in real mode. As exception entry points are running with MMU disabled, blacklist them. The handling of TLF_NAPPING and TLF_SLEEPING is moved before the CONFIG_TRACE_IRQFLAGS which contains 'reenable_mmu' because from there kprobe will be possible as the kernel will run with MMU enabled. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f61ac599855e674ebb592464d0ea32a3ba9c6644.1585670437.git.christophe.leroy@c-s.fr
2020-06-02powerpc/32: Blacklist functions running with MMU disabled for kprobeChristophe Leroy10-0/+16
kprobe does not handle events happening in real mode, all functions running with MMU disabled have to be blacklisted. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3bf57066d05518644dee0840af69d36ab5086729.1585670437.git.christophe.leroy@c-s.fr
2020-06-02powerpc/rtas: Remove machine_check_in_rtas()Christophe Leroy2-7/+1
machine_check_in_rtas() is just a trap. Do the trap directly in the machine check exception handler. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/78899f40f89cb3c4f69bdff7f04eb6ec7cb753d5.1585670437.git.christophe.leroy@c-s.fr
2020-06-02powerpc/kprobes: Use probe_address() to read instructionsChristophe Leroy1-3/+7
In order to avoid Oopses, use probe_address() to read the instruction at the address where the trap happened. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7f24b5961a6839ff01df792816807f74ff236bf6.1582567319.git.christophe.leroy@c-s.fr
2020-06-02powerpc/rtas: Implement reentrant rtas callLeonardo Bras2-0/+84
Implement rtas_call_reentrant() for reentrant rtas-calls: "ibm,int-on", "ibm,int-off",ibm,get-xive" and "ibm,set-xive". On LoPAPR Version 1.1 (March 24, 2016), from 7.3.10.1 to 7.3.10.4, items 2 and 3 say: 2 - For the PowerPC External Interrupt option: The * call must be reentrant to the number of processors on the platform. 3 - For the PowerPC External Interrupt option: The * argument call buffer for each simultaneous call must be physically unique. So, these rtas-calls can be called in a lockless way, if using a different buffer for each cpu doing such rtas call. For this, it was suggested to add the buffer (struct rtas_args) in the PACA struct, so each cpu can have it's own buffer. The PACA struct received a pointer to rtas buffer, which is allocated in the memory range available to rtas 32-bit. Reentrant rtas calls are useful to avoid deadlocks in crashing, where rtas-calls are needed, but some other thread crashed holding the rtas.lock. This is a backtrace of a deadlock from a kdump testing environment: #0 arch_spin_lock #1 lock_rtas () #2 rtas_call (token=8204, nargs=1, nret=1, outputs=0x0) #3 ics_rtas_mask_real_irq (hw_irq=4100) #4 machine_kexec_mask_interrupts #5 default_machine_crash_shutdown #6 machine_crash_shutdown #7 __crash_kexec #8 crash_kexec #9 oops_end Signed-off-by: Leonardo Bras <leobras.c@gmail.com> [mpe: Move under #ifdef PSERIES to avoid build breakage] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200518234245.200672-3-leobras.c@gmail.com
2020-06-02powerpc/kernel: Enables memory hot-remove after reboot on pseries guestsLeonardo Bras1-2/+7
While providing guests, it's desirable to resize it's memory on demand. By now, it's possible to do so by creating a guest with a small base memory, hot-plugging all the rest, and using 'movable_node' kernel command-line parameter, which puts all hot-plugged memory in ZONE_MOVABLE, allowing it to be removed whenever needed. But there is an issue regarding guest reboot: If memory is hot-plugged, and then the guest is rebooted, all hot-plugged memory goes to ZONE_NORMAL, which offers no guaranteed hot-removal. It usually prevents this memory to be hot-removed from the guest. It's possible to use device-tree information to fix that behavior, as it stores flags for LMB ranges on ibm,dynamic-memory-vN. It involves marking each memblock with the correct flags as hotpluggable memory, which mm/memblock.c puts in ZONE_MOVABLE during boot if 'movable_node' is passed. For carrying such information, the new flag DRCONF_MEM_HOTREMOVABLE was proposed and accepted into Power Architecture documentation. This flag should be: - true (b=1) if the hypervisor may want to hot-remove it later, and - false (b=0) if it does not care. During boot, guest kernel reads the device-tree, early_init_drmem_lmb() is called for every added LMBs. Here, checking for this new flag and marking memblocks as hotplugable memory is enough to get the desirable behavior. This should cause no change if 'movable_node' parameter is not passed in kernel command-line. Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com> Reviewed-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200402195156.626430-1-leonardo@linux.ibm.com
2020-06-02powerpc: Fix misleading small cores printMichael Neuling1-1/+1
Currently when we boot on a big core system, we get this print: [ 0.040500] Using small cores at SMT level This is misleading as we've actually detected big cores. This patch clears up the print to say we've detect big cores but are using small cores for scheduling. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200528230731.1235752-1-mikey@neuling.org
2020-06-02powerpc/fadump: Account for memory_limit while reserving memoryHari Bathini1-1/+1
If the memory chunk found for reserving memory overshoots the memory limit imposed, do not proceed with reserving memory. Default behavior was this until commit 140777a3d8df ("powerpc/fadump: consider reserved ranges while reserving memory") changed it unwittingly. Fixes: 140777a3d8df ("powerpc/fadump: consider reserved ranges while reserving memory") Cc: stable@vger.kernel.org Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/159057266320.22331.6571453892066907320.stgit@hbathini.in.ibm.com
2020-06-02Merge branch 'from-miklos' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "Assorted patches from Miklos. An interesting part here is /proc/mounts stuff..." The "/proc/mounts stuff" is using a cursor for keeeping the location data while traversing the mount listing. Also probably worth noting is the addition of faccessat2(), which takes an additional set of flags to specify how the lookup is done (AT_EACCESS, AT_SYMLINK_NOFOLLOW, AT_EMPTY_PATH). * 'from-miklos' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: add faccessat2 syscall vfs: don't parse "silent" option vfs: don't parse "posixacl" option vfs: don't parse forbidden flags statx: add mount_root statx: add mount ID statx: don't clear STATX_ATIME on SB_RDONLY uapi: deprecate STATX_ALL utimensat: AT_EMPTY_PATH support vfs: split out access_override_creds() proc/mounts: add cursor aio: fix async fsync creds vfs: allow unprivileged whiteout creation
2020-06-01Merge tag 'core-rcu-2020-06-01' of ↵Linus Torvalds1-16/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The RCU updates for this cycle were: - RCU-tasks update, including addition of RCU Tasks Trace for BPF use and TASKS_RUDE_RCU - kfree_rcu() updates. - Remove scheduler locking restriction - RCU CPU stall warning updates. - Torture-test updates. - Miscellaneous fixes and other updates" * tag 'core-rcu-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits) rcu: Allow for smp_call_function() running callbacks from idle rcu: Provide rcu_irq_exit_check_preempt() rcu: Abstract out rcu_irq_enter_check_tick() from rcu_nmi_enter() rcu: Provide __rcu_is_watching() rcu: Provide rcu_irq_exit_preempt() rcu: Make RCU IRQ enter/exit functions rely on in_nmi() rcu/tree: Mark the idle relevant functions noinstr x86: Replace ist_enter() with nmi_enter() x86/mce: Send #MC singal from task work x86/entry: Get rid of ist_begin/end_non_atomic() sched,rcu,tracing: Avoid tracing before in_nmi() is correct sh/ftrace: Move arch_ftrace_nmi_{enter,exit} into nmi exception lockdep: Always inline lockdep_{off,on}() hardirq/nmi: Allow nested nmi_enter() arm64: Prepare arch_nmi_enter() for recursion printk: Disallow instrumenting print_nmi_enter() printk: Prepare for nested printk_nmi_enter() rcutorture: Convert ULONG_CMP_LT() to time_before() torture: Add a --kasan argument torture: Save a few lines by using config_override_param initially ...
2020-06-01Merge tag 'core-kprobes-2020-06-01' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull kprobes updates from Ingo Molnar: "Various kprobes updates, mostly centered around cleaning up the no-instrumentation logic. Instead of the current per debug facility blacklist, use the more generic .noinstr.text approach, combined with a 'noinstr' marker for functions. Also add instrumentation_begin()/end() to better manage the exact place in entry code where instrumentation may be used. And add a kprobes blacklist for modules" * tag 'core-kprobes-2020-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kprobes: Prevent probes in .noinstr.text section vmlinux.lds.h: Create section for protection against instrumentation samples/kprobes: Add __kprobes and NOKPROBE_SYMBOL() for handlers. kprobes: Support NOKPROBE_SYMBOL() in modules kprobes: Support __kprobes blacklist in modules kprobes: Lock kprobe_mutex while showing kprobe_blacklist
2020-06-01Merge tag 'pstore-v5.8-rc1' of ↵Linus Torvalds1-3/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore updates from Kees Cook: "Fixes and new features for pstore. This is a pretty big set of changes (relative to past pstore pulls), but it has been in -next for a while. The biggest change here is the ability to support a block device as a pstore backend, which has been desired for a while. A lot of additional fixes and refactorings are also included, mostly in support of the new features. - refactor pstore locking for safer module unloading (Kees Cook) - remove orphaned records from pstorefs when backend unloaded (Kees Cook) - refactor dump_oops parameter into max_reason (Pavel Tatashin) - introduce pstore/zone for common code for contiguous storage (WeiXiong Liao) - introduce pstore/blk for block device backend (WeiXiong Liao) - introduce mtd backend (WeiXiong Liao)" * tag 'pstore-v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (35 commits) mtd: Support kmsg dumper based on pstore/blk pstore/blk: Introduce "best_effort" mode pstore/blk: Support non-block storage devices pstore/blk: Provide way to query pstore configuration pstore/zone: Provide way to skip "broken" zone for MTD devices Documentation: Add details for pstore/blk pstore/zone,blk: Add ftrace frontend support pstore/zone,blk: Add console frontend support pstore/zone,blk: Add support for pmsg frontend pstore/blk: Introduce backend for block devices pstore/zone: Introduce common layer to manage storage zones ramoops: Add "max-reason" optional field to ramoops DT node pstore/ram: Introduce max_reason and convert dump_oops pstore/platform: Pass max_reason to kmesg dump printk: Introduce kmsg_dump_reason_str() printk: honor the max_reason field in kmsg_dumper printk: Collapse shutdown types into a single dump reason pstore/ftrace: Provide ftrace log merging routine pstore/ram: Refactor ftrace buffer merging pstore/ram: Refactor DT size parsing ...
2020-05-30printk: Collapse shutdown types into a single dump reasonKees Cook1-3/+1
To turn the KMSG_DUMP_* reasons into a more ordered list, collapse the redundant KMSG_DUMP_(RESTART|HALT|POWEROFF) reasons into KMSG_DUMP_SHUTDOWN. The current users already don't meaningfully distinguish between them, so there's no need to, as discussed here: https://lore.kernel.org/lkml/CA+CK2bAPv5u1ih5y9t5FUnTyximtFCtDYXJCpuyjOyHNOkRdqw@mail.gmail.com/ Link: https://lore.kernel.org/lkml/20200515184434.8470-2-keescook@chromium.org/ Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: Kees Cook <keescook@chromium.org>
2020-05-29powerpc/64s: Disable sanitisers for C syscall/interrupt entry/exit codeDaniel Axtens1-0/+3
syzkaller is picking up a bunch of crashes that look like this: Unrecoverable exception 380 at c00000000037ed60 (msr=8000000000001031) Oops: Unrecoverable exception, sig: 6 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 0 PID: 874 Comm: syz-executor.0 Not tainted 5.7.0-rc7-syzkaller-00016-gb0c3ba31be3e #0 NIP: c00000000037ed60 LR: c00000000004bac8 CTR: c000000000030990 REGS: c0000000555a7230 TRAP: 0380 Not tainted (5.7.0-rc7-syzkaller-00016-gb0c3ba31be3e) MSR: 8000000000001031 <SF,ME,IR,DR,LE> CR: 48222882 XER: 20000000 CFAR: c00000000004bac4 IRQMASK: 0 GPR00: c00000000004bb68 c0000000555a74c0 c0000000024b3500 0000000000000005 GPR04: 0000000000000000 0000000000000000 c00000000004bb88 c008000000910000 GPR08: 00000000000b0000 c00000000004bac8 0000000000016000 c000000002503500 GPR12: c000000000030990 c000000003190000 00000000106a5898 00000000106a0000 GPR16: 00000000106a5890 c000000007a92000 c000000008180e00 c000000007a8f700 GPR20: c000000007a904b0 0000000010110000 c00000000259d318 5deadbeef0000100 GPR24: 5deadbeef0000122 c000000078422700 c000000009ee88b8 c000000078422778 GPR28: 0000000000000001 800000000280b033 0000000000000000 c0000000555a75a0 NIP [c00000000037ed60] __sanitizer_cov_trace_pc+0x40/0x50 LR [c00000000004bac8] interrupt_exit_kernel_prepare+0x118/0x310 Call Trace: [c0000000555a74c0] [c00000000004bb68] interrupt_exit_kernel_prepare+0x1b8/0x310 (unreliable) [c0000000555a7530] [c00000000000f9a8] interrupt_return+0x118/0x1c0 --- interrupt: 900 at __sanitizer_cov_trace_pc+0x0/0x50 ...<random previous call chain>... This is caused by __sanitizer_cov_trace_pc() causing an SLB fault after MSR[RI] has been cleared by __hard_EE_RI_disable(), which we can not recover from. Do not instrument the new syscall/interrupt entry/exit code with KCOV, GCOV or UBSAN. Reported-by: syzbot-ppc64 <ozlabsyz@au1.ibm.com> Fixes: 68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic in C") Signed-off-by: Daniel Axtens <dja@axtens.net> Acked-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-05-28powerpc/64s/kuap: Conditionally restore AMR in kuap_restore_amr asmNicholas Piggin2-6/+6
Similar to the C code change, make the AMR restore conditional on whether the register has changed. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200429065654.1677541-7-npiggin@gmail.com
2020-05-28powerpc/64/kuap: Conditionally restore AMR in interrupt exitNicholas Piggin1-4/+10
The AMR update is made conditional on AMR actually changing, which should be the less common case on most workloads (though kernel page faults on uaccess could be frequent, this doesn't significantly slow down that case). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200429065654.1677541-4-npiggin@gmail.com
2020-05-28powerpc/40x: Don't save CR in SPRN_SPRG_SCRATCH6Christophe Leroy1-10/+5
We have r12 available, use it to keep CR around and don't save it in SPRN_SPRG_SCRATCH6. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/019f314a98c107c4ca46e46c1cf402e9a44114a7.1590079969.git.christophe.leroy@csgroup.eu
2020-05-28powerpc/40x: Avoid using r12 in TLB miss handlersChristophe Leroy1-37/+33
Let's reduce the number of registers used in TLB miss handlers. We have both r9 and r12 available for any temporary use. r9 is enough, avoid using r12. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7f330e971952abb2645fb9ca4310c0f527e84dcb.1590079969.git.christophe.leroy@csgroup.eu
2020-05-28powerpc: Remove IBM405 Erratum #77Christophe Leroy2-14/+0
This erratum is dedicated to IBM 405GP and STB03xxx which are now gone. Remove this erratum. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/44dbc08e9034681eb28324cbabc086e97044c36c.1590079969.git.christophe.leroy@csgroup.eu
2020-05-28powerpc/40x: Remove IBM405 Erratum #51Christophe Leroy1-6/+0
This erratum was for IBM 403GCX, 405EP and STB03xxx which are now gone. Remove this erratum. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1b6c9916514ef3e084bba57925ad9eb444627566.1590079969.git.christophe.leroy@csgroup.eu
2020-05-28powerpc/40x: Remove support for IBM 405GPChristophe Leroy1-13/+0
All platforms selecting the obsolete processor are gone now. Remove support for it. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/906c6a6df710f2826e332b8a0cd5d2859a913a1c.1590079969.git.christophe.leroy@csgroup.eu
2020-05-28powerpc/40x: Remove STB03xxxChristophe Leroy1-13/+0
CONFIG_STB03xxx is not user selectable and is not selected by any config. Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d7d73f9a8ee3a890566abace568101e9b4836016.1590079968.git.christophe.leroy@csgroup.eu
2020-05-28powerpc/40x: Remove support for IBM 403GCXChristophe Leroy4-95/+0
CONFIG_403GCX is not user selectable and is not selected by any platform. Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/635f8f5ce9d1f761b3bd8dc3e8ddad500cea26c4.1590079968.git.christophe.leroy@csgroup.eu
2020-05-28powerpc/40x: Rework 40x PTE access and TLB missChristophe Leroy1-148/+29
Commit 1bc54c03117b ("powerpc: rework 4xx PTE access and TLB miss") reworked 44x PTE access to avoid atomic pte updates, and left 8xx, 40x and fsl booke with atomic pte updates. Commit 6cfd8990e27d ("powerpc: rework FSL Book-E PTE access and TLB miss") removed atomic pte updates on fsl booke. It went away on 8xx with commit ddfc20a3b9ae ("powerpc/8xx: Remove PTE_ATOMIC_UPDATES"). 40x is the last platform setting PTE_ATOMIC_UPDATES. Rework PTE access and TLB miss to remove PTE_ATOMIC_UPDATES for 40x: - Always handle DSI as a fault. - Bail out of TLB miss handler when CONFIG_SWAP is set and _PAGE_ACCESSED is not set. - Bail out of ITLB miss handler when _PAGE_EXEC is not set. - Only set WR bit when both _PAGE_RW and _PAGE_DIRTY are set. - Remove _PAGE_HWWRITE - Don't require PTE_ATOMIC_UPDATES anymore Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/99a0fcd337ef67088140d1647d75fea026a70413.1590079968.git.christophe.leroy@csgroup.eu
2020-05-28powerpc: Remove Xilinx PPC405/PPC440 supportMichal Simek1-39/+0
The latest Xilinx design tools called ISE and EDK has been released in October 2013. New tool doesn't support any PPC405/PPC440 new designs. These platforms are no longer supported and tested. PowerPC 405/440 port is orphan from 2013 by commit cdeb89943bfc ("MAINTAINERS: Fix incorrect status tag") and commit 19624236cce1 ("MAINTAINERS: Update Grant's email address and maintainership") that's why it is time to remove the support fot these platforms. Signed-off-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8c593895e2cb57d232d85ce4d8c3a1aa7f0869cc.1590079968.git.christophe.leroy@csgroup.eu
2020-05-28powerpc/64: Refactor interrupt exit irq disabling sequenceNicholas Piggin1-30/+28
The same complicated sequence for juggling EE, RI, soft mask, and irq tracing is repeated 3 times, tidy these up into one function. This differs qiute a bit between sub architectures, so this makes the ppc32 port cleaner as well. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200429062421.1675400-1-npiggin@gmail.com
2020-05-26powerpc: Add ppc_inst_as_u64()Michael Ellerman1-2/+1
The code patching code wants to get the value of a struct ppc_inst as a u64 when the instruction is prefixed, so we can pass the u64 down to __put_user_asm() and write it with a single store. The optprobes code wants to load a struct ppc_inst as an immediate into a register so it is useful to have it as a u64 to use the existing helper function. Currently this is a bit awkward because the value differs based on the CPU endianness, so add a helper to do the conversion. This fixes the usage in arch_prepare_optimized_kprobe() which was previously incorrect on big endian. Fixes: 650b55b707fd ("powerpc: Add prefixed instructions to instruction data type") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Jordan Niethe <jniethe5@gmail.com> Link: https://lore.kernel.org/r/20200526072630.2487363-1-mpe@ellerman.id.au
2020-05-26powerpc: Add ppc_inst_next()Michael Ellerman1-1/+1
In a few places we want to calculate the address of the next instruction. Previously that was simple, we just added 4 bytes, or if using a u32 * we incremented that pointer by 1. But prefixed instructions make it more complicated, we need to advance by either 4 or 8 bytes depending on the actual instruction. We also can't do pointer arithmetic using struct ppc_inst, because it is always 8 bytes in size on 64-bit, even though we might only need to advance by 4 bytes. So add a ppc_inst_next() helper which calculates the location of the next instruction, if the given instruction was located at the given address. Note the instruction doesn't need to actually be at the address in memory. Although it would seem natural for the value to be passed by value, that makes it too easy to write a loop that will read off the end of a page, eg: for (; src < end; src = ppc_inst_next(src, *src), dest = ppc_inst_next(dest, *dest)) As noticed by Christophe and Jordan, if end is the exact end of a page, and the next page is not mapped, this will fault, because *dest will read 8 bytes, 4 bytes into the next page. So value is passed by reference, so the helper can be careful to use ppc_inst_read() on it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Jordan Niethe <jniethe5@gmail.com> Link: https://lore.kernel.org/r/20200522133318.1681406-1-mpe@ellerman.id.au
2020-05-26Merge branch 'fixes' into nextMichael Ellerman9-21/+34
Merge our fixes branch from this cycle. It contains several important fixes we need in next for testing purposes, and also some that will conflict with upcoming changes.
2020-05-26powerpc/8xx: Map linear memory with huge pagesChristophe Leroy1-2/+2
Map linear memory space with 512k and 8M pages whenever possible. Three mappings are performed: - One for kernel text - One for RO data - One for the rest Separating the mappings is done to be able to update the protection later when using STRICT_KERNEL_RWX. The ITLB miss handler now need to also handle huge TLBs unless kernel text in pinned. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c44f0ab5510474f25123d904cd1f4e5c6aa3c1ac.1589866984.git.christophe.leroy@csgroup.eu
2020-05-26powerpc/8xx: Refactor kernel address boundary comparisonChristophe Leroy1-14/+8
Now that linear and IMMR dedicated TLB handling is gone, kernel boundary address comparison is similar in ITLB miss handler and in DTLB miss handler. Create a macro named compare_to_kernel_boundary. When TASK_SIZE is strictly below 0x80000000 and PAGE_OFFSET is above 0x80000000, it is enough to compare to 0x8000000, and this can be done with a single instruction. Using not. instruction, we get to use 'blt' conditional branch as when doing a regular comparison: 0x00000000 <= addr <= 0x7fffffff ==> 0xffffffff >= NOT(addr) >= 0x80000000 The above test corresponds to a 'blt' Otherwise, do a regular comparison using two instructions. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/6312575d06a8813105e6564a3b12e1d373aa1b2f.1589866984.git.christophe.leroy@csgroup.eu
2020-05-26powerpc/mm: Don't be too strict with _etext alignment on PPC32Christophe Leroy1-2/+1
Similar to PPC64, accept to map RO data as ROX as a trade off between between security and memory usage. Having RO data executable is not a high risk as RO data can't be modified to forge an exploit. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8c4a0d89d944eed984dd941e509614031a5ace2b.1589866984.git.christophe.leroy@csgroup.eu
2020-05-26powerpc/8xx: Move DTLB perf handling closer.Christophe Leroy1-12/+11
Now that space have been freed next to the DTLB miss handler, it's associated DTLB perf handling can be brought back in the same place. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/97f48cc1a2ea6b895bfac0752cbe59deaf2eecda.1589866984.git.christophe.leroy@csgroup.eu
2020-05-26powerpc/8xx: Remove now unused TLB miss functionsChristophe Leroy1-83/+0
The code to setup linear and IMMR mapping via huge TLB entries is not called anymore. Remove it. Also remove the handling of removed code exits in the perf driver. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/75750d25849cb8e73ca519866bb892d7eb9649c0.1589866984.git.christophe.leroy@csgroup.eu
2020-05-26powerpc/8xx: Drop special handling of Linear and IMMR mappings in I/D TLB ↵Christophe Leroy1-27/+2
handlers Up to now, linear and IMMR mappings are managed via huge TLB entries through specific code directly in TLB miss handlers. This implies some patching of the TLB miss handlers at startup, and a lot of dedicated code. Remove all this specific dedicated code. For now we are back to normal handling via standard 4k pages. In the next patches, linear memory mapping and IMMR mapping will be managed through huge pages. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/221b7e3ead80a5969629938c023f8cfe45fdd2fb.1589866984.git.christophe.leroy@csgroup.eu
2020-05-26powerpc/8xx: Always pin TLBs at startup.Christophe Leroy1-14/+17
At startup, map 32 Mbytes of memory through 4 pages of 8M, and PIN them inconditionnaly. They need to be pinned because KASAN is using page tables early and the TLBs might be dynamically replaced otherwise. Remove RSV4I flag after installing mappings unless CONFIG_PIN_TLB_XXXX is selected. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b27c5767d18053b59f7eefddc189fcc3acf7b9c2.1589866984.git.christophe.leroy@csgroup.eu
2020-05-26powerpc/8xx: Don't set IMMR map anymore at bootChristophe Leroy1-22/+17
Only early debug requires IMMR to be mapped early. No need to set it up and pin it in assembly. Map it through page tables at udbg init when necessary. If CONFIG_PIN_TLB_IMMR is selected, pin it once we don't need the 32 Mb pinned RAM anymore. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/13c1e8539fdf363d3146f4884e5c3c76c6c308b5.1589866984.git.christophe.leroy@csgroup.eu