summaryrefslogtreecommitdiff
path: root/arch/powerpc/mm
AgeCommit message (Collapse)AuthorFilesLines
2021-04-14powerpc/mm: Add cond_resched() while removing hpte mappingsVaibhav Jain1-1/+12
While removing large number of mappings from hash page tables for large memory systems as soft-lockup is reported because of the time spent inside htap_remove_mapping() like one below: watchdog: BUG: soft lockup - CPU#8 stuck for 23s! <snip> NIP plpar_hcall+0x38/0x58 LR pSeries_lpar_hpte_invalidate+0x68/0xb0 Call Trace: 0x1fffffffffff000 (unreliable) pSeries_lpar_hpte_removebolted+0x9c/0x230 hash__remove_section_mapping+0xec/0x1c0 remove_section_mapping+0x28/0x3c arch_remove_memory+0xfc/0x150 devm_memremap_pages_release+0x180/0x2f0 devm_action_release+0x30/0x50 release_nodes+0x28c/0x300 device_release_driver_internal+0x16c/0x280 unbind_store+0x124/0x170 drv_attr_store+0x44/0x60 sysfs_kf_write+0x64/0x90 kernfs_fop_write+0x1b0/0x290 __vfs_write+0x3c/0x70 vfs_write+0xd4/0x270 ksys_write+0xdc/0x130 system_call+0x5c/0x70 Fix this by adding a cond_resched() to the loop in htap_remove_mapping() that issues hcall to remove hpte mapping. The call to cond_resched() is issued every HZ jiffies which should prevent the soft-lockup from being reported. Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210404163148.321346-1-vaibhav@linux.ibm.com
2021-04-08powerpc/mm/64s/hash: Add real-mode change_memory_range() for hash LPARMichael Ellerman1-1/+104
When we enabled STRICT_KERNEL_RWX we received some reports of boot failures when using the Hash MMU and running under phyp. The crashes are intermittent, and often exhibit as a completely unresponsive system, or possibly an oops. One example, which was caught in xmon: [ 14.068327][ T1] devtmpfs: mounted [ 14.069302][ T1] Freeing unused kernel memory: 5568K [ 14.142060][ T347] BUG: Unable to handle kernel instruction fetch [ 14.142063][ T1] Run /sbin/init as init process [ 14.142074][ T347] Faulting instruction address: 0xc000000000004400 cpu 0x2: Vector: 400 (Instruction Access) at [c00000000c7475e0] pc: c000000000004400: exc_virt_0x4400_instruction_access+0x0/0x80 lr: c0000000001862d4: update_rq_clock+0x44/0x110 sp: c00000000c747880 msr: 8000000040001031 current = 0xc00000000c60d380 paca = 0xc00000001ec9de80 irqmask: 0x03 irq_happened: 0x01 pid = 347, comm = kworker/2:1 ... enter ? for help [c00000000c747880] c0000000001862d4 update_rq_clock+0x44/0x110 (unreliable) [c00000000c7478f0] c000000000198794 update_blocked_averages+0xb4/0x6d0 [c00000000c7479f0] c000000000198e40 update_nohz_stats+0x90/0xd0 [c00000000c747a20] c0000000001a13b4 _nohz_idle_balance+0x164/0x390 [c00000000c747b10] c0000000001a1af8 newidle_balance+0x478/0x610 [c00000000c747be0] c0000000001a1d48 pick_next_task_fair+0x58/0x480 [c00000000c747c40] c000000000eaab5c __schedule+0x12c/0x950 [c00000000c747cd0] c000000000eab3e8 schedule+0x68/0x120 [c00000000c747d00] c00000000016b730 worker_thread+0x130/0x640 [c00000000c747da0] c000000000174d50 kthread+0x1a0/0x1b0 [c00000000c747e10] c00000000000e0f0 ret_from_kernel_thread+0x5c/0x6c This shows that CPU 2, which was idle, woke up and then appears to randomly take an instruction fault on a completely valid area of kernel text. The cause turns out to be the call to hash__mark_rodata_ro(), late in boot. Due to the way we layout text and rodata, that function actually changes the permissions for all of text and rodata to read-only plus execute. To do the permission change we use a hypervisor call, H_PROTECT. On phyp that appears to be implemented by briefly removing the mapping of the kernel text, before putting it back with the updated permissions. If any other CPU is executing during that window, it will see spurious faults on the kernel text and/or data, leading to crashes. To fix it we use stop machine to collect all other CPUs, and then have them drop into real mode (MMU off), while we change the mapping. That way they are unaffected by the mapping temporarily disappearing. We don't see this bug on KVM because KVM always use VPM=1, where faults are directed to the hypervisor, and the fault will be serialised vs the h_protect() by HPTE_V_HVLOCK. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210331003845.216246-5-mpe@ellerman.id.au
2021-04-08powerpc/mm/64s/hash: Factor out change_memory_range()Michael Ellerman1-8/+15
Pull the loop calling hpte_updateboltedpp() out of hash__change_memory_range() into a helper function. We need it to be a separate function for the next patch. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210331003845.216246-4-mpe@ellerman.id.au
2021-04-08powerpc/64s: Use htab_convert_pte_flags() in hash__mark_rodata_ro()Michael Ellerman1-2/+4
In hash__mark_rodata_ro() we pass the raw PP_RXXX value to hash__change_memory_range(). That has the effect of setting the key to zero, because PP_RXXX contains no key value. Fix it by using htab_convert_pte_flags(), which knows how to convert a pgprot into a pp value, including the key. Fixes: d94b827e89dc ("powerpc/book3s64/kuap: Use Key 3 for kernel mapping with hash translation") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Daniel Axtens <dja@axtens.net> Link: https://lore.kernel.org/r/20210331003845.216246-3-mpe@ellerman.id.au
2021-04-08powerpc/64s: Fix pte update for kernel memory on radixJordan Niethe1-2/+2
When adding a PTE a ptesync is needed to order the update of the PTE with subsequent accesses otherwise a spurious fault may be raised. radix__set_pte_at() does not do this for performance gains. For non-kernel memory this is not an issue as any faults of this kind are corrected by the page fault handler. For kernel memory these faults are not handled. The current solution is that there is a ptesync in flush_cache_vmap() which should be called when mapping from the vmalloc region. However, map_kernel_page() does not call flush_cache_vmap(). This is troublesome in particular for code patching with Strict RWX on radix. In do_patch_instruction() the page frame that contains the instruction to be patched is mapped and then immediately patched. With no ordering or synchronization between setting up the PTE and writing to the page it is possible for faults. As the code patching is done using __put_user_asm_goto() the resulting fault is obscured - but using a normal store instead it can be seen: BUG: Unable to handle kernel data access on write at 0xc008000008f24a3c Faulting instruction address: 0xc00000000008bd74 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: nop_module(PO+) [last unloaded: nop_module] CPU: 4 PID: 757 Comm: sh Tainted: P O 5.10.0-rc5-01361-ge3c1b78c8440-dirty #43 NIP: c00000000008bd74 LR: c00000000008bd50 CTR: c000000000025810 REGS: c000000016f634a0 TRAP: 0300 Tainted: P O (5.10.0-rc5-01361-ge3c1b78c8440-dirty) MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 44002884 XER: 00000000 CFAR: c00000000007c68c DAR: c008000008f24a3c DSISR: 42000000 IRQMASK: 1 This results in the kind of issue reported here: https://lore.kernel.org/linuxppc-dev/15AC5B0E-A221-4B8C-9039-FA96B8EF7C88@lca.pw/ Chris Riedl suggested a reliable way to reproduce the issue: $ mount -t debugfs none /sys/kernel/debug $ (while true; do echo function > /sys/kernel/debug/tracing/current_tracer ; echo nop > /sys/kernel/debug/tracing/current_tracer ; done) & Turning ftrace on and off does a large amount of code patching which in usually less then 5min will crash giving a trace like: ftrace-powerpc: (____ptrval____): replaced (4b473b11) != old (60000000) ------------[ ftrace bug ]------------ ftrace failed to modify [<c000000000bf8e5c>] napi_busy_loop+0xc/0x390 actual: 11:3b:47:4b Setting ftrace call site to call ftrace function ftrace record flags: 80000001 (1) expected tramp: c00000000006c96c ------------[ cut here ]------------ WARNING: CPU: 4 PID: 809 at kernel/trace/ftrace.c:2065 ftrace_bug+0x28c/0x2e8 Modules linked in: nop_module(PO-) [last unloaded: nop_module] CPU: 4 PID: 809 Comm: sh Tainted: P O 5.10.0-rc5-01360-gf878ccaf250a #1 NIP: c00000000024f334 LR: c00000000024f330 CTR: c0000000001a5af0 REGS: c000000004c8b760 TRAP: 0700 Tainted: P O (5.10.0-rc5-01360-gf878ccaf250a) MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28008848 XER: 20040000 CFAR: c0000000001a9c98 IRQMASK: 0 GPR00: c00000000024f330 c000000004c8b9f0 c000000002770600 0000000000000022 GPR04: 00000000ffff7fff c000000004c8b6d0 0000000000000027 c0000007fe9bcdd8 GPR08: 0000000000000023 ffffffffffffffd8 0000000000000027 c000000002613118 GPR12: 0000000000008000 c0000007fffdca00 0000000000000000 0000000000000000 GPR16: 0000000023ec37c5 0000000000000000 0000000000000000 0000000000000008 GPR20: c000000004c8bc90 c0000000027a2d20 c000000004c8bcd0 c000000002612fe8 GPR24: 0000000000000038 0000000000000030 0000000000000028 0000000000000020 GPR28: c000000000ff1b68 c000000000bf8e5c c00000000312f700 c000000000fbb9b0 NIP ftrace_bug+0x28c/0x2e8 LR ftrace_bug+0x288/0x2e8 Call Trace: ftrace_bug+0x288/0x2e8 (unreliable) ftrace_modify_all_code+0x168/0x210 arch_ftrace_update_code+0x18/0x30 ftrace_run_update_code+0x44/0xc0 ftrace_startup+0xf8/0x1c0 register_ftrace_function+0x4c/0xc0 function_trace_init+0x80/0xb0 tracing_set_tracer+0x2a4/0x4f0 tracing_set_trace_write+0xd4/0x130 vfs_write+0xf0/0x330 ksys_write+0x84/0x140 system_call_exception+0x14c/0x230 system_call_common+0xf0/0x27c To fix this when updating kernel memory PTEs using ptesync. Fixes: f1cb8f9beba8 ("powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags") Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Tidy up change log slightly] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210208032957.1232102-1-jniethe5@gmail.com
2021-04-08powerpc: Spelling/typo fixesBhaskar Chowdhury1-1/+1
Various spelling/typo fixes. Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2021-03-29powerpc/64s: Fix hash fault to use TRAP accessorNicholas Piggin1-2/+2
Hash faults use the trap vector to decide whether this is an instruction or data fault. This should use the TRAP accessor rather than open access regs->trap. This won't cause a problem at the moment because 64s only uses trap flags for system call interrupts (the norestart flag), but that could change if any other trap flags get used in future. Fixes: a4922f5442e7e ("powerpc/64s: move the hash fault handling logic to C") Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210316105205.407767-1-npiggin@gmail.com
2021-03-29powerpc/mm: Remove unneeded #ifdef CONFIG_PPC_MEM_KEYSChristophe Leroy1-6/+1
In fault.c, #ifdef CONFIG_PPC_MEM_KEYS is not needed because all functions are always defined, and arch_vma_access_permitted() always returns true when CONFIG_PPC_MEM_KEYS is not defined so access_pkey_error() will return false so bad_access_pkey() will never be called. Include linux/pkeys.h to get a definition of vma_pkeys() for bad_access_pkey(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8038392f38d81f2ad169347efac29146f553b238.1615819955.git.christophe.leroy@csgroup.eu
2021-03-29powerpc/64s: Fold update_current_thread_[i]amr() into their only callersMichael Ellerman1-15/+5
lkp reported warnings in some configuration due to update_current_thread_amr() being unused: arch/powerpc/mm/book3s64/pkeys.c:284:20: error: unused function 'update_current_thread_amr' static inline void update_current_thread_amr(u64 value) Which is because it's only use is inside an ifdef. We could move it inside the ifdef, but it's a single line function and only has one caller, so just fold it in. Similarly update_current_thread_iamr() is small and only called once, so fold it in also. Fixes: 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode.") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210314093320.132331-1-mpe@ellerman.id.au
2021-03-29powerpc/mm/book3s64: Fix a typo in mmu_context.cBhaskar Chowdhury1-1/+1
s/detalis/details/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210312112537.4585-1-unixbhaskar@gmail.com
2021-03-29powerpc/32s: Move KUEP locking/unlocking in CChristophe Leroy2-0/+41
This can be done in C, do it. Unrolling the loop gains approx. 15% performance. From now on, prepare_transfer_to_handler() is only for interrupts from kernel. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/4eadd873927e9a73c3d1dfe2f9497353465514cf.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29powerpc/32: Call bad_page_fault() from do_page_fault()Christophe Leroy1-1/+1
Now that non volatile registers are saved at all time, no need to split bad_page_fault() out of do_page_fault(). Remove handle_page_fault() and use do_page_fault() directly. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/cfb95be8863204cc2bf45a22ea44dd1d0dc16b7f.1615552867.git.christophe.leroy@csgroup.eu
2021-03-29powerpc/32: Always enable data translation in exception prologChristophe Leroy1-14/+0
If the code can use a stack in vm area, it can also use a stack in linear space. Simplify code by removing old non VMAP stack code on PPC32. That means the data translation is now re-enabled early in exception prolog in all cases, not only when using VMAP stacks. While we are touching EXCEPTION_PROLOG macros, remove the unused for_rtas parameter in EXCEPTION_PROLOG_1. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7cd6440c60a7e8f4f035b245c57720f51e225aae.1615552866.git.christophe.leroy@csgroup.eu
2021-03-24powerpc: Enable KFENCE for PPC32Christophe Leroy5-4/+17
Add architecture specific implementation details for KFENCE and enable KFENCE for the ppc32 architecture. In particular, this implements the required interface in <asm/kfence.h>. KFENCE requires that attributes for pages from its memory pool can individually be set. Therefore, force the Read/Write linear map to be mapped at page granularity. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Marco Elver <elver@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8dfe1bd2abde26337c1d8c1ad0acfcc82185e0d5.1614868445.git.christophe.leroy@csgroup.eu
2021-03-24powerpc/mm: Move the linear_mapping_mutex to the ifdef where it is usedSebastian Andrzej Siewior1-1/+1
The mutex linear_mapping_mutex is defined at the of the file while its only two user are within the CONFIG_MEMORY_HOTPLUG block. A compile without CONFIG_MEMORY_HOTPLUG set fails on PREEMPT_RT because its mutex implementation is smart enough to realize that it is unused. Move the definition of linear_mapping_mutex to ifdef block where it is used. Fixes: 1f73ad3e8d755 ("powerpc/mm: print warning in arch_remove_linear_mapping()") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210219165648.2505482-1-bigeasy@linutronix.de
2021-02-23Merge tag 'powerpc-5.12-1' of ↵Linus Torvalds14-246/+395
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - A large series adding wrappers for our interrupt handlers, so that irq/nmi/user tracking can be isolated in the wrappers rather than spread in each handler. - Conversion of the 32-bit syscall handling into C. - A series from Nick to streamline our TLB flushing when using the Radix MMU. - Switch to using queued spinlocks by default for 64-bit server CPUs. - A rework of our PCI probing so that it happens later in boot, when more generic infrastructure is available. - Two small fixes to allow 32-bit little-endian processes to run on 64-bit kernels. - Other smaller features, fixes & cleanups. Thanks to: Alexey Kardashevskiy, Ananth N Mavinakayanahalli, Aneesh Kumar K.V, Athira Rajeev, Bhaskar Chowdhury, Cédric Le Goater, Chengyang Fan, Christophe Leroy, Christopher M. Riedl, Fabiano Rosas, Florian Fainelli, Frederic Barrat, Ganesh Goudar, Hari Bathini, Jiapeng Chong, Joseph J Allen, Kajol Jain, Markus Elfring, Michal Suchanek, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Pingfan Liu, Po-Hsu Lin, Qian Cai, Ram Pai, Randy Dunlap, Sandipan Das, Stephen Rothwell, Tyrel Datwyler, Will Springer, Yury Norov, and Zheng Yongjun. * tag 'powerpc-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (188 commits) powerpc/perf: Adds support for programming of Thresholding in P10 powerpc/pci: Remove unimplemented prototypes powerpc/uaccess: Merge raw_copy_to_user_allowed() into raw_copy_to_user() powerpc/uaccess: Merge __put_user_size_allowed() into __put_user_size() powerpc/uaccess: get rid of small constant size cases in raw_copy_{to,from}_user() powerpc/64: Fix stack trace not displaying final frame powerpc/time: Remove get_tbl() powerpc/time: Avoid using get_tbl() spi: mpc52xx: Avoid using get_tbl() powerpc/syscall: Avoid storing 'current' in another pointer powerpc/32: Handle bookE debugging in C in syscall entry/exit powerpc/syscall: Do not check unsupported scv vector on PPC32 powerpc/32: Remove the counter in global_dbcr0 powerpc/32: Remove verification of MSR_PR on syscall in the ASM entry powerpc/syscall: implement system call entry/exit logic in C for PPC32 powerpc/32: Always save non volatile GPRs at syscall entry powerpc/syscall: Change condition to check MSR_RI powerpc/syscall: Save r3 in regs->orig_r3 powerpc/syscall: Use is_compat_task() powerpc/syscall: Make interrupt.c buildable on PPC32 ...
2021-02-11powerpc/mm: Remove dcache flush from memory remove.Aneesh Kumar K.V1-22/+0
We added dcache flush on memory add/remove in commit fb5924fddf9e ("powerpc/mm: Flush cache on memory hot(un)plug") to handle crashes on GPU hotplug. Instead of adding dcache flush in generic memory add/remove routine which is used even for regular memory, we should handle these devices specific flush in the device driver code. memtrace did handle this in the driver and that was removed by commit 7fd6641de28f ("powerpc/powernv/memtrace: Let the arch hotunplug code flush cache"). This patch reverts that commit. The dcache flush in memory add was removed by commit ea458effa88e ("powerpc: Don't flush caches when adding memory") which I don't think is correct. The reason why we require dcache flush in memtrace is to make sure we don't have a dirty cache when we remap a pfn to cache inhibited. We should do that when the memtrace module removes the memory and make the pfn available for HTM traces to map it as cache inhibited. The other device mentioned in commit fb5924fddf9e ("powerpc/mm: Flush cache on memory hot(un)plug") is nvlink device with coherent memory. The support for that was removed in commit 7eb3cf761927 ("powerpc/powernv: remove unused NPU DMA code") and commit 25b2995a35b6 ("mm: remove MEMORY_DEVICE_PUBLIC support") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210203045812.234439-3-aneesh.kumar@linux.ibm.com
2021-02-11powerpc/mm: Add PG_dcache_clean to indicate dcache clean stateAneesh Kumar K.V3-11/+11
This just add a better name for PG_arch_1. No functional change in this patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210203045812.234439-2-aneesh.kumar@linux.ibm.com
2021-02-11powerpc/mm: Enable compound page check for both THP and HugeTLBAneesh Kumar K.V2-24/+22
THP config results in compound pages. Make sure the kernel enables the PageCompound() check with CONFIG_HUGETLB_PAGE disabled and CONFIG_TRANSPARENT_HUGEPAGE enabled. This makes sure we correctly flush the icache with THP pages. flush_dcache_icache_page only matter for platforms that don't support COHERENT_ICACHE. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210203045812.234439-1-aneesh.kumar@linux.ibm.com
2021-02-11powerpc/mm/64s: Fix no previous prototype warningMichael Ellerman3-2/+6
As reported by lkp: arch/powerpc/mm/book3s64/radix_tlb.c:646:6: warning: no previous prototype for function 'exit_lazy_flush_tlb' Fix it by moving the prototype into the existing header. Fixes: 032b7f08932c ("powerpc/64s/radix: serialize_against_pte_lookup IPIs trim mm_cpumask") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210210130804.3190952-2-mpe@ellerman.id.au
2021-02-11powerpc: remove interrupt handler functions from the noinstr sectionNicholas Piggin1-1/+0
The allyesconfig ppc64 kernel fails to link with relocations unable to fit after commit 3a96570ffceb ("powerpc: convert interrupt handlers to use wrappers"), which is due to the interrupt handler functions being put into the .noinstr.text section, which the linker script places on the opposite side of the main .text section from the interrupt entry asm code which calls the handlers. This results in a lot of linker stubs that overwhelm the 252-byte sized space we allow for them, or in the case of BE a .opd relocation link error for some reason. It's not required to put interrupt handlers in the .noinstr section, previously they used NOKPROBE_SYMBOL, so take them out and replace with a NOKPROBE_SYMBOL in the wrapper macro. Remove the explicit NOKPROBE_SYMBOL macros in the interrupt handler functions. This makes a number of interrupt handlers nokprobe that were not prior to the interrupt wrappers commit, but since that commit they were made nokprobe due to being in .noinstr.text, so this fix does not change that. The fixes tag is different to the commit that first exposes the problem because it is where the wrapper macros were introduced. Fixes: 8d41fc618ab8 ("powerpc: interrupt handler wrapper functions") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Slightly fix up comment wording] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210211063636.236420-1-npiggin@gmail.com
2021-02-08powerpc/32s: mfsrin()/mtsrin() become mfsr()/mtsr()Christophe Leroy2-2/+2
Function names should tell what the function does, not how. mfsrin() and mtsrin() are read/writing segment registers. They are called that way because they are using mfsrin and mtsrin instructions, but it doesn't matter for the caller. In preparation of following patch, change their name to mfsr() and mtsr() in order to make it obvious they manipulate segment registers without messing up with how they do it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f92d99f4349391b77766745900231aa880a0efb5.1612612022.git.christophe.leroy@csgroup.eu
2021-02-08powerpc/64s/radix: serialize_against_pte_lookup IPIs trim mm_cpumaskNicholas Piggin2-10/+23
serialize_against_pte_lookup() performs IPIs to all CPUs in mm_cpumask. Take this opportunity to try trim the CPU out of mm_cpumask. This can reduce the cost of future serialize_against_pte_lookup() and/or the cost of future TLB flushes. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201217134731.488135-7-npiggin@gmail.com
2021-02-08powerpc/64s/radix: occasionally attempt to trim mm_cpumaskNicholas Piggin1-4/+56
A single-threaded process that is flushing its own address space is so far the only case where the mm_cpumask is attempted to be trimmed. This patch expands that to flush in other situations, multi-threaded processes and external sources. For now it's a relatively simple occasional trim attempt. The main aim is to add the mechanism, tweaking and tuning can come with more data. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201217134731.488135-6-npiggin@gmail.com
2021-02-08powerpc/64s/radix: Allow mm_cpumask trimming from external sourcesNicholas Piggin1-10/+6
mm_cpumask trimming is currently restricted to be issued by the current thread of a single-threaded mm. This patch relaxes that and allows the mask to be trimmed from any context. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201217134731.488135-5-npiggin@gmail.com
2021-02-08powerpc/64s/radix: Check for no TLB flush requiredNicholas Piggin1-13/+25
If there are no CPUs in mm_cpumask, no TLB flush is required at all. This patch adds a check for this case. Currently it's not tested for, in fact mm_is_thread_local() returns false if the current CPU is not in mm_cpumask, so it's treated as a global flush. This can come up in some cases like exec failure before the new mm has ever been switched to. This patch reduces TLBIE instructions required to build a kernel from about 120,000 to 45,000. Another situation it could help is page reclaim, KSM, THP, etc., (i.e., asynch operations external to the process) where the process is sleeping and has all TLBs flushed out of all CPUs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201217134731.488135-4-npiggin@gmail.com
2021-02-08powerpc/64s/radix: refactor TLB flush type selectionNicholas Piggin1-82/+94
The logic to decide what kind of TLB flush is required (local, global, or IPI) is spread multiple times over the several kinds of TLB flushes. Move it all into a single function which may issue IPIs if necessary, and also returns a flush type that is to be used. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201217134731.488135-3-npiggin@gmail.com
2021-02-08powerpc/64s/radix: add warning and comments in mm_cpumask trimNicholas Piggin1-6/+21
Add a comment explaining part of the logic for mm_cpumask trimming, and add a (hopefully graceful) check and warning in case something gets it wrong. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201217134731.488135-2-npiggin@gmail.com
2021-02-08powerpc/64: context tracking move to interrupt wrappersNicholas Piggin2-11/+1
This moves exception_enter/exit calls to wrapper functions for synchronous interrupts. More interrupt handlers are covered by this than previously. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-33-npiggin@gmail.com
2021-02-08powerpc/64s/hash: improve context tracking of hash faultsNicholas Piggin2-14/+32
This moves the 64s/hash context tracking from hash_page_mm() to __do_hash_fault(), so it's no longer called by OCXL / SPU accelerators, which was certainly the wrong thing to be doing, because those callers are not low level interrupt handlers, so should have entered a kernel context tracking already. Then remain in kernel context for the duration of the fault, rather than enter/exit for the hash fault then enter/exit for the page fault, which is pointless. Even still, calling exception_enter/exit in __do_hash_fault seems questionable because that's touching per-cpu variables, tracing, etc., which might have been interrupted by this hash fault or themselves cause hash faults. But maybe I miss something because hash_page_mm very deliberately calls trace_hash_fault too, for example. So for now go with it, it's no worse than before, in this regard. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-32-npiggin@gmail.com
2021-02-08powerpc: add interrupt_cond_local_irq_enable helperNicholas Piggin1-3/+1
Simple helper for synchronous interrupt handlers (i.e., process-context) to enable interrupts if it was taken in an interrupts-enabled context. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-30-npiggin@gmail.com
2021-02-08powerpc: convert interrupt handlers to use wrappersNicholas Piggin3-8/+16
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-29-npiggin@gmail.com
2021-02-08powerpc/64s: slb comment updateNicholas Piggin1-13/+15
This makes a small improvement to the description of the SLB interrupt environment. Move the memory access restrictions into one paragraph, and the interrupt restrictions into the next rather than mix them. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-18-npiggin@gmail.com
2021-02-08powerpc/mm: Remove stale do_page_fault comment referring to SLB faultsNicholas Piggin1-7/+5
SLB faults no longer call do_page_fault, this was removed somewhere between 2.6.0 and 2.6.12. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-17-npiggin@gmail.com
2021-02-08powerpc/64s: split do_hash_faultNicholas Piggin1-23/+33
This is required for subsequent interrupt wrapper implementation. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-16-npiggin@gmail.com
2021-02-08powerpc/64s: move bad_page_fault handling to CNicholas Piggin1-0/+4
This simplifies code, and it is also useful when introducing interrupt handler wrappers when introducing wrapper functionality that doesn't cope with asm entry code calling into more than one handler function. 32-bit and 64e still have some such cases, which limits some ways they can use interrupt wrappers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-15-npiggin@gmail.com
2021-02-08powerpc: rearrange do_page_fault error case to be inside exception_enterNicholas Piggin1-9/+14
This keeps the context tracking over the entire interrupt handler which helps later with moving context tracking into interrupt wrappers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-14-npiggin@gmail.com
2021-02-08powerpc/64s: add do_bad_page_fault_segv handlerNicholas Piggin1-0/+7
This function acts like an interrupt handler so it needs to follow the standard interrupt handler function signature which will be introduced in a future change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-13-npiggin@gmail.com
2021-02-08powerpc: bad_page_fault get registers from regsNicholas Piggin3-6/+6
Similar to the previous patch this makes interrupt handler function types more regular so they can be wrapped with the next patch. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-12-npiggin@gmail.com
2021-02-08powerpc: remove arguments from fault handler functionsNicholas Piggin3-10/+14
Make mm fault handlers all just take the pt_regs * argument and load DAR/DSISR from that. Make those that return a value return long. This is done to make the function signatures match other handlers, which will help with a future patch to add wrappers. Explicit arguments could be added for performance but that would require more wrapper macro variants. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-7-npiggin@gmail.com
2021-02-08powerpc/64s: move the hash fault handling logic to CNicholas Piggin1-28/+49
The fault handling still has some complex logic particularly around hash table handling, in asm. Implement most of this in C. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-6-npiggin@gmail.com
2021-02-06powerpc/kuap: Allow kernel thread to access userspace after kthread_use_mmAneesh Kumar K.V1-0/+1
This fix the bad fault reported by KUAP when io_wqe_worker access userspace. Bug: Read fault blocked by KUAP! WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0 NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0 LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 .......... Call Trace: [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable) [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120 [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0 .......... NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0 LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0 interrupt: 300 [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280 [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780 [c000000016367990] [c000000000710330] iomap_file_buffered_write+0xa0/0x120 [c0000000163679e0] [c00800000040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs] [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460 [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200 [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250 [c000000016367cb0] [c0000000006e2578] io_worker_handle_work+0x498/0x800 [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0 [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0 [c000000016367e10] [c00000000000dbf0] ret_from_kernel_thread+0x5c/0x6c The kernel consider thread AMR value for kernel thread to be AMR_KUAP_BLOCKED. Hence access to userspace is denied. This of course not correct and we should allow userspace access after kthread_use_mm(). To be precise, kthread_use_mm() should inherit the AMR value of the operating address space. But, the AMR value is thread-specific and we inherit the address space and not thread access restrictions. Because of this ignore AMR value when accessing userspace via kernel thread. current_thread_amr/iamr() are updated, because we use them in the below stack. .... [ 530.710838] CPU: 13 PID: 5587 Comm: io_wqe_worker-0 Tainted: G D 5.11.0-rc6+ #3 .... NIP [c0000000000aa0c8] pkey_access_permitted+0x28/0x90 LR [c0000000004b9278] gup_pte_range+0x188/0x420 --- interrupt: 700 [c00000001c4ef3f0] [0000000000000000] 0x0 (unreliable) [c00000001c4ef490] [c0000000004bd39c] gup_pgd_range+0x3ac/0xa20 [c00000001c4ef5a0] [c0000000004bdd44] internal_get_user_pages_fast+0x334/0x410 [c00000001c4ef620] [c000000000852028] iov_iter_get_pages+0xf8/0x5c0 [c00000001c4ef6a0] [c0000000007da44c] bio_iov_iter_get_pages+0xec/0x700 [c00000001c4ef770] [c0000000006a325c] iomap_dio_bio_actor+0x2ac/0x4f0 [c00000001c4ef810] [c00000000069cd94] iomap_apply+0x2b4/0x740 [c00000001c4ef920] [c0000000006a38b8] __iomap_dio_rw+0x238/0x5c0 [c00000001c4ef9d0] [c0000000006a3c60] iomap_dio_rw+0x20/0x80 [c00000001c4ef9f0] [c008000001927a30] xfs_file_dio_aio_write+0x1f8/0x650 [xfs] [c00000001c4efa60] [c0080000019284dc] xfs_file_write_iter+0xc4/0x130 [xfs] [c00000001c4efa90] [c000000000669984] io_write+0x104/0x4b0 [c00000001c4efbb0] [c00000000066cea4] io_issue_sqe+0x3d4/0xf50 [c00000001c4efc60] [c000000000670200] io_wq_submit_work+0xb0/0x2f0 [c00000001c4efcb0] [c000000000674268] io_worker_handle_work+0x248/0x4a0 [c00000001c4efd30] [c0000000006746e8] io_wqe_worker+0x228/0x2a0 [c00000001c4efda0] [c00000000019d994] kthread+0x1b4/0x1c0 Fixes: 48a8ab4eeb82 ("powerpc/book3s64/pkeys: Don't update SPRN_AMR when in kernel mode.") Reported-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210206025634.521979-1-aneesh.kumar@linux.ibm.com
2021-01-31powerpc/32s: Only build hash code when CONFIG_PPC_BOOK3S_604 is selectedChristophe Leroy1-1/+3
It is now possible to only build book3s/32 kernel for CPUs without hash table. Opt out hash related code when CONFIG_PPC_BOOK3S_604 is not selected. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/62df436454ef06e104cc334a0859a2878d7888d5.1608274548.git.christophe.leroy@csgroup.eu
2021-01-31powerpc/mm/book3s64/iommu: fix some RCU-list locksQian Cai1-2/+8
It is safe to traverse mm->context.iommu_group_mem_list with either mem_list_mutex or the RCU read lock held. Silence a few RCU-list false positive warnings and fix a few missing RCU read locks. arch/powerpc/mm/book3s64/iommu_api.c:330 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by qemu-kvm/4305: #0: c000000bc3fe4d68 (&container->lock){+.+.}-{3:3}, at: tce_iommu_ioctl.part.9+0xc7c/0x1870 [vfio_iommu_spapr_tce] #1: c000000001501910 (mem_list_mutex){+.+.}-{3:3}, at: mm_iommu_get+0x50/0x190 ==== arch/powerpc/mm/book3s64/iommu_api.c:132 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by qemu-kvm/4305: #0: c000000bc3fe4d68 (&container->lock){+.+.}-{3:3}, at: tce_iommu_ioctl.part.9+0xc7c/0x1870 [vfio_iommu_spapr_tce] #1: c000000001501910 (mem_list_mutex){+.+.}-{3:3}, at: mm_iommu_do_alloc+0x120/0x5f0 ==== arch/powerpc/mm/book3s64/iommu_api.c:292 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by qemu-kvm/4312: #0: c000000ecafe23c8 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0xdc/0x950 [kvm] #1: c000000045e6c468 (&kvm->srcu){....}-{0:0}, at: kvmppc_h_put_tce+0x88/0x340 [kvm] ==== arch/powerpc/mm/book3s64/iommu_api.c:424 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by qemu-kvm/4312: #0: c000000ecafe23c8 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0xdc/0x950 [kvm] #1: c000000045e6c468 (&kvm->srcu){....}-{0:0}, at: kvmppc_h_put_tce+0x88/0x340 [kvm] Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200510051559.1959-1-cai@lca.pw
2021-01-30powerpc/mm/hugetlb: Make pseries_alloc_bootmem_huge_page() staticCédric Le Goater1-1/+1
pseries_alloc_bootmem_huge_page() is only used locally in alloc_bootmem_huge_page() and does not need to be external. It fixes this W=1 compile error : ../arch/powerpc/mm/hugetlbpage.c:220:12: error: no previous prototype for ‘pseries_alloc_bootmem_huge_page’ [-Werror=missing-prototypes] 220 | int __init pseries_alloc_bootmem_huge_page(struct hstate *hstate) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210104143206.695198-16-clg@kaod.org
2021-01-30powerpc/mm: Move hpte_insert_repeating() prototypeCédric Le Goater1-4/+0
It fixes this W=1 compile error : ../arch/powerpc/mm/book3s64/hash_utils.c:1867:6: error: no previous prototype for ‘hpte_insert_repeating’ [-Werror=missing-prototypes] 1867 | long hpte_insert_repeating(unsigned long hash, unsigned long vpn, | ^~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210104143206.695198-14-clg@kaod.org
2021-01-30powerpc/mm: Include __find_linux_pte() prototypeCédric Le Goater1-0/+1
It fixes this W=1 compile error : ../arch/powerpc/mm/pgtable.c:337:8: error: no previous prototype for ‘__find_linux_pte’ [-Werror=missing-prototypes] 337 | pte_t *__find_linux_pte(pgd_t *pgdir, unsigned long ea, | ^~~~~~~~~~~~~~~~ Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210104143206.695198-2-clg@kaod.org
2020-12-18Merge tag 'powerpc-5.11-1' of ↵Linus Torvalds27-538/+422
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Switch to the generic C VDSO, as well as some cleanups of our VDSO setup/handling code. - Support for KUAP (Kernel User Access Prevention) on systems using the hashed page table MMU, using memory protection keys. - Better handling of PowerVM SMT8 systems where all threads of a core do not share an L2, allowing the scheduler to make better scheduling decisions. - Further improvements to our machine check handling. - Show registers when unwinding interrupt frames during stack traces. - Improvements to our pseries (PowerVM) partition migration code. - Several series from Christophe refactoring and cleaning up various parts of the 32-bit code. - Other smaller features, fixes & cleanups. Thanks to: Alan Modra, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Ard Biesheuvel, Athira Rajeev, Balamuruhan S, Bill Wendling, Cédric Le Goater, Christophe Leroy, Christophe Lombard, Colin Ian King, Daniel Axtens, David Hildenbrand, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geert Uytterhoeven, Giuseppe Sacco, Greg Kurz, Harish, Jan Kratochvil, Jordan Niethe, Kaixu Xia, Laurent Dufour, Leonardo Bras, Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu Desnoyers, Nathan Lynch, Nicholas Piggin, Oleg Nesterov, Oliver O'Halloran, Oscar Salvador, Po-Hsu Lin, Qian Cai, Qinglang Miao, Randy Dunlap, Ravi Bangoria, Sachin Sant, Sandipan Das, Sebastian Andrzej Siewior , Segher Boessenkool, Srikar Dronamraju, Tyrel Datwyler, Uwe Kleine-König, Vincent Stehlé, Youling Tang, and Zhang Xiaoxu. * tag 'powerpc-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (304 commits) powerpc/32s: Fix cleanup_cpu_mmu_context() compile bug powerpc: Add config fragment for disabling -Werror powerpc/configs: Add ppc64le_allnoconfig target powerpc/powernv: Rate limit opal-elog read failure message powerpc/pseries/memhotplug: Quieten some DLPAR operations powerpc/ps3: use dma_mapping_error() powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10 powerpc/perf: Fix Threshold Event Counter Multiplier width for P10 powerpc/mm: Fix hugetlb_free_pmd_range() and hugetlb_free_pud_range() KVM: PPC: Book3S HV: Fix mask size for emulated msgsndp KVM: PPC: fix comparison to bool warning KVM: PPC: Book3S: Assign boolean values to a bool variable powerpc: Inline setup_kup() powerpc/64s: Mark the kuap/kuep functions non __init KVM: PPC: Book3S HV: XIVE: Add a comment regarding VP numbering powerpc/xive: Improve error reporting of OPAL calls powerpc/xive: Simplify xive_do_source_eoi() powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_EOI_FW powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_MASK_FW powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_SHIFT_BUG ...
2020-12-15powerpc/mm: Fix hugetlb_free_pmd_range() and hugetlb_free_pud_range()Christophe Leroy1-4/+4
Commit 7bfe54b5f165 ("powerpc/mm: Refactor the floor/ceiling check in hugetlb range freeing functions") inadvertely removed the mask applied to start parameter in those two functions, leading to the following crash on power9. LTP: starting hugemmap05_1 (hugemmap05 -m) ------------[ cut here ]------------ kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:387! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 NUMA PowerNV ... CPU: 99 PID: 308 Comm: ksoftirqd/99 Tainted: G O 5.10.0-rc7-next-20201211 #1 NIP: c00000000005dbec LR: c0000000003352f4 CTR: 0000000000000000 REGS: c00020000bb6f830 TRAP: 0700 Tainted: G O (5.10.0-rc7-next-20201211) MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24002284 XER: 20040000 GPR00: c0000000003352f4 c00020000bb6fad0 c000000007f70b00 c0002000385b3ff0 GPR04: 0000000000000000 0000000000000003 c00020000bb6f8b4 0000000000000001 GPR08: 0000000000000001 0000000000000009 0000000000000008 0000000000000002 GPR12: 0000000024002488 c000201fff649c00 c000000007f2a20c 0000000000000000 GPR16: 0000000000000007 0000000000000000 c000000000194d10 c000000000194d10 GPR24: 0000000000000014 0000000000000015 c000201cc6e72398 c000000007fac4b4 GPR28: c000000007f2bf80 c000000007fac2f8 0000000000000008 c000200033870000 NIP [c00000000005dbec] __tlb_remove_table+0x1dc/0x1e0 pgtable_free at arch/powerpc/mm/book3s64/pgtable.c:387 (inlined by) __tlb_remove_table at arch/powerpc/mm/book3s64/pgtable.c:405 LR [c0000000003352f4] tlb_remove_table_rcu+0x54/0xa0 Call Trace: __tlb_remove_table+0x13c/0x1e0 (unreliable) tlb_remove_table_rcu+0x54/0xa0 __tlb_remove_table_free at mm/mmu_gather.c:101 (inlined by) tlb_remove_table_rcu at mm/mmu_gather.c:156 rcu_core+0x35c/0xbb0 rcu_do_batch at kernel/rcu/tree.c:2502 (inlined by) rcu_core at kernel/rcu/tree.c:2737 __do_softirq+0x480/0x704 run_ksoftirqd+0x74/0xd0 run_ksoftirqd at kernel/softirq.c:651 (inlined by) run_ksoftirqd at kernel/softirq.c:642 smpboot_thread_fn+0x278/0x320 kthread+0x1c4/0x1d0 ret_from_kernel_thread+0x5c/0x80 Properly apply the masks before calling pmd_free_tlb() and pud_free_tlb() respectively. Fixes: 7bfe54b5f165 ("powerpc/mm: Refactor the floor/ceiling check in hugetlb range freeing functions") Reported-by: Qian Cai <qcai@redhat.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/56feccd7b6fcd98e353361a233fa7bb8e67c3164.1607780469.git.christophe.leroy@csgroup.eu
2020-12-15Merge tag 'core-mm-2020-12-14' of ↵Linus Torvalds3-75/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull kmap updates from Thomas Gleixner: "The new preemtible kmap_local() implementation: - Consolidate all kmap_atomic() internals into a generic implementation which builds the base for the kmap_local() API and make the kmap_atomic() interface wrappers which handle the disabling/enabling of preemption and pagefaults. - Switch the storage from per-CPU to per task and provide scheduler support for clearing mapping when scheduling out and restoring them when scheduling back in. - Merge the migrate_disable/enable() code, which is also part of the scheduler pull request. This was required to make the kmap_local() interface available which does not disable preemption when a mapping is established. It has to disable migration instead to guarantee that the virtual address of the mapped slot is the same across preemption. - Provide better debug facilities: guard pages and enforced utilization of the mapping mechanics on 64bit systems when the architecture allows it. - Provide the new kmap_local() API which can now be used to cleanup the kmap_atomic() usage sites all over the place. Most of the usage sites do not require the implicit disabling of preemption and pagefaults so the penalty on 64bit and 32bit non-highmem systems is removed and quite some of the code can be simplified. A wholesale conversion is not possible because some usage depends on the implicit side effects and some need to be cleaned up because they work around these side effects. The migrate disable side effect is only effective on highmem systems and when enforced debugging is enabled. On 64bit and 32bit non-highmem systems the overhead is completely avoided" * tag 'core-mm-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) ARM: highmem: Fix cache_is_vivt() reference x86/crashdump/32: Simplify copy_oldmem_page() io-mapping: Provide iomap_local variant mm/highmem: Provide kmap_local* sched: highmem: Store local kmaps in task struct x86: Support kmap_local() forced debugging mm/highmem: Provide CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP mm/highmem: Provide and use CONFIG_DEBUG_KMAP_LOCAL microblaze/mm/highmem: Add dropped #ifdef back xtensa/mm/highmem: Make generic kmap_atomic() work correctly mm/highmem: Take kmap_high_get() properly into account highmem: High implementation details and document API Documentation/io-mapping: Remove outdated blurb io-mapping: Cleanup atomic iomap mm/highmem: Remove the old kmap_atomic cruft highmem: Get rid of kmap_types.h xtensa/mm/highmem: Switch to generic kmap atomic sparc/mm/highmem: Switch to generic kmap atomic powerpc/mm/highmem: Switch to generic kmap atomic nds32/mm/highmem: Switch to generic kmap atomic ...