summaryrefslogtreecommitdiff
path: root/arch/powerpc/include
AgeCommit message (Collapse)AuthorFilesLines
2021-08-13powerpc/pseries: Add a helper for form1 cpu distanceAneesh Kumar K.V1-2/+2
This helper is only used with the dispatch trace log collection. A later patch will add Form2 affinity support and this change helps in keeping that simpler. Also add a comment explaining we don't expect the code to be called with FORM0 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210812132223.225214-5-aneesh.kumar@linux.ibm.com
2021-08-13powerpc/pseries: Consolidate different NUMA distance update code pathsAneesh Kumar K.V1-0/+2
The associativity details of the newly added resourced are collected from the hypervisor via "ibm,configure-connector" rtas call. Update the numa distance details of the newly added numa node after the above call. Instead of updating NUMA distance every time we lookup a node id from the associativity property, add helpers that can be used during boot which does this only once. Also remove the distance update from node id lookup helpers. Currently, we duplicate parsing code for ibm,associativity and ibm,associativity-lookup-arrays in the kernel. The associativity array provided by these device tree properties are very similar and hence can use a helper to parse the node id and numa distance details. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210812132223.225214-4-aneesh.kumar@linux.ibm.com
2021-08-13powerpc/pseries: Rename TYPE1_AFFINITY to FORM1_AFFINITYAneesh Kumar K.V2-3/+3
Also make related code cleanup that will allow adding FORM2_AFFINITY in later patches. No functional change in this patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210812132223.225214-3-aneesh.kumar@linux.ibm.com
2021-08-13powerpc: rename powerpc_debugfs_root to arch_debugfs_dirAneesh Kumar K.V1-13/+0
No functional change in this patch. arch_debugfs_dir is the generic kernel name declared in linux/debugfs.h for arch-specific debugfs directory. Architectures like x86/s390 already use the name. Rename powerpc specific powerpc_debugfs_root to arch_debugfs_dir. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210812132831.233794-2-aneesh.kumar@linux.ibm.com
2021-08-12powerpc/interrupt: Fix OOPS by not calling do_IRQ() from timer_interrupt()Christophe Leroy2-1/+4
An interrupt handler shall not be called from another interrupt handler otherwise this leads to problems like the following: Kernel attempted to write user page (afd4fa84) - exploit attempt? (uid: 1000) ------------[ cut here ]------------ Bug: Write fault blocked by KUAP! WARNING: CPU: 0 PID: 1617 at arch/powerpc/mm/fault.c:230 do_page_fault+0x484/0x720 Modules linked in: CPU: 0 PID: 1617 Comm: sshd Tainted: G W 5.13.0-pmac-00010-g8393422eb77 #7 NIP: c001b77c LR: c001b77c CTR: 00000000 REGS: cb9e5bc0 TRAP: 0700 Tainted: G W (5.13.0-pmac-00010-g8393422eb77) MSR: 00021032 <ME,IR,DR,RI> CR: 24942424 XER: 00000000 GPR00: c001b77c cb9e5c80 c1582c00 00000021 3ffffbff 085b0000 00000027 c8eb644c GPR08: 00000023 00000000 00000000 00000000 24942424 0063f8c8 00000000 000186a0 GPR16: afd52dd4 afd52dd0 afd52dcc afd52dc8 0065a990 c07640c4 cb9e5e98 cb9e5e90 GPR24: 00000040 afd4fa96 00000040 02000000 c1fda6c0 afd4fa84 00000300 cb9e5cc0 NIP [c001b77c] do_page_fault+0x484/0x720 LR [c001b77c] do_page_fault+0x484/0x720 Call Trace: [cb9e5c80] [c001b77c] do_page_fault+0x484/0x720 (unreliable) [cb9e5cb0] [c000424c] DataAccess_virt+0xd4/0xe4 --- interrupt: 300 at __copy_tofrom_user+0x110/0x20c NIP: c001f9b4 LR: c03250a0 CTR: 00000004 REGS: cb9e5cc0 TRAP: 0300 Tainted: G W (5.13.0-pmac-00010-g8393422eb77) MSR: 00009032 <EE,ME,IR,DR,RI> CR: 48028468 XER: 20000000 DAR: afd4fa84 DSISR: 0a000000 GPR00: 20726f6f cb9e5d80 c1582c00 00000004 cb9e5e3a 00000016 afd4fa80 00000000 GPR08: 3835202d 72777872 2d78722d 00000004 28028464 0063f8c8 00000000 000186a0 GPR16: afd52dd4 afd52dd0 afd52dcc afd52dc8 0065a990 c07640c4 cb9e5e98 cb9e5e90 GPR24: 00000040 afd4fa96 00000040 cb9e5e0c 00000daa a0000000 cb9e5e98 afd4fa56 NIP [c001f9b4] __copy_tofrom_user+0x110/0x20c LR [c03250a0] _copy_to_iter+0x144/0x990 --- interrupt: 300 [cb9e5d80] [c03e89c0] n_tty_read+0xa4/0x598 (unreliable) [cb9e5df0] [c03e2a0c] tty_read+0xdc/0x2b4 [cb9e5e80] [c0156bf8] vfs_read+0x274/0x340 [cb9e5f00] [c01571ac] ksys_read+0x70/0x118 [cb9e5f30] [c0016048] ret_from_syscall+0x0/0x28 --- interrupt: c00 at 0xa7855c88 NIP: a7855c88 LR: a7855c5c CTR: 00000000 REGS: cb9e5f40 TRAP: 0c00 Tainted: G W (5.13.0-pmac-00010-g8393422eb77) MSR: 0000d032 <EE,PR,ME,IR,DR,RI> CR: 2402446c XER: 00000000 GPR00: 00000003 afd4ec70 a72137d0 0000000b afd4ecac 00004000 0065a990 00000800 GPR08: 00000000 a7947930 00000000 00000004 c15831b0 0063f8c8 00000000 000186a0 GPR16: afd52dd4 afd52dd0 afd52dcc afd52dc8 0065a990 0065a9e0 00000001 0065fac0 GPR24: 00000000 00000089 00664050 00000000 00668e30 a720c8dc a7943ff4 0065f9b0 NIP [a7855c88] 0xa7855c88 LR [a7855c5c] 0xa7855c5c --- interrupt: c00 Instruction dump: 3884aa88 38630178 48076861 807f0080 48042e45 2f830000 419e0148 3c80c079 3c60c076 38841be4 386301c0 4801f705 <0fe00000> 3860000b 4bfffe30 3c80c06b ---[ end trace fd69b91a8046c2e5 ]--- Here the problem is that by re-enterring an exception handler, kuap_save_and_lock() is called a second time with this time KUAP access locked, leading to regs->kuap being overwritten hence KUAP not being unlocked at exception exit as expected. Do not call do_IRQ() from timer_interrupt() directly. Instead, redefine do_IRQ() as a standard function named __do_IRQ(), and call it from both do_IRQ() and time_interrupt() handlers. Fixes: 3a96570ffceb ("powerpc: convert interrupt handlers to use wrappers") Cc: stable@vger.kernel.org # v5.12+ Reported-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c17d234f4927d39a1d7100864a8e1145323d33a0.1628611927.git.christophe.leroy@csgroup.eu
2021-08-10KVM: PPC: Book3S HV: XIVE: Add support for automatic save-restoreCédric Le Goater2-0/+4
On P10, the feature doing an automatic "save & restore" of a VCPU interrupt context is set by default in OPAL. When a VP context is pulled out, the state of the interrupt registers are saved by the XIVE interrupt controller under the internal NVP structure representing the VP. This saves a costly store/load in guest entries and exits. If OPAL advertises the "save & restore" feature in the device tree, it should also have set the 'H' bit in the CAM line. Check that when vCPUs are connected to their ICP in KVM before going any further. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210720134209.256133-3-clg@kaod.org
2021-08-10powerpc/powernv/pci: Rework pnv_opal_pci_msi_eoi()Cédric Le Goater1-1/+1
pnv_opal_pci_msi_eoi() is called from KVM to EOI passthrough interrupts when in real mode. Adding MSI domain broke the hack using the 'ioda.irq_chip' field to deduce the owning PHB. Fix that by using the IRQ chip data in the MSI domain. The 'ioda.irq_chip' field is now unused and could be removed from the pnv_phb struct. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-30-clg@kaod.org
2021-08-10powerpc/xics: Rename the map handler in a check handlerCédric Le Goater1-1/+2
This moves the IRQ initialization done under the different ICS backends in the common part of XICS. The 'map' handler becomes a simple 'check' on the HW IRQ at the FW level. As we don't need an ICS anymore in xics_migrate_irqs_away(), the XICS domain does not set a chip data for the IRQ. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-18-clg@kaod.org
2021-08-10KVM: PPC: Book3S HV: XIVE: Change interface of passthrough interrupt routinesCédric Le Goater1-2/+2
The routine kvmppc_set_passthru_irq() calls kvmppc_xive_set_mapped() and kvmppc_xive_clr_mapped() with an IRQ descriptor. Use directly the host IRQ number to remove a useless conversion. Add some debug. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-15-clg@kaod.org
2021-08-10powerpc/pseries/pci: Add a msi_free() handler to clear XIVE dataCédric Le Goater1-0/+1
The MSI domain clears the IRQ with msi_domain_free(), which calls irq_domain_free_irqs_top(), which clears the handler data. This is a problem for the XIVE controller since we need to unmap MMIO pages and free a specific XIVE structure. The 'msi_free()' handler is called before irq_domain_free_irqs_top() when the handler data is still available. Use that to clear the XIVE controller data. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-10-clg@kaod.org
2021-08-10powerpc/pseries/pci: Add MSI domainsCédric Le Goater1-0/+5
Two IRQ domains are added on top of default machine IRQ domain. First, the top level "pSeries-PCI-MSI" domain deals with the MSI specificities. In this domain, the HW IRQ numbers are generated by the PCI MSI layer, they compose a unique ID for an MSI source with the PCI device identifier and the MSI vector number. These numbers can be quite large on a pSeries machine running under the IBM Hypervisor and /sys/kernel/irq/ and /proc/interrupts will require small fixes to show them correctly. Second domain is the in-the-middle "pSeries-MSI" domain which acts as a proxy between the PCI MSI subsystem and the machine IRQ subsystem. It usually allocate the MSI vector numbers but, on pSeries machines, this is done by the RTAS FW and RTAS returns IRQ numbers in the IRQ number space of the machine. This is why the in-the-middle "pSeries-MSI" domain has the same HW IRQ numbers as its parent domain. Only the XIVE (P9/P10) parent domain is supported for now. We still need to add support for IRQ domain hierarchy under XICS. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-6-clg@kaod.org
2021-08-10powerpc: Always inline radix_enabled() to fix build failureJordan Niethe1-1/+1
This is the same as commit acdad8fb4a15 ("powerpc: Force inlining of mmu_has_feature to fix build failure") but for radix_enabled(). The config in the linked bugzilla causes the following build failure: LD .tmp_vmlinux.kallsyms1 powerpc64-linux-ld: arch/powerpc/mm/pgtable.o: in function `.__ptep_set_access_flags': pgtable.c:(.text+0x17c): undefined reference to `.radix__ptep_set_access_flags' powerpc64-linux-ld: arch/powerpc/mm/pageattr.o: in function `.change_page_attr': pageattr.c:(.text+0xc0): undefined reference to `.radix__flush_tlb_kernel_range' etc. This is due to radix_enabled() not being inlined. See extract from building with -Winline: In file included from arch/powerpc/include/asm/lppaca.h:46, from arch/powerpc/include/asm/paca.h:17, from arch/powerpc/include/asm/current.h:13, from include/linux/thread_info.h:23, from include/asm-generic/preempt.h:5, from ./arch/powerpc/include/generated/asm/preempt.h:1, from include/linux/preempt.h:78, from include/linux/spinlock.h:51, from include/linux/mmzone.h:8, from include/linux/gfp.h:6, from arch/powerpc/mm/pgtable.c:21: arch/powerpc/include/asm/book3s/64/pgtable.h: In function '__ptep_set_access_flags': arch/powerpc/include/asm/mmu.h:327:20: error: inlining failed in call to 'radix_enabled': call is unlikely and code size would grow [-Werror=inline] The code relies on constant folding of MMU_FTRS_POSSIBLE at buildtime and elimination of non possible parts of code at compile time. For this to work radix_enabled() must be inlined so make it __always_inline. Reported-by: Erhard F. <erhard_f@mailbox.org> Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jordan Niethe <jniethe5@gmail.com> [mpe: Trimmed error messages in change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://bugzilla.kernel.org/show_bug.cgi?id=213803 Link: https://lore.kernel.org/r/20210804013724.514468-1-jniethe5@gmail.com
2021-08-10pseries/drmem: update LMBs after LPMLaurent Dufour1-0/+1
After a LPM, the device tree node ibm,dynamic-reconfiguration-memory may be updated by the hypervisor in the case the NUMA topology of the LPAR's memory is updated. This is handled by the kernel, but the memory's node is not updated because there is no way to move a memory block between nodes from the Linux kernel point of view. If later a memory block is added or removed, drmem_update_dt() is called and it is overwriting the DT node ibm,dynamic-reconfiguration-memory to match the added or removed LMB. But the LMB's associativity node has not been updated after the DT node update and thus the node is overwritten by the Linux's topology instead of the hypervisor one. Introduce a hook called when the ibm,dynamic-reconfiguration-memory node is updated to force an update of the LMB's associativity. However, ignore the call to that hook when the update has been triggered by drmem_update_dt(). Because, in that case, the LMB tree has been used to set the DT property and thus it doesn't need to be updated back. Since drmem_update_dt() is called under the protection of the device_hotplug_lock and the hook is called in the same context, use a simple boolean variable to detect that call. Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210517090606.56930-1-ldufour@linux.ibm.com
2021-08-10powerpc/non-smp: Unconditionaly call smp_mb() on switch_mmChristophe Leroy1-1/+2
Commit 3ccfebedd8cf ("powerpc, membarrier: Skip memory barrier in switch_mm()") added some logic to skip the smp_mb() in switch_mm_irqs_off() before the call to switch_mmu_context(). However, on non SMP smp_mb() is just a compiler barrier and doing it unconditionaly is simpler than the logic used to check whether the barrier is needed or not. After the patch: 00000000 <switch_mm_irqs_off>: ... c: 7c 04 18 40 cmplw r4,r3 10: 81 24 00 24 lwz r9,36(r4) 14: 91 25 04 c8 stw r9,1224(r5) 18: 4d 82 00 20 beqlr 1c: 48 00 00 00 b 1c <switch_mm_irqs_off+0x1c> 1c: R_PPC_REL24 switch_mmu_context Before the patch: 00000000 <switch_mm_irqs_off>: ... c: 7c 04 18 40 cmplw r4,r3 10: 81 24 00 24 lwz r9,36(r4) 14: 91 25 04 c8 stw r9,1224(r5) 18: 4d 82 00 20 beqlr 1c: 81 24 00 28 lwz r9,40(r4) 20: 71 29 00 0a andi. r9,r9,10 24: 40 82 00 34 bne 58 <switch_mm_irqs_off+0x58> 28: 48 00 00 00 b 28 <switch_mm_irqs_off+0x28> 28: R_PPC_REL24 switch_mmu_context ... 58: 2c 03 00 00 cmpwi r3,0 5c: 41 82 ff cc beq 28 <switch_mm_irqs_off+0x28> 60: 48 00 00 00 b 60 <switch_mm_irqs_off+0x60> 60: R_PPC_REL24 switch_mmu_context Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/e9d501da0c59f60ca767b1b3ea4603fce6d02b9e.1625486440.git.christophe.leroy@csgroup.eu
2021-08-10powerpc: Remove in_kernel_text()Christophe Leroy1-8/+0
Last user of in_kernel_text() stopped using in with commit 549e8152de80 ("powerpc: Make the 64-bit kernel as a position-independent executable"). Generic function is_kernel_text() does the same. So remote it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/2a3a5b6f8cc0ef4e854d7b764f66aa8d2ee270d2.1624813698.git.christophe.leroy@csgroup.eu
2021-08-07powerpc/32: Fix critical and debug interrupts on BOOKEChristophe Leroy1-0/+16
32 bits BOOKE have special interrupts for debug and other critical events. When handling those interrupts, dedicated registers are saved in the stack frame in addition to the standard registers, leading to a shift of the pt_regs struct. Since commit db297c3b07af ("powerpc/32: Don't save thread.regs on interrupt entry"), the pt_regs struct is expected to be at the same place all the time. Instead of handling a special struct in addition to pt_regs, just add those special registers to struct pt_regs. Fixes: db297c3b07af ("powerpc/32: Don't save thread.regs on interrupt entry") Cc: stable@vger.kernel.org Reported-by: Radu Rendec <radu.rendec@gmail.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/028d5483b4851b01ea4334d0751e7f260419092b.1625637264.git.christophe.leroy@csgroup.eu
2021-08-04powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblingsParth Shah1-0/+3
On POWER10 systems, the "ibm,thread-groups" property "2" indicates the cpus in thread-group share both L2 and L3 caches. Hence, use cache_property = 2 itself to find both the L2 and L3 cache siblings. Hence, create a new thread_group_l3_cache_map to keep list of L3 siblings, but fill the mask using same property "2" array. Signed-off-by: Parth Shah <parth@linux.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210728175607.591679-4-parth@linux.ibm.com
2021-08-04powerpc/cacheinfo: Lookup cache by dt node and thread-group idGautham R. Shenoy1-0/+3
Currently the cacheinfo code on powerpc indexes the "cache" objects (modelling the L1/L2/L3 caches) where the key is device-tree node corresponding to that cache. On some of the POWER server platforms thread-groups within the core share different sets of caches (Eg: On SMT8 POWER9 systems, threads 0,2,4,6 of a core share L1 cache and threads 1,3,5,7 of the same core share another L1 cache). On such platforms, there is a single device-tree node corresponding to that cache and the cache-configuration within the threads of the core is indicated via "ibm,thread-groups" device-tree property. Since the current code is not aware of the "ibm,thread-groups" property, on the aforementoined systems, cacheinfo code still treats all the threads in the core to be sharing the cache because of the single device-tree node (In the earlier example, the cacheinfo code would says CPUs 0-7 share L1 cache). In this patch, we make the powerpc cacheinfo code aware of the "ibm,thread-groups" property. We indexe the "cache" objects by the key-pair (device-tree node, thread-group id). For any CPUX, for a given level of cache, the thread-group id is defined to be the first CPU in the "ibm,thread-groups" cache-group containing CPUX. For levels of cache which are not represented in "ibm,thread-groups" property, the thread-group id is -1. [parth: Remove "static" keyword for the definition of "thread_group_l1_cache_map" and "thread_group_l2_cache_map" to get rid of the compile error.] Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Parth Shah <parth@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210728175607.591679-2-parth@linux.ibm.com
2021-07-23compat: make linux/compat.h available everywhereArnd Bergmann1-11/+0
Parts of linux/compat.h are under an #ifdef, but we end up using more of those over time, moving things around bit by bit. To get it over with once and for all, make all of this file uncondititonal now so it can be accessed everywhere. There are only a few types left that are in asm/compat.h but not yet in the asm-generic version, so add those in the process. This requires providing a few more types in asm-generic/compat.h that were not already there. The only tricky one is compat_sigset_t, which needs a little help on 32-bit architectures and for x86. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-08powerpc/book3s64/mm: update flush_tlb_range to flush page walk cacheAneesh Kumar K.V1-0/+2
flush_tlb_range is special in that we don't specify the page size used for the translation. Hence when flushing TLB we flush the translation cache for all possible page sizes. The kernel also uses the same interface when moving page tables around. Such a move requires us to flush the page walk cache. Instead of adding another interface to force page walk cache flush, update flush_tlb_range to flush page walk cache if the range flushed is more than the PMD range. A page table move will always involve an invalidate range more than PMD_SIZE. Running microbenchmark with mprotect and parallel memory access didn't show any observable performance impact. Link: https://lkml.kernel.org/r/20210616045735.374532-3-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08mm/mremap: allow arch runtime overrideAneesh Kumar K.V1-0/+6
Patch series "Speedup mremap on ppc64", v8. This patchset enables MOVE_PMD/MOVE_PUD support on power. This requires the platform to support updating higher-level page tables without updating page table entries. This also needs to invalidate the Page Walk Cache on architecture supporting the same. This patch (of 3): Architectures like ppc64 support faster mremap only with radix translation. Hence allow a runtime check w.r.t support for fast mremap. Link: https://lkml.kernel.org/r/20210616045735.374532-1-aneesh.kumar@linux.ibm.com Link: https://lkml.kernel.org/r/20210616045735.374532-2-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t *Aneesh Kumar K.V2-2/+9
No functional change in this patch. [aneesh.kumar@linux.ibm.com: m68k build error reported by kernel robot] Link: https://lkml.kernel.org/r/87tulxnb2v.fsf@linux.ibm.com Link: https://lkml.kernel.org/r/20210615110859.320299-2-aneesh.kumar@linux.ibm.com Link: https://lore.kernel.org/linuxppc-dev/CAHk-=wi+J+iodze9FtjM3Zi4j4OeS+qqbKxME9QN4roxPEXH9Q@mail.gmail.com/ Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t *Aneesh Kumar K.V2-2/+10
No functional change in this patch. [aneesh.kumar@linux.ibm.com: fix] Link: https://lkml.kernel.org/r/87wnqtnb60.fsf@linux.ibm.com [sfr@canb.auug.org.au: another fix] Link: https://lkml.kernel.org/r/20210619134410.89559-1-aneesh.kumar@linux.ibm.com Link: https://lkml.kernel.org/r/20210615110859.320299-1-aneesh.kumar@linux.ibm.com Link: https://lore.kernel.org/linuxppc-dev/CAHk-=wi+J+iodze9FtjM3Zi4j4OeS+qqbKxME9QN4roxPEXH9Q@mail.gmail.com/ Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-02Merge tag 'powerpc-5.14-1' of ↵Linus Torvalds40-335/+736
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - A big series refactoring parts of our KVM code, and converting some to C. - Support for ARCH_HAS_SET_MEMORY, and ARCH_HAS_STRICT_MODULE_RWX on some CPUs. - Support for the Microwatt soft-core. - Optimisations to our interrupt return path on 64-bit. - Support for userspace access to the NX GZIP accelerator on PowerVM on Power10. - Enable KUAP and KUEP by default on 32-bit Book3S CPUs. - Other smaller features, fixes & cleanups. Thanks to: Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Baokun Li, Benjamin Herrenschmidt, Bharata B Rao, Christophe Leroy, Daniel Axtens, Daniel Henrique Barboza, Finn Thain, Geoff Levand, Haren Myneni, Jason Wang, Jiapeng Chong, Joel Stanley, Jordan Niethe, Kajol Jain, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nick Desaulniers, Paul Mackerras, Russell Currey, Sathvika Vasireddy, Shaokun Zhang, Stephen Rothwell, Sudeep Holla, Suraj Jitindar Singh, Tom Rix, Vaibhav Jain, YueHaibing, Zhang Jianhua, and Zhen Lei. * tag 'powerpc-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (218 commits) powerpc: Only build restart_table.c for 64s powerpc/64s: move ret_from_fork etc above __end_soft_masked powerpc/64s/interrupt: clean up interrupt return labels powerpc/64/interrupt: add missing kprobe annotations on interrupt exit symbols powerpc/64: enable MSR[EE] in irq replay pt_regs powerpc/64s/interrupt: preserve regs->softe for NMI interrupts powerpc/64s: add a table of implicit soft-masked addresses powerpc/64e: remove implicit soft-masking and interrupt exit restart logic powerpc/64e: fix CONFIG_RELOCATABLE build warnings powerpc/64s: fix hash page fault interrupt handler powerpc/4xx: Fix setup_kuep() on SMP powerpc/32s: Fix setup_{kuap/kuep}() on SMP powerpc/interrupt: Use names in check_return_regs_valid() powerpc/interrupt: Also use exit_must_hard_disable() on PPC32 powerpc/sysfs: Replace sizeof(arr)/sizeof(arr[0]) with ARRAY_SIZE powerpc/ptrace: Refactor regs_set_return_{msr/ip} powerpc/ptrace: Move set_return_regs_changed() before regs_set_return_{msr/ip} powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi() powerpc/pseries/vas: Include irqdomain.h powerpc: mark local variables around longjmp as volatile ...
2021-07-02Merge tag 'asm-generic-unaligned-5.14' of ↵Linus Torvalds1-22/+0
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm/unaligned.h unification from Arnd Bergmann: "Unify asm/unaligned.h around struct helper The get_unaligned()/put_unaligned() helpers are traditionally architecture specific, with the two main variants being the "access-ok.h" version that assumes unaligned pointer accesses always work on a particular architecture, and the "le-struct.h" version that casts the data to a byte aligned type before dereferencing, for architectures that cannot always do unaligned accesses in hardware. Based on the discussion linked below, it appears that the access-ok version is not realiable on any architecture, but the struct version probably has no downsides. This series changes the code to use the same implementation on all architectures, addressing the few exceptions separately" Link: https://lore.kernel.org/lkml/75d07691-1e4f-741f-9852-38c0b4f520bc@synopsys.com/ Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363 Link: https://lore.kernel.org/lkml/20210507220813.365382-14-arnd@kernel.org/ Link: git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic.git unaligned-rework-v2 Link: https://lore.kernel.org/lkml/CAHk-=whGObOKruA_bU3aPGZfoDqZM1_9wBkwREp0H0FgR-90uQ@mail.gmail.com/ * tag 'asm-generic-unaligned-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: asm-generic: simplify asm/unaligned.h asm-generic: uaccess: 1-byte access is always aligned netpoll: avoid put_unaligned() on single character mwifiex: re-fix for unaligned accesses apparmor: use get_unaligned() only for multi-byte words partitions: msdos: fix one-byte get_unaligned() asm-generic: unaligned always use struct helpers asm-generic: unaligned: remove byteshift helpers powerpc: use linux/unaligned/le_struct.h on LE power7 m68k: select CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS sh: remove unaligned access for sh4a openrisc: always use unaligned-struct header asm-generic: use asm-generic/unaligned.h for most architectures
2021-07-02Merge branch 'akpm' (patches from Andrew)Linus Torvalds7-12/+51
Merge more updates from Andrew Morton: "190 patches. Subsystems affected by this patch series: mm (hugetlb, userfaultfd, vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock, migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap, zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc, core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs, signals, exec, kcov, selftests, compress/decompress, and ipc" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits) ipc/util.c: use binary search for max_idx ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock ipc: use kmalloc for msg_queue and shmid_kernel ipc sem: use kvmalloc for sem_undo allocation lib/decompressors: remove set but not used variabled 'level' selftests/vm/pkeys: exercise x86 XSAVE init state selftests/vm/pkeys: refill shadow register after implicit kernel write selftests/vm/pkeys: handle negative sys_pkey_alloc() return code selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random kcov: add __no_sanitize_coverage to fix noinstr for all architectures exec: remove checks in __register_bimfmt() x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned hfsplus: report create_date to kstat.btime hfsplus: remove unnecessary oom message nilfs2: remove redundant continue statement in a while-loop kprobes: remove duplicated strong free_insn_page in x86 and s390 init: print out unknown kernel parameters checkpatch: do not complain about positive return values starting with EPOLL checkpatch: improve the indented label test checkpatch: scripts/spdxcheck.py now requires python3 ...
2021-07-01mm/thp: define default pmd_pgtable()Anshuman Khandual2-5/+6
Currently most platforms define pmd_pgtable() as pmd_page() duplicating the same code all over. Instead just define a default value i.e pmd_page() for pmd_pgtable() and let platforms override when required via <asm/pgtable.h>. All the existing platform that override pmd_pgtable() have been moved into their respective <asm/pgtable.h> header in order to precede before the new generic definition. This makes it much cleaner with reduced code. Link: https://lkml.kernel.org/r/1623646133-20306-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Nick Hu <nickhu@andestech.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Brian Cain <bcain@codeaurora.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Stafford Horne <shorne@gmail.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-01mm: define default value for FIRST_USER_ADDRESSAnshuman Khandual3-4/+0
Currently most platforms define FIRST_USER_ADDRESS as 0UL duplication the same code all over. Instead just define a generic default value (i.e 0UL) for FIRST_USER_ADDRESS and let the platforms override when required. This makes it much cleaner with reduced code. The default FIRST_USER_ADDRESS here would be skipped in <linux/pgtable.h> when the given platform overrides its value via <asm/pgtable.h>. Link: https://lkml.kernel.org/r/1620615725-24623-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Acked-by: Guo Ren <guoren@kernel.org> [csky] Acked-by: Stafford Horne <shorne@gmail.com> [openrisc] Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Palmer Dabbelt <palmerdabbelt@google.com> [RISC-V] Cc: Richard Henderson <rth@twiddle.net> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Brian Cain <bcain@codeaurora.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Stafford Horne <shorne@gmail.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-01powerpc/8xx: add support for huge pages on VMAP and VMALLOCChristophe Leroy1-0/+43
powerpc 8xx has 4 page sizes: - 4k - 16k - 512k - 8M At the time being, vmalloc and vmap only support huge pages which are leaf at PMD level. Here the PMD level is 4M, it doesn't correspond to any supported page size. For now, implement use of 16k and 512k pages which is done at PTE level. Support of 8M pages will be implemented later, it requires vmalloc to support hugepd tables. Link: https://lkml.kernel.org/r/8b972f1c03fb6bd59953035f0a3e4d26659de4f8.1620795204.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-01mm/hugetlb: change parameters of arch_make_huge_pte()Christophe Leroy1-3/+2
Patch series "Subject: [PATCH v2 0/5] Implement huge VMAP and VMALLOC on powerpc 8xx", v2. This series implements huge VMAP and VMALLOC on powerpc 8xx. Powerpc 8xx has 4 page sizes: - 4k - 16k - 512k - 8M At the time being, vmalloc and vmap only support huge pages which are leaf at PMD level. Here the PMD level is 4M, it doesn't correspond to any supported page size. For now, implement use of 16k and 512k pages which is done at PTE level. Support of 8M pages will be implemented later, it requires use of hugepd tables. To allow this, the architecture provides two functions: - arch_vmap_pte_range_map_size() which tells vmap_pte_range() what page size to use. A stub returning PAGE_SIZE is provided when the architecture doesn't provide this function. - arch_vmap_pte_supported_shift() which tells __vmalloc_node_range() what page shift to use for a given area size. A stub returning PAGE_SHIFT is provided when the architecture doesn't provide this function. This patch (of 5): At the time being, arch_make_huge_pte() has the following prototype: pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct *vma, struct page *page, int writable); vma is used to get the pages shift or size. vma is also used on Sparc to get vm_flags. page is not used. writable is not used. In order to use this function without a vma, replace vma by shift and flags. Also remove the used parameters. Link: https://lkml.kernel.org/r/cover.1620795204.git.christophe.leroy@csgroup.eu Link: https://lkml.kernel.org/r/f4633ac6a7da2f22f31a04a89e0a7026bb78b15b.1620795204.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Uladzislau Rezki <uladzislau.rezki@sony.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-30powerpc/64: enable MSR[EE] in irq replay pt_regsNicholas Piggin1-0/+4
Similar to commit 2b48e96be2f9f ("powerpc/64: fix irq replay pt_regs->softe value"), enable MSR_EE in pt_regs->msr. This makes the regs look more normal. It also allows some extra debug checks to be added to interrupt handler entry. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210630074621.2109197-7-npiggin@gmail.com
2021-06-30powerpc/64s/interrupt: preserve regs->softe for NMI interruptsNicholas Piggin1-0/+3
If an NMI interrupt hits in an implicit soft-masked region, regs->softe is modified to reflect that. This may not be necessary for correctness at the moment, but it is less surprising and it's unhelpful when debugging or adding checks. Make sure this is changed back to how it was found before returning. Fixes: 4ec5feec1ad0 ("powerpc/64s: Make NMI record implicitly soft-masked code as irqs disabled") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210630074621.2109197-6-npiggin@gmail.com
2021-06-30powerpc/64s: add a table of implicit soft-masked addressesNicholas Piggin2-1/+9
Commit 9d1988ca87dd ("powerpc/64: treat low kernel text as irqs soft-masked") ends up catching too much code, including ret_from_fork, and parts of interrupt and syscall return that do not expect to be interrupts to be soft-masked. If an interrupt gets marked pending, and then the code proceeds out of the implicit soft-masked region it will fail to deal with the pending interrupt. Fix this by adding a new table of addresses which explicitly marks the regions of code that are soft masked. This table is only checked for interrupts that below __end_soft_masked, so most kernel interrupts will not have the overhead of the table search. Fixes: 9d1988ca87dd ("powerpc/64: treat low kernel text as irqs soft-masked") Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210630074621.2109197-5-npiggin@gmail.com
2021-06-30powerpc/64e: remove implicit soft-masking and interrupt exit restart logicNicholas Piggin1-9/+24
The implicit soft-masking to speed up interrupt return was going to be used by 64e as well, but it has not been extensively tested on that platform and is not considered ready. It was intended to be disabled before merge. Disable it for now. Most of the restart code is common with 64s, so with more correctness and performance testing this could be re-enabled again by adding the extra soft-mask checks to interrupt handlers and flipping exit_must_hard_disable(). Fixes: 9d1988ca87dd ("powerpc/64: treat low kernel text as irqs soft-masked") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210630074621.2109197-4-npiggin@gmail.com
2021-06-30Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-2/+2
Merge misc updates from Andrew Morton: "191 patches. Subsystems affected by this patch series: kthread, ia64, scripts, ntfs, squashfs, ocfs2, kernel/watchdog, and mm (gup, pagealloc, slab, slub, kmemleak, dax, debug, pagecache, gup, swap, memcg, pagemap, mprotect, bootmem, dma, tracing, vmalloc, kasan, initialization, pagealloc, and memory-failure)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (191 commits) mm,hwpoison: make get_hwpoison_page() call get_any_page() mm,hwpoison: send SIGBUS with error virutal address mm/page_alloc: split pcp->high across all online CPUs for cpuless nodes mm/page_alloc: allow high-order pages to be stored on the per-cpu lists mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA docs: remove description of DISCONTIGMEM arch, mm: remove stale mentions of DISCONIGMEM mm: remove CONFIG_DISCONTIGMEM m68k: remove support for DISCONTIGMEM arc: remove support for DISCONTIGMEM arc: update comment about HIGHMEM implementation alpha: remove DISCONTIGMEM and NUMA mm/page_alloc: move free_the_page mm/page_alloc: fix counting of managed_pages mm/page_alloc: improve memmap_pages dbg msg mm: drop SECTION_SHIFT in code comments mm/page_alloc: introduce vm.percpu_pagelist_high_fraction mm/page_alloc: limit the number of pages on PCP lists when reclaim is active mm/page_alloc: scale the number of pages that are batch freed ...
2021-06-29Merge tag 'irq-core-2021-06-29' of ↵Linus Torvalds1-3/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates for the interrupt subsystem: Core changes: - Cleanup and simplification of common code to invoke the low level interrupt flow handlers when this invocation requires irqdomain resolution. Add the necessary core infrastructure. - Provide a proper interface for modular PMU drivers to set the interrupt affinity. - Add a request flag which allows to exclude interrupts from spurious interrupt detection. Useful especially for IPI handlers which always return IRQ_HANDLED which turns the spurious interrupt detection into a pointless waste of CPU cycles. Driver changes: - Bulk convert interrupt chip drivers to the new irqdomain low level flow handler invocation mechanism. - Add device tree bindings for the Renesas R-Car M3-W+ SoC - Enable modular build of the Qualcomm PDC driver - The usual small fixes and improvements" * tag 'irq-core-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits) dt-bindings: interrupt-controller: arm,gic-v3: Describe GICv3 optional properties irqchip: gic-pm: Remove redundant error log of clock bulk irqchip/sun4i: Remove unnecessary oom message irqchip/irq-imx-gpcv2: Remove unnecessary oom message irqchip/imgpdc: Remove unnecessary oom message irqchip/gic-v3-its: Remove unnecessary oom message irqchip/gic-v2m: Remove unnecessary oom message irqchip/exynos-combiner: Remove unnecessary oom message irqchip: Bulk conversion to generic_handle_domain_irq() genirq: Move non-irqdomain handle_domain_irq() handling into ARM's handle_IRQ() genirq: Add generic_handle_domain_irq() helper irqchip/nvic: Convert from handle_IRQ() to handle_domain_irq() irqdesc: Fix __handle_domain_irq() comment genirq: Use irq_resolve_mapping() to implement __handle_domain_irq() and co irqdomain: Introduce irq_resolve_mapping() irqdomain: Protect the linear revmap with RCU irqdomain: Cache irq_data instead of a virq number in the revmap irqdomain: Use struct_size() helper when allocating irqdomain irqdomain: Make normal and nomap irqdomains exclusive powerpc: Move the use of irq_domain_add_nomap() behind a config option ...
2021-06-29mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMAMike Rapoport1-2/+2
After removal of DISCINTIGMEM the NEED_MULTIPLE_NODES and NUMA configuration options are equivalent. Drop CONFIG_NEED_MULTIPLE_NODES and use CONFIG_NUMA instead. Done with $ sed -i 's/CONFIG_NEED_MULTIPLE_NODES/CONFIG_NUMA/' \ $(git grep -wl CONFIG_NEED_MULTIPLE_NODES) $ sed -i 's/NEED_MULTIPLE_NODES/NUMA/' \ $(git grep -wl NEED_MULTIPLE_NODES) with manual tweaks afterwards. [rppt@linux.ibm.com: fix arm boot crash] Link: https://lkml.kernel.org/r/YMj9vHhHOiCVN4BF@linux.ibm.com Link: https://lkml.kernel.org/r/20210608091316.3622-9-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: David Hildenbrand <david@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matt Turner <mattst88@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds13-29/+107
Pull kvm updates from Paolo Bonzini: "This covers all architectures (except MIPS) so I don't expect any other feature pull requests this merge window. ARM: - Add MTE support in guests, complete with tag save/restore interface - Reduce the impact of CMOs by moving them in the page-table code - Allow device block mappings at stage-2 - Reduce the footprint of the vmemmap in protected mode - Support the vGIC on dumb systems such as the Apple M1 - Add selftest infrastructure to support multiple configuration and apply that to PMU/non-PMU setups - Add selftests for the debug architecture - The usual crop of PMU fixes PPC: - Support for the H_RPT_INVALIDATE hypercall - Conversion of Book3S entry/exit to C - Bug fixes S390: - new HW facilities for guests - make inline assembly more robust with KASAN and co x86: - Allow userspace to handle emulation errors (unknown instructions) - Lazy allocation of the rmap (host physical -> guest physical address) - Support for virtualizing TSC scaling on VMX machines - Optimizations to avoid shattering huge pages at the beginning of live migration - Support for initializing the PDPTRs without loading them from memory - Many TLB flushing cleanups - Refuse to load if two-stage paging is available but NX is not (this has been a requirement in practice for over a year) - A large series that separates the MMU mode (WP/SMAP/SMEP etc.) from CR0/CR4/EFER, using the MMU mode everywhere once it is computed from the CPU registers - Use PM notifier to notify the guest about host suspend or hibernate - Support for passing arguments to Hyper-V hypercalls using XMM registers - Support for Hyper-V TLB flush hypercalls and enlightened MSR bitmap on AMD processors - Hide Hyper-V hypercalls that are not included in the guest CPUID - Fixes for live migration of virtual machines that use the Hyper-V "enlightened VMCS" optimization of nested virtualization - Bugfixes (not many) Generic: - Support for retrieving statistics without debugfs - Cleanups for the KVM selftests API" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (314 commits) KVM: x86: rename apic_access_page_done to apic_access_memslot_enabled kvm: x86: disable the narrow guest module parameter on unload selftests: kvm: Allows userspace to handle emulation errors. kvm: x86: Allow userspace to handle emulation errors KVM: x86/mmu: Let guest use GBPAGES if supported in hardware and TDP is on KVM: x86/mmu: Get CR4.SMEP from MMU, not vCPU, in shadow page fault KVM: x86/mmu: Get CR0.WP from MMU, not vCPU, in shadow page fault KVM: x86/mmu: Drop redundant rsvd bits reset for nested NPT KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic KVM: x86: Enhance comments for MMU roles and nested transition trickiness KVM: x86/mmu: WARN on any reserved SPTE value when making a valid SPTE KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic MMU KVM: x86/mmu: Use MMU's role to determine PTTYPE KVM: x86/mmu: Collapse 32-bit PAE and 64-bit statements for helpers KVM: x86/mmu: Add a helper to calculate root from role_regs KVM: x86/mmu: Add helper to update paging metadata KVM: x86/mmu: Don't update nested guest's paging bitmasks if CR0.PG=0 KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls KVM: x86/mmu: Use MMU role_regs to get LA57, and drop vCPU LA57 helper KVM: x86/mmu: Get nested MMU's root level from the MMU's role ...
2021-06-28Merge tag 'locking-core-2021-06-28' of ↵Linus Torvalds3-83/+89
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - Core locking & atomics: - Convert all architectures to ARCH_ATOMIC: move every architecture to ARCH_ATOMIC, then get rid of ARCH_ATOMIC and all the transitory facilities and #ifdefs. Much reduction in complexity from that series: 63 files changed, 756 insertions(+), 4094 deletions(-) - Self-test enhancements - Futexes: - Add the new FUTEX_LOCK_PI2 ABI, which is a variant that doesn't set FLAGS_CLOCKRT (.e. uses CLOCK_MONOTONIC). [ The temptation to repurpose FUTEX_LOCK_PI's implicit setting of FLAGS_CLOCKRT & invert the flag's meaning to avoid having to introduce a new variant was resisted successfully. ] - Enhance futex self-tests - Lockdep: - Fix dependency path printouts - Optimize trace saving - Broaden & fix wait-context checks - Misc cleanups and fixes. * tag 'locking-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) locking/lockdep: Correct the description error for check_redundant() futex: Provide FUTEX_LOCK_PI2 to support clock selection futex: Prepare futex_lock_pi() for runtime clock selection lockdep/selftest: Remove wait-type RCU_CALLBACK tests lockdep/selftests: Fix selftests vs PROVE_RAW_LOCK_NESTING lockdep: Fix wait-type for empty stack locking/selftests: Add a selftest for check_irq_usage() lockding/lockdep: Avoid to find wrong lock dep path in check_irq_usage() locking/lockdep: Remove the unnecessary trace saving locking/lockdep: Fix the dep path printing for backwards BFS selftests: futex: Add futex compare requeue test selftests: futex: Add futex wait test seqlock: Remove trailing semicolon in macros locking/lockdep: Reduce LOCKDEP dependency list locking/lockdep,doc: Improve readability of the block matrix locking/atomics: atomic-instrumented: simplify ifdeffery locking/atomic: delete !ARCH_ATOMIC remnants locking/atomic: xtensa: move to ARCH_ATOMIC locking/atomic: sparc: move to ARCH_ATOMIC locking/atomic: sh: move to ARCH_ATOMIC ...
2021-06-25Merge tag 'kvmarm-5.14' of ↵Paolo Bonzini2-35/+52
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for v5.14. - Add MTE support in guests, complete with tag save/restore interface - Reduce the impact of CMOs by moving them in the page-table code - Allow device block mappings at stage-2 - Reduce the footprint of the vmemmap in protected mode - Support the vGIC on dumb systems such as the Apple M1 - Add selftest infrastructure to support multiple configuration and apply that to PMU/non-PMU setups - Add selftests for the debug architecture - The usual crop of PMU fixes
2021-06-25powerpc/ptrace: Refactor regs_set_return_{msr/ip}Christophe Leroy1-8/+2
regs_set_return_msr() and regs_set_return_ip() have a copy of the code of set_return_regs_changed(). Call the later instead. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/baf64a91557d3811c155616a6aa23ed7b3b21da4.1624619582.git.christophe.leroy@csgroup.eu
2021-06-25powerpc/ptrace: Move set_return_regs_changed() before regs_set_return_{msr/ip}Christophe Leroy1-5/+5
regs_set_return_msr() and regs_set_return_ip() have a copy of the code of set_return_regs_changed(). Move up set_return_regs_changed() so it can be reused by regs_set_return_{msr/ip} Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/49f4fb051a3e1cb69f7305d5b6768aec14727c32.1624619582.git.christophe.leroy@csgroup.eu
2021-06-25powerpc: Fix is_kvm_guest() / kvm_para_available()Michael Ellerman1-2/+2
Commit a21d1becaa3f ("powerpc: Reintroduce is_kvm_guest() as a fast-path check") added is_kvm_guest() and changed kvm_para_available() to use it. is_kvm_guest() checks a static key, kvm_guest, and that static key is set in check_kvm_guest(). The problem is check_kvm_guest() is only called on pseries, and even then only in some configurations. That means is_kvm_guest() always returns false on all non-pseries and some pseries depending on configuration. That's a bug. For PR KVM guests this is noticable because they no longer do live patching of themselves, which can be detected by the omission of a message in dmesg such as: KVM: Live patching for a fast VM worked To fix it make check_kvm_guest() an initcall, to ensure it's always called at boot. It needs to be core so that it runs before kvm_guest_init() which is postcore. To be an initcall it needs to return int, where 0 means success, so update that. We still call it manually in pSeries_smp_probe(), because that runs before init calls are run. Fixes: a21d1becaa3f ("powerpc: Reintroduce is_kvm_guest() as a fast-path check") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210623130514.2543232-1-mpe@ellerman.id.au
2021-06-25powerpc/kprobes: Roll IS_RFI() macro into IS_RFID()Naveen N. Rao1-4/+3
In kprobes and xmon, we should exclude both 32-bit and 64-bit variants of mtmsr and rfi instructions from being stepped. Have IS_RFID() also detect a rfi instruction similar to IS_MTMSRD(). Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/eee32e1b75dae85d471c89b4c0a123ad4b0aabf8.1621416666.git.naveen.n.rao@linux.vnet.ibm.com
2021-06-25powerpc/papr_scm: Add support for reporting dirty-shutdown-countVaibhav Jain1-0/+6
Persistent memory devices like NVDIMMs can loose cached writes in case something prevents flush on power-fail. Such situations are termed as dirty shutdown and are exposed to applications as last-shutdown-state (LSS) flag and a dirty-shutdown-counter(DSC) as described at [1]. The latter being useful in conditions where multiple applications want to detect a dirty shutdown event without racing with one another. PAPR-NVDIMMs have so far only exposed LSS style flags to indicate a dirty-shutdown-state. This patch further adds support for DSC via the "ibm,persistence-failed-count" device tree property of an NVDIMM. This property is a monotonic increasing 64-bit counter thats an indication of number of times an NVDIMM has encountered a dirty-shutdown event causing persistence loss. Since this value is not expected to change after system-boot hence papr_scm reads & caches its value during NVDIMM probe and exposes it as a PAPR sysfs attributed named 'dirty_shutdown' to match the name of similarly named NFIT sysfs attribute. Also this value is available to libnvdimm via PAPR_PDSM_HEALTH payload. 'struct nd_papr_pdsm_health' has been extended to add a new member called 'dimm_dsc' presence of which is indicated by the newly introduced PDSM_DIMM_DSC_VALID flag. References: [1] https://pmem.io/documents/Dirty_Shutdown_Handling-V1.0.pdf Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210624080621.252038-1-vaibhav@linux.ibm.com
2021-06-24KVM: stats: Separate generic stats from architecture specific onesJing Zhang1-7/+2
Generic KVM stats are those collected in architecture independent code or those supported by all architectures; put all generic statistics in a separate structure. This ensures that they are defined the same way in the statistics API which is being added, removing duplication among different architectures in the declaration of the descriptors. No functional change intended. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210618222709.1858088-2-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-24powerpc: Remove klimitChristophe Leroy1-1/+0
klimit is a global variable initialised at build time with the value of _end. This variable is never modified, so _end symbol can be used directly. Remove klimit. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/9fa9ba6807c17f93f35a582c199c646c4a8bfd9c.1622800638.git.christophe.leroy@csgroup.eu
2021-06-24powerpc/64: use interrupt restart table to speed up return from interruptNicholas Piggin5-2/+22
Use the restart table facility to return from interrupt or system calls without disabling MSR[EE] or MSR[RI]. Interrupt return asm is put into the low soft-masked region, to prevent interrupts being processed here, although they are still taken as masked interrupts which causes SRRs to be clobbered, and a pending soft-masked interrupt to require replaying. The return code uses restart table regions to redirct to a fixup handler rather than continue with the exit, if such an interrupt happens. In this case the interrupt return is redirected to a fixup handler which reloads r1 for the interrupt stack and reloads registers and sets state up to replay the soft-masked interrupt and try the exit again. Some types of security exit fallback flushes and barriers are currently unable to cope with reentrant interrupts, e.g., because they store some state in the scratch SPR which would be clobbered even by masked interrupts. For now the interrupts-enabled exits are disabled when these flushes are used. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Guard unused exit_must_hard_disable() as reported by lkp] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210617155116.2167984-13-npiggin@gmail.com
2021-06-24powerpc/64: treat low kernel text as irqs soft-maskedNicholas Piggin1-3/+8
Treat code below __end_soft_masked as soft-masked for the purpose of alternate return. 64s already mostly does this for scv entry. This will be used to exit from interrupts without disabling MSR[EE]. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210617155116.2167984-12-npiggin@gmail.com
2021-06-24powerpc/64: allow alternate return locations for soft-masked interruptsNicholas Piggin2-0/+21
The exception table fixup adjusts a failed page fault's interrupt return location if it was taken at an address specified in the exception table, to a corresponding fixup handler address. Introduce a variation of that idea which adds a fixup table for NMIs and soft-masked asynchronous interrupts. This will be used to protect certain critical sections that are sensitive to being clobbered by interrupts coming in (due to using the same SPRs and/or irq soft-mask state). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210617155116.2167984-10-npiggin@gmail.com