summaryrefslogtreecommitdiff
path: root/arch/arm64/mm
AgeCommit message (Collapse)AuthorFilesLines
2018-09-05arm64: mm: check for upper PAGE_SHIFT bits in pfn_valid()Greg Hackmann1-1/+5
commit 5ad356eabc47d26a92140a0c4b20eba471c10de3 upstream. ARM64's pfn_valid() shifts away the upper PAGE_SHIFT bits of the input before seeing if the PFN is valid. This leads to false positives when some of the upper bits are set, but the lower bits match a valid PFN. For example, the following userspace code looks up a bogus entry in /proc/kpageflags: int pagemap = open("/proc/self/pagemap", O_RDONLY); int pageflags = open("/proc/kpageflags", O_RDONLY); uint64_t pfn, val; lseek64(pagemap, [...], SEEK_SET); read(pagemap, &pfn, sizeof(pfn)); if (pfn & (1UL << 63)) { /* valid PFN */ pfn &= ((1UL << 55) - 1); /* clear flag bits */ pfn |= (1UL << 55); lseek64(pageflags, pfn * sizeof(uint64_t), SEEK_SET); read(pageflags, &val, sizeof(val)); } On ARM64 this causes the userspace process to crash with SIGSEGV rather than reading (1 << KPF_NOPAGE). kpageflags_read() treats the offset as valid, and stable_page_flags() will try to access an address between the user and kernel address ranges. Fixes: c1cc1552616d ("arm64: MMU initialisation") Cc: stable@vger.kernel.org Signed-off-by: Greg Hackmann <ghackmann@google.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-17ioremap: Update pgtable free interfaces with addrChintan Pandya1-2/+2
commit 785a19f9d1dd8a4ab2d0633be4656653bd3de1fc upstream. The following kernel panic was observed on ARM64 platform due to a stale TLB entry. 1. ioremap with 4K size, a valid pte page table is set. 2. iounmap it, its pte entry is set to 0. 3. ioremap the same address with 2M size, update its pmd entry with a new value. 4. CPU may hit an exception because the old pmd entry is still in TLB, which leads to a kernel panic. Commit b6bdb7517c3d ("mm/vmalloc: add interfaces to free unmapped page table") has addressed this panic by falling to pte mappings in the above case on ARM64. To support pmd mappings in all cases, TLB purge needs to be performed in this case on ARM64. Add a new arg, 'addr', to pud_free_pmd_page() and pmd_free_pte_page() so that TLB purge can be added later in seprate patches. [toshi.kani@hpe.com: merge changes, rewrite patch description] Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces") Signed-off-by: Chintan Pandya <cpandya@codeaurora.org> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: mhocko@suse.com Cc: akpm@linux-foundation.org Cc: hpa@zytor.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: Will Deacon <will.deacon@arm.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: stable@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20180627141348.21777-3-toshi.kani@hpe.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03arm64: fix vmemmap BUILD_BUG_ON() triggering on !vmemmap setupsJohannes Weiner1-1/+3
commit 7b0eb6b41a08fa1fa0d04b1c53becd62b5fbfaee upstream. Arnd reports the following arm64 randconfig build error with the PSI patches that add another page flag: /git/arm-soc/arch/arm64/mm/init.c: In function 'mem_init': /git/arm-soc/include/linux/compiler.h:357:38: error: call to '__compiletime_assert_618' declared with attribute error: BUILD_BUG_ON failed: sizeof(struct page) > (1 << STRUCT_PAGE_MAX_SHIFT) The additional page flag causes other information stored in page->flags to get bumped into their own struct page member: #if SECTIONS_WIDTH+ZONES_WIDTH+NODES_SHIFT+LAST_CPUPID_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS #define LAST_CPUPID_WIDTH LAST_CPUPID_SHIFT #else #define LAST_CPUPID_WIDTH 0 #endif #if defined(CONFIG_NUMA_BALANCING) && LAST_CPUPID_WIDTH == 0 #define LAST_CPUPID_NOT_IN_PAGE_FLAGS #endif which in turn causes the struct page size to exceed the size set in STRUCT_PAGE_MAX_SHIFT. This value is an an estimate used to size the VMEMMAP page array according to address space and struct page size. However, the check is performed - and triggers here - on a !VMEMMAP config, which consumes an additional 22 page bits for the sparse section id. When VMEMMAP is enabled, those bits are returned, cpupid doesn't need its own member, and the page passes the VMEMMAP check. Restrict that check to the situation it was meant to check: that we are sizing the VMEMMAP page array correctly. Says Arnd: Further experiments show that the build error already existed before, but was only triggered with larger values of CONFIG_NR_CPU and/or CONFIG_NODES_SHIFT that might be used in actual configurations but not in randconfig builds. With longer CPU and node masks, I could recreate the problem with kernels as old as linux-4.7 when arm64 NUMA support got added. Reported-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Arnd Bergmann <arnd@arndb.de> Cc: stable@vger.kernel.org Fixes: 1a2db300348b ("arm64, numa: Add NUMA support for arm64 platforms.") Fixes: 3e1907d5bf5a ("arm64: mm: move vmemmap region right below the linear region") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03arm64: mm: Ensure writes to swapper are ordered wrt subsequent cache maintenanceWill Deacon1-2/+3
commit 71c8fc0c96abf8e53e74ed4d891d671e585f9076 upstream. When rewriting swapper using nG mappings, we must performance cache maintenance around each page table access in order to avoid coherency problems with the host's cacheable alias under KVM. To ensure correct ordering of the maintenance with respect to Device memory accesses made with the Stage-1 MMU disabled, DMBs need to be added between the maintenance and the corresponding memory access. This patch adds a missing DMB between writing a new page table entry and performing a clean+invalidate on the same line. Fixes: f992b4dfd58b ("arm64: kpti: Add ->enable callback to remap swapper using nG mappings") Cc: <stable@vger.kernel.org> # 4.16.x- Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-16arm64: Add work around for Arm Cortex-A55 Erratum 1024718Suzuki K Poulose1-0/+5
commit ece1397cbc89c51914fae1aec729539cfd8bd62b upstream. Some variants of the Arm Cortex-55 cores (r0p0, r0p1, r1p0) suffer from an erratum 1024718, which causes incorrect updates when DBM/AP bits in a page table entry is modified without a break-before-make sequence. The work around is to skip enabling the hardware DBM feature on the affected cores. The hardware Access Flag management features is not affected. There are some other cores suffering from this errata, which could be added to the midr_list to trigger the work around. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: ckadabi@codeaurora.org Reviewed-by: Dave Martin <dave.martin@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-20arm64: entry: Apply BP hardening for suspicious interrupts from EL0Mark Rutland1-0/+6
From: Will Deacon <will.deacon@arm.com> commit 30d88c0e3ace625a92eead9ca0ad94093a8f59fe upstream. It is possible to take an IRQ from EL0 following a branch to a kernel address in such a way that the IRQ is prioritised over the instruction abort. Whilst an attacker would need to get the stars to align here, it might be sufficient with enough calibration so perform BP hardening in the rare case that we see a kernel address in the ELR when handling an IRQ from EL0. Reported-by: Dan Hettena <dhettena@nvidia.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> [v4.9 backport] Tested-by: Greg Hackmann <ghackmann@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-20arm64: entry: Apply BP hardening for high-priority synchronous exceptionsMark Rutland1-0/+9
From: Will Deacon <will.deacon@arm.com> commit 5dfc6ed27710c42cbc15db5c0d4475699991da0a upstream. Software-step and PC alignment fault exceptions have higher priority than instruction abort exceptions, so apply the BP hardening hooks there too if the user PC appears to reside in kernel space. Reported-by: Dan Hettena <dhettena@nvidia.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> [v4.9 backport] Tested-by: Greg Hackmann <ghackmann@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-20arm64: Move BP hardening to check_and_switch_contextMark Rutland1-2/+3
From: Marc Zyngier <marc.zyngier@arm.com> commit a8e4c0a919ae310944ed2c9ace11cf3ccd8a609b upstream. We call arm64_apply_bp_hardening() from post_ttbr_update_workaround, which has the unexpected consequence of being triggered on every exception return to userspace when ARM64_SW_TTBR0_PAN is selected, even if no context switch actually occured. This is a bit suboptimal, and it would be more logical to only invalidate the branch predictor when we actually switch to a different mm. In order to solve this, move the call to arm64_apply_bp_hardening() into check_and_switch_context(), where we're guaranteed to pick a different mm context. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> [v4.9 backport] Tested-by: Greg Hackmann <ghackmann@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-20arm64: Add skeleton to harden the branch predictor against aliasing attacksMark Rutland2-0/+19
From: Will Deacon <will.deacon@arm.com> commit 0f15adbb2861ce6f75ccfc5a92b19eae0ef327d0 upstream. Aliasing attacks against CPU branch predictors can allow an attacker to redirect speculative control flow on some CPUs and potentially divulge information from one context to another. This patch adds initial skeleton code behind a new Kconfig option to enable implementation-specific mitigations against these attacks for CPUs that are affected. Co-developed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> [v4.9: copy bp hardening cb via text mapping] Signed-off-by: Mark Rutland <mark.rutland@arm.com> [v4.9 backport] Tested-by: Greg Hackmann <ghackmann@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-20arm64: Move post_ttbr_update_workaround to C codeMark Rutland2-2/+10
From: Marc Zyngier <marc.zyngier@arm.com> commit 95e3de3590e3f2358bb13f013911bc1bfa5d3f53 upstream. We will soon need to invoke a CPU-specific function pointer after changing page tables, so move post_ttbr_update_workaround out into C code to make this possible. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> [v4.9 backport] Tested-by: Greg Hackmann <ghackmann@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-20arm64: Factor out TTBR0_EL1 post-update workaround into a specific asm macroMark Rutland1-5/+1
From: Catalin Marinas <catalin.marinas@arm.com> commit f33bcf03e6079668da6bf4eec4a7dcf9289131d0 upstream. This patch takes the errata workaround code out of cpu_do_switch_mm into a dedicated post_ttbr0_update_workaround macro which will be reused in a subsequent patch. Cc: Will Deacon <will.deacon@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Kees Cook <keescook@chromium.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> [v4.9 backport] Tested-by: Greg Hackmann <ghackmann@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-20arm64: Make USER_DS an inclusive limitMark Rutland1-1/+1
From: Robin Murphy <robin.murphy@arm.com> commit 51369e398d0d33e8f524314e672b07e8cf870e79 upstream. Currently, USER_DS represents an exclusive limit while KERNEL_DS is inclusive. In order to do some clever trickery for speculation-safe masking, we need them both to behave equivalently - there aren't enough bits to make KERNEL_DS exclusive, so we have precisely one option. This also happens to correct a longstanding false negative for a range ending on the very top byte of kernel memory. Mark Rutland points out that we've actually got the semantics of addresses vs. segments muddled up in most of the places we need to amend, so shuffle the {USER,KERNEL}_DS definitions around such that we can correct those properly instead of just pasting "-1"s everywhere. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> [v4.9: avoid dependence on TTBR0 SW PAN and THREAD_INFO_IN_TASK_STRUCT] Signed-off-by: Mark Rutland <mark.rutland@arm.com> [v4.9 backport] Tested-by: Greg Hackmann <ghackmann@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13arm64: kernel: restrict /dev/mem read() calls to linear regionArd Biesheuvel1-6/+13
[ Upstream commit 1151f838cb626005f4d69bf675dacaaa5ea909d6 ] When running lscpu on an AArch64 system that has SMBIOS version 2.0 tables, it will segfault in the following way: Unable to handle kernel paging request at virtual address ffff8000bfff0000 pgd = ffff8000f9615000 [ffff8000bfff0000] *pgd=0000000000000000 Internal error: Oops: 96000007 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 1284 Comm: lscpu Not tainted 4.11.0-rc3+ #103 Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 task: ffff8000fa78e800 task.stack: ffff8000f9780000 PC is at __arch_copy_to_user+0x90/0x220 LR is at read_mem+0xcc/0x140 This is caused by the fact that lspci issues a read() on /dev/mem at the offset where it expects to find the SMBIOS structure array. However, this region is classified as EFI_RUNTIME_SERVICE_DATA (as per the UEFI spec), and so it is omitted from the linear mapping. So let's restrict /dev/mem read/write access to those areas that are covered by the linear region. Reported-by: Alexander Graf <agraf@suse.de> Fixes: 4dffbfc48d65 ("arm64/efi: mark UEFI reserved regions as MEMBLOCK_NOMAP") Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-08arm64: idmap: Use "awx" flags for .idmap.text .pushsection directivesWill Deacon1-4/+4
commit 439e70e27a51 upstream. The identity map is mapped as both writeable and executable by the SWAPPER_MM_MMUFLAGS and this is relied upon by the kpti code to manage a synchronisation flag. Update the .pushsection flags to reflect the actual mapping attributes. Reported-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Alex Shi <alex.shi@linaro.org> [v4.9 backport] Signed-off-by: Mark Rutland <mark.rutland@arm.com> [v4.9 backport] Tested-by: Will Deacon <will.deacon@arm.com> Tested-by: Greg Hackmann <ghackmann@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-08arm64: kpti: Add ->enable callback to remap swapper using nG mappingsWill Deacon1-7/+194
commit f992b4dfd58b upstream. Defaulting to global mappings for kernel space is generally good for performance and appears to be necessary for Cavium ThunderX. If we subsequently decide that we need to enable kpti, then we need to rewrite our existing page table entries to be non-global. This is fiddly, and made worse by the possible use of contiguous mappings, which require a strict break-before-make sequence. Since the enable callback runs on each online CPU from stop_machine context, we can have all CPUs enter the idmap, where secondaries can wait for the primary CPU to rewrite swapper with its MMU off. It's all fairly horrible, but at least it only runs once. Nicolas Dechesne <nicolas.dechesne@linaro.org> found a bug on this commit which cause boot failure on db410c etc board. Ard Biesheuvel found it writting wrong contenct to ttbr1_el1 in __idmap_cpu_set_reserved_ttbr1 macro and fixed it by give it the right content. Tested-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [Alex: avoid dependency on 52-bit PA patches and TTBR/MMU erratum patches] Signed-off-by: Alex Shi <alex.shi@linaro.org> [v4.9 backport] Signed-off-by: Mark Rutland <mark.rutland@arm.com> [v4.9 backport] Tested-by: Will Deacon <will.deacon@arm.com> Tested-by: Greg Hackmann <ghackmann@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-08arm64: kaslr: Put kernel vectors address in separate data pageWill Deacon1-1/+9
commit 6c27c4082f4f upstream. The literal pool entry for identifying the vectors base is the only piece of information in the trampoline page that identifies the true location of the kernel. This patch moves it into a page-aligned region of the .rodata section and maps this adjacent to the trampoline text via an additional fixmap entry, which protects against any accidental leakage of the trampoline contents. Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Laura Abbott <labbott@redhat.com> Tested-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com> [Alex: avoid ARM64_WORKAROUND_QCOM_FALKOR_E1003 dependency] Signed-off-by: Alex Shi <alex.shi@linaro.org> [v4.9 backport] Signed-off-by: Mark Rutland <mark.rutland@arm.com> [v4.9 backport] Tested-by: Will Deacon <will.deacon@arm.com> Tested-by: Greg Hackmann <ghackmann@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-08arm64: mm: Map entry trampoline into trampoline and kernel page tablesWill Deacon1-0/+23
commit 51a0048beb44 upstream. The exception entry trampoline needs to be mapped at the same virtual address in both the trampoline page table (which maps nothing else) and also the kernel page table, so that we can swizzle TTBR1_EL1 on exceptions from and return to EL0. This patch maps the trampoline at a fixed virtual address in the fixmap area of the kernel virtual address space, which allows the kernel proper to be randomized with respect to the trampoline when KASLR is enabled. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Tested-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Alex Shi <alex.shi@linaro.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> [v4.9 backport] Tested-by: Will Deacon <will.deacon@arm.com> Tested-by: Greg Hackmann <ghackmann@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-08arm64: mm: Allocate ASIDs in pairsWill Deacon1-8/+17
commit 0c8ea531b774 upstream. In preparation for separate kernel/user ASIDs, allocate them in pairs for each mm_struct. The bottom bit distinguishes the two: if it is set, then the ASID will map only userspace. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Tested-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Alex Shi <alex.shi@linaro.org> [v4.9 backport] Signed-off-by: Mark Rutland <mark.rutland@arm.com> [v4.9 backport] Tested-by: Will Deacon <will.deacon@arm.com> Tested-by: Greg Hackmann <ghackmann@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-08arm64: mm: Move ASID from TTBR0 to TTBR1Will Deacon1-3/+6
commit 7655abb95386 upstream. In preparation for mapping kernelspace and userspace with different ASIDs, move the ASID to TTBR1 and update switch_mm to context-switch TTBR0 via an invalid mapping (the zero page). Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Tested-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Alex Shi <alex.shi@linaro.org> [v4.9 backport] Signed-off-by: Mark Rutland <mark.rutland@arm.com> [v4.9 backport] Tested-by: Will Deacon <will.deacon@arm.com> Tested-by: Greg Hackmann <ghackmann@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28mm/vmalloc: add interfaces to free unmapped page tableToshi Kani1-0/+10
commit b6bdb7517c3d3f41f20e5c2948d6bc3f8897394e upstream. On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap() may create pud/pmd mappings. A kernel panic was observed on arm64 systems with Cortex-A75 in the following steps as described by Hanjun Guo. 1. ioremap a 4K size, valid page table will build, 2. iounmap it, pte0 will set to 0; 3. ioremap the same address with 2M size, pgd/pmd is unchanged, then set the a new value for pmd; 4. pte0 is leaked; 5. CPU may meet exception because the old pmd is still in TLB, which will lead to kernel panic. This panic is not reproducible on x86. INVLPG, called from iounmap, purges all levels of entries associated with purged address on x86. x86 still has memory leak. The patch changes the ioremap path to free unmapped page table(s) since doing so in the unmap path has the following issues: - The iounmap() path is shared with vunmap(). Since vmap() only supports pte mappings, making vunmap() to free a pte page is an overhead for regular vmap users as they do not need a pte page freed up. - Checking if all entries in a pte page are cleared in the unmap path is racy, and serializing this check is expensive. - The unmap path calls free_vmap_area_noflush() to do lazy TLB purges. Clearing a pud/pmd entry before the lazy TLB purges needs extra TLB purge. Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(), which clear a given pud/pmd entry and free up a page for the lower level entries. This patch implements their stub functions on x86 and arm64, which work as workaround. [akpm@linux-foundation.org: fix typo in pmd_free_pte_page() stub] Link: http://lkml.kernel.org/r/20180314180155.19492-2-toshi.kani@hpe.com Fixes: e61ce6ade404e ("mm: change ioremap to set up huge I/O mappings") Reported-by: Lei Li <lious.lilei@hisilicon.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Wang Xuefeng <wxf.wang@hisilicon.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-25arm64: fix warning about swapper_pg_dir overflowArnd Bergmann1-1/+1
commit 12f043ff2b28fa64c9123b454cbe30a8a9e1967e upstream. With 4 levels of 16KB pages, we get this warning about the fact that we are copying a whole page into an array that is declared as having only two pointers for the top level of the page table: arch/arm64/mm/mmu.c: In function 'paging_init': arch/arm64/mm/mmu.c:528:2: error: 'memcpy' writing 16384 bytes into a region of size 16 overflows the destination [-Werror=stringop-overflow=] This is harmless since we actually reserve a whole page in the definition of the array that comes from, and just the extern declaration is short. The pgdir is initialized to zero either way, so copying the actual entries here seems like the best solution. Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Will Deacon <will.deacon@arm.com> [slightly adapted to apply on 4.9] Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25arm64: Initialise high_memory global variable earlierSteve Capper1-1/+1
commit f24e5834a2c3f6c5f814a417f858226f0a010ade upstream. The high_memory global variable is used by cma_declare_contiguous(.) before it is defined. We don't notice this as we compute __pa(high_memory - 1), and it looks like we're processing a VA from the direct linear map. This problem becomes apparent when we flip the kernel virtual address space and the linear map is moved to the bottom of the kernel VA space. This patch moves the initialisation of high_memory before it used. Fixes: f7426b983a6a ("mm: cma: adjust address limit to avoid hitting low/high memory boundary") Signed-off-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-15arm64: dma-mapping: Only swizzle DMA ops for IOMMU_DOMAIN_DMAWill Deacon1-5/+12
[ Upstream commit 4a8d8a14c0d08c2437cb80c05e88f6cc1ca3fb2c ] The arm64 DMA-mapping implementation sets the DMA ops to the IOMMU DMA ops if we detect that an IOMMU is present for the master and the DMA ranges are valid. In the case when the IOMMU domain for the device is not of type IOMMU_DOMAIN_DMA, then we have no business swizzling the ops, since we're not in control of the underlying address space. This patch leaves the DMA ops alone for masters attached to non-DMA IOMMU domains. Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05arm64: fault: Route pte translation faults via do_translation_faultWill Deacon1-1/+1
commit 760bfb47c36a07741a089bf6a28e854ffbee7dc9 upstream. We currently route pte translation faults via do_page_fault, which elides the address check against TASK_SIZE before invoking the mm fault handling code. However, this can cause issues with the path walking code in conjunction with our word-at-a-time implementation because load_unaligned_zeropad can end up faulting in kernel space if it reads across a page boundary and runs into a page fault (e.g. by attempting to read from a guard region). In the case of such a fault, load_unaligned_zeropad has registered a fixup to shift the valid data and pad with zeroes, however the abort is reported as a level 3 translation fault and we dispatch it straight to do_page_fault, despite it being a kernel address. This results in calling a sleeping function from atomic context: BUG: sleeping function called from invalid context at arch/arm64/mm/fault.c:313 in_atomic(): 0, irqs_disabled(): 0, pid: 10290 Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [...] [<ffffff8e016cd0cc>] ___might_sleep+0x134/0x144 [<ffffff8e016cd158>] __might_sleep+0x7c/0x8c [<ffffff8e016977f0>] do_page_fault+0x140/0x330 [<ffffff8e01681328>] do_mem_abort+0x54/0xb0 Exception stack(0xfffffffb20247a70 to 0xfffffffb20247ba0) [...] [<ffffff8e016844fc>] el1_da+0x18/0x78 [<ffffff8e017f399c>] path_parentat+0x44/0x88 [<ffffff8e017f4c9c>] filename_parentat+0x5c/0xd8 [<ffffff8e017f5044>] filename_create+0x4c/0x128 [<ffffff8e017f59e4>] SyS_mkdirat+0x50/0xc8 [<ffffff8e01684e30>] el0_svc_naked+0x24/0x28 Code: 36380080 d5384100 f9400800 9402566d (d4210000) ---[ end trace 2d01889f2bca9b9f ]--- Fix this by dispatching all translation faults to do_translation_faults, which avoids invoking the page fault logic for faults on kernel addresses. Reported-by: Ankit Jain <ankijain@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-02arm64: mm: abort uaccess retries upon fatal signalMark Rutland1-1/+4
commit 289d07a2dc6c6b6f3e4b8a62669320d99dbe6c3d upstream. When there's a fatal signal pending, arm64's do_page_fault() implementation returns 0. The intent is that we'll return to the faulting userspace instruction, delivering the signal on the way. However, if we take a fatal signal during fixing up a uaccess, this results in a return to the faulting kernel instruction, which will be instantly retried, resulting in the same fault being taken forever. As the task never reaches userspace, the signal is not delivered, and the task is left unkillable. While the task is stuck in this state, it can inhibit the forward progress of the system. To avoid this, we must ensure that when a fatal signal is pending, we apply any necessary fixup for a faulting kernel instruction. Thus we will return to an error path, and it is up to that code to make forward progress towards delivering the fatal signal. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Laura Abbott <labbott@redhat.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Tested-by: Steve Capper <steve.capper@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Tested-by: James Morse <james.morse@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-07arm64: mm: fix show_pte KERN_CONT falloutMark Rutland1-4/+4
[ Upstream commit 6ef4fb387d50fa8f3bffdffc868b57e981cdd709 ] Recent changes made KERN_CONT mandatory for continued lines. In the absence of KERN_CONT, a newline may be implicit inserted by the core printk code. In show_pte, we (erroneously) use printk without KERN_CONT for continued prints, resulting in output being split across a number of lines, and not matching the intended output, e.g. [ff000000000000] *pgd=00000009f511b003 , *pud=00000009f4a80003 , *pmd=0000000000000000 Fix this by using pr_cont() for all the continuations. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-12arm64: mm: unaligned access by user-land should be received as SIGBUSVictor Kamensky1-18/+24
commit 09a6adf53d42ca3088fa3fb41f40b768efc711ed upstream. After 52d7523 (arm64: mm: allow the kernel to handle alignment faults on user accesses) commit user-land accesses that produce unaligned exceptions like in case of aarch32 ldm/stm/ldrd/strd instructions operating on unaligned memory received by user-land as SIGSEGV. It is wrong, it should be reported as SIGBUS as it was before 52d7523 commit. Changed do_bad_area function to take signal and code parameters out of esr value using fault_info table, so in case of do_alignment_fault fault user-land will receive SIGBUS. Wrapped access to fault_info table into esr_to_fault_info function. Fixes: 52d7523 (arm64: mm: allow the kernel to handle alignment faults on user accesses) Signed-off-by: Victor Kamensky <kamensky@cisco.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12arm64: dma-mapping: Fix dma_mapping_error() when bypassing SWIOTLBRobin Murphy1-1/+8
commit adbe7e26f4257f72817495b9bce114284060b0d7 upstream. When bypassing SWIOTLB on small-memory systems, we need to avoid calling into swiotlb_dma_mapping_error() in exactly the same way as we avoid swiotlb_dma_supported(), because the former also relies on SWIOTLB state being initialised. Under the assumptions for which we skip SWIOTLB, dma_map_{single,page}() will only ever return the DMA-offset-adjusted physical address of the page passed in, thus we can report success unconditionally. Fixes: b67a8b29df7e ("arm64: mm: only initialize swiotlb when necessary") CC: Jisheng Zhang <jszhang@marvell.com> Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26swiotlb: Convert swiotlb_force from int to enumGeert Uytterhoeven2-2/+4
commit ae7871be189cb41184f1e05742b4a99e2c59774d upstream. Convert the flag swiotlb_force from an int to an enum, to prepare for the advent of more possible values. Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26arm64: Fix swiotlb fallback allocationAlexander Graf1-0/+2
commit 524dabe1c68e0bca25ce7b108099e5d89472a101 upstream. Commit b67a8b29df introduced logic to skip swiotlb allocation when all memory is DMA accessible anyway. While this is a great idea, __dma_alloc still calls swiotlb code unconditionally to allocate memory when there is no CMA memory available. The swiotlb code is called to ensure that we at least try get_free_pages(). Without initialization, swiotlb allocation code tries to access io_tlb_list which is NULL. That results in a stack trace like this: Unable to handle kernel NULL pointer dereference at virtual address 00000000 [...] [<ffff00000845b908>] swiotlb_tbl_map_single+0xd0/0x2b0 [<ffff00000845be94>] swiotlb_alloc_coherent+0x10c/0x198 [<ffff000008099dc0>] __dma_alloc+0x68/0x1a8 [<ffff000000a1b410>] drm_gem_cma_create+0x98/0x108 [drm] [<ffff000000abcaac>] drm_fbdev_cma_create_with_funcs+0xbc/0x368 [drm_kms_helper] [<ffff000000abcd84>] drm_fbdev_cma_create+0x2c/0x40 [drm_kms_helper] [<ffff000000abc040>] drm_fb_helper_initial_config+0x238/0x410 [drm_kms_helper] [<ffff000000abce88>] drm_fbdev_cma_init_with_funcs+0x98/0x160 [drm_kms_helper] [<ffff000000abcf90>] drm_fbdev_cma_init+0x40/0x58 [drm_kms_helper] [<ffff000000b47980>] vc4_kms_load+0x90/0xf0 [vc4] [<ffff000000b46a94>] vc4_drm_bind+0xec/0x168 [vc4] [...] Thankfully swiotlb code just learned how to not do allocations with the FORCE_NO option. This patch configures the swiotlb code to use that if we decide not to initialize the swiotlb framework. Fixes: b67a8b29df ("arm64: mm: only initialize swiotlb when necessary") Signed-off-by: Alexander Graf <agraf@suse.de> CC: Jisheng Zhang <jszhang@marvell.com> CC: Geert Uytterhoeven <geert+renesas@glider.be> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-19arm64: hugetlb: fix the wrong return value for huge_ptep_set_access_flagsHuang Shijie1-1/+1
commit 69d012345a1a32d3f03957f14d972efccc106a98 upstream. In current code, the @changed always returns the last one's status for the huge page with the contiguous bit set. This is really not what we want. Even one of the PTEs is changed, we should tell it to the caller. This patch fixes this issue. Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit") Signed-off-by: Huang Shijie <shijie.huang@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-19arm64: hugetlb: remove the wrong pmd check in find_num_contig()Huang Shijie1-12/+0
commit 20156ce2365d61beaa6f5a78a7a789044e0e7acc upstream. The find_num_contig() will return 1 when the pmd is not present. It will cause a kernel dead loop in the following scenaro: 1.) pmd entry is not present. 2.) the page fault occurs: ... hugetlb_fault() --> hugetlb_no_page() --> set_huge_pte_at() 3.) set_huge_pte_at() will only set the first PMD entry, since the find_num_contig just return 1 in this case. So the PMD entries are all empty except the first one. 4.) when kernel accesses the address mapped by the second PMD entry, a new page fault occurs: ... hugetlb_fault() --> huge_ptep_set_access_flags() The second PMD entry is still empty now. 5.) When the kernel returns, the access will cause a page fault again. The kernel will run like the "4)" above. We will see a dead loop since here. The dead loop is caught in the 32M hugetlb page (2M PMD + Contiguous bit). This patch removes wrong pmd check, and fixes this dead loop. This patch also removes the redundant checks for PGD/PUD in the find_num_contig(). Acked-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Huang Shijie <shijie.huang@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-19arm64: hugetlb: fix the wrong address for several functionsHuang Shijie1-4/+4
commit 0c2f0afe3582c58efeef93bc57bc07d502132618 upstream. The libhugetlbfs meets several failures since the following functions do not use the correct address: huge_ptep_get_and_clear() huge_ptep_set_access_flags() huge_ptep_set_wrprotect() huge_ptep_clear_flush() This patch fixes the wrong address for them. Signed-off-by: Huang Shijie <shijie.huang@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-26arm64/numa: fix incorrect log for memory-less nodeHanjun Guo1-2/+5
When booting on NUMA system with memory-less node (no memory dimm on this memory controller), the print for setup_node_data() is incorrect: NUMA: Initmem setup node 2 [mem 0x00000000-0xffffffffffffffff] It can be fixed by printing [mem 0x00000000-0x00000000] when end_pfn is 0, but print <memory-less node> will be more useful. Fixes: 1a2db300348b ("arm64, numa: Add NUMA support for arm64 platforms.") Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yisheng Xie <xieyisheng1@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-10-26arm64/numa: fix pcpu_cpu_distance() to get correct CPU proximityYisheng Xie1-1/+1
The pcpu_build_alloc_info() function group CPUs according to their proximity, by call callback function @cpu_distance_fn from different ARCHs. For arm64 the callback of @cpu_distance_fn is pcpu_cpu_distance(from, to) -> node_distance(from, to) The @from and @to for function node_distance() should be nid. However, pcpu_cpu_distance() in arch/arm64/mm/numa.c just past the cpu id for @from and @to, and didn't convert to numa node id. For this incorrect cpu proximity get from ARCH, it may cause each CPU in one group and make group_cnt out of bound: setup_per_cpu_areas() pcpu_embed_first_chunk() pcpu_build_alloc_info() in pcpu_build_alloc_info, since cpu_distance_fn will return REMOTE_DISTANCE if we pass cpu ids (0,1,2...), so cpu_distance_fn(cpu, tcpu) > LOCAL_DISTANCE will wrongly be ture. This may results in triggering the BUG_ON(unit != nr_units) later: [ 0.000000] kernel BUG at mm/percpu.c:1916! [ 0.000000] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.9.0-rc1-00003-g14155ca-dirty #26 [ 0.000000] Hardware name: Hisilicon Hi1616 Evaluation Board (DT) [ 0.000000] task: ffff000008d6e900 task.stack: ffff000008d60000 [ 0.000000] PC is at pcpu_embed_first_chunk+0x420/0x704 [ 0.000000] LR is at pcpu_embed_first_chunk+0x3bc/0x704 [ 0.000000] pc : [<ffff000008c754f4>] lr : [<ffff000008c75490>] pstate: 800000c5 [ 0.000000] sp : ffff000008d63eb0 [ 0.000000] x29: ffff000008d63eb0 [ 0.000000] x28: 0000000000000000 [ 0.000000] x27: 0000000000000040 [ 0.000000] x26: ffff8413fbfcef00 [ 0.000000] x25: 0000000000000042 [ 0.000000] x24: 0000000000000042 [ 0.000000] x23: 0000000000001000 [ 0.000000] x22: 0000000000000046 [ 0.000000] x21: 0000000000000001 [ 0.000000] x20: ffff000008cb3bc8 [ 0.000000] x19: ffff8413fbfcf570 [ 0.000000] x18: 0000000000000000 [ 0.000000] x17: ffff000008e49ae0 [ 0.000000] x16: 0000000000000003 [ 0.000000] x15: 000000000000001e [ 0.000000] x14: 0000000000000004 [ 0.000000] x13: 0000000000000000 [ 0.000000] x12: 000000000000006f [ 0.000000] x11: 00000413fbffff00 [ 0.000000] x10: 0000000000000004 [ 0.000000] x9 : 0000000000000000 [ 0.000000] x8 : 0000000000000001 [ 0.000000] x7 : ffff8413fbfcf63c [ 0.000000] x6 : ffff000008d65d28 [ 0.000000] x5 : ffff000008d65e50 [ 0.000000] x4 : 0000000000000000 [ 0.000000] x3 : ffff000008cb3cc8 [ 0.000000] x2 : 0000000000000040 [ 0.000000] x1 : 0000000000000040 [ 0.000000] x0 : 0000000000000000 [...] [ 0.000000] Call trace: [ 0.000000] Exception stack(0xffff000008d63ce0 to 0xffff000008d63e10) [ 0.000000] 3ce0: ffff8413fbfcf570 0001000000000000 ffff000008d63eb0 ffff000008c754f4 [ 0.000000] 3d00: ffff000008d63d50 ffff0000081af210 00000413fbfff010 0000000000001000 [ 0.000000] 3d20: ffff000008d63d50 ffff0000081af220 00000413fbfff010 0000000000001000 [ 0.000000] 3d40: 00000413fbfcef00 0000000000000004 ffff000008d63db0 ffff0000081af390 [ 0.000000] 3d60: 00000413fbfcef00 0000000000001000 0000000000000000 0000000000001000 [ 0.000000] 3d80: 0000000000000000 0000000000000040 0000000000000040 ffff000008cb3cc8 [ 0.000000] 3da0: 0000000000000000 ffff000008d65e50 ffff000008d65d28 ffff8413fbfcf63c [ 0.000000] 3dc0: 0000000000000001 0000000000000000 0000000000000004 00000413fbffff00 [ 0.000000] 3de0: 000000000000006f 0000000000000000 0000000000000004 000000000000001e [ 0.000000] 3e00: 0000000000000003 ffff000008e49ae0 [ 0.000000] [<ffff000008c754f4>] pcpu_embed_first_chunk+0x420/0x704 [ 0.000000] [<ffff000008c6658c>] setup_per_cpu_areas+0x38/0xc8 [ 0.000000] [<ffff000008c608d8>] start_kernel+0x10c/0x390 [ 0.000000] [<ffff000008c601d8>] __primary_switched+0x5c/0x64 [ 0.000000] Code: b8018660 17ffffd7 6b16037f 54000080 (d4210000) [ 0.000000] ---[ end trace 0000000000000000 ]--- [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! Fix by getting cpu's node id with early_cpu_to_node() then pass it to node_distance() as the original intention. Fixes: 7af3a0a99252 ("arm64/numa: support HAVE_SETUP_PER_CPU_AREA") Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com> Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-10-20arm64: remove pr_cont abuse from mem_initMark Rutland1-13/+13
All the lines printed by mem_init are independent, with each ending with a newline. While they logically form a large block, none are actually continuations of previous lines. The kernel-side printk code and the userspace demsg tool differ in their handling of KERN_CONT following a newline, and while this isn't always a problem kernel-side, it does cause difficulty for userspace. Using pr_cont causes the userspace tool to not print line prefix (e.g. timestamps) even when following a newline, mis-aligning the output and making it harder to read, e.g. [ 0.000000] Virtual kernel memory layout: [ 0.000000] modules : 0xffff000000000000 - 0xffff000008000000 ( 128 MB) vmalloc : 0xffff000008000000 - 0xffff7dffbfff0000 (129022 GB) .text : 0xffff000008080000 - 0xffff0000088b0000 ( 8384 KB) .rodata : 0xffff0000088b0000 - 0xffff000008c50000 ( 3712 KB) .init : 0xffff000008c50000 - 0xffff000008d50000 ( 1024 KB) .data : 0xffff000008d50000 - 0xffff000008e25200 ( 853 KB) .bss : 0xffff000008e25200 - 0xffff000008e6bec0 ( 284 KB) fixed : 0xffff7dfffe7fd000 - 0xffff7dfffec00000 ( 4108 KB) PCI I/O : 0xffff7dfffee00000 - 0xffff7dffffe00000 ( 16 MB) vmemmap : 0xffff7e0000000000 - 0xffff800000000000 ( 2048 GB maximum) 0xffff7e0000000000 - 0xffff7e0026000000 ( 608 MB actual) memory : 0xffff800000000000 - 0xffff800980000000 ( 38912 MB) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=6, Nodes=1 Fix this by using pr_notice consistently for all lines, which both the kernel and userspace are happy with. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-10-20arm64: mm: Set PSTATE.PAN from the cpu_enable_pan() callJames Morse1-0/+9
Commit 338d4f49d6f7 ("arm64: kernel: Add support for Privileged Access Never") enabled PAN by enabling the 'SPAN' feature-bit in SCTLR_EL1. This means the PSTATE.PAN bit won't be set until the next return to the kernel from userspace. On a preemptible kernel we may schedule work that accesses userspace on a CPU before it has done this. Now that cpufeature enable() calls are scheduled via stop_machine(), we can set PSTATE.PAN from the cpu_enable_pan() call. Add WARN_ON_ONCE(in_interrupt()) to check the PSTATE value we updated is not immediately discarded. Reported-by: Tony Thompson <anthony.thompson@arm.com> Reported-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> [will: fixed typo in comment] Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-10-20arm64: cpufeature: Schedule enable() calls instead of calling them via IPIJames Morse1-2/+4
The enable() call for a cpufeature/errata is called using on_each_cpu(). This issues a cross-call IPI to get the work done. Implicitly, this stashes the running PSTATE in SPSR when the CPU receives the IPI, and restores it when we return. This means an enable() call can never modify PSTATE. To allow PAN to do this, change the on_each_cpu() call to use stop_machine(). This schedules the work on each CPU which allows us to modify PSTATE. This involves changing the protype of all the enable() functions. enable_cpu_capabilities() is called during boot and enables the feature on all online CPUs. This path now uses stop_machine(). CPU features for hotplug'd CPUs are enabled by verify_local_cpu_features() which only acts on the local CPU, and can already modify the running PSTATE as it is called from secondary_start_kernel(). Reported-by: Tony Thompson <anthony.thompson@arm.com> Reported-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-10-11Merge tag 'iommu-updates-v4.9' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU updates from Joerg Roedel: - support for interrupt virtualization in the AMD IOMMU driver. These patches were shared with the KVM tree and are already merged through that tree. - generic DT-binding support for the ARM-SMMU driver. With this the driver now makes use of the generic DMA-API code. This also required some changes outside of the IOMMU code, but these are acked by the respective maintainers. - more cleanups and fixes all over the place. * tag 'iommu-updates-v4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (40 commits) iommu/amd: No need to wait iommu completion if no dte irq entry change iommu/amd: Free domain id when free a domain of struct dma_ops_domain iommu/amd: Use standard bitmap operation to set bitmap iommu/amd: Clean up the cmpxchg64 invocation iommu/io-pgtable-arm: Check for v7s-incapable systems iommu/dma: Avoid PCI host bridge windows iommu/dma: Add support for mapping MSIs iommu/arm-smmu: Set domain geometry iommu/arm-smmu: Wire up generic configuration support Docs: dt: document ARM SMMU generic binding usage iommu/arm-smmu: Convert to iommu_fwspec iommu/arm-smmu: Intelligent SMR allocation iommu/arm-smmu: Add a stream map entry iterator iommu/arm-smmu: Streamline SMMU data lookups iommu/arm-smmu: Refactor mmu-masters handling iommu/arm-smmu: Keep track of S2CR state iommu/arm-smmu: Consolidate stream map entry state iommu/arm-smmu: Handle stream IDs more dynamically iommu/arm-smmu: Set PRIVCFG in stage 1 STEs iommu/arm-smmu: Support non-PCI devices with SMMUv3 ...
2016-10-03Merge tag 'arm64-upstream' of ↵Linus Torvalds12-116/+189
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "It's a bit all over the place this time with no "killer feature" to speak of. Support for mismatched cache line sizes should help people seeing whacky JIT failures on some SoCs, and the big.LITTLE perf updates have been a long time coming, but a lot of the changes here are cleanups. We stray outside arch/arm64 in a few areas: the arch/arm/ arch_timer workaround is acked by Russell, the DT/OF bits are acked by Rob, the arch_timer clocksource changes acked by Marc, CPU hotplug by tglx and jump_label by Peter (all CC'd). Summary: - Support for execute-only page permissions - Support for hibernate and DEBUG_PAGEALLOC - Support for heterogeneous systems with mismatches cache line sizes - Errata workarounds (A53 843419 update and QorIQ A-008585 timer bug) - arm64 PMU perf updates, including cpumasks for heterogeneous systems - Set UTS_MACHINE for building rpm packages - Yet another head.S tidy-up - Some cleanups and refactoring, particularly in the NUMA code - Lots of random, non-critical fixes across the board" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (100 commits) arm64: tlbflush.h: add __tlbi() macro arm64: Kconfig: remove SMP dependence for NUMA arm64: Kconfig: select OF/ACPI_NUMA under NUMA config arm64: fix dump_backtrace/unwind_frame with NULL tsk arm/arm64: arch_timer: Use archdata to indicate vdso suitability arm64: arch_timer: Work around QorIQ Erratum A-008585 arm64: arch_timer: Add device tree binding for A-008585 erratum arm64: Correctly bounds check virt_addr_valid arm64: migrate exception table users off module.h and onto extable.h arm64: pmu: Hoist pmu platform device name arm64: pmu: Probe default hw/cache counters arm64: pmu: add fallback probe table MAINTAINERS: Update ARM PMU PROFILING AND DEBUGGING entry arm64: Improve kprobes test for atomic sequence arm64/kvm: use alternative auto-nop arm64: use alternative auto-nop arm64: alternative: add auto-nop infrastructure arm64: lse: convert lse alternatives NOP padding to use __nops arm64: barriers: introduce nops and __nops macros for NOP sequences arm64: sysreg: replace open-coded mrs_s/msr_s with {read,write}_sysreg_s ...
2016-09-20Merge branch 'for-joerg/arm-smmu/updates' of ↵Joerg Roedel1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into arm/smmu
2016-09-20arm64: migrate exception table users off module.h and onto extable.hPaul Gortmaker2-2/+2
These files were only including module.h for exception table related functions. We've now separated that content out into its own file "extable.h" so now move over to that and avoid all the extra header content in module.h that we don't really need to compile these files. Cc: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-09-16iommu/dma: Avoid PCI host bridge windowsRobin Murphy1-1/+1
With our DMA ops enabled for PCI devices, we should avoid allocating IOVAs which a host bridge might misinterpret as peer-to-peer DMA and lead to faults, corruption or other badness. To be safe, punch out holes for all of the relevant host bridge's windows when initialising a DMA domain for a PCI device. CC: Marek Szyprowski <m.szyprowski@samsung.com> CC: Inki Dae <inki.dae@samsung.com> Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-09-12arm64: use alternative auto-nopMark Rutland1-7/+2
Make use of the new alternative_if and alternative_else_nop_endif and get rid of our homebew NOP sleds, making the code simpler to read. Note that for cpu_do_switch_mm the ret has been moved out of the alternative sequence, and in the default case there will be three additional NOPs executed. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-09-09arm64/numa: remove the limitation that cpu0 must bind to node0Zhen Lei1-6/+10
1. Remove the old binding code. 2. Read the nid of cpu0 from dts. 3. Fallback the nid of cpu0 to 0 when numa=off is set in bootargs. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-09-09arm64/numa: remove some useless codeZhen Lei1-4/+0
When the deleted code is executed, only the bit of cpu0 was set on cpu_possible_mask. So that, only set_cpu_numa_node(0, NUMA_NO_NODE); will be executed. And map_cpu_to_node(0, 0) will soon be called. So these code can be safely removed. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-09-09arm64/numa: support HAVE_SETUP_PER_CPU_AREAZhen Lei1-0/+52
To make each percpu area allocated from its local numa node. Without this patch, all percpu areas will be allocated from the node which cpu0 belongs to. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-09-09arm64: numa: Use pr_fmt()Kefeng Wang1-19/+18
Use pr_fmt to prefix kernel output, and remove duplicated msg of NUMA turned off. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-09-09arm64/numa: avoid inconsistent information to be printedZhen Lei1-3/+3
numa_init may return error because of numa configuration error. So "No NUMA configuration found" is inaccurate. In fact, specific configuration error information should be immediately printed by the testing branch. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-09-06arm64: mm: drop fixup_init() and mm.hKefeng Wang5-21/+7
There is only fixup_init() in mm.h , and it is only called in free_initmem(), so move the codes from fixup_init() into free_initmem(), then drop fixup_init() and mm.h. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>