summaryrefslogtreecommitdiff
path: root/arch/x86/mm/dump_pagetables.c
AgeCommit message (Collapse)AuthorFilesLines
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441Thomas Gleixner1-5/+1
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation version 2 of the license extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 315 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Armijn Hemel <armijn@tjaldur.nl> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531190115.503150771@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-07Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "This includes the following changes: - cpu_has() cleanups - sync_bitops.h modernization to the rmwcc.h facility, similarly to bitops.h - continued LTO annotations/fixes - misc cleanups and smaller cleanups" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/um/vdso: Drop unnecessary cc-ldoption x86/vdso: Rename variable to fix -Wshadow warning x86/cpu/amd: Exclude 32bit only assembler from 64bit build x86/asm: Mark all top level asm statements as .text x86/build/vdso: Add FORCE to the build rule of %.so x86/asm: Modernize sync_bitops.h x86/mm: Convert some slow-path static_cpu_has() callers to boot_cpu_has() x86: Convert some slow-path static_cpu_has() callers to boot_cpu_has() x86/asm: Clarify static_cpu_has()'s intended use x86/uaccess: Fix implicit cast of __user pointer x86/cpufeature: Remove __pure attribute to _static_cpu_has()
2019-04-16x86/mm: Prevent bogus warnings with "noexec=off"Thomas Gleixner1-1/+2
Xose Vazquez Perez reported boot warnings when NX is disabled on the kernel command line. __early_set_fixmap() triggers this warning: attempted to set unsupported pgprot: 8000000000000163 bits: 8000000000000000 supported: 7fffffffffffffff WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/pgtable.h:537 __early_set_fixmap+0xa2/0xff because it uses __default_kernel_pte_mask to mask out unsupported bits. Use __supported_pte_mask instead. Disabling NX on the command line also triggers the NX warning in the page table mapping check: WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:262 note_page+0x2ae/0x650 .... Make the warning depend on NX set in __supported_pte_mask. Reported-by: Xose Vazquez Perez <xose.vazquez@gmail.com> Tested-by: Xose Vazquez Perez <xose.vazquez@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1904151037530.1729@nanos.tec.linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-08x86/mm: Convert some slow-path static_cpu_has() callers to boot_cpu_has()Borislav Petkov1-2/+2
Using static_cpu_has() is pointless on those paths, convert them to the boot_cpu_has() variant. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Link: https://lkml.kernel.org/r/20190330112022.28888-5-bp@alien8.de
2019-02-14x86/mm/dump_pagetables: Remove the unused prev_pud variableShaokun Zhang1-2/+0
The 'prev_pud' local variable became unused after commit 04b67022fb6d ("x86/mm/dump_pagetables: Speed up page tables dump for CONFIG_KASAN=y"). Remove it. Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1550136829-49088-1-git-send-email-zhangshaokun@hisilicon.com
2018-12-28kasan: rename kasan_zero_page to kasan_early_shadow_pageAndrey Konovalov1-5/+6
With tag based KASAN mode the early shadow value is 0xff and not 0x00, so this patch renames kasan_zero_(page|pte|pmd|pud|p4d) to kasan_early_shadow_(page|pte|pmd|pud|p4d) to avoid confusion. Link: http://lkml.kernel.org/r/3fed313280ebf4f88645f5b89ccbc066d320e177.1544099024.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Suggested-by: Mark Rutland <mark.rutland@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-11x86/dump_pagetables: Fix LDT remap address markerKirill A. Shutemov1-5/+2
The LDT remap placement has been changed. It's now placed before the direct mapping in the kernel virtual address space for both paging modes. Change address markers order accordingly. Fixes: d52888aa2753 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: luto@kernel.org Cc: peterz@infradead.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: bhe@redhat.com Cc: hans.van.kranenburg@mendix.com Cc: linux-mm@kvack.org Cc: xen-devel@lists.xenproject.org Link: https://lkml.kernel.org/r/20181130202328.65359-3-kirill.shutemov@linux.intel.com
2018-12-11x86/mm: Fix guard hole handlingKirill A. Shutemov1-4/+4
There is a guard hole at the beginning of the kernel address space, also used by hypervisors. It occupies 16 PGD entries. This reserved range is not defined explicitely, it is calculated relative to other entities: direct mapping and user space ranges. The calculation got broken by recent changes of the kernel memory layout: LDT remap range is now mapped before direct mapping and makes the calculation invalid. The breakage leads to crash on Xen dom0 boot[1]. Define the reserved range explicitely. It's part of kernel ABI (hypervisors expect it to be stable) and must not depend on changes in the rest of kernel memory layout. [1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03313.html Fixes: d52888aa2753 ("x86/mm: Move LDT remap out of KASLR region on 5-level paging") Reported-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: luto@kernel.org Cc: peterz@infradead.org Cc: boris.ostrovsky@oracle.com Cc: bhe@redhat.com Cc: linux-mm@kvack.org Cc: xen-devel@lists.xenproject.org Link: https://lkml.kernel.org/r/20181130202328.65359-2-kirill.shutemov@linux.intel.com
2018-10-10x86/mm: Do not warn about PCI BIOS W+X mappingsThomas Gleixner1-8/+27
PCI BIOS requires the BIOS area 0x0A0000-0x0FFFFFF to be mapped W+X for various legacy reasons. When CONFIG_DEBUG_WX is enabled, this triggers the WX warning, but this is misleading because the mapping is required and is not a result of an accidental oversight. Prevent the full warning when PCI BIOS is enabled and the detected WX mapping is in the BIOS area. Just emit a pr_warn() which denotes the fact. This is partially duplicating the info which the PCI BIOS code emits when it maps the area as executable, but that info is not in the context of the WX checking output. Remove the extra %p printout in the WARN_ONCE() while at it. %pS is enough. Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Borislav Petkov <bp@suse.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: Kees Cook <keescook@chromium.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1810082151160.2455@nanos.tec.linutronix.de
2018-08-10x86/mm/pti: Move user W+X check into pti_finalize()Joerg Roedel1-3/+3
The user page-table gets the updated kernel mappings in pti_finalize(), which runs after the RO+X permissions got applied to the kernel page-table in mark_readonly(). But with CONFIG_DEBUG_WX enabled, the user page-table is already checked in mark_readonly() for insecure mappings. This causes false-positive warnings, because the user page-table did not get the updated mappings yet. Move the W+X check for the user page-table into pti_finalize() after it updated all required mappings. [ tglx: Folded !NX supported fix ] Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Waiman Long <llong@redhat.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca> Cc: joro@8bytes.org Link: https://lkml.kernel.org/r/1533727000-9172-1-git-send-email-joro@8bytes.org
2018-07-20x86/ldt: Reserve address-space range on 32 bit for the LDTJoerg Roedel1-0/+9
Reserve 2MB/4MB of address-space for mapping the LDT to user-space on 32 bit PTI kernels. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Pavel Machek <pavel@ucw.cz> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Waiman Long <llong@redhat.com> Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca> Cc: joro@8bytes.org Link: https://lkml.kernel.org/r/1531906876-13451-34-git-send-email-joro@8bytes.org
2018-07-20x86/mm/dump_pagetables: Define INIT_PGDJoerg Roedel1-6/+6
Define INIT_PGD to point to the correct initial page-table for 32 and 64 bit and use it where needed. This fixes the build on 32 bit with CONFIG_PAGE_TABLE_ISOLATION enabled. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Pavel Machek <pavel@ucw.cz> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: linux-mm@kvack.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Waiman Long <llong@redhat.com> Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca> Cc: joro@8bytes.org Link: https://lkml.kernel.org/r/1531906876-13451-32-git-send-email-joro@8bytes.org
2018-05-19x86/mm: Stop pretending pgtable_l5_enabled is a variableKirill A. Shutemov1-3/+3
pgtable_l5_enabled is defined using cpu_feature_enabled() but we refer to it as a variable. This is misleading. Make pgtable_l5_enabled() a function. We cannot literally define it as a function due to circular dependencies between header files. Function-alike macros is close enough. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180518103528.59260-4-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-17x86/mm: Prevent kernel Oops in PTDUMP code with HIGHPTE=yJoerg Roedel1-5/+6
The walk_pte_level() function just uses __va to get the virtual address of the PTE page, but that breaks when the PTE page is not in the direct mapping with HIGHPTE=y. The result is an unhandled kernel paging request at some random address when accessing the current_kernel or current_user file. Use the correct API to access PTE pages. Fixes: fe770bf0310d ('x86: clean up the page table dumper and add 32-bit support') Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: jgross@suse.com Cc: JBeulich@suse.com Cc: hpa@zytor.com Cc: aryabinin@virtuozzo.com Cc: kirill.shutemov@linux.intel.com Link: https://lkml.kernel.org/r/1523971636-4137-1-git-send-email-joro@8bytes.org
2018-02-26x86/mm: Consider effective protection attributes in W+X checkJan Beulich1-36/+58
Using just the leaf page table entry flags would cause a false warning in case _PAGE_RW is clear or _PAGE_NX is set in a higher level entry. Hand through both the current entry's flags as well as the accumulated effective value (the latter as pgprotval_t instead of pgprot_t, as it's not an actual entry's value). This in particular eliminates the false W+X warning when running under Xen, as commit: 2cc42bac1c ("x86-64/Xen: eliminate W+X mappings") had to make the necessary adjustment in L2 rather than L1 (the reason is explained there). I.e. _PAGE_RW is clear there in L1, but _PAGE_NX is set in L2. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/5A8FDE8902000078001AABBB@prv-mh.provo.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-16x86/mm: Replace compile-time checks for 5-level paging with runtime-time checksKirill A. Shutemov1-3/+1
This patch converts the of CONFIG_X86_5LEVEL check to runtime checks for p4d folding. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180214182542.69302-9-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-14x86/mm: Make __VIRTUAL_MASK_SHIFT dynamicKirill A. Shutemov1-2/+10
For boot-time switching between paging modes, we need to be able to adjust virtual mask shifts. The change doesn't affect the kernel image size much: text data bss dec hex filename 8628892 4734340 1368064 14731296 e0c820 vmlinux.before 8628966 4734340 1368064 14731370 e0c86a vmlinux.after Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180214111656.88514-9-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-14x86/mm: Make PGDIR_SHIFT and PTRS_PER_P4D variableKirill A. Shutemov1-7/+5
For boot-time switching between 4- and 5-level paging we need to be able to fold p4d page table level at runtime. It requires variable PGDIR_SHIFT and PTRS_PER_P4D. The change doesn't affect the kernel image size much: text data bss dec hex filename 8628091 4734304 1368064 14730459 e0c4db vmlinux.before 8628393 4734340 1368064 14730797 e0c62d vmlinux.after Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180214111656.88514-7-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-14x86/mm: Make LDT_BASE_ADDR dynamicKirill A. Shutemov1-1/+4
LDT_BASE_ADDR has different value in 4- and 5-level paging configurations. We need to make it dynamic in preparation for boot-time switching between paging modes. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180214111656.88514-6-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-05x86/mm: Map cpu_entry_area at the same place on 4/5 levelThomas Gleixner1-1/+1
There is no reason for 4 and 5 level pagetables to have a different layout. It just makes determining vaddr_end for KASLR harder than necessary. Fixes: 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Gilbert <benjamin.gilbert@coreos.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable <stable@vger.kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Garnier <thgarnie@google.com>, Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801041320360.1771@nanos
2017-12-23x86/mm/dump_pagetables: Allow dumping current pagetablesThomas Gleixner1-1/+5
Add two debugfs files which allow to dump the pagetable of the current task. current_kernel dumps the regular page table. This is the page table which is normally shared between kernel and user space. If kernel page table isolation is enabled this is the kernel space mapping. If kernel page table isolation is enabled the second file, current_user, dumps the user space page table. These files allow to verify the resulting page tables for page table isolation, but even in the normal case its useful to be able to inspect user space page tables of current for debugging purposes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-23x86/mm/dump_pagetables: Check user space page table for WX pagesThomas Gleixner1-5/+25
ptdump_walk_pgd_level_checkwx() checks the kernel page table for WX pages, but does not check the PAGE_TABLE_ISOLATION user space page table. Restructure the code so that dmesg output is selected by an explicit argument and not implicit via checking the pgd argument for !NULL. Add the check for the user space page table. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-23x86/pti: Put the LDT in its own PGD if PTI is onAndy Lutomirski1-0/+9
With PTI enabled, the LDT must be mapped in the usermode tables somewhere. The LDT is per process, i.e. per mm. An earlier approach mapped the LDT on context switch into a fixmap area, but that's a big overhead and exhausted the fixmap space when NR_CPUS got big. Take advantage of the fact that there is an address space hole which provides a completely unused pgd. Use this pgd to manage per-mm LDT mappings. This has a down side: the LDT isn't (currently) randomized, and an attack that can write the LDT is instant root due to call gates (thanks, AMD, for leaving call gates in AMD64 but designing them wrong so they're only useful for exploits). This can be mitigated by making the LDT read-only or randomizing the mapping, either of which is strightforward on top of this patch. This will significantly slow down LDT users, but that shouldn't matter for important workloads -- the LDT is only used by DOSEMU(2), Wine, and very old libc implementations. [ tglx: Cleaned it up. ] Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-22x86/cpu_entry_area: Move it out of the fixmapThomas Gleixner1-1/+5
Put the cpu_entry_area into a separate P4D entry. The fixmap gets too big and 0-day already hit a case where the fixmap PTEs were cleared by cleanup_highmap(). Aside of that the fixmap API is a pain as it's all backwards. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-22x86/mm/dump_pagetables: Make the address hints correct and readableThomas Gleixner1-37/+53
The address hints are a trainwreck. The array entry numbers have to kept magically in sync with the actual hints, which is doomed as some of the array members are initialized at runtime via the entry numbers. Designated initializers have been around before this code was implemented.... Use the entry numbers to populate the address hints array and add the missing bits and pieces. Split 32 and 64 bit for readability sake. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-12-22x86/mm/dump_pagetables: Check PAGE_PRESENT for realThomas Gleixner1-1/+1
The check for a present page in printk_prot(): if (!pgprot_val(prot)) { /* Not present */ is bogus. If a PTE is set to PAGE_NONE then the pgprot_val is not zero and the entry is decoded in bogus ways, e.g. as RX GLB. That is confusing when analyzing mapping correctness. Check for the present bit to make an informed decision. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-25x86/mm/dump_pagetables: Speed up page tables dump for CONFIG_KASAN=yAndrey Ryabinin1-23/+41
KASAN fills kernel page tables with repeated values to map several TBs of the virtual memory to the single kasan_zero_page: kasan_zero_p4d -> kasan_zero_pud -> kasan_zero_pmd-> kasan_zero_pte-> kasan_zero_page Walking the whole KASAN shadow range takes a lot of time, especially with 5-level page tables. Since we already know that all kasan page tables eventually point to the kasan_zero_page we could call note_page() right and avoid walking lower levels of the page tables. This will not affect the output of the kernel_page_tables file, but let us avoid spending time in page table walkers: Before: $ time cat /sys/kernel/debug/kernel_page_tables > /dev/null real 0m55.855s user 0m0.000s sys 0m55.840s After: $ time cat /sys/kernel/debug/kernel_page_tables > /dev/null real 0m0.054s user 0m0.000s sys 0m0.054s Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170724152558.24689-1-aryabinin@virtuozzo.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-21x86/mm/dump_pagetables: Fix printout of p4d levelKirill A. Shutemov1-9/+9
Modify printk_prot() and callers to print out additional page table level correctly. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170716225954.74185-3-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-21x86/mm/dump_pagetables: Generalize address normalizationKirill A. Shutemov1-5/+6
Modify normalize_addr to handle different sizes of virtual address space. It's preparation for enabling 5-level paging. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170716225954.74185-2-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-13x86/boot/64: Rename init_level4_pgt and early_level4_pgtKirill A. Shutemov1-1/+1
With CONFIG_X86_5LEVEL=y, level 4 is no longer top level of page tables. Let's give these variable more generic names: init_top_pgt and early_top_pgt. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170606113133.22974-9-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-13x86/mm: Fix dump pagetables for 4 levels of page tablesJuergen Gross1-1/+1
Commit fdd3d8ce0ea62 ("x86/dump_pagetables: Add support for 5-level paging") introduced an error for dumping with only 4 levels by setting PGD_LEVEL_MULT to a wrong value. This is leading to e.g. addresses printed as "(null)" for ranges: x86/mm: Found insecure W+X mapping at address (null)/(null) Make PGD_LEVEL_MULT a multiple of PTRS_PER_P4D instead of PTRS_PER_PUD Fixes: fdd3d8ce0ea62 ("x86/dump_pagetables: Add support for 5-level paging") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/20170412143634.6846-1-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-30x86/dump_pagetables: Add support for 5-level pagingKirill A. Shutemov1-14/+45
Simple extension to support one more page table level. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170328104806.41711-1-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-18x86/mm: Correct fixmap header usage on adaptable MODULES_ENDThomas Garnier1-1/+0
This patch removes fixmap header usage on non-x86 code that was introduced by the adaptable MODULE_END change. Signed-off-by: Thomas Garnier <thgarnie@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170317175034.4701-1-thgarnie@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-16x86/mm: Adapt MODULES_END based on fixmap section sizeThomas Garnier1-0/+1
This patch aligns MODULES_END to the beginning of the fixmap section. It optimizes the space available for both sections. The address is pre-computed based on the number of pages required by the fixmap section. It will allow GDT remapping in the fixmap section. The current MODULES_END static address does not provide enough space for the kernel to support a large number of processors. Signed-off-by: Thomas Garnier <thgarnie@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@suse.de> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Luis R . Rodriguez <mcgrof@kernel.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: kasan-dev@googlegroups.com Cc: kernel-hardening@lists.openwall.com Cc: kvm@vger.kernel.org Cc: lguest@lists.ozlabs.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-pm@vger.kernel.org Cc: xen-devel@lists.xenproject.org Cc: zijun_hu <zijun_hu@htc.com> Link: http://lkml.kernel.org/r/20170314170508.100882-1-thgarnie@google.com [ Small build fix. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-16x86/mm/ptdump: Add address marker for KASAN shadow regionAndrey Ryabinin1-0/+9
Annotate the KASAN shadow with address markers in page table dump output: $ cat /sys/kernel/debug/kernel_page_tables ... ---[ Vmemmap ]--- 0xffffea0000000000-0xffffea0003000000 48M RW PSE GLB NX pmd 0xffffea0003000000-0xffffea0004000000 16M pmd 0xffffea0004000000-0xffffea0005000000 16M RW PSE GLB NX pmd 0xffffea0005000000-0xffffea0040000000 944M pmd 0xffffea0040000000-0xffffea8000000000 511G pud 0xffffea8000000000-0xffffec0000000000 1536G pgd ---[ KASAN shadow ]--- 0xffffec0000000000-0xffffed0000000000 1T ro GLB NX pte 0xffffed0000000000-0xffffed0018000000 384M RW PSE GLB NX pmd 0xffffed0018000000-0xffffed0020000000 128M pmd 0xffffed0020000000-0xffffed0028200000 130M RW PSE GLB NX pmd 0xffffed0028200000-0xffffed0040000000 382M pmd 0xffffed0040000000-0xffffed8000000000 511G pud 0xffffed8000000000-0xfffff50000000000 7680G pgd 0xfffff50000000000-0xfffffbfff0000000 7339776M ro GLB NX pte 0xfffffbfff0000000-0xfffffbfff0200000 2M pmd 0xfffffbfff0200000-0xfffffbfff0a00000 8M RW PSE GLB NX pmd 0xfffffbfff0a00000-0xfffffbffffe00000 244M pmd 0xfffffbffffe00000-0xfffffc0000000000 2M ro GLB NX pte ---[ KASAN shadow end ]--- 0xfffffc0000000000-0xffffff0000000000 3T pgd ---[ ESPfix Area ]--- ... Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: kasan-dev@googlegroups.com Cc: Tobias Regnery <tobias.regnery@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Link: http://lkml.kernel.org/r/20170214100839.17186-2-aryabinin@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-16x86/mm/ptdump: Optimize check for W+X mappings for CONFIG_KASAN=yAndrey Ryabinin1-1/+15
Enabling both DEBUG_WX=y and KASAN=y options significantly increases boot time (dozens of seconds at least). KASAN fills kernel page tables with repeated values to map several TBs of the virtual memory to the single kasan_zero_page: kasan_zero_pud -> kasan_zero_pmd-> kasan_zero_pte-> kasan_zero_page So, the page table walker used to find W+X mapping check the same kasan_zero_p?d page table entries a lot more than once. With patch pud walker will skip the pud if it has the same value as the previous one . Skipping done iff we search for W+X mappings, so this optimization won't affect the page table dump via debugfs. This dropped time spend in W+X check from ~30 sec to reasonable 0.1 sec: Before: [ 4.579991] Freeing unused kernel memory: 1000K [ 35.257523] x86/mm: Checked W+X mappings: passed, no W+X pages found. After: [ 5.138756] Freeing unused kernel memory: 1000K [ 5.266496] x86/mm: Checked W+X mappings: passed, no W+X pages found. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: kasan-dev@googlegroups.com Cc: Tobias Regnery <tobias.regnery@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Link: http://lkml.kernel.org/r/20170214100839.17186-1-aryabinin@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-10x86/mm/ptdump: Fix soft lockup in page table walkerAndrey Ryabinin1-0/+2
CONFIG_KASAN=y needs a lot of virtual memory mapped for its shadow. In that case ptdump_walk_pgd_level_core() takes a lot of time to walk across all page tables and doing this without a rescheduling causes soft lockups: NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [swapper/0:1] ... Call Trace: ptdump_walk_pgd_level_core+0x40c/0x550 ptdump_walk_pgd_level_checkwx+0x17/0x20 mark_rodata_ro+0x13b/0x150 kernel_init+0x2f/0x120 ret_from_fork+0x2c/0x40 I guess that this issue might arise even without KASAN on huge machines with several terabytes of RAM. Stick cond_resched() in pgd loop to fix this. Reported-by: Tobias Regnery <tobias.regnery@gmail.com> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: kasan-dev@googlegroups.com Cc: Alexander Potapenko <glider@google.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20170210095405.31802-1-aryabinin@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-08-01Merge branch 'x86-headers-for-linus' of ↵Linus Torvalds1-5/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 header cleanups from Ingo Molnar: "This tree is a cleanup of the x86 tree reducing spurious uses of module.h - which should improve build performance a bit" * 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, crypto: Restore MODULE_LICENSE() to glue_helper.c so it loads x86/apic: Remove duplicated include from probe_64.c x86/ce4100: Remove duplicated include from ce4100.c x86/headers: Include spinlock_types.h in x8664_ksyms_64.c for missing spinlock_t x86/platform: Delete extraneous MODULE_* tags fromm ts5500 x86: Audit and remove any remaining unnecessary uses of module.h x86/kvm: Audit and remove any unnecessary uses of module.h x86/xen: Audit and remove any unnecessary uses of module.h x86/platform: Audit and remove any unnecessary uses of module.h x86/lib: Audit and remove any unnecessary uses of module.h x86/kernel: Audit and remove any unnecessary uses of module.h x86/mm: Audit and remove any unnecessary uses of module.h x86: Don't use module.h just for AUTHOR / LICENSE tags
2016-07-14x86: Don't use module.h just for AUTHOR / LICENSE tagsPaul Gortmaker1-5/+1
The Kconfig controlling compilation of these files are: arch/x86/Kconfig.debug:config DEBUG_RODATA_TEST arch/x86/Kconfig.debug: bool "Testcase for the marking rodata read-only" arch/x86/Kconfig.debug:config X86_PTDUMP_CORE arch/x86/Kconfig.debug: def_bool n ...meaning that it currently is not being built as a module by anyone. Lets remove the couple traces of modular infrastructure use, so that when reading the driver there is no doubt it is builtin-only. We delete the MODULE_LICENSE tag etc. since all that information is already contained at the top of the file in the comments. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160714001901.31603-2-paul.gortmaker@windriver.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-08x86/mm: Implement ASLR for kernel memory regionsThomas Garnier1-4/+12
Randomizes the virtual address space of kernel memory regions for x86_64. This first patch adds the infrastructure and does not randomize any region. The following patches will randomize the physical memory mapping, vmalloc and vmemmap regions. This security feature mitigates exploits relying on predictable kernel addresses. These addresses can be used to disclose the kernel modules base addresses or corrupt specific structures to elevate privileges bypassing the current implementation of KASLR. This feature can be enabled with the CONFIG_RANDOMIZE_MEMORY option. The order of each memory region is not changed. The feature looks at the available space for the regions based on different configuration options and randomizes the base and space between each. The size of the physical memory mapping is the available physical memory. No performance impact was detected while testing the feature. Entropy is generated using the KASLR early boot functions now shared in the lib directory (originally written by Kees Cook). Randomization is done on PGD & PUD page table levels to increase possible addresses. The physical memory mapping code was adapted to support PUD level virtual addresses. This implementation on the best configuration provides 30,000 possible virtual addresses in average for each memory region. An additional low memory page is used to ensure each CPU can start with a PGD aligned virtual address (for realmode). x86/dump_pagetable was updated to correctly display each region. Updated documentation on x86_64 memory layout accordingly. Performance data, after all patches in the series: Kernbench shows almost no difference (-+ less than 1%): Before: Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.63 (1.2695) User Time 1034.89 (1.18115) System Time 87.056 (0.456416) Percent CPU 1092.9 (13.892) Context Switches 199805 (3455.33) Sleeps 97907.8 (900.636) After: Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.489 (1.10636) User Time 1034.86 (1.36053) System Time 87.764 (0.49345) Percent CPU 1095 (12.7715) Context Switches 199036 (4298.1) Sleeps 97681.6 (1031.11) Hackbench shows 0% difference on average (hackbench 90 repeated 10 times): attemp,before,after 1,0.076,0.069 2,0.072,0.069 3,0.066,0.066 4,0.066,0.068 5,0.066,0.067 6,0.066,0.069 7,0.067,0.066 8,0.063,0.067 9,0.067,0.065 10,0.068,0.071 average,0.0677,0.0677 Signed-off-by: Thomas Garnier <thgarnie@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Alexander Popov <alpopov@ptsecurity.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Baoquan He <bhe@redhat.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lv Zheng <lv.zheng@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: kernel-hardening@lists.openwall.com Cc: linux-doc@vger.kernel.org Link: http://lkml.kernel.org/r/1466556426-32664-6-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-20x86/mm/ptdump: Remove paravirt_enabled()Borislav Petkov1-6/+5
is_hypervisor_range() can simply check if the PGD index is within ffff800000000000 - ffff87ffffffffff which is the range reserved for a hypervisor. That range is practically an ABI, see Documentation/x86/x86_64/mm.txt. Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> # Under Xen, as PV guest Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1455825641-19585-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds1-32/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "The main changes in this cycle were: - make the debugfs 'kernel_page_tables' file read-only, as it only has read ops. (Borislav Petkov) - micro-optimize clflush_cache_range() (Chris Wilson) - swiotlb enhancements, which fixes certain KVM emulated devices (Igor Mammedov) - fix an LDT related debug message (Jan Beulich) - modularize CONFIG_X86_PTDUMP (Kees Cook) - tone down an overly alarming warning (Laura Abbott) - Mark variable __initdata (Rasmus Villemoes) - PAT additions (Toshi Kani)" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Micro-optimise clflush_cache_range() x86/mm/pat: Change free_memtype() to support shrinking case x86/mm/pat: Add untrack_pfn_moved for mremap x86/mm: Drop WARN from multi-BAR check x86/LDT: Print the real LDT base address x86/mm/64: Enable SWIOTLB if system has SRAT memory regions above MAX_DMA32_PFN x86/mm: Introduce max_possible_pfn x86/mm/ptdump: Make (debugfs)/kernel_page_tables read-only x86/mm/mtrr: Mark the 'range_new' static variable in mtrr_calc_range_state() as __initdata x86/mm: Turn CONFIG_X86_PTDUMP into a module
2015-12-15Fix user-visible spelling errorLinus Torvalds1-1/+1
Pavel Machek reports a warning about W+X pages found in the "Persisent" kmap area. After grepping for it (using the correct spelling), and not finding it, I noticed how the debug printk was just misspelled. Fix it. The actual mapping bug that Pavel reported is still open. It's apparently a separate issue from the known EFI page tables, looks like it's related to the HIGHMEM mappings. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-23x86/mm: Turn CONFIG_X86_PTDUMP into a moduleKees Cook1-32/+2
Being able to examine page tables is handy, so make this a module that can be loaded as needed. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vladimir Murzin <vladimir.murzin@arm.com> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/20151120010755.GA9060@www.outflux.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-07x86/mm: Skip the hypervisor range when walking PGDBoris Ostrovsky1-1/+16
The range between 0xffff800000000000 and 0xffff87ffffffffff is reserved for hypervisor and therefore we should not try to follow PGD's indexes corresponding to those addresses. While this has always been a problem, with the new W+X warning mechanism ptdump_walk_pgd_level_core() can now be called during boot, causing a PV Xen guest to crash. [ tglx: Replaced the macro with a readable inline ] Fixes: e1a58320a38d "x86/mm: Warn on W^X mappings" Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: xen-devel@lists.xen.org Link: http://lkml.kernel.org/r/1446749795-27764-1-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-06x86/mm: Warn on W^X mappingsStephen Smalley1-1/+41
Warn on any residual W+X mappings after setting NX if DEBUG_WX is enabled. Introduce a separate X86_PTDUMP_CORE config that enables the code for dumping the page tables without enabling the debugfs interface, so that DEBUG_WX can be enabled without exposing the debugfs interface. Switch EFI_PGT_DUMP to using X86_PTDUMP_CORE so that it also does not require enabling the debugfs interface. On success it prints this to the kernel log: x86/mm: Checked W+X mappings: passed, no W+X pages found. On failure it prints a warning and a count of the failed pages: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:226 note_page+0x610/0x7b0() x86/mm: Found insecure W+X mapping at address ffffffff81755000/__stop___ex_table+0xfa8/0xabfa8 [...] Call Trace: [<ffffffff81380a5f>] dump_stack+0x44/0x55 [<ffffffff8109d3f2>] warn_slowpath_common+0x82/0xc0 [<ffffffff8109d48c>] warn_slowpath_fmt+0x5c/0x80 [<ffffffff8106cfc9>] ? note_page+0x5c9/0x7b0 [<ffffffff8106d010>] note_page+0x610/0x7b0 [<ffffffff8106d409>] ptdump_walk_pgd_level_core+0x259/0x3c0 [<ffffffff8106d5a7>] ptdump_walk_pgd_level_checkwx+0x17/0x20 [<ffffffff81063905>] mark_rodata_ro+0xf5/0x100 [<ffffffff817415a0>] ? rest_init+0x80/0x80 [<ffffffff817415bd>] kernel_init+0x1d/0xe0 [<ffffffff8174cd1f>] ret_from_fork+0x3f/0x70 [<ffffffff817415a0>] ? rest_init+0x80/0x80 ---[ end trace a1f23a1e42a2ac76 ]--- x86/mm: Checked W+X mappings: FAILED, 171 W+X pages found. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1444064120-11450-1-git-send-email-sds@tycho.nsa.gov [ Improved the Kconfig help text and made the new option default-y if CONFIG_DEBUG_RODATA=y, because it already found buggy mappings, so we really want people to have this on by default. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-22x86/mm: Fix page table dump to show PAT bitToshi Kani1-18/+21
/sys/kernel/debug/kernel_page_tables does not show the PAT bit for PUD/PMD mappings. This is because walk_pud_level(), walk_pmd_level() and note_page() mask the flags with PTE_FLAGS_MASK, which does not cover their PAT bit, _PAGE_PAT_LARGE. Fix it by replacing the use of PTE_FLAGS_MASK with p?d_flags(), which masks the flags properly. Also change to show the PAT bit as "PAT" to be consistent with other bits. Reported-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Konrad Wilk <konrad.wilk@oracle.com> Cc: Robert Elliot <elliott@hpe.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1442514264-12475-7-git-send-email-toshi.kani@hpe.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-11Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds1-13/+11
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm tree changes from Ingo Molnar: "The biggest change is full PAT support from Jürgen Gross: The x86 architecture offers via the PAT (Page Attribute Table) a way to specify different caching modes in page table entries. The PAT MSR contains 8 entries each specifying one of 6 possible cache modes. A pte references one of those entries via 3 bits: _PAGE_PAT, _PAGE_PWT and _PAGE_PCD. The Linux kernel currently supports only 4 different cache modes. The PAT MSR is set up in a way that the setting of _PAGE_PAT in a pte doesn't matter: the top 4 entries in the PAT MSR are the same as the 4 lower entries. This results in the kernel not supporting e.g. write-through mode. Especially this cache mode would speed up drivers of video cards which now have to use uncached accesses. OTOH some old processors (Pentium) don't support PAT correctly and the Xen hypervisor has been using a different PAT MSR configuration for some time now and can't change that as this setting is part of the ABI. This patch set abstracts the cache mode from the pte and introduces tables to translate between cache mode and pte bits (the default cache mode "write back" is hard-wired to PAT entry 0). The tables are statically initialized with values being compatible to old processors and current usage. As soon as the PAT MSR is changed (or - in case of Xen - is read at boot time) the tables are changed accordingly. Requests of mappings with special cache modes are always possible now, in case they are not supported there will be a fallback to a compatible but slower mode. Summing it up, this patch set adds the following features: - capability to support WT and WP cache modes on processors with full PAT support - processors with no or uncorrect PAT support are still working as today, even if WT or WP cache mode are selected by drivers for some pages - reduction of Xen special handling regarding cache mode Another change is a boot speedup on ridiculously large RAM systems, plus other smaller fixes" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) x86: mm: Move PAT only functions to mm/pat.c xen: Support Xen pv-domains using PAT x86: Enable PAT to use cache mode translation tables x86: Respect PAT bit when copying pte values between large and normal pages x86: Support PAT bit in pagetable dump for lower levels x86: Clean up pgtable_types.h x86: Use new cache mode type in memtype related functions x86: Use new cache mode type in mm/ioremap.c x86: Use new cache mode type in setting page attributes x86: Remove looking for setting of _PAGE_PAT_LARGE in pageattr.c x86: Use new cache mode type in track_pfn_remap() and track_pfn_insert() x86: Use new cache mode type in mm/iomap_32.c x86: Use new cache mode type in asm/pgtable.h x86: Use new cache mode type in arch/x86/mm/init_64.c x86: Use new cache mode type in arch/x86/pci x86: Use new cache mode type in drivers/video/fbdev/vermilion x86: Use new cache mode type in drivers/video/fbdev/gbefb.c x86: Use new cache mode type in include/asm/fb.h x86: Make page cache mode a real type x86: mm: Use 2GB memory block size on large-memory x86-64 systems ...
2014-11-16x86: Support PAT bit in pagetable dump for lower levelsJuergen Gross1-13/+11
Dumping page table protection bits is not correct for entries on levels 2 and 3 regarding the PAT bit, which is at a different position as on level 4. Based-on-patch-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: konrad.wilk@oracle.com Cc: ville.syrjala@linux.intel.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-16-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-12x86, ptdump: Add section for EFI runtime servicesMathias Krause1-0/+3
In commit 3891a04aafd6 ("x86-64, espfix: Don't leak bits 31:16 of %esp returning..") the "ESPFix Area" was added to the page table dump special sections. That area, though, has a limited amount of entries printed. The EFI runtime services are, unfortunately, located in-between the espfix area and the high kernel memory mapping. Due to the enforced limitation for the espfix area, the EFI mappings won't be printed in the page table dump. To make the ESP runtime service mappings visible again, provide them a dedicated entry. Signed-off-by: Mathias Krause <minipli@googlemail.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>