summaryrefslogtreecommitdiff
path: root/arch/x86/mm
AgeCommit message (Collapse)AuthorFilesLines
2016-02-24x86/mm/pat: Avoid truncation when converting cpa->numpages to addressMatt Fleming1-2/+2
commit 742563777e8da62197d6cb4b99f4027f59454735 upstream. There are a couple of nasty truncation bugs lurking in the pageattr code that can be triggered when mapping EFI regions, e.g. when we pass a cpa->pgd pointer. Because cpa->numpages is a 32-bit value, shifting left by PAGE_SHIFT will truncate the resultant address to 32-bits. Viorel-Cătălin managed to trigger this bug on his Dell machine that provides a ~5GB EFI region which requires 1236992 pages to be mapped. When calling populate_pud() the end of the region gets calculated incorrectly in the following buggy expression, end = start + (cpa->numpages << PAGE_SHIFT); And only 188416 pages are mapped. Next, populate_pud() gets invoked for a second time because of the loop in __change_page_attr_set_clr(), only this time no pages get mapped because shifting the remaining number of pages (1048576) by PAGE_SHIFT is zero. At which point the loop in __change_page_attr_set_clr() spins forever because we fail to map progress. Hitting this bug depends very much on the virtual address we pick to map the large region at and how many pages we map on the initial run through the loop. This explains why this issue was only recently hit with the introduction of commit a5caa209ba9c ("x86/efi: Fix boot crash by mapping EFI memmap entries bottom-up at runtime, instead of top-down") It's interesting to note that safe uses of cpa->numpages do exist in the pageattr code. If instead of shifting ->numpages we multiply by PAGE_SIZE, no truncation occurs because PAGE_SIZE is a UL value, and so the result is unsigned long. To avoid surprises when users try to convert very large cpa->numpages values to addresses, change the data type from 'int' to 'unsigned long', thereby making it suitable for shifting by PAGE_SHIFT without any type casting. The alternative would be to make liberal use of casting, but that is far more likely to cause problems in the future when someone adds more code and fails to cast properly; this bug was difficult enough to track down in the first place. Reported-and-tested-by: Viorel-Cătălin Răpițeanu <rapiteanu.catalin@gmail.com> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Link: https://bugzilla.kernel.org/show_bug.cgi?id=110131 Link: http://lkml.kernel.org/r/1454067370-10374-1-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2015-11-18x86/mm/hotplug: Modify PGD entry when removing memoryYasuaki Ishimatsu2-9/+20
commit 9661d5bcd058fe15b4138a00d96bd36516134543 upstream. When hot-adding/removing memory, sync_global_pgds() is called for synchronizing PGD to PGD entries of all processes MM. But when hot-removing memory, sync_global_pgds() does not work correctly. At first, sync_global_pgds() checks whether target PGD is none or not. And if PGD is none, the PGD is skipped. But when hot-removing memory, PGD may be none since PGD may be cleared by free_pud_table(). So when sync_global_pgds() is called after hot-removing memory, sync_global_pgds() should not skip PGD even if the PGD is none. And sync_global_pgds() must clear PGD entries of all processes MM. Currently sync_global_pgds() does not clear PGD entries of all processes MM when hot-removing memory. So when hot adding memory which is same memory range as removed memory after hot-removing memory, following call traces are shown: kernel BUG at arch/x86/mm/init_64.c:206! ... [<ffffffff815e0c80>] kernel_physical_mapping_init+0x1b2/0x1d2 [<ffffffff815ced94>] init_memory_mapping+0x1d4/0x380 [<ffffffff8104aebd>] arch_add_memory+0x3d/0xd0 [<ffffffff815d03d9>] add_memory+0xb9/0x1b0 [<ffffffff81352415>] acpi_memory_device_add+0x1af/0x28e [<ffffffff81325dc4>] acpi_bus_device_attach+0x8c/0xf0 [<ffffffff813413b9>] acpi_ns_walk_namespace+0xc8/0x17f [<ffffffff81325d38>] ? acpi_bus_type_and_status+0xb7/0xb7 [<ffffffff81325d38>] ? acpi_bus_type_and_status+0xb7/0xb7 [<ffffffff813418ed>] acpi_walk_namespace+0x95/0xc5 [<ffffffff81326b4c>] acpi_bus_scan+0x9a/0xc2 [<ffffffff81326bff>] acpi_scan_bus_device_check+0x8b/0x12e [<ffffffff81326cb5>] acpi_scan_device_check+0x13/0x15 [<ffffffff81320122>] acpi_os_execute_deferred+0x25/0x32 [<ffffffff8107e02b>] process_one_work+0x17b/0x460 [<ffffffff8107edfb>] worker_thread+0x11b/0x400 [<ffffffff8107ece0>] ? rescuer_thread+0x400/0x400 [<ffffffff81085aef>] kthread+0xcf/0xe0 [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140 [<ffffffff815fc76c>] ret_from_fork+0x7c/0xb0 [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140 This patch clears PGD entries of all processes MM when sync_global_pgds() is called after hot-removing memory Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Acked-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Vlastimil Babka <vbabka@suse.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2015-11-18x86/mm/hotplug: Pass sync_global_pgds() a correct argument in remove_pagetable()Yasuaki Ishimatsu1-4/+5
commit 5255e0a79fcc0ff47b387af92bd9ef5729b1b859 upstream. When hot-adding memory after hot-removing memory, following call traces are shown: kernel BUG at arch/x86/mm/init_64.c:206! ... [<ffffffff815e0c80>] kernel_physical_mapping_init+0x1b2/0x1d2 [<ffffffff815ced94>] init_memory_mapping+0x1d4/0x380 [<ffffffff8104aebd>] arch_add_memory+0x3d/0xd0 [<ffffffff815d03d9>] add_memory+0xb9/0x1b0 [<ffffffff81352415>] acpi_memory_device_add+0x1af/0x28e [<ffffffff81325dc4>] acpi_bus_device_attach+0x8c/0xf0 [<ffffffff813413b9>] acpi_ns_walk_namespace+0xc8/0x17f [<ffffffff81325d38>] ? acpi_bus_type_and_status+0xb7/0xb7 [<ffffffff81325d38>] ? acpi_bus_type_and_status+0xb7/0xb7 [<ffffffff813418ed>] acpi_walk_namespace+0x95/0xc5 [<ffffffff81326b4c>] acpi_bus_scan+0x9a/0xc2 [<ffffffff81326bff>] acpi_scan_bus_device_check+0x8b/0x12e [<ffffffff81326cb5>] acpi_scan_device_check+0x13/0x15 [<ffffffff81320122>] acpi_os_execute_deferred+0x25/0x32 [<ffffffff8107e02b>] process_one_work+0x17b/0x460 [<ffffffff8107edfb>] worker_thread+0x11b/0x400 [<ffffffff8107ece0>] ? rescuer_thread+0x400/0x400 [<ffffffff81085aef>] kthread+0xcf/0xe0 [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140 [<ffffffff815fc76c>] ret_from_fork+0x7c/0xb0 [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140 The patch-set fixes the issue. This patch (of 2): remove_pagetable() gets start argument and passes the argument to sync_global_pgds(). In this case, the argument must not be modified. If the argument is modified and passed to sync_global_pgds(), sync_global_pgds() does not correctly synchronize PGD to PGD entries of all processes MM since synchronized range of memory [start, end] is wrong. Unfortunately the start argument is modified in remove_pagetable(). So this patch fixes the issue. Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Acked-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Vlastimil Babka <vbabka@suse.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2015-10-27x86/mm: Set NX on gap between __ex_table and rodataStephen Smalley1-1/+1
commit ab76f7b4ab2397ffdd2f1eb07c55697d19991d10 upstream. Unused space between the end of __ex_table and the start of rodata can be left W+x in the kernel page tables. Extend the setting of the NX bit to cover this gap by starting from text_end rather than rodata_start. Before: ---[ High Kernel Mapping ]--- 0xffffffff80000000-0xffffffff81000000 16M pmd 0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd 0xffffffff81600000-0xffffffff81754000 1360K ro GLB x pte 0xffffffff81754000-0xffffffff81800000 688K RW GLB x pte 0xffffffff81800000-0xffffffff81a00000 2M ro PSE GLB NX pmd 0xffffffff81a00000-0xffffffff81b3b000 1260K ro GLB NX pte 0xffffffff81b3b000-0xffffffff82000000 4884K RW GLB NX pte 0xffffffff82000000-0xffffffff82200000 2M RW PSE GLB NX pmd 0xffffffff82200000-0xffffffffa0000000 478M pmd After: ---[ High Kernel Mapping ]--- 0xffffffff80000000-0xffffffff81000000 16M pmd 0xffffffff81000000-0xffffffff81600000 6M ro PSE GLB x pmd 0xffffffff81600000-0xffffffff81754000 1360K ro GLB x pte 0xffffffff81754000-0xffffffff81800000 688K RW GLB NX pte 0xffffffff81800000-0xffffffff81a00000 2M ro PSE GLB NX pmd 0xffffffff81a00000-0xffffffff81b3b000 1260K ro GLB NX pte 0xffffffff81b3b000-0xffffffff82000000 4884K RW GLB NX pte 0xffffffff82000000-0xffffffff82200000 2M RW PSE GLB NX pmd 0xffffffff82200000-0xffffffffa0000000 478M pmd Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1443704662-3138-1-git-send-email-sds@tycho.nsa.gov Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2015-09-30x86/mm: Initialize pmd_idx in page_table_range_init_count()Minfei Huang1-0/+1
commit 9962eea9e55f797f05f20ba6448929cab2a9f018 upstream. The variable pmd_idx is not initialized for the first iteration of the for loop. Assign the proper value which indexes the start address. Fixes: 719272c45b82 'x86, mm: only call early_ioremap_page_table_range_init() once' Signed-off-by: Minfei Huang <mnfhuang@gmail.com> Cc: tony.luck@intel.com Cc: wangnan0@huawei.com Cc: david.vrabel@citrix.com Reviewed-by: yinghai@kernel.org Link: http://lkml.kernel.org/r/1436703522-29552-1-git-send-email-mhuang@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2015-03-16mm/hugetlb: reduce arch dependent code around follow_huge_*Naoya Horiguchi1-12/+0
commit 61f77eda9bbf0d2e922197ed2dcf88638a639ce5 upstream. Currently we have many duplicates in definitions around follow_huge_addr(), follow_huge_pmd(), and follow_huge_pud(), so this patch tries to remove the m. The basic idea is to put the default implementation for these functions in mm/hugetlb.c as weak symbols (regardless of CONFIG_ARCH_WANT_GENERAL_HUGETL B), and to implement arch-specific code only when the arch needs it. For follow_huge_addr(), only powerpc and ia64 have their own implementation, and in all other architectures this function just returns ERR_PTR(-EINVAL). So this patch sets returning ERR_PTR(-EINVAL) as default. As for follow_huge_(pmd|pud)(), if (pmd|pud)_huge() is implemented to always return 0 in your architecture (like in ia64 or sparc,) it's never called (the callsite is optimized away) no matter how implemented it is. So in such architectures, we don't need arch-specific implementation. In some architecture (like mips, s390 and tile,) their current arch-specific follow_huge_(pmd|pud)() are effectively identical with the common code, so this patch lets these architecture use the common code. One exception is metag, where pmd_huge() could return non-zero but it expects follow_huge_pmd() to always return NULL. This means that we need arch-specific implementation which returns NULL. This behavior looks strange to me (because non-zero pmd_huge() implies that the architecture supports PMD-based hugepage, so follow_huge_pmd() can/should return some relevant value,) but that's beyond this cleanup patch, so let's keep it. Justification of non-trivial changes: - in s390, follow_huge_pmd() checks !MACHINE_HAS_HPAGE at first, and this patch removes the check. This is OK because we can assume MACHINE_HAS_HPAGE is true when follow_huge_pmd() can be called (note that pmd_huge() has the same check and always returns 0 for !MACHINE_HAS_HPAGE.) - in s390 and mips, we use HPAGE_MASK instead of PMD_MASK as done in common code. This patch forces these archs use PMD_MASK, but it's OK because they are identical in both archs. In s390, both of HPAGE_SHIFT and PMD_SHIFT are 20. In mips, HPAGE_SHIFT is defined as (PAGE_SHIFT + PAGE_SHIFT - 3) and PMD_SHIFT is define as (PAGE_SHIFT + PAGE_SHIFT + PTE_ORDER - 3), but PTE_ORDER is always 0, so these are identical. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Steve Capper <steve.capper@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2015-03-12vm: add VM_FAULT_SIGSEGV handling supportLinus Torvalds1-0/+2
commit 33692f27597fcab536d7cbbcc8f52905133e4aa7 upstream. The core VM already knows about VM_FAULT_SIGBUS, but cannot return a "you should SIGSEGV" error, because the SIGSEGV case was generally handled by the caller - usually the architecture fault handler. That results in lots of duplication - all the architecture fault handlers end up doing very similar "look up vma, check permissions, do retries etc" - but it generally works. However, there are cases where the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV. In particular, when accessing the stack guard page, libsigsegv expects a SIGSEGV. And it usually got one, because the stack growth is handled by that duplicated architecture fault handler. However, when the generic VM layer started propagating the error return from the stack expansion in commit fee7e49d4514 ("mm: propagate error from stack expansion even for guard page"), that now exposed the existing VM_FAULT_SIGBUS result to user space. And user space really expected SIGSEGV, not SIGBUS. To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those duplicate architecture fault handlers about it. They all already have the code to handle SIGSEGV, so it's about just tying that new return value to the existing code, but it's all a bit annoying. This is the mindless minimal patch to do this. A more extensive patch would be to try to gather up the mostly shared fault handling logic into one generic helper routine, and long-term we really should do that cleanup. Just from this patch, you can generally see that most architectures just copied (directly or indirectly) the old x86 way of doing things, but in the meantime that original x86 model has been improved to hold the VM semaphore for shorter times etc and to handle VM_FAULT_RETRY and other "newer" things, so it would be a good idea to bring all those improvements to the generic case and teach other architectures about them too. Reported-and-tested-by: Takashi Iwai <tiwai@suse.de> Tested-by: Jan Engelhardt <jengelh@inai.de> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots" Cc: linux-arch@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2015-03-12x86: mm: move mmap_sem unlock from mm_fault_error() to callerLinus Torvalds1-7/+1
commit 7fb08eca45270d0ae86e1ad9d39c40b7a55d0190 upstream. This replaces four copies in various stages of mm_fault_error() handling with just a single one. It will also allow for more natural placement of the unlocking after some further cleanup. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2015-03-05x86, mm/ASLR: Fix stack randomization on 64-bit systemsHector Marco-Gisbert1-3/+3
commit 4e7c22d447bb6d7e37bfe39ff658486ae78e8d77 upstream. The issue is that the stack for processes is not properly randomized on 64 bit architectures due to an integer overflow. The affected function is randomize_stack_top() in file "fs/binfmt_elf.c": static unsigned long randomize_stack_top(unsigned long stack_top) { unsigned int random_variable = 0; if ((current->flags & PF_RANDOMIZE) && !(current->personality & ADDR_NO_RANDOMIZE)) { random_variable = get_random_int() & STACK_RND_MASK; random_variable <<= PAGE_SHIFT; } return PAGE_ALIGN(stack_top) + random_variable; return PAGE_ALIGN(stack_top) - random_variable; } Note that, it declares the "random_variable" variable as "unsigned int". Since the result of the shifting operation between STACK_RND_MASK (which is 0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64): random_variable <<= PAGE_SHIFT; then the two leftmost bits are dropped when storing the result in the "random_variable". This variable shall be at least 34 bits long to hold the (22+12) result. These two dropped bits have an impact on the entropy of process stack. Concretely, the total stack entropy is reduced by four: from 2^28 to 2^30 (One fourth of expected entropy). This patch restores back the entropy by correcting the types involved in the operations in the functions randomize_stack_top() and stack_maxrandom_size(). The successful fix can be tested with: $ for i in `seq 1 10`; do cat /proc/self/maps | grep stack; done 7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0 [stack] 7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0 [stack] 7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0 [stack] 7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0 [stack] ... Once corrected, the leading bytes should be between 7ffc and 7fff, rather than always being 7fff. Signed-off-by: Hector Marco-Gisbert <hecmargi@upv.es> Signed-off-by: Ismael Ripoll <iripoll@upv.es> [ Rebased, fixed 80 char bugs, cleaned up commit message, added test example and CVE ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Fixes: CVE-2015-1593 Link: http://lkml.kernel.org/r/20150214173350.GA18393@www.outflux.net Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2015-03-02mm/hugetlb: pmd_huge() returns true for non-present hugepageNaoya Horiguchi2-2/+8
commit cbef8478bee55775ac312a574aad48af7bb9cf9f upstream. Migrating hugepages and hwpoisoned hugepages are considered as non-present hugepages, and they are referenced via migration entries and hwpoison entries in their page table slots. This behavior causes race condition because pmd_huge() doesn't tell non-huge pages from migrating/hwpoisoned hugepages. follow_page_mask() is one example where the kernel would call follow_page_pte() for such hugepage while this function is supposed to handle only normal pages. To avoid this, this patch makes pmd_huge() return true when pmd_none() is true *and* pmd_present() is false. We don't have to worry about mixing up non-present pmd entry with normal pmd (pointing to leaf level pte entry) because pmd_present() is true in normal pmd. The same race condition could happen in (x86-specific) gup_pmd_range(), where this patch simply adds pmd_present() check instead of pmd_huge(). This is because gup_pmd_range() is fast path. If we have non-present hugepage in this function, we will go into gup_huge_pmd(), then return 0 at flag mask check, and finally fall back to the slow path. Fixes: 290408d4a2 ("hugetlb: hugepage migration core") Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Steve Capper <steve.capper@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-12-06x86, mm: Set NX across entire PMD at bootKees Cook1-1/+10
commit 45e2a9d4701d8c624d4a4bcdd1084eae31e92f58 upstream. When setting up permissions on kernel memory at boot, the end of the PMD that was split from bss remained executable. It should be NX like the rest. This performs a PMD alignment instead of a PAGE alignment to get the correct span of memory. Before: ---[ High Kernel Mapping ]--- ... 0xffffffff8202d000-0xffffffff82200000 1868K RW GLB NX pte 0xffffffff82200000-0xffffffff82c00000 10M RW PSE GLB NX pmd 0xffffffff82c00000-0xffffffff82df5000 2004K RW GLB NX pte 0xffffffff82df5000-0xffffffff82e00000 44K RW GLB x pte 0xffffffff82e00000-0xffffffffc0000000 978M pmd After: ---[ High Kernel Mapping ]--- ... 0xffffffff8202d000-0xffffffff82200000 1868K RW GLB NX pte 0xffffffff82200000-0xffffffff82e00000 12M RW PSE GLB NX pmd 0xffffffff82e00000-0xffffffffc0000000 978M pmd [ tglx: Changed it to roundup(_brk_end, PMD_SIZE) and added a comment. We really should unmap the reminder along with the holes caused by init,initdata etc. but thats a different issue ] Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Wang Nan <wangnan0@huawei.com> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20141114194737.GA3091@www.outflux.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-11-13x86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAEDexuan Cui1-1/+1
commit d1cd1210834649ce1ca6bafe5ac25d2f40331343 upstream. pte_pfn() returns a PFN of long (32 bits in 32-PAE), so "long << PAGE_SHIFT" will overflow for PFNs above 4GB. Due to this issue, some Linux 32-PAE distros, running as guests on Hyper-V, with 5GB memory assigned, can't load the netvsc driver successfully and hence the synthetic network device can't work (we can use the kernel parameter mem=3000M to work around the issue). Cast pte_pfn() to phys_addr_t before shifting. Fixes: "commit d76565344512: x86, mm: Create slow_virt_to_phys()" Signed-off-by: Dexuan Cui <decui@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: gregkh@linuxfoundation.org Cc: linux-mm@kvack.org Cc: olaf@aepfle.de Cc: apw@canonical.com Cc: jasowang@redhat.com Cc: dave.hansen@intel.com Cc: riel@redhat.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1414580017-27444-1-git-send-email-decui@microsoft.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-09-26x86/mm: In the PTE swapout page reclaim case clear the accessed bit instead ↵Shaohua Li1-7/+14
of flushing the TLB commit b13b1d2d8692b437203de7a404c6b809d2cc4d99 upstream. We use the accessed bit to age a page at page reclaim time, and currently we also flush the TLB when doing so. But in some workloads TLB flush overhead is very heavy. In my simple multithreaded app with a lot of swap to several pcie SSDs, removing the tlb flush gives about 20% ~ 30% swapout speedup. Fortunately just removing the TLB flush is a valid optimization: on x86 CPUs, clearing the accessed bit without a TLB flush doesn't cause data corruption. It could cause incorrect page aging and the (mistaken) reclaim of hot pages, but the chance of that should be relatively low. So as a performance optimization don't flush the TLB when clearing the accessed bit, it will eventually be flushed by a context switch or a VM operation anyway. [ In the rare event of it not getting flushed for a long time the delay shouldn't really matter because there's no real memory pressure for swapout to react to. ] Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Shaohua Li <shli@fusionio.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: linux-mm@kvack.org Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20140408075809.GA1764@kernel.org [ Rewrote the changelog and the code comments. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-09-26x86/mm: Eliminate redundant page table walk during TLB range flushingMel Gorman1-27/+1
commit 71b54f8263860a37dd9f50f81880a9d681fd9c10 upstream. When choosing between doing an address space or ranged flush, the x86 implementation of flush_tlb_mm_range takes into account whether there are any large pages in the range. A per-page flush typically requires fewer entries than would covered by a single large page and the check is redundant. There is one potential exception. THP migration flushes single THP entries and it conceivably would benefit from flushing a single entry instead of the mm. However, this flush is after a THP allocation, copy and page table update potentially with any other threads serialised behind it. In comparison to that, the flush is noise. It makes more sense to optimise balancing to require fewer flushes than to optimise the flush itself. This patch deletes the redundant huge page check. Signed-off-by: Mel Gorman <mgorman@suse.de> Tested-by: Davidlohr Bueso <davidlohr@hp.com> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alex Shi <alex.shi@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-sgei1drpOcburujPsfh6ovmo@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-09-26x86/mm: Clean up inconsistencies when flushing TLB rangesMel Gorman1-6/+6
commit 15aa368255f249df0b2af630c9487bb5471bd7da upstream. NR_TLB_LOCAL_FLUSH_ALL is not always accounted for correctly and the comparison with total_vm is done before taking tlb_flushall_shift into account. Clean it up. Signed-off-by: Mel Gorman <mgorman@suse.de> Tested-by: Davidlohr Bueso <davidlohr@hp.com> Reviewed-by: Alex Shi <alex.shi@linaro.org> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/n/tip-Iz5gcahrgskIldvukulzi0hh@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-09-26mm, x86: Account for TLB flushes only when debuggingMel Gorman1-7/+7
commit ec65993443736a5091b68e80ff1734548944a4b8 upstream. Bisection between 3.11 and 3.12 fingered commit 9824cf97 ("mm: vmstats: tlb flush counters") to cause overhead problems. The counters are undeniably useful but how often do we really need to debug TLB flush related issues? It does not justify taking the penalty everywhere so make it a debugging option. Signed-off-by: Mel Gorman <mgorman@suse.de> Tested-by: Davidlohr Bueso <davidlohr@hp.com> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Alex Shi <alex.shi@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-XzxjntugxuwpxXhcrxqqh53b@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-08-19x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stackH. Peter Anvin1-11/+20
commit 3891a04aafd668686239349ea58f3314ea2af86b upstream. The IRET instruction, when returning to a 16-bit segment, only restores the bottom 16 bits of the user space stack pointer. This causes some 16-bit software to break, but it also leaks kernel state to user space. We have a software workaround for that ("espfix") for the 32-bit kernel, but it relies on a nonzero stack segment base which is not available in 64-bit mode. In checkin: b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels we "solved" this by forbidding 16-bit segments on 64-bit kernels, with the logic that 16-bit support is crippled on 64-bit kernels anyway (no V86 support), but it turns out that people are doing stuff like running old Win16 binaries under Wine and expect it to work. This works around this by creating percpu "ministacks", each of which is mapped 2^16 times 64K apart. When we detect that the return SS is on the LDT, we copy the IRET frame to the ministack and use the relevant alias to return to userspace. The ministacks are mapped readonly, so if IRET faults we promote #GP to #DF which is an IST vector and thus has its own stack; we then do the fixup in the #DF handler. (Making #GP an IST exception would make the msr_safe functions unsafe in NMI/MC context, and quite possibly have other effects.) Special thanks to: - Andy Lutomirski, for the suggestion of using very small stack slots and copy (as opposed to map) the IRET frame there, and for the suggestion to mark them readonly and let the fault promote to #DF. - Konrad Wilk for paravirt fixup and testing. - Borislav Petkov for testing help and useful comments. Reported-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andrew Lutomriski <amluto@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dirk Hohndel <dirk@hohndel.org> Cc: Arjan van de Ven <arjan.van.de.ven@intel.com> Cc: comex <comexk@gmail.com> Cc: Alexander van Heukelum <heukelum@fastmail.fm> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: <stable@vger.kernel.org> # consider after upstream merge Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-18x86, ioremap: Speed up check for RAM pagesRoland Dreier1-7/+19
commit c81c8a1eeede61e92a15103748c23d100880cc8a upstream. In __ioremap_caller() (the guts of ioremap), we loop over the range of pfns being remapped and checks each one individually with page_is_ram(). For large ioremaps, this can be very slow. For example, we have a device with a 256 GiB PCI BAR, and ioremapping this BAR can take 20+ seconds -- sometimes long enough to trigger the soft lockup detector! Internally, page_is_ram() calls walk_system_ram_range() on a single page. Instead, we can make a single call to walk_system_ram_range() from __ioremap_caller(), and do our further checks only for any RAM pages that we find. For the common case of MMIO, this saves an enormous amount of work, since the range being ioremapped doesn't intersect system RAM at all. With this change, ioremap on our 256 GiB BAR takes less than 1 second. Signed-off-by: Roland Dreier <roland@purestorage.com> Link: http://lkml.kernel.org/r/1399054721-1331-1-git-send-email-roland@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-07-02hugetlb: restrict hugepage_migration_support() to x86_64Naoya Horiguchi1-10/+0
commit c177c81e09e517bbf75b67762cdab1b83aba6976 upstream. Currently hugepage migration is available for all archs which support pmd-level hugepage, but testing is done only for x86_64 and there're bugs for other archs. So to avoid breaking such archs, this patch limits the availability strictly to x86_64 until developers of other archs get interested in enabling this feature. Simply disabling hugepage migration on non-x86_64 archs is not enough to fix the reported problem where sys_move_pages() hits the BUG_ON() in follow_page(FOLL_GET), so let's fix this by checking if hugepage migration is supported in vma_migratable(). Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Hugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-04-03arch/x86/mm/srat: Skip NUMA_NO_NODE while parsing SLITToshi Kani1-3/+13
commit a85eba8814631d0d48361c8b9a7ee0984e80c03c upstream. When ACPI SLIT table has an I/O locality (i.e. a locality unique to an I/O device), numa_set_distance() emits this warning message: NUMA: Warning: node ids are out of bound, from=-1 to=-1 distance=10 acpi_numa_slit_init() calls numa_set_distance() with pxm_to_node(), which assumes that all localities have been parsed with SRAT previously. SRAT does not list I/O localities, where as SLIT lists all localities including I/Os. Hence, pxm_to_node() returns NUMA_NO_NODE (-1) for an I/O locality. I/O localities are not supported and are ignored today, but emitting such warning message leads to unnecessary confusion. Change acpi_numa_slit_init() to avoid calling numa_set_distance() with NUMA_NO_NODE. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/n/tip-dSvpjjvp8aMzs1ybkftxohlh@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-12x86/dumpstack: Fix printk_address for direct addressesJiri Slaby1-1/+1
commit 5f01c98859073cb512b01d4fad74b5f4e047be0b upstream. Consider a kernel crash in a module, simulated the following way: static int my_init(void) { char *map = (void *)0x5; *map = 3; return 0; } module_init(my_init); When we turn off FRAME_POINTERs, the very first instruction in that function causes a BUG. The problem is that we print IP in the BUG report using %pB (from printk_address). And %pB decrements the pointer by one to fix printing addresses of functions with tail calls. This was added in commit 71f9e59800e5ad4 ("x86, dumpstack: Use %pB format specifier for stack trace") to fix the call stack printouts. So instead of correct output: BUG: unable to handle kernel NULL pointer dereference at 0000000000000005 IP: [<ffffffffa01ac000>] my_init+0x0/0x10 [pb173] We get: BUG: unable to handle kernel NULL pointer dereference at 0000000000000005 IP: [<ffffffffa0152000>] 0xffffffffa0151fff To fix that, we use %pS only for stack addresses printouts (via newly added printk_stack_address) and %pB for regs->ip (via printk_address). I.e. we revert to the old behaviour for all except call stacks. And since from all those reliable is 1, we remove that parameter from printk_address. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: joe@perches.com Cc: jirislaby@gmail.com Link: http://lkml.kernel.org/r/1382706418-8435-1-git-send-email-jslaby@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-02-23x86, smap: smap_violation() is bogus if CONFIG_X86_SMAP is offH. Peter Anvin1-5/+9
commit 4640c7ee9b8953237d05a61ea3ea93981d1bc961 upstream. If CONFIG_X86_SMAP is disabled, smap_violation() tests for conditions which are incorrect (as the AC flag doesn't matter), causing spurious faults. The dynamic disabling of SMAP (nosmap on the command line) is fine because it disables X86_FEATURE_SMAP, therefore causing the static_cpu_has() to return false. Found by Fengguang Wu's test system. [ v3: move all predicates into smap_violation() ] [ v2: use IS_ENABLED() instead of #ifdef ] Reported-by: Fengguang Wu <fengguang.wu@intel.com> Link: http://lkml.kernel.org/r/20140213124550.GA30497@localhost Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-10mm: numa: serialise parallel get_user_page against THP migrationMel Gorman1-0/+13
commit 2b4847e73004c10ae6666c2e27b5c5430aed8698 upstream. Base pages are unmapped and flushed from cache and TLB during normal page migration and replaced with a migration entry that causes any parallel NUMA hinting fault or gup to block until migration completes. THP does not unmap pages due to a lack of support for migration entries at a PMD level. This allows races with get_user_pages and get_user_pages_fast which commit 3f926ab945b6 ("mm: Close races between THP migration and PMD numa clearing") made worse by introducing a pmd_clear_flush(). This patch forces get_user_page (fast and normal) on a pmd_numa page to go through the slow get_user_page path where it will serialise against THP migration and properly account for the NUMA hinting fault. On the migration side the page table lock is taken for each PTE update. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-13x86: finish user fault error path with fatal signalJohannes Weiner1-18/+17
The x86 fault handler bails in the middle of error handling when the task has a fatal signal pending. For a subsequent patch this is a problem in OOM situations because it relies on pagefault_out_of_memory() being called even when the task has been killed, to perform proper per-task OOM state unwinding. Shortcutting the fault like this is a rather minor optimization that saves a few instructions in rare cases. Just remove it for user-triggered faults. Use the opportunity to split the fault retry handling from actual fault errors and add locking documentation that reads suprisingly similar to ARM's. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: azurIt <azurit@pobox.sk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-13arch: mm: pass userspace fault flag to generic fault handlerJohannes Weiner1-3/+5
Unlike global OOM handling, memory cgroup code will invoke the OOM killer in any OOM situation because it has no way of telling faults occuring in kernel context - which could be handled more gracefully - from user-triggered faults. Pass a flag that identifies faults originating in user space from the architecture-specific fault handlers to generic code so that memcg OOM handling can be improved. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: azurIt <azurit@pobox.sk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12mm: migrate: check movability of hugepage in unmap_and_move_huge_page()Naoya Horiguchi1-0/+8
Currently hugepage migration works well only for pmd-based hugepages (mainly due to lack of testing,) so we had better not enable migration of other levels of hugepages until we are ready for it. Some users of hugepage migration (mbind, move_pages, and migrate_pages) do page table walk and check pud/pmd_huge() there, so they are safe. But the other users (softoffline and memory hotremove) don't do this, so without this patch they can try to migrate unexpected types of hugepages. To prevent this, we introduce hugepage_migration_support() as an architecture dependent check of whether hugepage are implemented on a pmd basis or not. And on some architecture multiple sizes of hugepages are available, so hugepage_migration_support() also checks hugepage size. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12mm: vmstats: track TLB flush stats on UP tooDave Hansen1-3/+1
The previous patch doing vmstats for TLB flushes ("mm: vmstats: tlb flush counters") effectively missed UP since arch/x86/mm/tlb.c is only compiled for SMP. UP systems do not do remote TLB flushes, so compile those counters out on UP. arch/x86/kernel/cpu/mtrr/generic.c calls __flush_tlb() directly. This is probably an optimization since both the mtrr code and __flush_tlb() write cr4. It would probably be safe to make that a flush_tlb_all() (and then get these statistics), but the mtrr code is ancient and I'm hesitant to touch it other than to just stick in the counters. [akpm@linux-foundation.org: tweak comments] Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12mm: vmstats: tlb flush countersDave Hansen1-4/+14
I was investigating some TLB flush scaling issues and realized that we do not have any good methods for figuring out how many TLB flushes we are doing. It would be nice to be able to do these in generic code, but the arch-independent calls don't explicitly specify whether we actually need to do remote flushes or not. In the end, we really need to know if we actually _did_ global vs. local invalidations, so that leaves us with few options other than to muck with the counters from arch-specific code. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-04Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds1-3/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Ingo Molnar: "Misc smaller fixes: - a parse_setup_data() boot crash fix - a memblock and an __early_ioremap cleanup - turn the always-on CONFIG_ARCH_MEMORY_PROBE=y into a configurable option and turn it off - it's an unrobust debug facility, it shouldn't be enabled by default" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: avoid remapping data in parse_setup_data() x86: Use memblock_set_current_limit() to set limit for memblock. mm: Remove unused variable idx0 in __early_ioremap() mm/hotplug, x86: Disable ARCH_MEMORY_PROBE by default
2013-09-04Merge tag 'pm+acpi-3.12-rc1' of ↵Linus Torvalds1-4/+7
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: 1) ACPI-based PCI hotplug (ACPIPHP) subsystem rework and introduction of Intel Thunderbolt support on systems that use ACPI for signalling Thunderbolt hotplug events. This also should make ACPIPHP work in some cases in which it was known to have problems. From Rafael J Wysocki, Mika Westerberg and Kirill A Shutemov. 2) ACPI core code cleanups and dock station support cleanups from Jiang Liu and Rafael J Wysocki. 3) Fixes for locking problems related to ACPI device hotplug from Rafael J Wysocki. 4) ACPICA update to version 20130725 includig fixes, cleanups, support for more than 256 GPEs per GPE block and a change to make the ACPI PM Timer optional (we've seen systems without the PM Timer in the field already). One of the fixes, related to the DeRefOf operator, is necessary to prevent some Windows 8 oriented AML from causing problems to happen. From Bob Moore, Lv Zheng, and Jung-uk Kim. 5) Removal of the old and long deprecated /proc/acpi/event interface and related driver changes from Thomas Renninger. 6) ACPI and Xen changes to make the reduced hardware sleep work with the latter from Ben Guthro. 7) ACPI video driver cleanups and a blacklist of systems that should not tell the BIOS that they are compatible with Windows 8 (or ACPI backlight and possibly other things will not work on them). From Felipe Contreras. 8) Assorted ACPI fixes and cleanups from Aaron Lu, Hanjun Guo, Kuppuswamy Sathyanarayanan, Lan Tianyu, Sachin Kamat, Tang Chen, Toshi Kani, and Wei Yongjun. 9) cpufreq ondemand governor target frequency selection change to reduce oscillations between min and max frequencies (essentially, it causes the governor to choose target frequencies proportional to load) from Stratos Karafotis. 10) cpufreq fixes allowing sysfs attributes file permissions to be preserved over suspend/resume cycles Srivatsa S Bhat. 11) Removal of Device Tree parsing for CPU device nodes from multiple cpufreq drivers that required some changes related to of_get_cpu_node() to be made in a few architectures and in the driver core. From Sudeep KarkadaNagesha. 12) cpufreq core fixes and cleanups related to mutual exclusion and driver module references from Viresh Kumar, Lukasz Majewski and Rafael J Wysocki. 13) Assorted cpufreq fixes and cleanups from Amit Daniel Kachhap, Bartlomiej Zolnierkiewicz, Hanjun Guo, Jingoo Han, Joseph Lo, Julia Lawall, Li Zhong, Mark Brown, Sascha Hauer, Stephen Boyd, Stratos Karafotis, and Viresh Kumar. 14) Fixes to prevent race conditions in coupled cpuidle from happening from Colin Cross. 15) cpuidle core fixes and cleanups from Daniel Lezcano and Tuukka Tikkanen. 16) Assorted cpuidle fixes and cleanups from Daniel Lezcano, Geert Uytterhoeven, Jingoo Han, Julia Lawall, Linus Walleij, and Sahara. 17) System sleep tracing changes from Todd E Brandt and Shuah Khan. 18) PNP subsystem conversion to using struct dev_pm_ops for power management from Shuah Khan. * tag 'pm+acpi-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (217 commits) cpufreq: Don't use smp_processor_id() in preemptible context cpuidle: coupled: fix race condition between pokes and safe state cpuidle: coupled: abort idle if pokes are pending cpuidle: coupled: disable interrupts after entering safe state ACPI / hotplug: Remove containers synchronously driver core / ACPI: Avoid device hot remove locking issues cpufreq: governor: Fix typos in comments cpufreq: governors: Remove duplicate check of target freq in supported range cpufreq: Fix timer/workqueue corruption due to double queueing ACPI / EC: Add ASUSTEK L4R to quirk list in order to validate ECDT ACPI / thermal: Add check of "_TZD" availability and evaluating result cpufreq: imx6q: Fix clock enable balance ACPI: blacklist win8 OSI for buggy laptops cpufreq: tegra: fix the wrong clock name cpuidle: Change struct menu_device field types cpuidle: Add a comment warning about possible overflow cpuidle: Fix variable domains in get_typical_interval() cpuidle: Fix menu_device->intervals type cpuidle: CodingStyle: Break up multiple assignments on single line cpuidle: Check called function parameter in get_typical_interval() ...
2013-09-02Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot fix from Peter Anvin: "A single very small boot fix for very large memory systems (> 0.5T)" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Fix boot crash with DEBUG_PAGE_ALLOC=y and more than 512G RAM
2013-08-27Merge branch 'acpi-assorted'Rafael J. Wysocki1-4/+7
* acpi-assorted: ACPI / osl: Kill macro INVALID_TABLE(). earlycpio.c: Fix the confusing comment of find_cpio_data(). ACPI / x86: Print Hot-Pluggable Field in SRAT. ACPI / thermal: Use THERMAL_TRIPS_NONE macro to replace number ACPI / thermal: Remove unused macros in the driver/acpi/thermal.c ACPI / thermal: Remove the unused lock of struct acpi_thermal ACPI / osl: Fix osi_setup_entries[] __initdata attribute location ACPI / numa: Fix __init attribute location in slit_valid() ACPI / dock: Fix __init attribute location in find_dock_and_bay() ACPI / Sleep: Fix incorrect placement of __initdata ACPI / processor: Fix incorrect placement of __initdata ACPI / EC: Fix incorrect placement of __initdata ACPI / scan: Drop unnecessary label from acpi_create_platform_device() ACPI: Move acpi_bus_get_device() from bus.c to scan.c ACPI / scan: Allow platform device creation without any IO resources ACPI: Cleanup sparse warning on acpi_os_initialize1() platform / thinkpad: Remove deprecated hotkey_report_mode parameter ACPI: Remove the old /proc/acpi/event interface
2013-08-22x86 get_unmapped_area: Access mmap_legacy_base through mm_struct memberRadu Caragea1-2/+4
This is the updated version of df54d6fa5427 ("x86 get_unmapped_area(): use proper mmap base for bottom-up direction") that only randomizes the mmap base address once. Signed-off-by: Radu Caragea <sinaelgl@gmail.com> Reported-and-tested-by: Jeff Shorey <shoreyjeff@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michel Lespinasse <walken@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Adrian Sendroiu <molecula2788@gmail.com> Cc: Greg KH <greg@kroah.com> Cc: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-22Revert "x86 get_unmapped_area(): use proper mmap base for bottom-up direction"Linus Torvalds1-1/+1
This reverts commit df54d6fa54275ce59660453e29d1228c2b45a826. The commit isn't necessarily wrong, but because it recalculates the random mmap_base every time, it seems to confuse user memory allocators that expect contiguous mmap allocations even when the mmap address isn't specified. In particular, the MATLAB Java runtime seems to be unhappy. See https://bugzilla.kernel.org/show_bug.cgi?id=60774 So we'll want to apply the random offset only once, and Radu has a patch for that. Revert this older commit in order to apply the other one. Reported-by: Jeff Shorey <shoreyjeff@gmail.com> Cc: Radu Caragea <sinaelgl@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-20x86/mm: Fix boot crash with DEBUG_PAGE_ALLOC=y and more than 512G RAMYinghai Lu1-2/+2
Dave Hansen reported that systems between 500G and 600G RAM crash early if DEBUG_PAGEALLOC is selected. > [ 0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff] > [ 0.000000] [mem 0x00000000-0x000fffff] page 4k > [ 0.000000] BRK [0x02086000, 0x02086fff] PGTABLE > [ 0.000000] BRK [0x02087000, 0x02087fff] PGTABLE > [ 0.000000] BRK [0x02088000, 0x02088fff] PGTABLE > [ 0.000000] init_memory_mapping: [mem 0xe80ee00000-0xe80effffff] > [ 0.000000] [mem 0xe80ee00000-0xe80effffff] page 4k > [ 0.000000] BRK [0x02089000, 0x02089fff] PGTABLE > [ 0.000000] BRK [0x0208a000, 0x0208afff] PGTABLE > [ 0.000000] Kernel panic - not syncing: alloc_low_page: ran out of memory It turns out that we missed increasing needed pages in BRK to mapping initial 2M and [0,1M) when we switched to use the #PF handler to set memory mappings: > commit 8170e6bed465b4b0c7687f93e9948aca4358a33b > Author: H. Peter Anvin <hpa@zytor.com> > Date: Thu Jan 24 12:19:52 2013 -0800 > > x86, 64bit: Use a #PF handler to materialize early mappings on demand Before that, we had the maping from [0,512M) in head_64.S, and we can spare two pages [0-1M). After that change, we can not reuse pages anymore. When we have more than 512M ram, we need an extra page for pgd page with [512G, 1024g). Increase pages in BRK for page table to solve the boot crash. Reported-by: Dave Hansen <dave.hansen@intel.com> Bisected-by: Dave Hansen <dave.hansen@intel.com> Tested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: <stable@vger.kernel.org> # v3.9 and later Link: http://lkml.kernel.org/r/1376351004-4015-1-git-send-email-yinghai@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-15ACPI / x86: Print Hot-Pluggable Field in SRAT.Tang Chen1-4/+7
The Hot-Pluggable field in SRAT suggests if the memory could be hotplugged while the system is running. Print it as well when parsing SRAT will help users to know which memory is hotpluggable. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-14x86 get_unmapped_area(): use proper mmap base for bottom-up directionRadu Caragea1-1/+1
When the stack is set to unlimited, the bottomup direction is used for mmap-ings but the mmap_base is not used and thus effectively renders ASLR for mmapings along with PIE useless. Cc: Michel Lespinasse <walken@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Adrian Sendroiu <molecula2788@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-13mm: Remove unused variable idx0 in __early_ioremap()Jianguo Wu1-3/+2
After commit: 8827247ffcc ("x86: don't define __this_fixmap_does_not_exist()") variable idx0 is no longer needed, so just remove it. Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <linux-mm@kvack.org> Cc: <wangchen@cn.fujitsu.com> Cc: Hanjun Guo <guohanjun@huawei.com> Link: http://lkml.kernel.org/r/5209A173.3090600@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-15x86: delete __cpuinit usage from all x86 filesPaul Gortmaker4-17/+15
The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. Note that some harmless section mismatch warnings may result, since notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c) are flagged as __cpuinit -- so if we remove the __cpuinit from arch specific callers, we will also get section mismatch warnings. As an intermediate step, we intend to turn the linux/init.h cpuinit content into no-ops as early as possible, since that will get rid of these warnings. In any case, they are temporary and harmless. This removes all the arch/x86 uses of the __cpuinit macros from all C files. x86 only had the one __CPUINIT used in assembly files, and it wasn't paired off with a .previous or a __FINIT, so we can delete it directly w/o any corresponding additional change there. [1] https://lkml.org/lkml/2013/5/20/589 Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-07-11mm: remove free_area_cacheMichel Lespinasse1-2/+0
Since all architectures have been converted to use vm_unmapped_area(), there is no remaining use for the free_area_cache. Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: Helge Deller <deller@gmx.de> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09mm/pgtable: don't accumulate addr during pgd prepopulate pmdWanpeng Li1-3/+1
The old codes accumulate addr to get right pmd, however, currently pmds are preallocated and transfered as a parameter, there is unnecessary to accumulate addr variable any more, this patch remove it. Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-04mm/x86: prepare for removing num_physpages and simplify mem_init()Jiang Liu3-49/+3
Prepare for removing num_physpages and simplify mem_init(). Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Jianguo Wu <wujianguo@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-04mm: concentrate modification of totalram_pages into the mm coreJiang Liu2-2/+2
Concentrate code to modify totalram_pages into the mm core, so the arch memory initialized code doesn't need to take care of it. With these changes applied, only following functions from mm core modify global variable totalram_pages: free_bootmem_late(), free_all_bootmem(), free_all_bootmem_node(), adjust_managed_page_count(). With this patch applied, it will be much more easier for us to keep totalram_pages and zone->managed_pages in consistence. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Acked-by: David Howells <dhowells@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: <sworddragon2@aol.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michel Lespinasse <walken@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-04mm: make __free_pages_bootmem() only available at boot timeJiang Liu1-16/+2
In order to simpilify management of totalram_pages and zone->managed_pages, make __free_pages_bootmem() only available at boot time. With this change applied, __free_pages_bootmem() will only be used by bootmem.c and nobootmem.c at boot time, so mark it as __init. Other callers of __free_pages_bootmem() have been converted to use free_reserved_page(), which handles totalram_pages and zone->managed_pages in a safer way. This patch also fix a bug in free_pagetable() for x86_64, which should increase zone->managed_pages instead of zone->present_pages when freeing reserved pages. And now we have managed_pages_count_lock to protect totalram_pages and zone->managed_pages, so remove the redundant ppb_lock lock in put_page_bootmem(). This greatly simplifies the locking rules. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Minchan Kim <minchan@kernel.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: <sworddragon2@aol.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-04mm: accurately calculate zone->managed_pages for highmem zonesJiang Liu1-0/+6
Commit "mm: introduce new field 'managed_pages' to struct zone" assumes that all highmem pages will be freed into the buddy system by function mem_init(). But that's not always true, some architectures may reserve some highmem pages during boot. For example PPC may allocate highmem pages for giagant HugeTLB pages, and several architectures have code to check PageReserved flag to exclude highmem pages allocated during boot when freeing highmem pages into the buddy system. So treat highmem pages in the same way as normal pages, that is to: 1) reset zone->managed_pages to zero in mem_init(). 2) recalculate managed_pages when freeing pages into the buddy system. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Tejun Heo <tj@kernel.org> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Minchan Kim <minchan@kernel.org> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: <sworddragon2@aol.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-04mm/x86: use free_reserved_area() to simplify codeJiang Liu2-14/+5
Use common help function free_reserved_area() to simplify code. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: <sworddragon2@aol.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michel Lespinasse <walken@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Tejun Heo <tj@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03Merge tag 'arm64-upstream' of ↵Linus Torvalds1-187/+0
git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64 Pull ARM64 updates from Catalin Marinas: "Main features: - KVM and Xen ports to AArch64 - Hugetlbfs and transparent huge pages support for arm64 - Applied Micro X-Gene Kconfig entry and dts file - Cache flushing improvements For arm64 huge pages support, there are x86 changes moving part of arch/x86/mm/hugetlbpage.c into mm/hugetlb.c to be re-used by arm64" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64: (66 commits) arm64: Add initial DTS for APM X-Gene Storm SOC and APM Mustang board arm64: Add defines for APM ARMv8 implementation arm64: Enable APM X-Gene SOC family in the defconfig arm64: Add Kconfig option for APM X-Gene SOC family arm64/Makefile: provide vdso_install target ARM64: mm: THP support. ARM64: mm: Raise MAX_ORDER for 64KB pages and THP. ARM64: mm: HugeTLB support. ARM64: mm: Move PTE_PROT_NONE bit. ARM64: mm: Make PAGE_NONE pages read only and no-execute. ARM64: mm: Restore memblock limit when map_mem finished. mm: thp: Correct the HPAGE_PMD_ORDER check. x86: mm: Remove general hugetlb code from x86. mm: hugetlb: Copy general hugetlb code from x86 to mm. x86: mm: Remove x86 version of huge_pmd_share. mm: hugetlb: Copy huge_pmd_share from x86 to mm. arm64: KVM: document kernel object mappings in HYP arm64: KVM: MAINTAINERS update arm64: KVM: userspace API documentation arm64: KVM: enable initialization of a 32bit vcpu ...
2013-07-03Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds2-5/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Ingo Molnar: "Misc improvements: - Fix /proc/mtrr reporting - Fix ioremap printout - Remove the unused pvclock fixmap entry on 32-bit - misc cleanups" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ioremap: Correct function name output x86: Fix /proc/mtrr with base/size more than 44bits ix86: Don't waste fixmap entries x86/mm: Drop unneeded include <asm/*pgtable, page*_types.h> x86_64: Correct phys_addr in cleanup_highmap comment
2013-06-28x86/ioremap: Correct function name outputBorislav Petkov1-4/+4
Infact, let the compiler enter the function name so that there are no discrepancies. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1372369996-20556-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-14x86: mm: Remove general hugetlb code from x86.Steve Capper1-67/+0
huge_pte_alloc, huge_pte_offset and follow_huge_p[mu]d have already been copied over to mm. This patch removes the x86 copies of these functions and activates the general ones by enabling: CONFIG_ARCH_WANT_GENERAL_HUGETLB Signed-off-by: Steve Capper <steve.capper@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org>