summaryrefslogtreecommitdiff
path: root/mm
AgeCommit message (Collapse)AuthorFilesLines
2010-08-10mm: extend KSM refcounts to the anon_vma rootRik van Riel3-19/+54
KSM reference counts can cause an anon_vma to exist after the processe it belongs to have already exited. Because the anon_vma lock now lives in the root anon_vma, we need to ensure that the root anon_vma stays around until after all the "child" anon_vmas have been freed. The obvious way to do this is to have a "child" anon_vma take a reference to the root in anon_vma_fork. When the anon_vma is freed at munmap or process exit, we drop the refcount in anon_vma_unlink and possibly free the root anon_vma. The KSM anon_vma reference count function also needs to be modified to deal with the possibility of freeing 2 levels of anon_vma. The easiest way to do this is to break out the KSM magic and make it generic. When compiling without CONFIG_KSM, this code is compiled out. Signed-off-by: Rik van Riel <riel@redhat.com> Tested-by: Larry Woodman <lwoodman@redhat.com> Acked-by: Larry Woodman <lwoodman@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-10mm: always lock the root (oldest) anon_vmaRik van Riel3-10/+24
Always (and only) lock the root (oldest) anon_vma whenever we do something in an anon_vma. The recently introduced anon_vma scalability is due to the rmap code scanning only the VMAs that need to be scanned. Many common operations still took the anon_vma lock on the root anon_vma, so always taking that lock is not expected to introduce any scalability issues. However, always taking the same lock does mean we only need to take one lock, which means rmap_walk on pages from any anon_vma in the vma is excluded from occurring during an munmap, expand_stack or other operation that needs to exclude rmap_walk and similar functions. Also add the proper locking to vma_adjust. Signed-off-by: Rik van Riel <riel@redhat.com> Tested-by: Larry Woodman <lwoodman@redhat.com> Acked-by: Larry Woodman <lwoodman@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-10mm: track the root (oldest) anon_vmaRik van Riel1-2/+16
Track the root (oldest) anon_vma in each anon_vma tree. Because we only take the lock on the root anon_vma, we cannot use the lock on higher-up anon_vmas to lock anything. This makes it impossible to do an indirect lookup of the root anon_vma, since the data structures could go away from under us. However, a direct pointer is safe because the root anon_vma is always the last one that gets freed on munmap or exit, by virtue of the same_vma list order and unlink_anon_vmas walking the list forward. [akpm@linux-foundation.org: fix typo] Signed-off-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: Larry Woodman <lwoodman@redhat.com> Acked-by: Larry Woodman <lwoodman@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-10mm: change direct call of spin_lock(anon_vma->lock) to inline functionRik van Riel4-21/+21
Subsitute a direct call of spin_lock(anon_vma->lock) with an inline function doing exactly the same. This makes it easier to do the substitution to the root anon_vma lock in a following patch. We will deal with the handful of special locks (nested, dec_and_lock, etc) separately. Signed-off-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: Larry Woodman <lwoodman@redhat.com> Acked-by: Larry Woodman <lwoodman@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-10mm: rename anon_vma_lock to vma_lock_anon_vmaRik van Riel1-7/+7
Rename anon_vma_lock to vma_lock_anon_vma. This matches the naming style used in page_lock_anon_vma and will come in really handy further down in this patch series. Signed-off-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: Larry Woodman <lwoodman@redhat.com> Acked-by: Larry Woodman <lwoodman@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-10hugetlb: call mmu notifiers on hugepage cowDoug Doan1-0/+6
When a copy-on-write occurs, we take one of two paths in handle_mm_fault: through handle_pte_fault for normal pages, or through hugetlb_fault for huge pages. In the normal page case, we eventually get to do_wp_page and call mmu notifiers via ptep_clear_flush_notify. There is no callout to the mmmu notifiers in the huge page case. This patch fixes that. Signed-off-by: Doug Doan <dougd@cray.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-10mm: provide init_mm mm_context initializerHeiko Carstens1-0/+6
Provide an INIT_MM_CONTEXT intializer macro which can be used to statically initialize mm_struct:mm_context of init_mm. This way we can get rid of code which will do the initialization at run time (on s390). In addition the current code can be found at a place where it is not expected. So let's have a common initializer which architectures can use if needed. This is based on a patch from Suzuki Poulose. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Suzuki Poulose <suzuki@in.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-10mm: use ERR_CASTJulia Lawall1-1/+1
Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)). The former makes more clear what is the purpose of the operation, which otherwise looks like a no-op. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ type T; T x; identifier f; @@ T f (...) { <+... - ERR_PTR(PTR_ERR(x)) + x ...+> } @@ expression x; @@ - ERR_PTR(PTR_ERR(x)) + ERR_CAST(x) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-10mm: use memdup_userJulia Lawall1-8/+3
Use memdup_user when user data is immediately copied into the allocated region. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression from,to,size,flag; position p; identifier l1,l2; @@ - to = \(kmalloc@p\|kzalloc@p\)(size,flag); + to = memdup_user(from,size); if ( - to==NULL + IS_ERR(to) || ...) { <+... when != goto l1; - -ENOMEM + PTR_ERR(to) ...+> } - if (copy_from_user(to, from, size) != 0) { - <+... when != goto l2; - -EFAULT - ...+> - } // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-06Merge branch 'for-linus' of ↵Linus Torvalds3-45/+52
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: slub: Allow removal of slab caches during boot Revert "slub: Allow removal of slab caches during boot" slub numa: Fix rare allocation from unexpected node slab: use deferable timers for its periodic housekeeping slub: Use kmem_cache flags to detect if slab is in debugging mode. slub: Allow removal of slab caches during boot slub: Check kasprintf results in kmem_cache_init() SLUB: Constants need UL slub: Use a constant for a unspecified node. SLOB: Free objects to their own list slab: fix caller tracking on !CONFIG_DEBUG_SLAB && CONFIG_TRACING
2010-08-06Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Ioremap: fix wrong physical address handling in PAT code x86, tlb: Clean up and correct used type x86, iomap: Fix wrong page aligned size calculation in ioremapping code x86, mm: Create symbolic index into address_markers array x86, ioremap: Fix normal ram range check x86, ioremap: Fix incorrect physical address handling in PAE mode x86-64, mm: Initialize VDSO earlier on 64 bits x86, kmmio/mmiotrace: Fix double free of kmmio_fault_pages
2010-08-06Merge branch 'perf-core-for-linus' of ↵Linus Torvalds4-4/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (162 commits) tracing/kprobes: unregister_trace_probe needs to be called under mutex perf: expose event__process function perf events: Fix mmap offset determination perf, powerpc: fsl_emb: Restore setting perf_sample_data.period perf, powerpc: Convert the FSL driver to use local64_t perf tools: Don't keep unreferenced maps when unmaps are detected perf session: Invalidate last_match when removing threads from rb_tree perf session: Free the ref_reloc_sym memory at the right place x86,mmiotrace: Add support for tracing STOS instruction perf, sched migration: Librarize task states and event headers helpers perf, sched migration: Librarize the GUI class perf, sched migration: Make the GUI class client agnostic perf, sched migration: Make it vertically scrollable perf, sched migration: Parameterize cpu height and spacing perf, sched migration: Fix key bindings perf, sched migration: Ignore unhandled task states perf, sched migration: Handle ignored migrate out events perf: New migration tool overview tracing: Drop cpparg() macro perf: Use tracepoint_synchronize_unregister() to flush any pending tracepoint call ... Fix up trivial conflicts in Makefile and drivers/cpufreq/cpufreq.c
2010-08-06Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds2-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: Revert "net: Make accesses to ->br_port safe for sparse RCU" mce: convert to rcu_dereference_index_check() net: Make accesses to ->br_port safe for sparse RCU vfs: add fs.h to define struct file lockdep: Add an in_workqueue_context() lockdep-based test function rcu: add __rcu API for later sparse checking rcu: add an rcu_dereference_index_check() tree/tiny rcu: Add debug RCU head objects mm: remove all rcu head initializations fs: remove all rcu head initializations, except on_stack initializations powerpc: remove all rcu head initializations
2010-08-06Merge branch 'for_linus' of ↵Linus Torvalds1-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: debug_core,kdb: fix crash when arch does not have single step kgdb,x86: use macro HBP_NUM to replace magic number 4 kgdb,mips: remove unused kgdb_cpu_doing_single_step operations mm,kdb,kgdb: Add a debug reference for the kdb kmap usage KGDB: Remove set but unused newPC ftrace,kdb: Allow dumping a specific cpu's buffer with ftdump ftrace,kdb: Extend kdb to be able to dump the ftrace buffer kgdb,powerpc: Replace hardcoded offset by BREAK_INSTR_SIZE arm,kgdb: Add ability to trap into debugger on notify_die gdbstub: do not directly use dbg_reg_def[] in gdb_cmd_reg_set() gdbstub: Implement gdbserial 'p' and 'P' packets kgdb,arm: Individual register get/set for arm kgdb,mips: Individual register get/set for mips kgdb,x86: Individual register get/set for x86 kgdb,kdb: individual register set and and get API gdbstub: Optimize kgdb's "thread:" response for the gdb serial protocol kgdb: remove custom hex_to_bin()implementation
2010-08-05mm,kdb,kgdb: Add a debug reference for the kdb kmap usageJason Wessel1-0/+7
The kdb kmap should never get used outside of the kernel debugger exception context. Signed-off-by: Jason Wessel<jason.wessel@windriver.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: Ingo Molnar <mingo@elte.hu> CC: linux-mm@kvack.org
2010-08-05Merge branch 'for-linus' of ↵Linus Torvalds1-36/+49
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: allow limited allocation before slab is online percpu: make @dyn_size always mean min dyn_size in first chunk init functions
2010-08-04Merge branches 'slab/fixes', 'slob/fixes', 'slub/cleanups' and 'slub/fixes' ↵Pekka Enberg3-45/+52
into for-linus
2010-08-04Merge branch 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+33
* 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (198 commits) KVM: VMX: Fix host GDT.LIMIT corruption KVM: MMU: using __xchg_spte more smarter KVM: MMU: cleanup spte set and accssed/dirty tracking KVM: MMU: don't atomicly set spte if it's not present KVM: MMU: fix page dirty tracking lost while sync page KVM: MMU: fix broken page accessed tracking with ept enabled KVM: MMU: add missing reserved bits check in speculative path KVM: MMU: fix mmu notifier invalidate handler for huge spte KVM: x86 emulator: fix xchg instruction emulation KVM: x86: Call mask notifiers from pic KVM: x86: never re-execute instruction with enabled tdp KVM: Document KVM_GET_SUPPORTED_CPUID2 ioctl KVM: x86: emulator: inc/dec can have lock prefix KVM: MMU: Eliminate redundant temporaries in FNAME(fetch) KVM: MMU: Validate all gptes during fetch, not just those used for new pages KVM: MMU: Simplify spte fetch() function KVM: MMU: Add gpte_valid() helper KVM: MMU: Add validate_direct_spte() helper KVM: MMU: Add drop_large_spte() helper KVM: MMU: Use __set_spte to link shadow pages ...
2010-08-03slub: Allow removal of slab caches during bootChristoph Lameter1-9/+15
Serialize kmem_cache_create and kmem_cache_destroy using the slub_lock. Only possible after the use of the slub_lock during dynamic dma creation has been removed. Then make sure that the setup of the slab sysfs entries does not race with kmem_cache_create and kmem_cache destroy. If a slab cache is removed before we have setup sysfs then simply skip over the sysfs handling. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Roland Dreier <rdreier@cisco.com> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-08-03Revert "slub: Allow removal of slab caches during boot"Pekka Enberg1-7/+0
This reverts commit f5b801ac38a9612b380ee9a75ab1861f0594e79f.
2010-08-02Merge commit 'v2.6.35' into perf/coreIngo Molnar1-3/+13
Conflicts: tools/perf/Makefile tools/perf/util/hist.c Merge reason: Resolve the conflicts and update to latest upstream. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-01KVM: Fix a race condition for usage of is_hwpoison_address()Huang Ying1-0/+3
is_hwpoison_address accesses the page table, so the caller must hold current->mm->mmap_sem in read mode. So fix its usage in hva_to_pfn of kvm accordingly. Comment is_hwpoison_address to remind other users. Reported-by: Avi Kivity <avi@redhat.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01KVM: Avoid killing userspace through guest SRAO MCE on unmapped pagesHuang Ying1-0/+30
In common cases, guest SRAO MCE will cause corresponding poisoned page be un-mapped and SIGBUS be sent to QEMU-KVM, then QEMU-KVM will relay the MCE to guest OS. But it is reported that if the poisoned page is accessed in guest after unmapping and before MCE is relayed to guest OS, userspace will be killed. The reason is as follows. Because poisoned page has been un-mapped, guest access will cause guest exit and kvm_mmu_page_fault will be called. kvm_mmu_page_fault can not get the poisoned page for fault address, so kernel and user space MMIO processing is tried in turn. In user MMIO processing, poisoned page is accessed again, then userspace is killed by force_sig_info. To fix the bug, kvm_mmu_page_fault send HWPOISON signal to QEMU-KVM and do not try kernel and user space MMIO processing for poisoned page. [xiao: fix warning introduced by avi] Reported-by: Max Asbock <masbock@linux.vnet.ibm.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-07-31mm: fix ia64 crash when gcore reads gate areaHugh Dickins1-3/+13
Debian's ia64 autobuilders have been seeing kernel freeze or reboot when running the gdb testsuite (Debian bug 588574): dannf bisected to 2.6.32 62eede62dafb4a6633eae7ffbeb34c60dba5e7b1 "mm: ZERO_PAGE without PTE_SPECIAL"; and reproduced it with gdb's gcore on a simple target. I'd missed updating the gate_vma handling in __get_user_pages(): that happens to use vm_normal_page() (nowadays failing on the zero page), yet reported success even when it failed to get a page - boom when access_process_vm() tried to copy that to its intermediate buffer. Fix this, resisting cleanups: in particular, leave it for now reporting success when not asked to get any pages - very probably safe to change, but let's not risk it without testing exposure. Why did ia64 crash with 16kB pages, but succeed with 64kB pages? Because setup_gate() pads each 64kB of its gate area with zero pages. Reported-by: Andreas Barth <aba@not.so.argh.org> Bisected-by: dann frazier <dannf@debian.org> Signed-off-by: Hugh Dickins <hughd@google.com> Tested-by: dann frazier <dannf@dannf.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-29slub numa: Fix rare allocation from unexpected nodeChristoph Lameter1-1/+1
The network developers have seen sporadic allocations resulting in objects coming from unexpected NUMA nodes despite asking for objects from a specific node. This is due to get_partial() calling get_any_partial() if partial slabs are exhausted for a node even if a node was specified and therefore one would expect allocations only from the specified node. get_any_partial() sporadically may return a slab from a foreign node to gradually reduce the size of partial lists on remote nodes and thereby reduce total memory use for a slab cache. The behavior is controlled by the remote_defrag_ratio of each cache. Strictly speaking this is permitted behavior since __GFP_THISNODE was not specified for the allocation but it is certain surprising. This patch makes sure that the remote defrag behavior only occurs if no node was specified. Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-07-21Merge branch 'linus' into perf/coreIngo Molnar9-23/+592
Merge reason: Pick up the latest perf fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-21x86,nobootmem: make alloc_bootmem_node fall back to other node when 32bit ↵Yinghai Lu2-4/+23
numa is used Borislav Petkov reported his 32bit numa system has problem: [ 0.000000] Reserving total of 4c00 pages for numa KVA remap [ 0.000000] kva_start_pfn ~ 32800 max_low_pfn ~ 375fe [ 0.000000] max_pfn = 238000 [ 0.000000] 8202MB HIGHMEM available. [ 0.000000] 885MB LOWMEM available. [ 0.000000] mapped low ram: 0 - 375fe000 [ 0.000000] low ram: 0 - 375fe000 [ 0.000000] alloc (nid=8 100000 - 7ee00000) (1000000 - ffffffff) 1000 1000 => 34e7000 [ 0.000000] alloc (nid=8 100000 - 7ee00000) (1000000 - ffffffff) 200 40 => 34c9d80 [ 0.000000] alloc (nid=0 100000 - 7ee00000) (1000000 - ffffffffffffffff) 180 40 => 34e6140 [ 0.000000] alloc (nid=1 80000000 - c7e60000) (1000000 - ffffffffffffffff) 240 40 => 80000000 [ 0.000000] BUG: unable to handle kernel paging request at 40000000 [ 0.000000] IP: [<c2c8cff1>] __alloc_memory_core_early+0x147/0x1d6 [ 0.000000] *pdpt = 0000000000000000 *pde = f000ff53f000ff00 ... [ 0.000000] Call Trace: [ 0.000000] [<c2c8b4f8>] ? __alloc_bootmem_node+0x216/0x22f [ 0.000000] [<c2c90c9b>] ? sparse_early_usemaps_alloc_node+0x5a/0x10b [ 0.000000] [<c2c9149e>] ? sparse_init+0x1dc/0x499 [ 0.000000] [<c2c79118>] ? paging_init+0x168/0x1df [ 0.000000] [<c2c780ff>] ? native_pagetable_setup_start+0xef/0x1bb looks like it allocates too much high address for bootmem. Try to cut limit with get_max_mapped() Reported-by: Borislav Petkov <borislav.petkov@amd.com> Tested-by: Conny Seidel <conny.seidel@amd.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: <stable@kernel.org> [2.6.34.x] Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-21mm/vmscan.c: fix mapping use after freeNick Piggin1-1/+1
We need lock_page_nosync() here because we have no reference to the mapping when taking the page lock. Signed-off-by: Nick Piggin <npiggin@suse.de> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-20slab: use deferable timers for its periodic housekeepingArjan van de Ven1-1/+1
slab has a "once every 2 second" timer for its housekeeping. As the number of logical processors is growing, its more and more common that this 2 second timer becomes the primary wakeup source. This patch turns this housekeeping timer into a deferable timer, which means that the timer does not interrupt idle, but just runs at the next event that wakes the cpu up. The impact is that the timer likely runs a bit later, but during the delay no code is running so there's not all that much reason for a difference in housekeeping to occur because of this delay. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-07-19mm: add context argument to shrinker callbackDave Chinner1-3/+5
The current shrinker implementation requires the registered callback to have global state to work from. This makes it difficult to shrink caches that are not global (e.g. per-filesystem caches). Pass the shrinker structure to the callback so that users can embed the shrinker structure in the context the shrinker needs to operate on and get back to it in the callback via container_of(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-07-20Merge branch 'kmemleak' of ↵Linus Torvalds2-0/+12
git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-2.6-cm * 'kmemleak' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-2.6-cm: kmemleak: Add support for NO_BOOTMEM configurations kmemleak: Annotate false positive in init_section_page_cgroup()
2010-07-19kmemleak: Add support for NO_BOOTMEM configurationsCatalin Marinas1-0/+5
With commits 08677214 and 59be5a8e, alloc_bootmem()/free_bootmem() and friends use the early_res functions for memory management when NO_BOOTMEM is enabled. This patch adds the kmemleak calls in the corresponding code paths for bootmem allocations. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: stable@kernel.org
2010-07-19kmemleak: Annotate false positive in init_section_page_cgroup()Catalin Marinas1-0/+7
The pointer to the page_cgroup table allocated in init_section_page_cgroup() is stored in section->page_cgroup as (base - pfn). Since this value does not point to the beginning or inside the allocated memory block, kmemleak reports a false positive. This was reported in bugzilla.kernel.org as #16297. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Adrien Dessemond <adrien.dessemond@gmail.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Andrew Morton <akpm@linux-foundation.org>
2010-07-16slub: Use kmem_cache flags to detect if slab is in debugging mode.Christoph Lameter1-21/+12
The cacheline with the flags is reachable from the hot paths after the percpu allocator changes went in. So there is no need anymore to put a flag into each slab page. Get rid of the SlubDebug flag and use the flags in kmem_cache instead. Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-07-16slub: Allow removal of slab caches during bootChristoph Lameter1-0/+7
If a slab cache is removed before we have setup sysfs then simply skip over the sysfs handling. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Roland Dreier <rdreier@cisco.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-07-16slub: Check kasprintf results in kmem_cache_init()Christoph Lameter1-3/+6
Small allocations may fail during slab bringup which is fatal. Add a BUG_ON() so that we fail immediately rather than failing later during sysfs processing. Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-07-16SLUB: Constants need ULChristoph Lameter1-2/+2
UL suffix is missing in some constants. Conform to how slab.h uses constants. Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-07-16slub: Use a constant for a unspecified node.Christoph Lameter1-7/+7
kmalloc_node() and friends can be passed a constant -1 to indicate that no choice was made for the node from which the object needs to come. Use NUMA_NO_NODE instead of -1. CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-07-16SLOB: Free objects to their own listBob Liu1-1/+8
SLOB has alloced smaller objects from their own list in reduce overall external fragmentation and increase repeatability, free to their own list also. This is /proc/meminfo result in my test machine: without this patch: === MemTotal: 1030720 kB MemFree: 750012 kB Buffers: 15496 kB Cached: 160396 kB SwapCached: 0 kB Active: 105024 kB Inactive: 145604 kB Active(anon): 74816 kB Inactive(anon): 2180 kB Active(file): 30208 kB Inactive(file): 143424 kB Unevictable: 16 kB .... with this patch: === MemTotal: 1030720 kB MemFree: 751908 kB Buffers: 15492 kB Cached: 160280 kB SwapCached: 0 kB Active: 102720 kB Inactive: 146140 kB Active(anon): 73168 kB Inactive(anon): 2180 kB Active(file): 29552 kB Inactive(file): 143960 kB Unevictable: 16 kB ... The result shows an improvement of 1 MB! And when I tested it on a embeded system with 64 MB, I found this path is never called during kernel bootup. Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Bob Liu <lliubbo@gmail.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-07-14lmb: rename to memblockYinghai Lu3-0/+546
via following scripts FILES=$(find * -type f | grep -vE 'oprofile|[^K]config') sed -i \ -e 's/lmb/memblock/g' \ -e 's/LMB/MEMBLOCK/g' \ $FILES for N in $(find . -name lmb.[ch]); do M=$(echo $N | sed 's/lmb/memblock/g') mv $N $M done and remove some wrong change like lmbench and dlmb etc. also move memblock.c from lib/ to mm/ Suggested-by: Ingo Molnar <mingo@elte.hu> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-09x86, ioremap: Fix incorrect physical address handling in PAE modeKenji Kaneshige1-1/+1
Current x86 ioremap() doesn't handle physical address higher than 32-bit properly in X86_32 PAE mode. When physical address higher than 32-bit is passed to ioremap(), higher 32-bits in physical address is cleared wrongly. Due to this bug, ioremap() can map wrong address to linear address space. In my case, 64-bit MMIO region was assigned to a PCI device (ioat device) on my system. Because of the ioremap()'s bug, wrong physical address (instead of MMIO region) was mapped to linear address space. Because of this, loading ioatdma driver caused unexpected behavior (kernel panic, kernel hangup, ...). Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> LKML-Reference: <4C1AE680.7090408@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-07-08Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2-15/+5
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: writeback: simplify the write back thread queue writeback: split writeback_inodes_wb writeback: remove writeback_inodes_wbc fs-writeback: fix kernel-doc warnings splice: check f_mode for seekable file splice: direct_splice_actor() should not use pos in sd
2010-07-06writeback: simplify the write back thread queueChristoph Hellwig1-11/+3
First remove items from work_list as soon as we start working on them. This means we don't have to track any pending or visited state and can get rid of all the RCU magic freeing the work items - we can simply free them once the operation has finished. Second use a real completion for tracking synchronous requests - if the caller sets the completion pointer we complete it, otherwise use it as a boolean indicator that we can free the work item directly. Third unify struct wb_writeback_args and struct bdi_work into a single data structure, wb_writeback_work. Previous we set all parameters into a struct wb_writeback_args, copied it into struct bdi_work, copied it again on the stack to use it there. Instead of just allocate one structure dynamically or on the stack and use it all the way through the stack. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-07-06writeback: remove writeback_inodes_wbcChristoph Hellwig2-4/+2
This was just an odd wrapper around writeback_inodes_wb. Removing this also allows to get rid of the bdi member of struct writeback_control which was rather out of place there. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-07-05Merge commit 'v2.6.35-rc4' into perf/coreIngo Molnar3-8/+10
Merge reason: Pick up the latest perf fixes Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-01Merge branch 'linus' into core/rcuIngo Molnar4-14/+40
Conflicts: fs/fs-writeback.c Merge reason: Resolve the conflict Note, i picked the version from Linus's tree, which effectively reverts the fs-writeback.c bits of: b97181f: fs: remove all rcu head initializations, except on_stack initializations As the upstream changes to this file changed this code heavily and the first attempt to resolve the conflict resulted in a non-booting kernel. It's safer to re-try this portion of the commit cleanly. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-30mempolicy: fix dangling reference to tmpfs superblock mpolLee Schermerhorn1-4/+5
My patch to "Factor out duplicate put/frees in mpol_shared_policy_init() to a common return path"; and Dan Carpenter's fix thereto both left a dangling reference to the incoming tmpfs superblock mempolicy structure. A similar leak was introduced earlier when the nodemask was moved offstack to the scratch area despite the note in the comment block regarding the incoming ref. Move the remaining 'put of the incoming "mpol" to the common exit path to drop the reference. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Dan Carpenter <error27@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-06-30memcg: fix wake up in oom wait queueKAMEZAWA Hiroyuki1-1/+3
OOM-waitqueue should be waken up when oom_disable is canceled. This is a fix for 3c11ecf448eff8f1 ("memcg: oom kill disable and oom status"). How to test: Create a cgroup A... 1. set memory.limit and memory.memsw.limit to be small value 2. echo 1 > /cgroup/A/memory.oom_control, this disables oom-kill. 3. run a program which must cause OOM. A program executed in 3 will sleep by oom_waiqueue in memcg. Then, how to wake it up is problem. 1. echo 0 > /cgroup/A/memory.oom_control (enable OOM-killer) 2. echo big mem > /cgroup/A/memory.memsw.limit_in_bytes(allow more swap) etc.. Without the patch, a task in slept can not be waken up. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-06-29Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds1-3/+2
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: Don't count_vm_events for discard bio in submit_bio. cfq: fix recursive call in cfq_blkiocg_update_completion_stats() cfq-iosched: Fixed boot warning with BLK_CGROUP=y and CFQ_GROUP_IOSCHED=n cfq: Don't allow queue merges for queues that have no process references block: fix DISCARD_BARRIER requests cciss: set SCSI max cmd len to 16, as default is wrong cpqarray: fix two more wrong section type cpqarray: fix wrong __init type on pci probe function drbd: Fixed a race between disk-attach and unexpected state changes writeback: fix pin_sb_for_writeback writeback: add missing requeue_io in writeback_inodes_wb writeback: simplify and split bdi_start_writeback writeback: simplify wakeup_flusher_threads writeback: fix writeback_inodes_wb from writeback_inodes_sb writeback: enforce s_umount locking in writeback_inodes_sb writeback: queue work on stack in writeback_inodes_sb writeback: fix writeback completion notifications
2010-06-29Merge branch 'linus' into perf/coreThomas Gleixner1-6/+30
Reason: Further changes conflict with upstream fixes Signed-off-by: Thomas Gleixner <tglx@linutronix.de>