summaryrefslogtreecommitdiff
path: root/mm/slab_common.c
AgeCommit message (Collapse)AuthorFilesLines
2013-11-13memcg, kmem: rename cache_from_memcg to cache_from_memcg_idxQiang Huang1-1/+1
We can't see the relationship with memcg from the parameters, so the name with memcg_idx would be more reasonable. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-28slab_common: Do not check for duplicate slab namesChristoph Lameter1-0/+2
SLUB can alias multiple slab kmem_create_requests to one slab cache to save memory and increase the cache hotness. As a result the name of the slab can be stale. Only check the name for duplicates if we are in debug mode where we do not merge multiple caches. This fixes the following problem reported by Jonathan Brassow: The problem with kmem_cache* is this: *) Assume CONFIG_SLUB is set 1) kmem_cache_create(name="foo-a") - creates new kmem_cache structure 2) kmem_cache_create(name="foo-b") - If identical cache characteristics, it will be merged with the previously created cache associated with "foo-a". The cache's refcount will be incremented and an alias will be created via sysfs_slab_alias(). 3) kmem_cache_destroy(<ptr>) - Attempting to destroy cache associated with "foo-a", but instead the refcount is simply decremented. I don't even think the sysfs aliases are ever removed... 4) kmem_cache_create(name="foo-a") - This FAILS because kmem_cache_sanity_check colides with the existing name ("foo-a") associated with the non-removed cache. This is a problem for RAID (specifically dm-raid) because the name used for the kmem_cache_create is ("raid%d-%p", level, mddev). If the cache persists for long enough, the memory address of an old mddev will be reused for a new mddev - causing an identical formulation of the cache name. Even though kmem_cache_destory had long ago been used to delete the old cache, the merging of caches has cause the name and cache of that old instance to be preserved and causes a colision (and thus failure) in kmem_cache_create(). I see this regularly in my testing. Reported-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-09-04mm/sl[aou]b: Move kmallocXXX functions to common codeChristoph Lameter1-0/+10
The kmalloc* functions of all slab allcoators are similar now so lets move them into slab.h. This requires some function naming changes in slob. As a results of this patch there is a common set of functions for all allocators. Also means that kmalloc_large() is now available in general to perform large order allocations that go directly via the page allocator. kmalloc_large() can be substituted if kmalloc() throws warnings because of too large allocations. kmalloc_large() has exactly the same semantics as kmalloc but can only used for allocations > PAGE_SIZE. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-08-14mm, slab_common: add 'unlikely' to size check of kmalloc_slab()Joonsoo Kim1-1/+1
Size is usually below than KMALLOC_MAX_SIZE. If we add a 'unlikely' macro, compiler can make better code. Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-07-15Merge branch 'slab/for-linus' of ↵Linus Torvalds1-5/+13
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux Pull slab update from Pekka Enberg: "Highlights: - Fix for boot-time problems on some architectures due to init_lock_keys() not respecting kmalloc_caches boundaries (Christoph Lameter) - CONFIG_SLUB_CPU_PARTIAL requested by RT folks (Joonsoo Kim) - Fix for excessive slab freelist draining (Wanpeng Li) - SLUB and SLOB cleanups and fixes (various people)" I ended up editing the branch, and this avoids two commits at the end that were immediately reverted, and I instead just applied the oneliner fix in between myself. * 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux slub: Check for page NULL before doing the node_match check mm/slab: Give s_next and s_stop slab-specific names slob: Check for NULL pointer before calling ctor() slub: Make cpu partial slab support configurable slab: add kmalloc() to kernel API documentation slab: fix init_lock_keys slob: use DIV_ROUND_UP where possible slub: do not put a slab to cpu partial list when cpu_partial is 0 mm/slub: Use node_nr_slabs and node_nr_objs in get_slabinfo mm/slub: Drop unnecessary nr_partials mm/slab: Fix /proc/slabinfo unwriteable for slab mm/slab: Sharing s_next and s_stop between slab and slub mm/slab: Fix drain freelist excessively slob: Rework #ifdeffery in slab.h mm, slab: moved kmem_cache_alloc_node comment to correct place
2013-07-08mm/slab: Give s_next and s_stop slab-specific namesWanpeng Li1-4/+4
Give s_next and s_stop slab-specific names instead of exporting "s_next" and "s_stop". Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-07-07mm/slab: Fix /proc/slabinfo unwriteable for slabWanpeng Li1-1/+9
Slab have some tunables like limit, batchcount, and sharedfactor can be tuned through function slabinfo_write. Commit (b7454ad3: mm/sl[au]b: Move slabinfo processing to slab_common.c) uncorrectly change /proc/slabinfo unwriteable for slab, this patch fix it by revert to original mode. Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-07-07mm/slab: Sharing s_next and s_stop between slab and slubWanpeng Li1-2/+2
This patch shares s_next and s_stop between slab and slub. Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-06-13slab: prevent warnings when allocating with __GFP_NOWARNSasha Levin1-1/+3
Sasha Levin noticed that the warning introduced by commit 6286ae9 ("slab: Return NULL for oversized allocations) is being triggered: WARNING: CPU: 15 PID: 21519 at mm/slab_common.c:376 kmalloc_slab+0x2f/0xb0() can: request_module (can-proto-4) failed. mpoa: proc_mpc_write: could not parse '' Modules linked in: CPU: 15 PID: 21519 Comm: trinity-child15 Tainted: G W 3.10.0-rc4-next-20130607-sasha-00011-gcd78395-dirty #2 0000000000000009 ffff880020a95e30 ffffffff83ff4041 0000000000000000 ffff880020a95e68 ffffffff8111fe12 fffffffffffffff0 00000000000082d0 0000000000080000 0000000000080000 0000000001400000 ffff880020a95e78 Call Trace: [<ffffffff83ff4041>] dump_stack+0x4e/0x82 [<ffffffff8111fe12>] warn_slowpath_common+0x82/0xb0 [<ffffffff8111fe55>] warn_slowpath_null+0x15/0x20 [<ffffffff81243dcf>] kmalloc_slab+0x2f/0xb0 [<ffffffff81278d54>] __kmalloc+0x24/0x4b0 [<ffffffff8196ffe3>] ? security_capable+0x13/0x20 [<ffffffff812a26b7>] ? pipe_fcntl+0x107/0x210 [<ffffffff812a26b7>] pipe_fcntl+0x107/0x210 [<ffffffff812b7ea0>] ? fget_raw_light+0x130/0x3f0 [<ffffffff812aa5fb>] SyS_fcntl+0x60b/0x6a0 [<ffffffff8403ca98>] tracesys+0xe1/0xe6 Andrew Morton writes: __GFP_NOWARN is frequently used by kernel code to probe for "how big an allocation can I get". That's a bit lame, but it's used on slow paths and is pretty simple. However, SLAB would still spew a warning when a big allocation happens if the __GFP_NOWARN flag is _not_ set to expose kernel bugs. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> [ penberg@kernel.org: improve changelog ] Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-05-09mm/slab: Fix crash during slab initChris Mason1-10/+10
Commit 8a965b3baa89 ("mm, slab_common: Fix bootstrap creation of kmalloc caches") introduced a regression that caused us to crash early during boot. The commit was introducing ordering of slab creation, making sure two odd-sized slabs were created after specific powers of two sizes. But, if any of the power of two slabs were created earlier during boot, slabs at index 1 or 2 might not get created at all. This patch makes sure none of the slabs get skipped. Tony Lindgren bisected this down to the offending commit, which really helped because bisect kept bringing me to almost but not quite this one. Signed-off-by: Chris Mason <chris.mason@fusionio.com> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: Tony Lindgren <tony@atomide.com> Acked-by: Soren Brinkmann <soren.brinkmann@xilinx.com> Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07mm, slab_common: Fix bootstrap creation of kmalloc cachesChristoph Lameter1-9/+15
For SLAB the kmalloc caches must be created in ascending sizes in order for the OFF_SLAB sub-slab cache to work properly. Create the non power of two caches immediately after the prior power of two kmalloc cache. Do not create the non power of two caches before all other caches. Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Christoph Lamete <cl@linux.com> Link: http://lkml.kernel.org/r/201305040348.CIF81716.OStQOHFJMFLOVF@I-love.SAKURA.ne.jp Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-05-06slab: Return NULL for oversized allocationsChristoph Lameter1-0/+3
The inline path seems to have changed the SLAB behavior for very large kmalloc allocations with commit e3366016 ("slab: Use common kmalloc_index/kmalloc_size functions"). This patch restores the old behavior but also adds diagnostics so that we can figure where in the code these large allocations occur. Reported-and-tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Christoph Lameter <cl@linux.com> Link: http://lkml.kernel.org/r/201305040348.CIF81716.OStQOHFJMFLOVF@I-love.SAKURA.ne.jp [ penberg@kernel.org: use WARN_ON_ONCE ] Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-02-06mm/sl[au]b: correct allocation type check in kmalloc_slab()Joonsoo Kim1-1/+1
commit "slab: Common Kmalloc cache determination" made mistake in kmalloc_slab(). SLAB_CACHE_DMA is for kmem_cache creation, not for allocation. For allocation, we should use GFP_XXX to identify type of allocation. So, change SLAB_CACHE_DMA to GFP_DMA. Acked-by: Christoph Lameter <cl@linux.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-02-01slab: Common Kmalloc cache determinationChristoph Lameter1-2/+103
Extract the optimized lookup functions from slub and put them into slab_common.c. Then make slab use these functions as well. Joonsoo notes that this fixes some issues with constant folding which also reduces the code size for slub. https://lkml.org/lkml/2012/10/20/82 Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-02-01slab: Common function to create the kmalloc arrayChristoph Lameter1-0/+54
The kmalloc array is created in similar ways in both SLAB and SLUB. Create a common function and have both allocators call that function. V1->V2: Whitespace cleanup Reviewed-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-02-01slab: Common definition for the array of kmalloc cachesChristoph Lameter1-0/+8
Have a common definition fo the kmalloc cache arrays in SLAB and SLUB Acked-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-02-01slab: Use proper formatting specs for unsigned size_tChristoph Lameter1-1/+1
Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-12-19slab: propagate tunable valuesGlauber Costa1-3/+4
SLAB allows us to tune a particular cache behavior with tunables. When creating a new memcg cache copy, we'd like to preserve any tunables the parent cache already had. This could be done by an explicit call to do_tune_cpucache() after the cache is created. But this is not very convenient now that the caches are created from common code, since this function is SLAB-specific. Another method of doing that is taking advantage of the fact that do_tune_cpucache() is always called from enable_cpucache(), which is called at cache initialization. We can just preset the values, and then things work as expected. It can also happen that a root cache has its tunables updated during normal system operation. In this case, we will propagate the change to all caches that are already active. This change will require us to move the assignment of root_cache in memcg_params a bit earlier. We need this to be already set - which memcg_kmem_register_cache will do - when we reach __kmem_cache_create() Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-19memcg: aggregate memcg cache values in slabinfoGlauber Costa1-4/+40
When we create caches in memcgs, we need to display their usage information somewhere. We'll adopt a scheme similar to /proc/meminfo, with aggregate totals shown in the global file, and per-group information stored in the group itself. For the time being, only reads are allowed in the per-group cache. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-19memcg/sl[au]b: track all the memcg children of a kmem_cacheGlauber Costa1-0/+3
This enables us to remove all the children of a kmem_cache being destroyed, if for example the kernel module it's being used in gets unloaded. Otherwise, the children will still point to the destroyed parent. Signed-off-by: Suleiman Souhlal <suleiman@google.com> Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-19memcg: allocate memory for memcg caches whenever a new memcg appearsGlauber Costa1-0/+28
Every cache that is considered a root cache (basically the "original" caches, tied to the root memcg/no-memcg) will have an array that should be large enough to store a cache pointer per each memcg in the system. Theoreticaly, this is as high as 1 << sizeof(css_id), which is currently in the 64k pointers range. Most of the time, we won't be using that much. What goes in this patch, is a simple scheme to dynamically allocate such an array, in order to minimize memory usage for memcg caches. Because we would also like to avoid allocations all the time, at least for now, the array will only grow. It will tend to be big enough to hold the maximum number of kmem-limited memcgs ever achieved. We'll allocate it to be a minimum of 64 kmem-limited memcgs. When we have more than that, we'll start doubling the size of this array every time the limit is reached. Because we are only considering kmem limited memcgs, a natural point for this to happen is when we write to the limit. At that point, we already have set_limit_mutex held, so that will become our natural synchronization mechanism. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-19slab/slub: consider a memcg parameter in kmem_create_cacheGlauber Costa1-9/+33
Allow a memcg parameter to be passed during cache creation. When the slub allocator is being used, it will only merge caches that belong to the same memcg. We'll do this by scanning the global list, and then translating the cache to a memcg-specific cache Default function is created as a wrapper, passing NULL to the memcg version. We only merge caches that belong to the same memcg. A helper is provided, memcg_css_id: because slub needs a unique cache name for sysfs. Since this is visible, but not the canonical location for slab data, the cache name is not used, the css_id should suffice. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11mm/sl[aou]b: Common alignment codeChristoph Lameter1-2/+30
Extract the code to do object alignment from the allocators. Do the alignment calculations in slab_common so that the __kmem_cache_create functions of the allocators do not have to deal with alignment. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-12-11mm, sl[au]b: create common functions for boot slab creationChristoph Lameter1-0/+36
Use a special function to create kmalloc caches and use that function in SLAB and SLUB. Acked-by: Joonsoo Kim <js1304@gmail.com> Reviewed-by: Glauber Costa <glommer@parallels.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-10-31slab: Ignore internal flags in cache creationGlauber Costa1-0/+7
Some flags are used internally by the allocators for management purposes. One example of that is the CFLGS_OFF_SLAB flag that slab uses to mark that the metadata for that cache is stored outside of the slab. No cache should ever pass those as a creation flags. We can just ignore this bit if it happens to be passed (such as when duplicating a cache in the kmem memcg patches). Because such flags can vary from allocator to allocator, we allow them to make their own decisions on that, defining SLAB_AVAILABLE_FLAGS with all flags that are valid at creation time. Allocators that doesn't have any specific flag requirement should define that to mean all flags. Common code will mask out all flags not belonging to that set. Acked-by: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-10-24sl[au]b: Process slabinfo_show in common codeGlauber Costa1-1/+17
With all the infrastructure in place, we can now have slabinfo_show done from slab_common.c. A cache-specific function is called to grab information about the cache itself, since that is still heavily dependent on the implementation. But with the values produced by it, all the printing and handling is done from common code. Signed-off-by: Glauber Costa <glommer@parallels.com> CC: Christoph Lameter <cl@linux.com> CC: David Rientjes <rientjes@google.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-10-24mm/sl[au]b: Move print_slabinfo_header to slab_common.cGlauber Costa1-0/+23
The header format is highly similar between slab and slub. The main difference lays in the fact that slab may optionally have statistics added here in case of CONFIG_SLAB_DEBUG, while the slub will stick them somewhere else. By making sure that information conditionally lives inside a globally-visible CONFIG_DEBUG_SLAB switch, we can move the header printing to a common location. Signed-off-by: Glauber Costa <glommer@parallels.com> Acked-by: Christoph Lameter <cl@linux.com> CC: David Rientjes <rientjes@google.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-10-24mm/sl[au]b: Move slabinfo processing to slab_common.cGlauber Costa1-0/+70
This patch moves all the common machinery to slabinfo processing to slab_common.c. We can do better by noticing that the output is heavily common, and having the allocators to just provide finished information about this. But after this first step, this can be done easier. Signed-off-by: Glauber Costa <glommer@parallels.com> Acked-by: Christoph Lameter <cl@linux.com> CC: David Rientjes <rientjes@google.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-10-10mm, slab: release slab_mutex earlier in kmem_cache_destroy()Jiri Kosina1-1/+4
Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock dependency through kmem_cache_destroy() -> rcu_barrier() -> _rcu_barrier() -> get_online_cpus(). Lockdep thinks that this might actually result in ABBA deadlock, and reports it as below: === [ cut here ] === ====================================================== [ INFO: possible circular locking dependency detected ] 3.6.0-rc5-00004-g0d8ee37 #143 Not tainted ------------------------------------------------------- kworker/u:2/40 is trying to acquire lock: (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0 but task is already holding lock: (slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (slab_mutex){+.+.+.}: [<ffffffff810ae1e2>] validate_chain+0x632/0x720 [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530 [<ffffffff810ae921>] lock_acquire+0x121/0x190 [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450 [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50 [<ffffffff81558cb5>] cpuup_callback+0x2f/0xbe [<ffffffff81564b83>] notifier_call_chain+0x93/0x140 [<ffffffff81076f89>] __raw_notifier_call_chain+0x9/0x10 [<ffffffff8155719d>] _cpu_up+0xba/0x14e [<ffffffff815572ed>] cpu_up+0xbc/0x117 [<ffffffff81ae05e3>] smp_init+0x6b/0x9f [<ffffffff81ac47d6>] kernel_init+0x147/0x1dc [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10 -> #1 (cpu_hotplug.lock){+.+.+.}: [<ffffffff810ae1e2>] validate_chain+0x632/0x720 [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530 [<ffffffff810ae921>] lock_acquire+0x121/0x190 [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450 [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50 [<ffffffff81049197>] get_online_cpus+0x37/0x50 [<ffffffff810f21bb>] _rcu_barrier+0xbb/0x1e0 [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20 [<ffffffff810f2309>] rcu_barrier+0x9/0x10 [<ffffffff8118c129>] deactivate_locked_super+0x49/0x90 [<ffffffff8118cc01>] deactivate_super+0x61/0x70 [<ffffffff811aaaa7>] mntput_no_expire+0x127/0x180 [<ffffffff811ab49e>] sys_umount+0x6e/0xd0 [<ffffffff81569979>] system_call_fastpath+0x16/0x1b -> #0 (rcu_sched_state.barrier_mutex){+.+...}: [<ffffffff810adb4e>] check_prev_add+0x3de/0x440 [<ffffffff810ae1e2>] validate_chain+0x632/0x720 [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530 [<ffffffff810ae921>] lock_acquire+0x121/0x190 [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450 [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50 [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0 [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20 [<ffffffff810f2309>] rcu_barrier+0x9/0x10 [<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0 [<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack] [<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack] [<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack] [<ffffffff81454b79>] ops_exit_list+0x39/0x60 [<ffffffff814551ab>] cleanup_net+0xfb/0x1b0 [<ffffffff8106917b>] process_one_work+0x26b/0x4c0 [<ffffffff81069f3e>] worker_thread+0x12e/0x320 [<ffffffff8106f73e>] kthread+0x9e/0xb0 [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10 other info that might help us debug this: Chain exists of: rcu_sched_state.barrier_mutex --> cpu_hotplug.lock --> slab_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(slab_mutex); lock(cpu_hotplug.lock); lock(slab_mutex); lock(rcu_sched_state.barrier_mutex); *** DEADLOCK *** === [ cut here ] === This is actually a false positive. Lockdep has no way of knowing the fact that the ABBA can actually never happen, because of special semantics of cpu_hotplug.refcount and its handling in cpu_hotplug_begin(); the mutual exclusion there is not achieved through mutex, but through cpu_hotplug.refcount. The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin() until everyone who called get_online_cpus() will call put_online_cpus()" semantics is totally invisible to lockdep. This patch therefore moves the unlock of slab_mutex so that rcu_barrier() is being called with it unlocked. It has two advantages: - it slightly reduces hold time of slab_mutex; as it's used to protect the cachep list, it's not necessary to hold it over kmem_cache_free() call any more - it silences the lockdep false positive warning, as it avoids lockdep ever learning about slab_mutex -> cpu_hotplug.lock dependency Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-09-05Revert "mm/sl[aou]b: Move sysfs_slab_add to common"Pekka Enberg1-8/+0
This reverts commit 96d17b7be0a9849d381442030886211dbb2a7061 which caused the following errors at boot: [ 1.114885] kobject (ffff88001a802578): tried to init an initialized object, something is seriously wrong. [ 1.114885] Pid: 1, comm: swapper/0 Tainted: G W 3.6.0-rc1+ #6 [ 1.114885] Call Trace: [ 1.114885] [<ffffffff81273f37>] kobject_init+0x87/0xa0 [ 1.115555] [<ffffffff8127426a>] kobject_init_and_add+0x2a/0x90 [ 1.115555] [<ffffffff8127c870>] ? sprintf+0x40/0x50 [ 1.115555] [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210 [ 1.115555] [<ffffffff81100175>] kmem_cache_create+0xa5/0x250 [ 1.115555] [<ffffffff81cf24cd>] ? md_init+0x144/0x144 [ 1.115555] [<ffffffff81cf25b6>] local_init+0xa4/0x11b [ 1.115555] [<ffffffff81cf24e1>] dm_init+0x14/0x45 [ 1.115836] [<ffffffff810001ba>] do_one_initcall+0x3a/0x160 [ 1.116834] [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7 [ 1.117835] [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86 [ 1.117835] [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10 [ 1.118401] [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f [ 1.119832] [<ffffffff8171aff0>] ? gs_change+0xb/0xb [ 1.120325] ------------[ cut here ]------------ [ 1.120835] WARNING: at fs/sysfs/dir.c:536 sysfs_add_one+0xc1/0xf0() [ 1.121437] sysfs: cannot create duplicate filename '/kernel/slab/:t-0000016' [ 1.121831] Modules linked in: [ 1.122138] Pid: 1, comm: swapper/0 Tainted: G W 3.6.0-rc1+ #6 [ 1.122831] Call Trace: [ 1.123074] [<ffffffff81195ce1>] ? sysfs_add_one+0xc1/0xf0 [ 1.123833] [<ffffffff8103adfa>] warn_slowpath_common+0x7a/0xb0 [ 1.124405] [<ffffffff8103aed1>] warn_slowpath_fmt+0x41/0x50 [ 1.124832] [<ffffffff81195ce1>] sysfs_add_one+0xc1/0xf0 [ 1.125337] [<ffffffff81195eb3>] create_dir+0x73/0xd0 [ 1.125832] [<ffffffff81196221>] sysfs_create_dir+0x81/0xe0 [ 1.126363] [<ffffffff81273d3d>] kobject_add_internal+0x9d/0x210 [ 1.126832] [<ffffffff812742a3>] kobject_init_and_add+0x63/0x90 [ 1.127406] [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210 [ 1.127832] [<ffffffff81100175>] kmem_cache_create+0xa5/0x250 [ 1.128384] [<ffffffff81cf24cd>] ? md_init+0x144/0x144 [ 1.128833] [<ffffffff81cf25b6>] local_init+0xa4/0x11b [ 1.129831] [<ffffffff81cf24e1>] dm_init+0x14/0x45 [ 1.130305] [<ffffffff810001ba>] do_one_initcall+0x3a/0x160 [ 1.130831] [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7 [ 1.131351] [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86 [ 1.131830] [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10 [ 1.132392] [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f [ 1.132830] [<ffffffff8171aff0>] ? gs_change+0xb/0xb [ 1.133315] ---[ end trace 2703540871c8fab7 ]--- [ 1.133830] ------------[ cut here ]------------ [ 1.134274] WARNING: at lib/kobject.c:196 kobject_add_internal+0x1f5/0x210() [ 1.134829] kobject_add_internal failed for :t-0000016 with -EEXIST, don't try to register things with the same name in the same directory. [ 1.135829] Modules linked in: [ 1.136135] Pid: 1, comm: swapper/0 Tainted: G W 3.6.0-rc1+ #6 [ 1.136828] Call Trace: [ 1.137071] [<ffffffff81273e95>] ? kobject_add_internal+0x1f5/0x210 [ 1.137830] [<ffffffff8103adfa>] warn_slowpath_common+0x7a/0xb0 [ 1.138402] [<ffffffff8103aed1>] warn_slowpath_fmt+0x41/0x50 [ 1.138830] [<ffffffff811955a3>] ? release_sysfs_dirent+0x73/0xf0 [ 1.139419] [<ffffffff81273e95>] kobject_add_internal+0x1f5/0x210 [ 1.139830] [<ffffffff812742a3>] kobject_init_and_add+0x63/0x90 [ 1.140429] [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210 [ 1.140830] [<ffffffff81100175>] kmem_cache_create+0xa5/0x250 [ 1.141829] [<ffffffff81cf24cd>] ? md_init+0x144/0x144 [ 1.142307] [<ffffffff81cf25b6>] local_init+0xa4/0x11b [ 1.142829] [<ffffffff81cf24e1>] dm_init+0x14/0x45 [ 1.143307] [<ffffffff810001ba>] do_one_initcall+0x3a/0x160 [ 1.143829] [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7 [ 1.144352] [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86 [ 1.144829] [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10 [ 1.145405] [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f [ 1.145828] [<ffffffff8171aff0>] ? gs_change+0xb/0xb [ 1.146313] ---[ end trace 2703540871c8fab8 ]--- Conflicts: mm/slub.c Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-09-05mm/sl[aou]b: Move kmem_cache refcounting to common codeChristoph Lameter1-2/+3
Get rid of the refcount stuff in the allocators and do that part of kmem_cache management in the common code. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-09-05mm/sl[aou]b: Shrink __kmem_cache_create() parameter listsChristoph Lameter1-13/+13
Do the initial settings of the fields in common code. This will allow us to push more processing into common code later and improve readability. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-09-05mm/sl[aou]b: Move kmem_cache allocations into common codeChristoph Lameter1-8/+10
Shift the allocations to common code. That way the allocation and freeing of the kmem_cache structures is handled by common code. Reviewed-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-09-05mm/sl[aou]b: Move sysfs_slab_add to commonChristoph Lameter1-0/+8
Simplify locking by moving the slab_add_sysfs after all locks have been dropped. Eases the upcoming move to provide sysfs support for all allocators. Reviewed-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-09-05mm/sl[aou]b: Do slab aliasing call from common codeChristoph Lameter1-0/+4
The slab aliasing logic causes some strange contortions in slub. So add a call to deal with aliases to slab_common.c but disable it for other slab allocators by providng stubs that fail to create aliases. Full general support for aliases will require additional cleanup passes and more standardization of fields in kmem_cache. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-09-05mm/sl[aou]b: Move duping of slab name to slab_common.cChristoph Lameter1-9/+21
Duping of the slabname has to be done by each slab. Moving this code to slab_common avoids duplicate implementations. With this patch we have common string handling for all slab allocators. Strings passed to kmem_cache_create() are copied internally. Subsystems can create temporary strings to create slab caches. Slabs allocated in early states of bootstrap will never be freed (and those can never be freed since they are essential to slab allocator operations). During bootstrap we therefore do not have to worry about duping names. Reviewed-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-09-05mm/sl[aou]b: Get rid of __kmem_cache_destroyChristoph Lameter1-1/+0
What is done there can be done in __kmem_cache_shutdown. This affects RCU handling somewhat. On rcu free all slab allocators do not refer to other management structures than the kmem_cache structure. Therefore these other structures can be freed before the rcu deferred free to the page allocator occurs. Reviewed-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-09-05mm/sl[aou]b: Move freeing of kmem_cache structure to common codeChristoph Lameter1-0/+1
The freeing action is basically the same in all slab allocators. Move to the common kmem_cache_destroy() function. Reviewed-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-09-05mm/sl[aou]b: Use "kmem_cache" name for slab cache with kmem_cache structChristoph Lameter1-0/+1
Make all allocators use the "kmem_cache" slabname for the "kmem_cache" structure. Reviewed-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-09-05mm/sl[aou]b: Extract a common function for kmem_cache_destroyChristoph Lameter1-0/+25
kmem_cache_destroy does basically the same in all allocators. Extract common code which is easy since we already have common mutex handling. Reviewed-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-09-05mm/sl[aou]b: Move list_add() to slab_common.cChristoph Lameter1-0/+7
Move the code to append the new kmem_cache to the list of slab caches to the kmem_cache_create code in the shared code. This is possible now since the acquisition of the mutex was moved into kmem_cache_create(). Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-09-05mm/slab_common: Improve error handling in kmem_cache_createChristoph Lameter1-4/+24
Instead of using s == NULL use an errorcode. This allows much more detailed diagnostics as to what went wrong. As we add more functionality from the slab allocators to the common kmem_cache_create() function we will also add more error conditions. Print the error code during the panic as well as in a warning if the module can handle failure. The API for kmem_cache_create() currently does not allow the returning of an error code. Return NULL but log the cause of the problem in the syslog. Reviewed-by: Glauber Costa <glommer@parallels.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-08-16mm/slab: restructure kmem_cache_create() debug checksShuah Khan1-49/+48
kmem_cache_create() does cache integrity checks when CONFIG_DEBUG_VM is defined. These checks interspersed with the regular code path has lead to compile time warnings when compiled without CONFIG_DEBUG_VM defined. Restructuring the code to move the integrity checks in to a new function would eliminate the current compile warning problem and also will allow for future changes to the debug only code to evolve without introducing new warnings in the regular path. This restructuring work is based on the discussion in the following thread: https://lkml.org/lkml/2012/7/13/424 [akpm@linux-foundation.org: fix build, cleanup] Signed-off-by: Shuah Khan <shuah.khan@hp.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-08-16Revert "mm/slab_common.c: cleanup"Pekka Enberg1-5/+10
This reverts commit 455ce9eb1cfa083da0def023094190aeb133855a. Andrew sent a better version. Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-08-16mm/slab_common.c: cleanupAndrew Morton1-10/+5
Eliminate an ifdef and a label by moving all the CONFIG_DEBUG_VM checking inside the locked region. Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-07-30mm: Fix build warning in kmem_cache_create()Shuah Khan1-0/+2
The label oops is used in CONFIG_DEBUG_VM ifdef block and is defined outside ifdef CONFIG_DEBUG_VM block. This results in the following build warning when built with CONFIG_DEBUG_VM disabled. Fix to move label oops definition to inside a CONFIG_DEBUG_VM block. mm/slab_common.c: In function ‘kmem_cache_create’: mm/slab_common.c:101:1: warning: label ‘oops’ defined but not used [-Wunused-label] Signed-off-by: Shuah Khan <shuah.khan@hp.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-07-09mm, sl[aou]b: Move kmem_cache_create mutex handling to common codeChristoph Lameter1-1/+40
Move the mutex handling into the common kmem_cache_create() function. Then we can also move more checks out of SLAB's kmem_cache_create() into the common code. Reviewed-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-07-09mm, sl[aou]b: Use a common mutex definitionChristoph Lameter1-0/+2
Use the mutex definition from SLAB and make it the common way to take a sleeping lock. This has the effect of using a mutex instead of a rw semaphore for SLUB. SLOB gains the use of a mutex for kmem_cache_create serialization. Not needed now but SLOB may acquire some more features later (like slabinfo / sysfs support) through the expansion of the common code that will need this. Reviewed-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-07-09mm, sl[aou]b: Common definition for boot state of the slab allocatorsChristoph Lameter1-0/+9
All allocators have some sort of support for the bootstrap status. Setup a common definition for the boot states and make all slab allocators use that definition. Reviewed-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2012-07-09mm, sl[aou]b: Extract common code for kmem_cache_create()Christoph Lameter1-0/+68
Kmem_cache_create() does a variety of sanity checks but those vary depending on the allocator. Use the strictest tests and put them into a slab_common file. Make the tests conditional on CONFIG_DEBUG_VM. This patch has the effect of adding sanity checks for SLUB and SLOB under CONFIG_DEBUG_VM and removes the checks in SLAB for !CONFIG_DEBUG_VM. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>