summaryrefslogtreecommitdiff
path: root/mm/slab_common.c
AgeCommit message (Collapse)AuthorFilesLines
2015-02-13slab: link memcg caches of the same kind into a listVladimir Davydov1-11/+10
Sometimes, we need to iterate over all memcg copies of a particular root kmem cache. Currently, we use memcg_cache_params->memcg_caches array for that, because it contains all existing memcg caches. However, it's a bad practice to keep all caches, including those that belong to offline cgroups, in this array, because it will be growing beyond any bounds then. I'm going to wipe away dead caches from it to save space. To still be able to perform iterations over all memcg caches of the same kind, let us link them into a list. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13slab: embed memcg_cache_params to kmem_cacheVladimir Davydov1-58/+71
Currently, kmem_cache stores a pointer to struct memcg_cache_params instead of embedding it. The rationale is to save memory when kmem accounting is disabled. However, the memcg_cache_params has shrivelled drastically since it was first introduced: * Initially: struct memcg_cache_params { bool is_root_cache; union { struct kmem_cache *memcg_caches[0]; struct { struct mem_cgroup *memcg; struct list_head list; struct kmem_cache *root_cache; bool dead; atomic_t nr_pages; struct work_struct destroy; }; }; }; * Now: struct memcg_cache_params { bool is_root_cache; union { struct { struct rcu_head rcu_head; struct kmem_cache *memcg_caches[0]; }; struct { struct mem_cgroup *memcg; struct kmem_cache *root_cache; }; }; }; So the memory saving does not seem to be a clear win anymore. OTOH, keeping a pointer to memcg_cache_params struct instead of embedding it results in touching one more cache line on kmem alloc/free hot paths. Besides, it makes linking kmem caches in a list chained by a field of struct memcg_cache_params really painful due to a level of indirection, while I want to make them linked in the following patch. That said, let us embed it. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13memcg: add rwsem to synchronize against memcg_caches arrays relocationVladimir Davydov1-5/+4
We need a stable value of memcg_nr_cache_ids in kmem_cache_create() (memcg_alloc_cache_params() wants it for root caches), where we only hold the slab_mutex and no memcg-related locks. As a result, we have to update memcg_nr_cache_ids under the slab_mutex, which we can only take on the slab's side (see memcg_update_array_size). This looks awkward and will become even worse when per-memcg list_lru is introduced, which also wants stable access to memcg_nr_cache_ids. To get rid of this dependency between the memcg_nr_cache_ids and the slab_mutex, this patch introduces a special rwsem. The rwsem is held for writing during memcg_caches arrays relocation and memcg_nr_cache_ids updates. Therefore one can take it for reading to get a stable access to memcg_caches arrays and/or memcg_nr_cache_ids. Currently the semaphore is taken for reading only from kmem_cache_create, right before taking the slab_mutex, so right now there's no much point in using rwsem instead of mutex. However, once list_lru is made per-memcg it will allow list_lru initializations to proceed concurrently. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Greg Thelen <gthelen@google.com> Cc: Glauber Costa <glommer@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13memcg: rename some cache id related variablesVladimir Davydov1-2/+2
memcg_limited_groups_array_size, which defines the size of memcg_caches arrays, sounds rather cumbersome. Also it doesn't point anyhow that it's related to kmem/caches stuff. So let's rename it to memcg_nr_cache_ids. It's concise and points us directly to memcg_cache_id. Also, rename kmem_limited_groups to memcg_cache_ida. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Greg Thelen <gthelen@google.com> Cc: Glauber Costa <glommer@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11memcg: zap memcg_slab_caches and memcg_slab_mutexVladimir Davydov1-39/+103
mem_cgroup->memcg_slab_caches is a list of kmem caches corresponding to the given cgroup. Currently, it is only used on css free in order to destroy all caches corresponding to the memory cgroup being freed. The list is protected by memcg_slab_mutex. The mutex is also used to protect kmem_cache->memcg_params->memcg_caches arrays and synchronizes kmem_cache_destroy vs memcg_unregister_all_caches. However, we can perfectly get on without these two. To destroy all caches corresponding to a memory cgroup, we can walk over the global list of kmem caches, slab_caches, and we can do all the synchronization stuff using the slab_mutex instead of the memcg_slab_mutex. This patch therefore gets rid of the memcg_slab_caches and memcg_slab_mutex. Apart from this nice cleanup, it also: - assures that rcu_barrier() is called once at max when a root cache is destroyed or a memory cgroup is freed, no matter how many caches have SLAB_DESTROY_BY_RCU flag set; - fixes the race between kmem_cache_destroy and kmem_cache_create that exists, because memcg_cleanup_cache_params, which is called from kmem_cache_destroy after checking that kmem_cache->refcount=0, releases the slab_mutex, which gives kmem_cache_create a chance to make an alias to a cache doomed to be destroyed. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11memcg: zap memcg_name argument of memcg_create_kmem_cacheVladimir Davydov1-4/+5
Instead of passing the name of the memory cgroup which the cache is created for in the memcg_name_argument, let's obtain it immediately in memcg_create_kmem_cache. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11mm/slab_common.c: use kmem_cache_free()Vaishali Thakkar1-1/+1
Here, free memory is allocated using kmem_cache_zalloc. So, use kmem_cache_free instead of kfree. This is done using Coccinelle and semantic patch used is as follows: @@ expression x,E,c; @@ x = \(kmem_cache_alloc\|kmem_cache_zalloc\|kmem_cache_alloc_node\)(c,...) ... when != x = E when != &x ?-kfree(x) +kmem_cache_free(c,x) Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11memcg: use generic slab iterators for showing slabinfoVladimir Davydov1-6/+19
Let's use generic slab_start/next/stop for showing memcg caches info. In contrast to the current implementation, this will work even if all memcg caches' info doesn't fit into a seq buffer (a page), plus it simply looks neater. Actually, the main reason I do this isn't mere cleanup. I'm going to zap the memcg_slab_caches list, because I find it useless provided we have the slab_caches list, and this patch is a step in this direction. It should be noted that before this patch an attempt to read memory.kmem.slabinfo of a cgroup that doesn't have kmem limit set resulted in -EIO, while after this patch it will silently show nothing except the header, but I don't think it will frustrate anyone. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11mm/slab: reverse iteration on find_mergeable()Joonsoo Kim1-1/+1
Unlike SLUB, sometimes, object isn't started at the beginning of the slab in the SLAB. This causes the unalignment problem when after slab merging is supported by commit 12220dea07f1 ("mm/slab: support slab merge"). Alignment mismatch check is introduced ("mm/slab: fix unalignment problem on Malta with EVA due to slab merge") to prevent merge in this case. This causes undesirable result that merging happens between infrequently used kmem_caches if there are kmem_caches with same size and is 256 bytes, are merged into pool_workqueue rather than kmalloc-256, because kmem_caches for kmalloc are at the tail of the list. To prevent this situation, this patch reverses iteration order in find_mergeable() to find frequently used kmem_caches. This change helps to merge kmem_cache to frequently used kmem_caches, such as kmalloc kmem_caches. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11slab: print slabinfo header in seq showVladimir Davydov1-9/+6
Currently we print the slabinfo header in the seq start method, which makes it unusable for showing leaks, so we have leaks_show, which does practically the same as s_show except it doesn't show the header. However, we can print the header in the seq show method - we only need to check if the current element is the first on the list. This will allow us to use the same set of seq iterators for both leaks and slabinfo reporting, which is nice. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-14mm/slab: fix unalignment problem on Malta with EVA due to slab mergeJoonsoo Kim1-0/+4
Unlike SLUB, sometimes, object isn't started at the beginning of the slab in SLAB. This causes the unalignment problem after slab merging is supported by commit 12220dea07f1 ("mm/slab: support slab merge"). Following is the report from Markos that fail to boot on Malta with EVA. Calibrating delay loop... 19.86 BogoMIPS (lpj=99328) pid_max: default: 32768 minimum: 301 Mount-cache hash table entries: 4096 (order: 0, 16384 bytes) Mountpoint-cache hash table entries: 4096 (order: 0, 16384 bytes) Kernel bug detected[#1]: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.0-05639-g12220dea07f1 #1631 task: 1f04f5d8 ti: 1f050000 task.ti: 1f050000 epc : 80141190 alloc_unbound_pwq+0x234/0x304 Not tainted ra : 80141184 alloc_unbound_pwq+0x228/0x304 Process swapper/0 (pid: 1, threadinfo=1f050000, task=1f04f5d8, tls=00000000) Call Trace: alloc_unbound_pwq+0x234/0x304 apply_workqueue_attrs+0x11c/0x294 __alloc_workqueue_key+0x23c/0x470 init_workqueues+0x320/0x400 do_one_initcall+0xe8/0x23c kernel_init_freeable+0x9c/0x224 kernel_init+0x10/0x100 ret_from_kernel_thread+0x14/0x1c [ end trace cb88537fdc8fa200 ] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b alloc_unbound_pwq() allocates slab object from pool_workqueue. This kmem_cache requires 256 bytes alignment, but, current merging code doesn't honor that, and merge it with kmalloc-256. kmalloc-256 requires only cacheline size alignment so that above failure occurs. However, in x86, kmalloc-256 is luckily aligned in 256 bytes, so the problem didn't happen on it. To fix this problem, this patch introduces alignment mismatch check in find_mergeable(). This will fix the problem. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reported-by: Markos Chandras <Markos.Chandras@imgtec.com> Tested-by: Markos Chandras <Markos.Chandras@imgtec.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-30mm/slab_common: don't check for duplicate cache namesMikulas Patocka1-10/+0
The SLUB cache merges caches with the same size and alignment and there was long standing bug with this behavior: - create the cache named "foo" - create the cache named "bar" (which is merged with "foo") - delete the cache named "foo" (but it stays allocated because "bar" uses it) - create the cache named "foo" again - it fails because the name "foo" is already used That bug was fixed in commit 694617474e33 ("slab_common: fix the check for duplicate slab names") by not warning on duplicate cache names when the SLUB subsystem is used. Recently, cache merging was implemented the with SLAB subsystem too, in 12220dea07f1 ("mm/slab: support slab merge")). Therefore we need stop checking for duplicate names even for the SLAB subsystem. This patch fixes the bug by removing the check. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-10memcg: move memcg_update_cache_size() to slab_common.cVladimir Davydov1-2/+28
`While growing per memcg caches arrays, we jump between memcontrol.c and slab_common.c in a weird way: memcg_alloc_cache_id - memcontrol.c memcg_update_all_caches - slab_common.c memcg_update_cache_size - memcontrol.c There's absolutely no reason why memcg_update_cache_size can't live on the slab's side though. So let's move it there and settle it comfortably amid per-memcg cache allocation functions. Besides, this patch cleans this function up a bit, removing all the useless comments from it, and renames it to memcg_update_cache_params to conform to memcg_alloc/free_cache_params, which we already have in slab_common.c. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Glauber Costa <glommer@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-10memcg: move memcg_{alloc,free}_cache_params to slab_common.cVladimir Davydov1-1/+43
The only reason why they live in memcontrol.c is that we get/put css reference to the owner memory cgroup in them. However, we can do that in memcg_{un,}register_cache. OTOH, there are several reasons to move them to slab_common.c. First, I think that the less public interface functions we have in memcontrol.h the better. Since the functions I move don't depend on memcontrol, I think it's worth making them private to slab, especially taking into account that the arrays are defined on the slab's side too. Second, the way how per-memcg arrays are updated looks rather awkward: it proceeds from memcontrol.c (__memcg_activate_kmem) to slab_common.c (memcg_update_all_caches) and back to memcontrol.c again (memcg_update_array_size). In the following patches I move the function relocating the arrays (memcg_update_array_size) to slab_common.c and therefore get rid this circular call path. I think we should have the cache allocation stuff in the same place where we have relocation, because it's easier to follow the code then. So I move arrays alloc/free functions to slab_common.c too. The third point isn't obvious. I'm going to make the list_lru structure per-memcg to allow targeted kmem reclaim. That means we will have per-memcg arrays in list_lrus too. It turns out that it's much easier to update these arrays in list_lru.c rather than in memcontrol.c, because all the stuff we need is defined there. This patch makes memcg caches arrays allocation path conform that of the upcoming list_lru. So let's move these functions to slab_common.c and make them static. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Glauber Costa <glommer@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-10mm/slab_common: commonize slab merge logicJoonsoo Kim1-0/+91
Slab merge is good feature to reduce fragmentation. Now, it is only applied to SLUB, but, it would be good to apply it to SLAB. This patch is preparation step to apply slab merge to SLAB by commonizing slab merge logic. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-10mm/slab_common: move kmem_cache definition to internal headerJoonsoo Kim1-0/+9
We don't need to keep kmem_cache definition in include/linux/slab.h if we don't need to inline kmem_cache_size(). According to my code inspection, this function is only called at lc_create() in lib/lru_cache.c which may be called at initialization phase of something, so we don't need to inline it. Therfore, move it to slab_common.c and move kmem_cache definition to internal header. After this change, we can change kmem_cache definition easily without full kernel build. For instance, we can turn on/off CONFIG_SLUB_STATS without full kernel build. [akpm@linux-foundation.org: export kmem_cache_size() to modules] [rdunlap@infradead.org: add header files to fix kmemcheck.c build errors] Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-10mm/slab_common.c: suppress warningAndrew Morton1-1/+3
False positive: mm/slab_common.c: In function 'kmem_cache_create': mm/slab_common.c:204: warning: 's' may be used uninitialized in this function Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-07mm: move slab related stuff from util.c to slab_common.cAndrey Ryabinin1-0/+101
Functions krealloc(), __krealloc(), kzfree() belongs to slab API, so should be placed in slab_common.c Also move slab allocator's tracepoints defenitions to slab_common.c No functional changes here. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-23Merge branch 'slab/urgent' of ↵Mike Snitzer1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux into for-3.16-rcX
2014-06-05slab: delete cache from list after __kmem_cache_shutdown succeedsVladimir Davydov1-2/+2
Currently, on kmem_cache_destroy we delete the cache from the slab_list before __kmem_cache_shutdown, inserting it back to the list on failure. Initially, this was done, because we could release the slab_mutex in __kmem_cache_shutdown to delete sysfs slub entry, but since commit 41a212859a4d ("slub: use sysfs'es release mechanism for kmem_cache") we remove sysfs entry later in kmem_cache_destroy after dropping the slab_mutex, so that no implementation of __kmem_cache_shutdown can ever release the lock. Therefore we can simplify the code a bit by moving list_del after __kmem_cache_shutdown. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05memcg: cleanup kmem cache creation/destruction functions namingVladimir Davydov1-6/+6
Current names are rather inconsistent. Let's try to improve them. Brief change log: ** old name ** ** new name ** kmem_cache_create_memcg memcg_create_kmem_cache memcg_kmem_create_cache memcg_regsiter_cache memcg_kmem_destroy_cache memcg_unregister_cache kmem_cache_destroy_memcg_children memcg_cleanup_cache_params mem_cgroup_destroy_all_caches memcg_unregister_all_caches create_work memcg_register_cache_work memcg_create_cache_work_func memcg_register_cache_func memcg_create_cache_enqueue memcg_schedule_register_cache Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05memcg: get rid of memcg_create_cache_nameVladimir Davydov1-2/+5
Instead of calling back to memcontrol.c from kmem_cache_create_memcg in order to just create the name of a per memcg cache, let's allocate it in place. We only need to pass the memcg name to kmem_cache_create_memcg for that - everything else can be done in slab_common.c. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05memcg, slab: simplify synchronization schemeVladimir Davydov1-15/+8
At present, we have the following mutexes protecting data related to per memcg kmem caches: - slab_mutex. This one is held during the whole kmem cache creation and destruction paths. We also take it when updating per root cache memcg_caches arrays (see memcg_update_all_caches). As a result, taking it guarantees there will be no changes to any kmem cache (including per memcg). Why do we need something else then? The point is it is private to slab implementation and has some internal dependencies with other mutexes (get_online_cpus). So we just don't want to rely upon it and prefer to introduce additional mutexes instead. - activate_kmem_mutex. Initially it was added to synchronize initializing kmem limit (memcg_activate_kmem). However, since we can grow per root cache memcg_caches arrays only on kmem limit initialization (see memcg_update_all_caches), we also employ it to protect against memcg_caches arrays relocation (e.g. see __kmem_cache_destroy_memcg_children). - We have a convention not to take slab_mutex in memcontrol.c, but we want to walk over per memcg memcg_slab_caches lists there (e.g. for destroying all memcg caches on offline). So we have per memcg slab_caches_mutex's protecting those lists. The mutexes are taken in the following order: activate_kmem_mutex -> slab_mutex -> memcg::slab_caches_mutex Such a syncrhonization scheme has a number of flaws, for instance: - We can't call kmem_cache_{destroy,shrink} while walking over a memcg::memcg_slab_caches list due to locking order. As a result, in mem_cgroup_destroy_all_caches we schedule the memcg_cache_params::destroy work shrinking and destroying the cache. - We don't have a mutex to synchronize per memcg caches destruction between memcg offline (mem_cgroup_destroy_all_caches) and root cache destruction (__kmem_cache_destroy_memcg_children). Currently we just don't bother about it. This patch simplifies it by substituting per memcg slab_caches_mutex's with the global memcg_slab_mutex. It will be held whenever a new per memcg cache is created or destroyed, so it protects per root cache memcg_caches arrays and per memcg memcg_slab_caches lists. The locking order is following: activate_kmem_mutex -> memcg_slab_mutex -> slab_mutex This allows us to call kmem_cache_{create,shrink,destroy} under the memcg_slab_mutex. As a result, we don't need memcg_cache_params::destroy work any more - we can simply destroy caches while iterating over a per memcg slab caches list. Also using the global mutex simplifies synchronization between concurrent per memcg caches creation/destruction, e.g. mem_cgroup_destroy_all_caches vs __kmem_cache_destroy_memcg_children. The downside of this is that we substitute per-memcg slab_caches_mutex's with a hummer-like global mutex, but since we already take either the slab_mutex or the cgroup_mutex along with a memcg::slab_caches_mutex, it shouldn't hurt concurrency a lot. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@gmail.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05slab: get_online_mems for kmem_cache_{create,destroy,shrink}Vladimir Davydov1-2/+33
When we create a sl[au]b cache, we allocate kmem_cache_node structures for each online NUMA node. To handle nodes taken online/offline, we register memory hotplug notifier and allocate/free kmem_cache_node corresponding to the node that changes its state for each kmem cache. To synchronize between the two paths we hold the slab_mutex during both the cache creationg/destruction path and while tuning per-node parts of kmem caches in memory hotplug handler, but that's not quite right, because it does not guarantee that a newly created cache will have all kmem_cache_nodes initialized in case it races with memory hotplug. For instance, in case of slub: CPU0 CPU1 ---- ---- kmem_cache_create: online_pages: __kmem_cache_create: slab_memory_callback: slab_mem_going_online_callback: lock slab_mutex for each slab_caches list entry allocate kmem_cache node unlock slab_mutex lock slab_mutex init_kmem_cache_nodes: for_each_node_state(node, N_NORMAL_MEMORY) allocate kmem_cache node add kmem_cache to slab_caches list unlock slab_mutex online_pages (continued): node_states_set_node As a result we'll get a kmem cache with not all kmem_cache_nodes allocated. To avoid issues like that we should hold get/put_online_mems() during the whole kmem cache creation/destruction/shrink paths, just like we deal with cpu hotplug. This patch does the trick. Note, that after it's applied, there is no need in taking the slab_mutex for kmem_cache_shrink any more, so it is removed from there. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05slab: document kmalloc_orderVladimir Davydov1-0/+5
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05mm: get rid of __GFP_KMEMCGVladimir Davydov1-0/+13
Currently to allocate a page that should be charged to kmemcg (e.g. threadinfo), we pass __GFP_KMEMCG flag to the page allocator. The page allocated is then to be freed by free_memcg_kmem_pages. Apart from looking asymmetrical, this also requires intrusion to the general allocation path. So let's introduce separate functions that will alloc/free pages charged to kmemcg. The new functions are called alloc_kmem_pages and free_kmem_pages. They should be used when the caller actually would like to use kmalloc, but has to fall back to the page allocator for the allocation is large. They only differ from alloc_pages and free_pages in that besides allocating or freeing pages they also charge them to the kmem resource counter of the current memory cgroup. [sfr@canb.auug.org.au: export kmalloc_order() to modules] Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05sl[au]b: charge slabs to kmemcg explicitlyVladimir Davydov1-5/+1
We have only a few places where we actually want to charge kmem so instead of intruding into the general page allocation path with __GFP_KMEMCG it's better to explictly charge kmem there. All kmem charges will be easier to follow that way. This is a step towards removing __GFP_KMEMCG. It removes __GFP_KMEMCG from memcg caches' allocflags. Instead it makes slab allocation path call memcg_charge_kmem directly getting memcg to charge from the cache's memcg params. This also eliminates any possibility of misaccounting an allocation going from one memcg's cache to another memcg, because now we always charge slabs against the memcg the cache belongs to. That's why this patch removes the big comment to memcg_kmem_get_cache. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-05-24slab_common: fix the check for duplicate slab namesMikulas Patocka1-1/+1
The patch 3e374919b314f20e2a04f641ebc1093d758f66a4 is supposed to fix the problem where kmem_cache_create incorrectly reports duplicate cache name and fails. The problem is described in the header of that patch. However, the patch doesn't really fix the problem because of these reasons: * the logic to test for debugging is reversed. It was intended to perform the check only if slub debugging is enabled (which implies that caches with the same parameters are not merged). Therefore, there should be #if !defined(CONFIG_SLUB) || defined(CONFIG_SLUB_DEBUG_ON) The current code has the condition reversed and performs the test if debugging is disabled. * slub debugging may be enabled or disabled based on kernel command line, CONFIG_SLUB_DEBUG_ON is just the default settings. Therefore the test based on definition of CONFIG_SLUB_DEBUG_ON is unreliable. This patch fixes the problem by removing the test "!defined(CONFIG_SLUB_DEBUG_ON)". Therefore, duplicate names are never checked if the SLUB allocator is used. Note to stable kernel maintainers: when backporint this patch, please backport also the patch 3e374919b314f20e2a04f641ebc1093d758f66a4. Acked-by: David Rientjes <rientjes@google.com> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org # 3.6+ Signed-off-by: Pekka Enberg <penberg@kernel.org>
2014-05-07slub: use sysfs'es release mechanism for kmem_cacheChristoph Lameter1-2/+11
debugobjects warning during netfilter exit: ------------[ cut here ]------------ WARNING: CPU: 6 PID: 4178 at lib/debugobjects.c:260 debug_print_object+0x8d/0xb0() ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20 Modules linked in: CPU: 6 PID: 4178 Comm: kworker/u16:2 Tainted: G W 3.11.0-next-20130906-sasha #3984 Workqueue: netns cleanup_net Call Trace: dump_stack+0x52/0x87 warn_slowpath_common+0x8c/0xc0 warn_slowpath_fmt+0x46/0x50 debug_print_object+0x8d/0xb0 __debug_check_no_obj_freed+0xa5/0x220 debug_check_no_obj_freed+0x15/0x20 kmem_cache_free+0x197/0x340 kmem_cache_destroy+0x86/0xe0 nf_conntrack_cleanup_net_list+0x131/0x170 nf_conntrack_pernet_exit+0x5d/0x70 ops_exit_list+0x5e/0x70 cleanup_net+0xfb/0x1c0 process_one_work+0x338/0x550 worker_thread+0x215/0x350 kthread+0xe7/0xf0 ret_from_fork+0x7c/0xb0 Also during dcookie cleanup: WARNING: CPU: 12 PID: 9725 at lib/debugobjects.c:260 debug_print_object+0x8c/0xb0() ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20 Modules linked in: CPU: 12 PID: 9725 Comm: trinity-c141 Not tainted 3.15.0-rc2-next-20140423-sasha-00018-gc4ff6c4 #408 Call Trace: dump_stack (lib/dump_stack.c:52) warn_slowpath_common (kernel/panic.c:430) warn_slowpath_fmt (kernel/panic.c:445) debug_print_object (lib/debugobjects.c:262) __debug_check_no_obj_freed (lib/debugobjects.c:697) debug_check_no_obj_freed (lib/debugobjects.c:726) kmem_cache_free (mm/slub.c:2689 mm/slub.c:2717) kmem_cache_destroy (mm/slab_common.c:363) dcookie_unregister (fs/dcookies.c:302 fs/dcookies.c:343) event_buffer_release (arch/x86/oprofile/../../../drivers/oprofile/event_buffer.c:153) __fput (fs/file_table.c:217) ____fput (fs/file_table.c:253) task_work_run (kernel/task_work.c:125 (discriminator 1)) do_notify_resume (include/linux/tracehook.h:196 arch/x86/kernel/signal.c:751) int_signal (arch/x86/kernel/entry_64.S:807) Sysfs has a release mechanism. Use that to release the kmem_cache structure if CONFIG_SYSFS is enabled. Only slub is changed - slab currently only supports /proc/slabinfo and not /sys/kernel/slab/*. We talked about adding that and someone was working on it. [akpm@linux-foundation.org: fix CONFIG_SYSFS=n build] [akpm@linux-foundation.org: fix CONFIG_SYSFS=n build even more] Signed-off-by: Christoph Lameter <cl@linux.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Greg KH <greg@kroah.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Pekka Enberg <penberg@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08memcg, slab: do not destroy children caches if parent has aliasesVladimir Davydov1-25/+50
Currently we destroy children caches at the very beginning of kmem_cache_destroy(). This is wrong, because the root cache will not necessarily be destroyed in the end - if it has aliases (refcount > 0), kmem_cache_destroy() will simply decrement its refcount and return. In this case, at best we will get a bunch of warnings in dmesg, like this one: kmem_cache_destroy kmalloc-32:0: Slab cache still has objects CPU: 1 PID: 7139 Comm: modprobe Tainted: G B W 3.13.0+ #117 Call Trace: dump_stack+0x49/0x5b kmem_cache_destroy+0xdf/0xf0 kmem_cache_destroy_memcg_children+0x97/0xc0 kmem_cache_destroy+0xf/0xf0 xfs_mru_cache_uninit+0x21/0x30 [xfs] exit_xfs_fs+0x2e/0xc44 [xfs] SyS_delete_module+0x198/0x1f0 system_call_fastpath+0x16/0x1b At worst - if kmem_cache_destroy() will race with an allocation from a memcg cache - the kernel will panic. This patch fixes this by moving children caches destruction after the check if the cache has aliases. Plus, it forbids destroying a root cache if it still has children caches, because each children cache keeps a reference to its parent. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Glauber Costa <glommer@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08memcg, slab: unregister cache from memcg before starting to destroy itVladimir Davydov1-1/+2
Currently, memcg_unregister_cache(), which deletes the cache being destroyed from the memcg_slab_caches list, is called after __kmem_cache_shutdown() (see kmem_cache_destroy()), which starts to destroy the cache. As a result, one can access a partially destroyed cache while traversing a memcg_slab_caches list, which can have deadly consequences (for instance, cache_show() called for each cache on a memcg_slab_caches list from mem_cgroup_slabinfo_read() will dereference pointers to already freed data). To fix this, let's move memcg_unregister_cache() before the cache destruction process beginning, issuing memcg_register_cache() on failure. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Glauber Costa <glommer@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08memcg, slab: separate memcg vs root cache creation pathsVladimir Davydov1-80/+107
Memcg-awareness turned kmem_cache_create() into a dirty interweaving of memcg-only and except-for-memcg calls. To clean this up, let's move the code responsible for memcg cache creation to a separate function. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Glauber Costa <glommer@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08memcg, slab: cleanup memcg cache creationVladimir Davydov1-1/+4
This patch cleans up the memcg cache creation path as follows: - Move memcg cache name creation to a separate function to be called from kmem_cache_create_memcg(). This allows us to get rid of the mutex protecting the temporary buffer used for the name formatting, because the whole cache creation path is protected by the slab_mutex. - Get rid of memcg_create_kmem_cache(). This function serves as a proxy to kmem_cache_create_memcg(). After separating the cache name creation path, it would be reduced to a function call, so let's inline it. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Glauber Costa <glommer@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-08memcg, slab: never try to merge memcg cachesVladimir Davydov1-3/+5
When a kmem cache is created (kmem_cache_create_memcg()), we first try to find a compatible cache that already exists and can handle requests from the new cache, i.e. has the same object size, alignment, ctor, etc. If there is such a cache, we do not create any new caches, instead we simply increment the refcount of the cache found and return it. Currently we do this procedure not only when creating root caches, but also for memcg caches. However, there is no point in that, because, as every memcg cache has exactly the same parameters as its parent and cache merging cannot be turned off in runtime (only on boot by passing "slub_nomerge"), the root caches of any two potentially mergeable memcg caches should be merged already, i.e. it must be the same root cache, and therefore we couldn't even get to the memcg cache creation, because it already exists. The only exception is boot caches - they are explicitly forbidden to be merged by setting their refcount to -1. There are currently only two of them - kmem_cache and kmem_cache_node, which are used in slab internals (I do not count kmalloc caches as their refcount is set to 1 immediately after creation). Since they are prevented from merging preliminary I guess we should avoid to merge their children too. So let's remove the useless code responsible for merging memcg caches. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Glauber Costa <glommer@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-30slab: fix wrong retval on kmem_cache_create_memcg error pathDave Jones1-8/+11
On kmem_cache_create_memcg() error path we set 'err', but leave 's' (the new cache ptr) undefined. The latter can be NULL if we could not allocate the cache, or pointing to a freed area if we failed somewhere later while trying to initialize it. Initially we checked 'err' immediately before exiting the function and returned NULL if it was set ignoring the value of 's': out_unlock: ... if (err) { /* report error */ return NULL; } return s; Recently this check was, in fact, broken by commit f717eb3abb5e ("slab: do not panic if we fail to create memcg cache"), which turned it to: out_unlock: ... if (err && !memcg) { /* report error */ return NULL; } return s; As a result, if we are failing creating a cache for a memcg, we will skip the check and return 's' that can contain crap. Obviously, commit f717eb3abb5e intended not to return crap on error allocating a cache for a memcg, but only to remove the error reporting in this case, so the check should look like this: out_unlock: ... if (err) { if (!memcg) return NULL; /* report error */ return NULL; } return s; [rientjes@google.com: despaghettification] [vdavydov@parallels.com: patch monkeying] Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Dave Jones <davej@redhat.com> Reported-by: Dave Jones <davej@redhat.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-24slab: do not panic if we fail to create memcg cacheVladimir Davydov1-1/+8
There is no point in flooding logs with warnings or especially crashing the system if we fail to create a cache for a memcg. In this case we will be accounting the memcg allocation to the root cgroup until we succeed to create its own cache, but it isn't that critical. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-24memcg, slab: fix races in per-memcg cache creation/destructionVladimir Davydov1-1/+13
We obtain a per-memcg cache from a root kmem_cache by dereferencing an entry of the root cache's memcg_params::memcg_caches array. If we find no cache for a memcg there on allocation, we initiate the memcg cache creation (see memcg_kmem_get_cache()). The cache creation proceeds asynchronously in memcg_create_kmem_cache() in order to avoid lock clashes, so there can be several threads trying to create the same kmem_cache concurrently, but only one of them may succeed. However, due to a race in the code, it is not always true. The point is that the memcg_caches array can be relocated when we activate kmem accounting for a memcg (see memcg_update_all_caches(), memcg_update_cache_size()). If memcg_update_cache_size() and memcg_create_kmem_cache() proceed concurrently as described below, we can leak a kmem_cache. Asume two threads schedule creation of the same kmem_cache. One of them successfully creates it. Another one should fail then, but if memcg_create_kmem_cache() interleaves with memcg_update_cache_size() as follows, it won't: memcg_create_kmem_cache() memcg_update_cache_size() (called w/o mutexes held) (called with slab_mutex, set_limit_mutex held) ------------------------- ------------------------- mutex_lock(&memcg_cache_mutex) s->memcg_params=kzalloc(...) new_cachep=cache_from_memcg_idx(cachep,idx) // new_cachep==NULL => proceed to creation s->memcg_params->memcg_caches[i] =cur_params->memcg_caches[i] // kmem_cache_create_memcg takes slab_mutex // so we will hang around until // memcg_update_cache_size finishes, but // nothing will prevent it from succeeding so // memcg_caches[idx] will be overwritten in // memcg_register_cache! new_cachep = kmem_cache_create_memcg(...) mutex_unlock(&memcg_cache_mutex) Let's fix this by moving the check for existence of the memcg cache to kmem_cache_create_memcg() to be called under the slab_mutex and make it return NULL if so. A similar race is possible when destroying a memcg cache (see kmem_cache_destroy()). Since memcg_unregister_cache(), which clears the pointer in the memcg_caches array, is called w/o protection, we can race with memcg_update_cache_size() and omit clearing the pointer. Therefore memcg_unregister_cache() should be moved before we release the slab_mutex. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-24memcg, slab: clean up memcg cache initialization/destructionVladimir Davydov1-2/+3
Currently, we have rather a messy function set relating to per-memcg kmem cache initialization/destruction. Per-memcg caches are created in memcg_create_kmem_cache(). This function calls kmem_cache_create_memcg() to allocate and initialize a kmem cache and then "registers" the new cache in the memcg_params::memcg_caches array of the parent cache. During its work-flow, kmem_cache_create_memcg() executes the following memcg-related functions: - memcg_alloc_cache_params(), to initialize memcg_params of the newly created cache; - memcg_cache_list_add(), to add the new cache to the memcg_slab_caches list. On the other hand, kmem_cache_destroy() called on a cache destruction only calls memcg_release_cache(), which does all the work: it cleans the reference to the cache in its parent's memcg_params::memcg_caches, removes the cache from the memcg_slab_caches list, and frees memcg_params. Such an inconsistency between destruction and initialization paths make the code difficult to read, so let's clean this up a bit. This patch moves all the code relating to registration of per-memcg caches (adding to memcg list, setting the pointer to a cache from its parent) to the newly created memcg_register_cache() and memcg_unregister_cache() functions making the initialization and destruction paths look symmetrical. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-24memcg, slab: kmem_cache_create_memcg(): fix memleak on fail pathVladimir Davydov1-1/+2
We do not free the cache's memcg_params if __kmem_cache_create fails. Fix this. Plus, rename memcg_register_cache() to memcg_alloc_cache_params(), because it actually does not register the cache anywhere, but simply initialize kmem_cache::memcg_params. [akpm@linux-foundation.org: fix build] Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-24slab: clean up kmem_cache_create_memcg() error handlingVladimir Davydov1-34/+31
Currently kmem_cache_create_memcg() backoffs on failure inside conditionals, without using gotos. This results in the rollback code duplication, which makes the function look cumbersome even though on error we should only free the allocated cache. Since in the next patch I am going to add yet another rollback function call on error path there, let's employ labels instead of conditionals for undoing any changes on failure to keep things clean. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13memcg, kmem: rename cache_from_memcg to cache_from_memcg_idxQiang Huang1-1/+1
We can't see the relationship with memcg from the parameters, so the name with memcg_idx would be more reasonable. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-28slab_common: Do not check for duplicate slab namesChristoph Lameter1-0/+2
SLUB can alias multiple slab kmem_create_requests to one slab cache to save memory and increase the cache hotness. As a result the name of the slab can be stale. Only check the name for duplicates if we are in debug mode where we do not merge multiple caches. This fixes the following problem reported by Jonathan Brassow: The problem with kmem_cache* is this: *) Assume CONFIG_SLUB is set 1) kmem_cache_create(name="foo-a") - creates new kmem_cache structure 2) kmem_cache_create(name="foo-b") - If identical cache characteristics, it will be merged with the previously created cache associated with "foo-a". The cache's refcount will be incremented and an alias will be created via sysfs_slab_alias(). 3) kmem_cache_destroy(<ptr>) - Attempting to destroy cache associated with "foo-a", but instead the refcount is simply decremented. I don't even think the sysfs aliases are ever removed... 4) kmem_cache_create(name="foo-a") - This FAILS because kmem_cache_sanity_check colides with the existing name ("foo-a") associated with the non-removed cache. This is a problem for RAID (specifically dm-raid) because the name used for the kmem_cache_create is ("raid%d-%p", level, mddev). If the cache persists for long enough, the memory address of an old mddev will be reused for a new mddev - causing an identical formulation of the cache name. Even though kmem_cache_destory had long ago been used to delete the old cache, the merging of caches has cause the name and cache of that old instance to be preserved and causes a colision (and thus failure) in kmem_cache_create(). I see this regularly in my testing. Reported-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-09-04mm/sl[aou]b: Move kmallocXXX functions to common codeChristoph Lameter1-0/+10
The kmalloc* functions of all slab allcoators are similar now so lets move them into slab.h. This requires some function naming changes in slob. As a results of this patch there is a common set of functions for all allocators. Also means that kmalloc_large() is now available in general to perform large order allocations that go directly via the page allocator. kmalloc_large() can be substituted if kmalloc() throws warnings because of too large allocations. kmalloc_large() has exactly the same semantics as kmalloc but can only used for allocations > PAGE_SIZE. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-08-14mm, slab_common: add 'unlikely' to size check of kmalloc_slab()Joonsoo Kim1-1/+1
Size is usually below than KMALLOC_MAX_SIZE. If we add a 'unlikely' macro, compiler can make better code. Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-07-15Merge branch 'slab/for-linus' of ↵Linus Torvalds1-5/+13
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux Pull slab update from Pekka Enberg: "Highlights: - Fix for boot-time problems on some architectures due to init_lock_keys() not respecting kmalloc_caches boundaries (Christoph Lameter) - CONFIG_SLUB_CPU_PARTIAL requested by RT folks (Joonsoo Kim) - Fix for excessive slab freelist draining (Wanpeng Li) - SLUB and SLOB cleanups and fixes (various people)" I ended up editing the branch, and this avoids two commits at the end that were immediately reverted, and I instead just applied the oneliner fix in between myself. * 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux slub: Check for page NULL before doing the node_match check mm/slab: Give s_next and s_stop slab-specific names slob: Check for NULL pointer before calling ctor() slub: Make cpu partial slab support configurable slab: add kmalloc() to kernel API documentation slab: fix init_lock_keys slob: use DIV_ROUND_UP where possible slub: do not put a slab to cpu partial list when cpu_partial is 0 mm/slub: Use node_nr_slabs and node_nr_objs in get_slabinfo mm/slub: Drop unnecessary nr_partials mm/slab: Fix /proc/slabinfo unwriteable for slab mm/slab: Sharing s_next and s_stop between slab and slub mm/slab: Fix drain freelist excessively slob: Rework #ifdeffery in slab.h mm, slab: moved kmem_cache_alloc_node comment to correct place
2013-07-08mm/slab: Give s_next and s_stop slab-specific namesWanpeng Li1-4/+4
Give s_next and s_stop slab-specific names instead of exporting "s_next" and "s_stop". Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-07-07mm/slab: Fix /proc/slabinfo unwriteable for slabWanpeng Li1-1/+9
Slab have some tunables like limit, batchcount, and sharedfactor can be tuned through function slabinfo_write. Commit (b7454ad3: mm/sl[au]b: Move slabinfo processing to slab_common.c) uncorrectly change /proc/slabinfo unwriteable for slab, this patch fix it by revert to original mode. Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-07-07mm/slab: Sharing s_next and s_stop between slab and slubWanpeng Li1-2/+2
This patch shares s_next and s_stop between slab and slub. Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-06-13slab: prevent warnings when allocating with __GFP_NOWARNSasha Levin1-1/+3
Sasha Levin noticed that the warning introduced by commit 6286ae9 ("slab: Return NULL for oversized allocations) is being triggered: WARNING: CPU: 15 PID: 21519 at mm/slab_common.c:376 kmalloc_slab+0x2f/0xb0() can: request_module (can-proto-4) failed. mpoa: proc_mpc_write: could not parse '' Modules linked in: CPU: 15 PID: 21519 Comm: trinity-child15 Tainted: G W 3.10.0-rc4-next-20130607-sasha-00011-gcd78395-dirty #2 0000000000000009 ffff880020a95e30 ffffffff83ff4041 0000000000000000 ffff880020a95e68 ffffffff8111fe12 fffffffffffffff0 00000000000082d0 0000000000080000 0000000000080000 0000000001400000 ffff880020a95e78 Call Trace: [<ffffffff83ff4041>] dump_stack+0x4e/0x82 [<ffffffff8111fe12>] warn_slowpath_common+0x82/0xb0 [<ffffffff8111fe55>] warn_slowpath_null+0x15/0x20 [<ffffffff81243dcf>] kmalloc_slab+0x2f/0xb0 [<ffffffff81278d54>] __kmalloc+0x24/0x4b0 [<ffffffff8196ffe3>] ? security_capable+0x13/0x20 [<ffffffff812a26b7>] ? pipe_fcntl+0x107/0x210 [<ffffffff812a26b7>] pipe_fcntl+0x107/0x210 [<ffffffff812b7ea0>] ? fget_raw_light+0x130/0x3f0 [<ffffffff812aa5fb>] SyS_fcntl+0x60b/0x6a0 [<ffffffff8403ca98>] tracesys+0xe1/0xe6 Andrew Morton writes: __GFP_NOWARN is frequently used by kernel code to probe for "how big an allocation can I get". That's a bit lame, but it's used on slow paths and is pretty simple. However, SLAB would still spew a warning when a big allocation happens if the __GFP_NOWARN flag is _not_ set to expose kernel bugs. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> [ penberg@kernel.org: improve changelog ] Signed-off-by: Pekka Enberg <penberg@kernel.org>
2013-05-09mm/slab: Fix crash during slab initChris Mason1-10/+10
Commit 8a965b3baa89 ("mm, slab_common: Fix bootstrap creation of kmalloc caches") introduced a regression that caused us to crash early during boot. The commit was introducing ordering of slab creation, making sure two odd-sized slabs were created after specific powers of two sizes. But, if any of the power of two slabs were created earlier during boot, slabs at index 1 or 2 might not get created at all. This patch makes sure none of the slabs get skipped. Tony Lindgren bisected this down to the offending commit, which really helped because bisect kept bringing me to almost but not quite this one. Signed-off-by: Chris Mason <chris.mason@fusionio.com> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: Tony Lindgren <tony@atomide.com> Acked-by: Soren Brinkmann <soren.brinkmann@xilinx.com> Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>