From 3942c07ccf98e66b8893f396dca98f5b076f905f Mon Sep 17 00:00:00 2001 From: Glauber Costa Date: Wed, 28 Aug 2013 10:17:53 +1000 Subject: fs: bump inode and dentry counters to long MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This series reworks our current object cache shrinking infrastructure in two main ways: * Noticing that a lot of users copy and paste their own version of LRU lists for objects, we put some effort in providing a generic version. It is modeled after the filesystem users: dentries, inodes, and xfs (for various tasks), but we expect that other users could benefit in the near future with little or no modification. Let us know if you have any issues. * The underlying list_lru being proposed automatically and transparently keeps the elements in per-node lists, and is able to manipulate the node lists individually. Given this infrastructure, we are able to modify the up-to-now hammer called shrink_slab to proceed with node-reclaim instead of always searching memory from all over like it has been doing. Per-node lru lists are also expected to lead to less contention in the lru locks on multi-node scans, since we are now no longer fighting for a global lock. The locks usually disappear from the profilers with this change. Although we have no official benchmarks for this version - be our guest to independently evaluate this - earlier versions of this series were performance tested (details at http://permalink.gmane.org/gmane.linux.kernel.mm/100537) yielding no visible performance regressions while yielding a better qualitative behavior in NUMA machines. With this infrastructure in place, we can use the list_lru entry point to provide memcg isolation and per-memcg targeted reclaim. Historically, those two pieces of work have been posted together. This version presents only the infrastructure work, deferring the memcg work for a later time, so we can focus on getting this part tested. You can see more about the history of such work at http://lwn.net/Articles/552769/ Dave Chinner (18): dcache: convert dentry_stat.nr_unused to per-cpu counters dentry: move to per-sb LRU locks dcache: remove dentries from LRU before putting on dispose list mm: new shrinker API shrinker: convert superblock shrinkers to new API list: add a new LRU list type inode: convert inode lru list to generic lru list code. dcache: convert to use new lru list infrastructure list_lru: per-node list infrastructure shrinker: add node awareness fs: convert inode and dentry shrinking to be node aware xfs: convert buftarg LRU to generic code xfs: rework buffer dispose list tracking xfs: convert dquot cache lru to list_lru fs: convert fs shrinkers to new scan/count API drivers: convert shrinkers to new count/scan API shrinker: convert remaining shrinkers to count/scan API shrinker: Kill old ->shrink API. Glauber Costa (7): fs: bump inode and dentry counters to long super: fix calculation of shrinkable objects for small numbers list_lru: per-node API vmscan: per-node deferred work i915: bail out earlier when shrinker cannot acquire mutex hugepage: convert huge zero page shrinker to new shrinker API list_lru: dynamically adjust node arrays This patch: There are situations in very large machines in which we can have a large quantity of dirty inodes, unused dentries, etc. This is particularly true when umounting a filesystem, where eventually since every live object will eventually be discarded. Dave Chinner reported a problem with this while experimenting with the shrinker revamp patchset. So we believe it is time for a change. This patch just moves int to longs. Machines where it matters should have a big long anyway. Signed-off-by: Glauber Costa Cc: Dave Chinner Cc: "Theodore Ts'o" Cc: Adrian Hunter Cc: Al Viro Cc: Artem Bityutskiy Cc: Arve Hjønnevåg Cc: Carlos Maiolino Cc: Christoph Hellwig Cc: Chuck Lever Cc: Daniel Vetter Cc: Dave Chinner Cc: David Rientjes Cc: Gleb Natapov Cc: Greg Thelen Cc: J. Bruce Fields Cc: Jan Kara Cc: Jerome Glisse Cc: John Stultz Cc: KAMEZAWA Hiroyuki Cc: Kent Overstreet Cc: Kirill A. Shutemov Cc: Marcelo Tosatti Cc: Mel Gorman Cc: Steven Whitehouse Cc: Thomas Hellstrom Cc: Trond Myklebust Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- fs/dcache.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'fs/dcache.c') diff --git a/fs/dcache.c b/fs/dcache.c index 4d9df3c940e6..6ef1c2e1bbc4 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -146,13 +146,13 @@ struct dentry_stat_t dentry_stat = { .age_limit = 45, }; -static DEFINE_PER_CPU(unsigned int, nr_dentry); +static DEFINE_PER_CPU(long, nr_dentry); #if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS) -static int get_nr_dentry(void) +static long get_nr_dentry(void) { int i; - int sum = 0; + long sum = 0; for_each_possible_cpu(i) sum += per_cpu(nr_dentry, i); return sum < 0 ? 0 : sum; @@ -162,7 +162,7 @@ int proc_nr_dentry(ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos) { dentry_stat.nr_dentry = get_nr_dentry(); - return proc_dointvec(table, write, buffer, lenp, ppos); + return proc_doulongvec_minmax(table, write, buffer, lenp, ppos); } #endif -- cgit v1.2.3 From 62d36c77035219ac776d1882ed3a662f2b75f258 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Wed, 28 Aug 2013 10:17:54 +1000 Subject: dcache: convert dentry_stat.nr_unused to per-cpu counters MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Before we split up the dcache_lru_lock, the unused dentry counter needs to be made independent of the global dcache_lru_lock. Convert it to per-cpu counters to do this. Signed-off-by: Dave Chinner Signed-off-by: Glauber Costa Reviewed-by: Christoph Hellwig Acked-by: Mel Gorman Cc: "Theodore Ts'o" Cc: Adrian Hunter Cc: Al Viro Cc: Artem Bityutskiy Cc: Arve Hjønnevåg Cc: Carlos Maiolino Cc: Christoph Hellwig Cc: Chuck Lever Cc: Daniel Vetter Cc: David Rientjes Cc: Gleb Natapov Cc: Greg Thelen Cc: J. Bruce Fields Cc: Jan Kara Cc: Jerome Glisse Cc: John Stultz Cc: KAMEZAWA Hiroyuki Cc: Kent Overstreet Cc: Kirill A. Shutemov Cc: Marcelo Tosatti Cc: Mel Gorman Cc: Steven Whitehouse Cc: Thomas Hellstrom Cc: Trond Myklebust Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- fs/dcache.c | 30 +++++++++++++++++++++++++++--- 1 file changed, 27 insertions(+), 3 deletions(-) (limited to 'fs/dcache.c') diff --git a/fs/dcache.c b/fs/dcache.c index 6ef1c2e1bbc4..03161240e744 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -147,8 +147,22 @@ struct dentry_stat_t dentry_stat = { }; static DEFINE_PER_CPU(long, nr_dentry); +static DEFINE_PER_CPU(long, nr_dentry_unused); #if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS) + +/* + * Here we resort to our own counters instead of using generic per-cpu counters + * for consistency with what the vfs inode code does. We are expected to harvest + * better code and performance by having our own specialized counters. + * + * Please note that the loop is done over all possible CPUs, not over all online + * CPUs. The reason for this is that we don't want to play games with CPUs going + * on and off. If one of them goes off, we will just keep their counters. + * + * glommer: See cffbc8a for details, and if you ever intend to change this, + * please update all vfs counters to match. + */ static long get_nr_dentry(void) { int i; @@ -158,10 +172,20 @@ static long get_nr_dentry(void) return sum < 0 ? 0 : sum; } +static long get_nr_dentry_unused(void) +{ + int i; + long sum = 0; + for_each_possible_cpu(i) + sum += per_cpu(nr_dentry_unused, i); + return sum < 0 ? 0 : sum; +} + int proc_nr_dentry(ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos) { dentry_stat.nr_dentry = get_nr_dentry(); + dentry_stat.nr_unused = get_nr_dentry_unused(); return proc_doulongvec_minmax(table, write, buffer, lenp, ppos); } #endif @@ -342,7 +366,7 @@ static void dentry_lru_add(struct dentry *dentry) dentry->d_flags |= DCACHE_LRU_LIST; list_add(&dentry->d_lru, &dentry->d_sb->s_dentry_lru); dentry->d_sb->s_nr_dentry_unused++; - dentry_stat.nr_unused++; + this_cpu_inc(nr_dentry_unused); spin_unlock(&dcache_lru_lock); } } @@ -352,7 +376,7 @@ static void __dentry_lru_del(struct dentry *dentry) list_del_init(&dentry->d_lru); dentry->d_flags &= ~(DCACHE_SHRINK_LIST | DCACHE_LRU_LIST); dentry->d_sb->s_nr_dentry_unused--; - dentry_stat.nr_unused--; + this_cpu_dec(nr_dentry_unused); } /* @@ -374,7 +398,7 @@ static void dentry_lru_move_list(struct dentry *dentry, struct list_head *list) dentry->d_flags |= DCACHE_LRU_LIST; list_add_tail(&dentry->d_lru, list); dentry->d_sb->s_nr_dentry_unused++; - dentry_stat.nr_unused++; + this_cpu_inc(nr_dentry_unused); } else { list_move_tail(&dentry->d_lru, list); } -- cgit v1.2.3 From 19156840e33a23eeb1a749c0f991dab6588b077d Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Wed, 28 Aug 2013 10:17:55 +1000 Subject: dentry: move to per-sb LRU locks MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit With the dentry LRUs being per-sb structures, there is no real need for a global dentry_lru_lock. The locking can be made more fine-grained by moving to a per-sb LRU lock, isolating the LRU operations of different filesytsems completely from each other. The need for this is independent of any performance consideration that may arise: in the interest of abstracting the lru operations away, it is mandatory that each lru works around its own lock instead of a global lock for all of them. [glommer@openvz.org: updated changelog ] Signed-off-by: Dave Chinner Signed-off-by: Glauber Costa Reviewed-by: Christoph Hellwig Acked-by: Mel Gorman Cc: "Theodore Ts'o" Cc: Adrian Hunter Cc: Al Viro Cc: Artem Bityutskiy Cc: Arve Hjønnevåg Cc: Carlos Maiolino Cc: Christoph Hellwig Cc: Chuck Lever Cc: Daniel Vetter Cc: David Rientjes Cc: Gleb Natapov Cc: Greg Thelen Cc: J. Bruce Fields Cc: Jan Kara Cc: Jerome Glisse Cc: John Stultz Cc: KAMEZAWA Hiroyuki Cc: Kent Overstreet Cc: Kirill A. Shutemov Cc: Marcelo Tosatti Cc: Mel Gorman Cc: Steven Whitehouse Cc: Thomas Hellstrom Cc: Trond Myklebust Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- fs/dcache.c | 33 ++++++++++++++++----------------- fs/super.c | 1 + include/linux/fs.h | 4 +++- 3 files changed, 20 insertions(+), 18 deletions(-) (limited to 'fs/dcache.c') diff --git a/fs/dcache.c b/fs/dcache.c index 03161240e744..e989ecb44a65 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -48,7 +48,7 @@ * - the dcache hash table * s_anon bl list spinlock protects: * - the s_anon list (see __d_drop) - * dcache_lru_lock protects: + * dentry->d_sb->s_dentry_lru_lock protects: * - the dcache lru lists and counters * d_lock protects: * - d_flags @@ -63,7 +63,7 @@ * Ordering: * dentry->d_inode->i_lock * dentry->d_lock - * dcache_lru_lock + * dentry->d_sb->s_dentry_lru_lock * dcache_hash_bucket lock * s_anon lock * @@ -81,7 +81,6 @@ int sysctl_vfs_cache_pressure __read_mostly = 100; EXPORT_SYMBOL_GPL(sysctl_vfs_cache_pressure); -static __cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lru_lock); __cacheline_aligned_in_smp DEFINE_SEQLOCK(rename_lock); EXPORT_SYMBOL(rename_lock); @@ -362,12 +361,12 @@ static void dentry_unlink_inode(struct dentry * dentry) static void dentry_lru_add(struct dentry *dentry) { if (unlikely(!(dentry->d_flags & DCACHE_LRU_LIST))) { - spin_lock(&dcache_lru_lock); + spin_lock(&dentry->d_sb->s_dentry_lru_lock); dentry->d_flags |= DCACHE_LRU_LIST; list_add(&dentry->d_lru, &dentry->d_sb->s_dentry_lru); dentry->d_sb->s_nr_dentry_unused++; this_cpu_inc(nr_dentry_unused); - spin_unlock(&dcache_lru_lock); + spin_unlock(&dentry->d_sb->s_dentry_lru_lock); } } @@ -385,15 +384,15 @@ static void __dentry_lru_del(struct dentry *dentry) static void dentry_lru_del(struct dentry *dentry) { if (!list_empty(&dentry->d_lru)) { - spin_lock(&dcache_lru_lock); + spin_lock(&dentry->d_sb->s_dentry_lru_lock); __dentry_lru_del(dentry); - spin_unlock(&dcache_lru_lock); + spin_unlock(&dentry->d_sb->s_dentry_lru_lock); } } static void dentry_lru_move_list(struct dentry *dentry, struct list_head *list) { - spin_lock(&dcache_lru_lock); + spin_lock(&dentry->d_sb->s_dentry_lru_lock); if (list_empty(&dentry->d_lru)) { dentry->d_flags |= DCACHE_LRU_LIST; list_add_tail(&dentry->d_lru, list); @@ -402,7 +401,7 @@ static void dentry_lru_move_list(struct dentry *dentry, struct list_head *list) } else { list_move_tail(&dentry->d_lru, list); } - spin_unlock(&dcache_lru_lock); + spin_unlock(&dentry->d_sb->s_dentry_lru_lock); } /** @@ -895,14 +894,14 @@ void prune_dcache_sb(struct super_block *sb, int count) LIST_HEAD(tmp); relock: - spin_lock(&dcache_lru_lock); + spin_lock(&sb->s_dentry_lru_lock); while (!list_empty(&sb->s_dentry_lru)) { dentry = list_entry(sb->s_dentry_lru.prev, struct dentry, d_lru); BUG_ON(dentry->d_sb != sb); if (!spin_trylock(&dentry->d_lock)) { - spin_unlock(&dcache_lru_lock); + spin_unlock(&sb->s_dentry_lru_lock); cpu_relax(); goto relock; } @@ -918,11 +917,11 @@ relock: if (!--count) break; } - cond_resched_lock(&dcache_lru_lock); + cond_resched_lock(&sb->s_dentry_lru_lock); } if (!list_empty(&referenced)) list_splice(&referenced, &sb->s_dentry_lru); - spin_unlock(&dcache_lru_lock); + spin_unlock(&sb->s_dentry_lru_lock); shrink_dentry_list(&tmp); } @@ -938,14 +937,14 @@ void shrink_dcache_sb(struct super_block *sb) { LIST_HEAD(tmp); - spin_lock(&dcache_lru_lock); + spin_lock(&sb->s_dentry_lru_lock); while (!list_empty(&sb->s_dentry_lru)) { list_splice_init(&sb->s_dentry_lru, &tmp); - spin_unlock(&dcache_lru_lock); + spin_unlock(&sb->s_dentry_lru_lock); shrink_dentry_list(&tmp); - spin_lock(&dcache_lru_lock); + spin_lock(&sb->s_dentry_lru_lock); } - spin_unlock(&dcache_lru_lock); + spin_unlock(&sb->s_dentry_lru_lock); } EXPORT_SYMBOL(shrink_dcache_sb); diff --git a/fs/super.c b/fs/super.c index 63b6863bac7b..3c5318694ccd 100644 --- a/fs/super.c +++ b/fs/super.c @@ -176,6 +176,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags) INIT_HLIST_BL_HEAD(&s->s_anon); INIT_LIST_HEAD(&s->s_inodes); INIT_LIST_HEAD(&s->s_dentry_lru); + spin_lock_init(&s->s_dentry_lru_lock); INIT_LIST_HEAD(&s->s_inode_lru); spin_lock_init(&s->s_inode_lru_lock); INIT_LIST_HEAD(&s->s_mounts); diff --git a/include/linux/fs.h b/include/linux/fs.h index 3b3edac75df2..14a90f6886fa 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1269,7 +1269,9 @@ struct super_block { struct list_head s_files; #endif struct list_head s_mounts; /* list of mounts; _not_ for fs use */ - /* s_dentry_lru, s_nr_dentry_unused protected by dcache.c lru locks */ + + /* s_dentry_lru_lock protects s_dentry_lru and s_nr_dentry_unused */ + spinlock_t s_dentry_lru_lock ____cacheline_aligned_in_smp; struct list_head s_dentry_lru; /* unused dentry lru */ long s_nr_dentry_unused; /* # of dentry on lru */ -- cgit v1.2.3 From dd1f6b2e43a53ee58eb87d5e623cf44e277d005d Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Wed, 28 Aug 2013 10:17:55 +1000 Subject: dcache: remove dentries from LRU before putting on dispose list MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit One of the big problems with modifying the way the dcache shrinker and LRU implementation works is that the LRU is abused in several ways. One of these is shrink_dentry_list(). Basically, we can move a dentry off the LRU onto a different list without doing any accounting changes, and then use dentry_lru_prune() to remove it from what-ever list it is now on to do the LRU accounting at that point. This makes it -really hard- to change the LRU implementation. The use of the per-sb LRU lock serialises movement of the dentries between the different lists and the removal of them, and this is the only reason that it works. If we want to break up the dentry LRU lock and lists into, say, per-node lists, we remove the only serialisation that allows this lru list/dispose list abuse to work. To make this work effectively, the dispose list has to be isolated from the LRU list - dentries have to be removed from the LRU *before* being placed on the dispose list. This means that the LRU accounting and isolation is completed before disposal is started, and that means we can change the LRU implementation freely in future. This means that dentries *must* be marked with DCACHE_SHRINK_LIST when they are placed on the dispose list so that we don't think that parent dentries found in try_prune_one_dentry() are on the LRU when the are actually on the dispose list. This would result in accounting the dentry to the LRU a second time. Hence dentry_lru_del() has to handle the DCACHE_SHRINK_LIST case Signed-off-by: Dave Chinner Signed-off-by: Glauber Costa Cc: "Theodore Ts'o" Cc: Adrian Hunter Cc: Al Viro Cc: Artem Bityutskiy Cc: Arve Hjønnevåg Cc: Carlos Maiolino Cc: Christoph Hellwig Cc: Chuck Lever Cc: Daniel Vetter Cc: David Rientjes Cc: Gleb Natapov Cc: Greg Thelen Cc: J. Bruce Fields Cc: Jan Kara Cc: Jerome Glisse Cc: John Stultz Cc: KAMEZAWA Hiroyuki Cc: Kent Overstreet Cc: Kirill A. Shutemov Cc: Marcelo Tosatti Cc: Mel Gorman Cc: Steven Whitehouse Cc: Thomas Hellstrom Cc: Trond Myklebust Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- fs/dcache.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 78 insertions(+), 21 deletions(-) (limited to 'fs/dcache.c') diff --git a/fs/dcache.c b/fs/dcache.c index e989ecb44a65..509b49410943 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -356,7 +356,7 @@ static void dentry_unlink_inode(struct dentry * dentry) } /* - * dentry_lru_(add|del|prune|move_tail) must be called with d_lock held. + * dentry_lru_(add|del|move_list) must be called with d_lock held. */ static void dentry_lru_add(struct dentry *dentry) { @@ -373,16 +373,26 @@ static void dentry_lru_add(struct dentry *dentry) static void __dentry_lru_del(struct dentry *dentry) { list_del_init(&dentry->d_lru); - dentry->d_flags &= ~(DCACHE_SHRINK_LIST | DCACHE_LRU_LIST); + dentry->d_flags &= ~DCACHE_LRU_LIST; dentry->d_sb->s_nr_dentry_unused--; this_cpu_dec(nr_dentry_unused); } /* * Remove a dentry with references from the LRU. + * + * If we are on the shrink list, then we can get to try_prune_one_dentry() and + * lose our last reference through the parent walk. In this case, we need to + * remove ourselves from the shrink list, not the LRU. */ static void dentry_lru_del(struct dentry *dentry) { + if (dentry->d_flags & DCACHE_SHRINK_LIST) { + list_del_init(&dentry->d_lru); + dentry->d_flags &= ~DCACHE_SHRINK_LIST; + return; + } + if (!list_empty(&dentry->d_lru)) { spin_lock(&dentry->d_sb->s_dentry_lru_lock); __dentry_lru_del(dentry); @@ -392,14 +402,16 @@ static void dentry_lru_del(struct dentry *dentry) static void dentry_lru_move_list(struct dentry *dentry, struct list_head *list) { + BUG_ON(dentry->d_flags & DCACHE_SHRINK_LIST); + spin_lock(&dentry->d_sb->s_dentry_lru_lock); if (list_empty(&dentry->d_lru)) { dentry->d_flags |= DCACHE_LRU_LIST; list_add_tail(&dentry->d_lru, list); - dentry->d_sb->s_nr_dentry_unused++; - this_cpu_inc(nr_dentry_unused); } else { list_move_tail(&dentry->d_lru, list); + dentry->d_sb->s_nr_dentry_unused--; + this_cpu_dec(nr_dentry_unused); } spin_unlock(&dentry->d_sb->s_dentry_lru_lock); } @@ -497,7 +509,8 @@ EXPORT_SYMBOL(d_drop); * If ref is non-zero, then decrement the refcount too. * Returns dentry requiring refcount drop, or NULL if we're done. */ -static inline struct dentry *dentry_kill(struct dentry *dentry) +static inline struct dentry * +dentry_kill(struct dentry *dentry, int unlock_on_failure) __releases(dentry->d_lock) { struct inode *inode; @@ -506,8 +519,10 @@ static inline struct dentry *dentry_kill(struct dentry *dentry) inode = dentry->d_inode; if (inode && !spin_trylock(&inode->i_lock)) { relock: - spin_unlock(&dentry->d_lock); - cpu_relax(); + if (unlock_on_failure) { + spin_unlock(&dentry->d_lock); + cpu_relax(); + } return dentry; /* try again with same dentry */ } if (IS_ROOT(dentry)) @@ -590,7 +605,7 @@ repeat: return; kill_it: - dentry = dentry_kill(dentry); + dentry = dentry_kill(dentry, 1); if (dentry) goto repeat; } @@ -810,12 +825,12 @@ EXPORT_SYMBOL(d_prune_aliases); * * This may fail if locks cannot be acquired no problem, just try again. */ -static void try_prune_one_dentry(struct dentry *dentry) +static struct dentry * try_prune_one_dentry(struct dentry *dentry) __releases(dentry->d_lock) { struct dentry *parent; - parent = dentry_kill(dentry); + parent = dentry_kill(dentry, 0); /* * If dentry_kill returns NULL, we have nothing more to do. * if it returns the same dentry, trylocks failed. In either @@ -827,17 +842,18 @@ static void try_prune_one_dentry(struct dentry *dentry) * fragmentation. */ if (!parent) - return; + return NULL; if (parent == dentry) - return; + return dentry; /* Prune ancestors. */ dentry = parent; while (dentry) { if (lockref_put_or_lock(&dentry->d_lockref)) - return; - dentry = dentry_kill(dentry); + return NULL; + dentry = dentry_kill(dentry, 1); } + return NULL; } static void shrink_dentry_list(struct list_head *list) @@ -855,22 +871,32 @@ static void shrink_dentry_list(struct list_head *list) continue; } + /* + * The dispose list is isolated and dentries are not accounted + * to the LRU here, so we can simply remove it from the list + * here regardless of whether it is referenced or not. + */ + list_del_init(&dentry->d_lru); + dentry->d_flags &= ~DCACHE_SHRINK_LIST; + /* * We found an inuse dentry which was not removed from - * the LRU because of laziness during lookup. Do not free - * it - just keep it off the LRU list. + * the LRU because of laziness during lookup. Do not free it. */ if (dentry->d_lockref.count) { - dentry_lru_del(dentry); spin_unlock(&dentry->d_lock); continue; } - rcu_read_unlock(); - try_prune_one_dentry(dentry); + dentry = try_prune_one_dentry(dentry); rcu_read_lock(); + if (dentry) { + dentry->d_flags |= DCACHE_SHRINK_LIST; + list_add(&dentry->d_lru, list); + spin_unlock(&dentry->d_lock); + } } rcu_read_unlock(); } @@ -911,8 +937,10 @@ relock: list_move(&dentry->d_lru, &referenced); spin_unlock(&dentry->d_lock); } else { - list_move_tail(&dentry->d_lru, &tmp); + list_move(&dentry->d_lru, &tmp); dentry->d_flags |= DCACHE_SHRINK_LIST; + this_cpu_dec(nr_dentry_unused); + sb->s_nr_dentry_unused--; spin_unlock(&dentry->d_lock); if (!--count) break; @@ -926,6 +954,27 @@ relock: shrink_dentry_list(&tmp); } +/* + * Mark all the dentries as on being the dispose list so we don't think they are + * still on the LRU if we try to kill them from ascending the parent chain in + * try_prune_one_dentry() rather than directly from the dispose list. + */ +static void +shrink_dcache_list( + struct list_head *dispose) +{ + struct dentry *dentry; + + rcu_read_lock(); + list_for_each_entry_rcu(dentry, dispose, d_lru) { + spin_lock(&dentry->d_lock); + dentry->d_flags |= DCACHE_SHRINK_LIST; + spin_unlock(&dentry->d_lock); + } + rcu_read_unlock(); + shrink_dentry_list(dispose); +} + /** * shrink_dcache_sb - shrink dcache for a superblock * @sb: superblock @@ -939,9 +988,17 @@ void shrink_dcache_sb(struct super_block *sb) spin_lock(&sb->s_dentry_lru_lock); while (!list_empty(&sb->s_dentry_lru)) { + /* + * account for removal here so we don't need to handle it later + * even though the dentry is no longer on the lru list. + */ list_splice_init(&sb->s_dentry_lru, &tmp); + this_cpu_sub(nr_dentry_unused, sb->s_nr_dentry_unused); + sb->s_nr_dentry_unused = 0; spin_unlock(&sb->s_dentry_lru_lock); - shrink_dentry_list(&tmp); + + shrink_dcache_list(&tmp); + spin_lock(&sb->s_dentry_lru_lock); } spin_unlock(&sb->s_dentry_lru_lock); -- cgit v1.2.3 From 0a234c6dcb79a270803f5c9773ed650b78730962 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Wed, 28 Aug 2013 10:17:57 +1000 Subject: shrinker: convert superblock shrinkers to new API MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Convert superblock shrinker to use the new count/scan API, and propagate the API changes through to the filesystem callouts. The filesystem callouts already use a count/scan API, so it's just changing counters to longs to match the VM API. This requires the dentry and inode shrinker callouts to be converted to the count/scan API. This is mainly a mechanical change. [glommer@openvz.org: use mult_frac for fractional proportions, build fixes] Signed-off-by: Dave Chinner Signed-off-by: Glauber Costa Acked-by: Mel Gorman Cc: "Theodore Ts'o" Cc: Adrian Hunter Cc: Al Viro Cc: Artem Bityutskiy Cc: Arve Hjønnevåg Cc: Carlos Maiolino Cc: Christoph Hellwig Cc: Chuck Lever Cc: Daniel Vetter Cc: David Rientjes Cc: Gleb Natapov Cc: Greg Thelen Cc: J. Bruce Fields Cc: Jan Kara Cc: Jerome Glisse Cc: John Stultz Cc: KAMEZAWA Hiroyuki Cc: Kent Overstreet Cc: Kirill A. Shutemov Cc: Marcelo Tosatti Cc: Mel Gorman Cc: Steven Whitehouse Cc: Thomas Hellstrom Cc: Trond Myklebust Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- fs/dcache.c | 7 +++-- fs/inode.c | 7 +++-- fs/internal.h | 2 ++ fs/super.c | 80 ++++++++++++++++++++++++++++++++--------------------- fs/xfs/xfs_icache.c | 4 +-- fs/xfs/xfs_icache.h | 2 +- fs/xfs/xfs_super.c | 8 +++--- include/linux/fs.h | 8 ++---- 8 files changed, 70 insertions(+), 48 deletions(-) (limited to 'fs/dcache.c') diff --git a/fs/dcache.c b/fs/dcache.c index 509b49410943..77d466b13fef 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -913,11 +913,12 @@ static void shrink_dentry_list(struct list_head *list) * This function may fail to free any resources if all the dentries are in * use. */ -void prune_dcache_sb(struct super_block *sb, int count) +long prune_dcache_sb(struct super_block *sb, unsigned long nr_to_scan) { struct dentry *dentry; LIST_HEAD(referenced); LIST_HEAD(tmp); + long freed = 0; relock: spin_lock(&sb->s_dentry_lru_lock); @@ -942,7 +943,8 @@ relock: this_cpu_dec(nr_dentry_unused); sb->s_nr_dentry_unused--; spin_unlock(&dentry->d_lock); - if (!--count) + freed++; + if (!--nr_to_scan) break; } cond_resched_lock(&sb->s_dentry_lru_lock); @@ -952,6 +954,7 @@ relock: spin_unlock(&sb->s_dentry_lru_lock); shrink_dentry_list(&tmp); + return freed; } /* diff --git a/fs/inode.c b/fs/inode.c index 2a3c37ea823d..021d64768a55 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -706,10 +706,11 @@ static int can_unuse(struct inode *inode) * LRU does not have strict ordering. Hence we don't want to reclaim inodes * with this flag set because they are the inodes that are out of order. */ -void prune_icache_sb(struct super_block *sb, int nr_to_scan) +long prune_icache_sb(struct super_block *sb, unsigned long nr_to_scan) { LIST_HEAD(freeable); - int nr_scanned; + long nr_scanned; + long freed = 0; unsigned long reap = 0; spin_lock(&sb->s_inode_lru_lock); @@ -779,6 +780,7 @@ void prune_icache_sb(struct super_block *sb, int nr_to_scan) list_move(&inode->i_lru, &freeable); sb->s_nr_inodes_unused--; this_cpu_dec(nr_unused); + freed++; } if (current_is_kswapd()) __count_vm_events(KSWAPD_INODESTEAL, reap); @@ -789,6 +791,7 @@ void prune_icache_sb(struct super_block *sb, int nr_to_scan) current->reclaim_state->reclaimed_slab += reap; dispose_list(&freeable); + return freed; } static void __wait_on_freeing_inode(struct inode *inode); diff --git a/fs/internal.h b/fs/internal.h index b6495659d6e8..cb83a3417a68 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -114,6 +114,7 @@ extern int open_check_o_direct(struct file *f); * inode.c */ extern spinlock_t inode_sb_list_lock; +extern long prune_icache_sb(struct super_block *sb, unsigned long nr_to_scan); extern void inode_add_lru(struct inode *inode); /* @@ -130,6 +131,7 @@ extern int invalidate_inodes(struct super_block *, bool); */ extern struct dentry *__d_alloc(struct super_block *, const struct qstr *); extern int d_set_mounted(struct dentry *dentry); +extern long prune_dcache_sb(struct super_block *sb, unsigned long nr_to_scan); /* * read_write.c diff --git a/fs/super.c b/fs/super.c index 3c5318694ccd..8aa2660642b9 100644 --- a/fs/super.c +++ b/fs/super.c @@ -53,11 +53,15 @@ static char *sb_writers_name[SB_FREEZE_LEVELS] = { * shrinker path and that leads to deadlock on the shrinker_rwsem. Hence we * take a passive reference to the superblock to avoid this from occurring. */ -static int prune_super(struct shrinker *shrink, struct shrink_control *sc) +static unsigned long super_cache_scan(struct shrinker *shrink, + struct shrink_control *sc) { struct super_block *sb; - int fs_objects = 0; - int total_objects; + long fs_objects = 0; + long total_objects; + long freed = 0; + long dentries; + long inodes; sb = container_of(shrink, struct super_block, s_shrink); @@ -65,11 +69,11 @@ static int prune_super(struct shrinker *shrink, struct shrink_control *sc) * Deadlock avoidance. We may hold various FS locks, and we don't want * to recurse into the FS that called us in clear_inode() and friends.. */ - if (sc->nr_to_scan && !(sc->gfp_mask & __GFP_FS)) - return -1; + if (!(sc->gfp_mask & __GFP_FS)) + return SHRINK_STOP; if (!grab_super_passive(sb)) - return -1; + return SHRINK_STOP; if (sb->s_op->nr_cached_objects) fs_objects = sb->s_op->nr_cached_objects(sb); @@ -77,33 +81,46 @@ static int prune_super(struct shrinker *shrink, struct shrink_control *sc) total_objects = sb->s_nr_dentry_unused + sb->s_nr_inodes_unused + fs_objects + 1; - if (sc->nr_to_scan) { - int dentries; - int inodes; - - /* proportion the scan between the caches */ - dentries = mult_frac(sc->nr_to_scan, sb->s_nr_dentry_unused, - total_objects); - inodes = mult_frac(sc->nr_to_scan, sb->s_nr_inodes_unused, - total_objects); - if (fs_objects) - fs_objects = mult_frac(sc->nr_to_scan, fs_objects, - total_objects); - /* - * prune the dcache first as the icache is pinned by it, then - * prune the icache, followed by the filesystem specific caches - */ - prune_dcache_sb(sb, dentries); - prune_icache_sb(sb, inodes); + /* proportion the scan between the caches */ + dentries = mult_frac(sc->nr_to_scan, sb->s_nr_dentry_unused, + total_objects); + inodes = mult_frac(sc->nr_to_scan, sb->s_nr_inodes_unused, + total_objects); - if (fs_objects && sb->s_op->free_cached_objects) { - sb->s_op->free_cached_objects(sb, fs_objects); - fs_objects = sb->s_op->nr_cached_objects(sb); - } - total_objects = sb->s_nr_dentry_unused + - sb->s_nr_inodes_unused + fs_objects; + /* + * prune the dcache first as the icache is pinned by it, then + * prune the icache, followed by the filesystem specific caches + */ + freed = prune_dcache_sb(sb, dentries); + freed += prune_icache_sb(sb, inodes); + + if (fs_objects) { + fs_objects = mult_frac(sc->nr_to_scan, fs_objects, + total_objects); + freed += sb->s_op->free_cached_objects(sb, fs_objects); } + drop_super(sb); + return freed; +} + +static unsigned long super_cache_count(struct shrinker *shrink, + struct shrink_control *sc) +{ + struct super_block *sb; + long total_objects = 0; + + sb = container_of(shrink, struct super_block, s_shrink); + + if (!grab_super_passive(sb)) + return 0; + + if (sb->s_op && sb->s_op->nr_cached_objects) + total_objects = sb->s_op->nr_cached_objects(sb); + + total_objects += sb->s_nr_dentry_unused; + total_objects += sb->s_nr_inodes_unused; + total_objects = vfs_pressure_ratio(total_objects); drop_super(sb); return total_objects; @@ -211,7 +228,8 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags) s->cleancache_poolid = -1; s->s_shrink.seeks = DEFAULT_SEEKS; - s->s_shrink.shrink = prune_super; + s->s_shrink.scan_objects = super_cache_scan; + s->s_shrink.count_objects = super_cache_count; s->s_shrink.batch = 1024; } out: diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 16219b9c6790..73b62a24ceac 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -1167,7 +1167,7 @@ xfs_reclaim_inodes( * them to be cleaned, which we hope will not be very long due to the * background walker having already kicked the IO off on those dirty inodes. */ -void +long xfs_reclaim_inodes_nr( struct xfs_mount *mp, int nr_to_scan) @@ -1176,7 +1176,7 @@ xfs_reclaim_inodes_nr( xfs_reclaim_work_queue(mp); xfs_ail_push_all(mp->m_ail); - xfs_reclaim_inodes_ag(mp, SYNC_TRYLOCK | SYNC_WAIT, &nr_to_scan); + return xfs_reclaim_inodes_ag(mp, SYNC_TRYLOCK | SYNC_WAIT, &nr_to_scan); } /* diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h index 8a89f7d791bd..456f0144e1b6 100644 --- a/fs/xfs/xfs_icache.h +++ b/fs/xfs/xfs_icache.h @@ -46,7 +46,7 @@ void xfs_reclaim_worker(struct work_struct *work); int xfs_reclaim_inodes(struct xfs_mount *mp, int mode); int xfs_reclaim_inodes_count(struct xfs_mount *mp); -void xfs_reclaim_inodes_nr(struct xfs_mount *mp, int nr_to_scan); +long xfs_reclaim_inodes_nr(struct xfs_mount *mp, int nr_to_scan); void xfs_inode_set_reclaim_tag(struct xfs_inode *ip); diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 979a77d4b87d..71d7aaebb912 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1535,19 +1535,19 @@ xfs_fs_mount( return mount_bdev(fs_type, flags, dev_name, data, xfs_fs_fill_super); } -static int +static long xfs_fs_nr_cached_objects( struct super_block *sb) { return xfs_reclaim_inodes_count(XFS_M(sb)); } -static void +static long xfs_fs_free_cached_objects( struct super_block *sb, - int nr_to_scan) + long nr_to_scan) { - xfs_reclaim_inodes_nr(XFS_M(sb), nr_to_scan); + return xfs_reclaim_inodes_nr(XFS_M(sb), nr_to_scan); } static const struct super_operations xfs_super_operations = { diff --git a/include/linux/fs.h b/include/linux/fs.h index 14a90f6886fa..0ae0bc3c1fde 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1335,10 +1335,6 @@ struct super_block { struct workqueue_struct *s_dio_done_wq; }; -/* superblock cache pruning functions */ -extern void prune_icache_sb(struct super_block *sb, int nr_to_scan); -extern void prune_dcache_sb(struct super_block *sb, int nr_to_scan); - extern struct timespec current_fs_time(struct super_block *sb); /* @@ -1631,8 +1627,8 @@ struct super_operations { ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); #endif int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t); - int (*nr_cached_objects)(struct super_block *); - void (*free_cached_objects)(struct super_block *, int); + long (*nr_cached_objects)(struct super_block *); + long (*free_cached_objects)(struct super_block *, long); }; /* -- cgit v1.2.3 From f604156751db77e08afe47ce29fe8f3d51ad9b04 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Wed, 28 Aug 2013 10:18:00 +1000 Subject: dcache: convert to use new lru list infrastructure MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [glommer@openvz.org: don't reintroduce double decrement of nr_unused_dentries, adapted for new LRU return codes] Signed-off-by: Dave Chinner Signed-off-by: Glauber Costa Cc: "Theodore Ts'o" Cc: Adrian Hunter Cc: Al Viro Cc: Artem Bityutskiy Cc: Arve Hjønnevåg Cc: Carlos Maiolino Cc: Christoph Hellwig Cc: Chuck Lever Cc: Daniel Vetter Cc: David Rientjes Cc: Gleb Natapov Cc: Greg Thelen Cc: J. Bruce Fields Cc: Jan Kara Cc: Jerome Glisse Cc: John Stultz Cc: KAMEZAWA Hiroyuki Cc: Kent Overstreet Cc: Kirill A. Shutemov Cc: Marcelo Tosatti Cc: Mel Gorman Cc: Steven Whitehouse Cc: Thomas Hellstrom Cc: Trond Myklebust Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- fs/dcache.c | 170 ++++++++++++++++++++++++----------------------------- fs/super.c | 11 ++-- include/linux/fs.h | 15 +++-- 3 files changed, 90 insertions(+), 106 deletions(-) (limited to 'fs/dcache.c') diff --git a/fs/dcache.c b/fs/dcache.c index 77d466b13fef..38a4a03499a2 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -37,6 +37,7 @@ #include #include #include +#include #include "internal.h" #include "mount.h" @@ -356,28 +357,17 @@ static void dentry_unlink_inode(struct dentry * dentry) } /* - * dentry_lru_(add|del|move_list) must be called with d_lock held. + * dentry_lru_(add|del)_list) must be called with d_lock held. */ static void dentry_lru_add(struct dentry *dentry) { if (unlikely(!(dentry->d_flags & DCACHE_LRU_LIST))) { - spin_lock(&dentry->d_sb->s_dentry_lru_lock); + if (list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru)) + this_cpu_inc(nr_dentry_unused); dentry->d_flags |= DCACHE_LRU_LIST; - list_add(&dentry->d_lru, &dentry->d_sb->s_dentry_lru); - dentry->d_sb->s_nr_dentry_unused++; - this_cpu_inc(nr_dentry_unused); - spin_unlock(&dentry->d_sb->s_dentry_lru_lock); } } -static void __dentry_lru_del(struct dentry *dentry) -{ - list_del_init(&dentry->d_lru); - dentry->d_flags &= ~DCACHE_LRU_LIST; - dentry->d_sb->s_nr_dentry_unused--; - this_cpu_dec(nr_dentry_unused); -} - /* * Remove a dentry with references from the LRU. * @@ -393,27 +383,9 @@ static void dentry_lru_del(struct dentry *dentry) return; } - if (!list_empty(&dentry->d_lru)) { - spin_lock(&dentry->d_sb->s_dentry_lru_lock); - __dentry_lru_del(dentry); - spin_unlock(&dentry->d_sb->s_dentry_lru_lock); - } -} - -static void dentry_lru_move_list(struct dentry *dentry, struct list_head *list) -{ - BUG_ON(dentry->d_flags & DCACHE_SHRINK_LIST); - - spin_lock(&dentry->d_sb->s_dentry_lru_lock); - if (list_empty(&dentry->d_lru)) { - dentry->d_flags |= DCACHE_LRU_LIST; - list_add_tail(&dentry->d_lru, list); - } else { - list_move_tail(&dentry->d_lru, list); - dentry->d_sb->s_nr_dentry_unused--; + if (list_lru_del(&dentry->d_sb->s_dentry_lru, &dentry->d_lru)) this_cpu_dec(nr_dentry_unused); - } - spin_unlock(&dentry->d_sb->s_dentry_lru_lock); + dentry->d_flags &= ~DCACHE_LRU_LIST; } /** @@ -901,12 +873,72 @@ static void shrink_dentry_list(struct list_head *list) rcu_read_unlock(); } +static enum lru_status +dentry_lru_isolate(struct list_head *item, spinlock_t *lru_lock, void *arg) +{ + struct list_head *freeable = arg; + struct dentry *dentry = container_of(item, struct dentry, d_lru); + + + /* + * we are inverting the lru lock/dentry->d_lock here, + * so use a trylock. If we fail to get the lock, just skip + * it + */ + if (!spin_trylock(&dentry->d_lock)) + return LRU_SKIP; + + /* + * Referenced dentries are still in use. If they have active + * counts, just remove them from the LRU. Otherwise give them + * another pass through the LRU. + */ + if (dentry->d_lockref.count) { + list_del_init(&dentry->d_lru); + spin_unlock(&dentry->d_lock); + return LRU_REMOVED; + } + + if (dentry->d_flags & DCACHE_REFERENCED) { + dentry->d_flags &= ~DCACHE_REFERENCED; + spin_unlock(&dentry->d_lock); + + /* + * The list move itself will be made by the common LRU code. At + * this point, we've dropped the dentry->d_lock but keep the + * lru lock. This is safe to do, since every list movement is + * protected by the lru lock even if both locks are held. + * + * This is guaranteed by the fact that all LRU management + * functions are intermediated by the LRU API calls like + * list_lru_add and list_lru_del. List movement in this file + * only ever occur through this functions or through callbacks + * like this one, that are called from the LRU API. + * + * The only exceptions to this are functions like + * shrink_dentry_list, and code that first checks for the + * DCACHE_SHRINK_LIST flag. Those are guaranteed to be + * operating only with stack provided lists after they are + * properly isolated from the main list. It is thus, always a + * local access. + */ + return LRU_ROTATE; + } + + dentry->d_flags |= DCACHE_SHRINK_LIST; + list_move_tail(&dentry->d_lru, freeable); + this_cpu_dec(nr_dentry_unused); + spin_unlock(&dentry->d_lock); + + return LRU_REMOVED; +} + /** * prune_dcache_sb - shrink the dcache * @sb: superblock - * @count: number of entries to try to free + * @nr_to_scan : number of entries to try to free * - * Attempt to shrink the superblock dcache LRU by @count entries. This is + * Attempt to shrink the superblock dcache LRU by @nr_to_scan entries. This is * done when we need more memory an called from the superblock shrinker * function. * @@ -915,45 +947,12 @@ static void shrink_dentry_list(struct list_head *list) */ long prune_dcache_sb(struct super_block *sb, unsigned long nr_to_scan) { - struct dentry *dentry; - LIST_HEAD(referenced); - LIST_HEAD(tmp); - long freed = 0; + LIST_HEAD(dispose); + long freed; -relock: - spin_lock(&sb->s_dentry_lru_lock); - while (!list_empty(&sb->s_dentry_lru)) { - dentry = list_entry(sb->s_dentry_lru.prev, - struct dentry, d_lru); - BUG_ON(dentry->d_sb != sb); - - if (!spin_trylock(&dentry->d_lock)) { - spin_unlock(&sb->s_dentry_lru_lock); - cpu_relax(); - goto relock; - } - - if (dentry->d_flags & DCACHE_REFERENCED) { - dentry->d_flags &= ~DCACHE_REFERENCED; - list_move(&dentry->d_lru, &referenced); - spin_unlock(&dentry->d_lock); - } else { - list_move(&dentry->d_lru, &tmp); - dentry->d_flags |= DCACHE_SHRINK_LIST; - this_cpu_dec(nr_dentry_unused); - sb->s_nr_dentry_unused--; - spin_unlock(&dentry->d_lock); - freed++; - if (!--nr_to_scan) - break; - } - cond_resched_lock(&sb->s_dentry_lru_lock); - } - if (!list_empty(&referenced)) - list_splice(&referenced, &sb->s_dentry_lru); - spin_unlock(&sb->s_dentry_lru_lock); - - shrink_dentry_list(&tmp); + freed = list_lru_walk(&sb->s_dentry_lru, dentry_lru_isolate, + &dispose, nr_to_scan); + shrink_dentry_list(&dispose); return freed; } @@ -987,24 +986,10 @@ shrink_dcache_list( */ void shrink_dcache_sb(struct super_block *sb) { - LIST_HEAD(tmp); - - spin_lock(&sb->s_dentry_lru_lock); - while (!list_empty(&sb->s_dentry_lru)) { - /* - * account for removal here so we don't need to handle it later - * even though the dentry is no longer on the lru list. - */ - list_splice_init(&sb->s_dentry_lru, &tmp); - this_cpu_sub(nr_dentry_unused, sb->s_nr_dentry_unused); - sb->s_nr_dentry_unused = 0; - spin_unlock(&sb->s_dentry_lru_lock); + long disposed; - shrink_dcache_list(&tmp); - - spin_lock(&sb->s_dentry_lru_lock); - } - spin_unlock(&sb->s_dentry_lru_lock); + disposed = list_lru_dispose_all(&sb->s_dentry_lru, shrink_dcache_list); + this_cpu_sub(nr_dentry_unused, disposed); } EXPORT_SYMBOL(shrink_dcache_sb); @@ -1366,7 +1351,8 @@ static enum d_walk_ret select_collect(void *_data, struct dentry *dentry) if (dentry->d_lockref.count) { dentry_lru_del(dentry); } else if (!(dentry->d_flags & DCACHE_SHRINK_LIST)) { - dentry_lru_move_list(dentry, &data->dispose); + dentry_lru_del(dentry); + list_add_tail(&dentry->d_lru, &data->dispose); dentry->d_flags |= DCACHE_SHRINK_LIST; data->found++; ret = D_WALK_NORETRY; diff --git a/fs/super.c b/fs/super.c index aa7995d73bcc..cd3c2cd9144d 100644 --- a/fs/super.c +++ b/fs/super.c @@ -79,11 +79,11 @@ static unsigned long super_cache_scan(struct shrinker *shrink, fs_objects = sb->s_op->nr_cached_objects(sb); inodes = list_lru_count(&sb->s_inode_lru); - total_objects = sb->s_nr_dentry_unused + inodes + fs_objects + 1; + dentries = list_lru_count(&sb->s_dentry_lru); + total_objects = dentries + inodes + fs_objects + 1; /* proportion the scan between the caches */ - dentries = mult_frac(sc->nr_to_scan, sb->s_nr_dentry_unused, - total_objects); + dentries = mult_frac(sc->nr_to_scan, dentries, total_objects); inodes = mult_frac(sc->nr_to_scan, inodes, total_objects); /* @@ -117,7 +117,7 @@ static unsigned long super_cache_count(struct shrinker *shrink, if (sb->s_op && sb->s_op->nr_cached_objects) total_objects = sb->s_op->nr_cached_objects(sb); - total_objects += sb->s_nr_dentry_unused; + total_objects += list_lru_count(&sb->s_dentry_lru); total_objects += list_lru_count(&sb->s_inode_lru); total_objects = vfs_pressure_ratio(total_objects); @@ -191,8 +191,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags) INIT_HLIST_NODE(&s->s_instances); INIT_HLIST_BL_HEAD(&s->s_anon); INIT_LIST_HEAD(&s->s_inodes); - INIT_LIST_HEAD(&s->s_dentry_lru); - spin_lock_init(&s->s_dentry_lru_lock); + list_lru_init(&s->s_dentry_lru); list_lru_init(&s->s_inode_lru); INIT_LIST_HEAD(&s->s_mounts); init_rwsem(&s->s_umount); diff --git a/include/linux/fs.h b/include/linux/fs.h index e04786569c28..36e45df87f6e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1270,14 +1270,6 @@ struct super_block { struct list_head s_files; #endif struct list_head s_mounts; /* list of mounts; _not_ for fs use */ - - /* s_dentry_lru_lock protects s_dentry_lru and s_nr_dentry_unused */ - spinlock_t s_dentry_lru_lock ____cacheline_aligned_in_smp; - struct list_head s_dentry_lru; /* unused dentry lru */ - long s_nr_dentry_unused; /* # of dentry on lru */ - - struct list_lru s_inode_lru ____cacheline_aligned_in_smp; - struct block_device *s_bdev; struct backing_dev_info *s_bdi; struct mtd_info *s_mtd; @@ -1331,6 +1323,13 @@ struct super_block { /* AIO completions deferred from interrupt context */ struct workqueue_struct *s_dio_done_wq; + + /* + * Keep the lru lists last in the structure so they always sit on their + * own individual cachelines. + */ + struct list_lru s_dentry_lru ____cacheline_aligned_in_smp; + struct list_lru s_inode_lru ____cacheline_aligned_in_smp; }; extern struct timespec current_fs_time(struct super_block *sb); -- cgit v1.2.3 From 4e717f5c1083995c334ced639cc77a75e9972567 Mon Sep 17 00:00:00 2001 From: Glauber Costa Date: Wed, 28 Aug 2013 10:18:03 +1000 Subject: list_lru: remove special case function list_lru_dispose_all. The list_lru implementation has one function, list_lru_dispose_all, with only one user (the dentry code). At first, such function appears to make sense because we are really not interested in the result of isolating each dentry separately - all of them are going away anyway. However, it's implementation is buggy in the following way: When we call list_lru_dispose_all in fs/dcache.c, we scan all dentries marking them with DCACHE_SHRINK_LIST. However, this is done without the nlru->lock taken. The imediate result of that is that someone else may add or remove the dentry from the LRU at the same time. When list_lru_del happens in that scenario we will see an element that is not yet marked with DCACHE_SHRINK_LIST (even though it will be in the future) and obviously remove it from an lru where the element no longer is. Since list_lru_dispose_all will in effect count down nlru's nr_items and list_lru_del will do the same, this will lead to an imbalance. The solution for this would not be so simple: we can obviously just keep the lru_lock taken, but then we have no guarantees that we will be able to acquire the dentry lock (dentry->d_lock). To properly solve this, we need a communication mechanism between the lru and dentry code, so they can coordinate this with each other. Such mechanism already exists in the form of the list_lru_walk_cb callback. So it is possible to construct a dcache-side prune function that does the right thing only by calling list_lru_walk in a loop until no more dentries are available. With only one user, plus the fact that a sane solution for the problem would involve boucing between dcache and list_lru anyway, I see little justification to keep the special case list_lru_dispose_all in tree. Signed-off-by: Glauber Costa Cc: Michal Hocko Acked-by: Dave Chinner Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- fs/dcache.c | 49 ++++++++++++++++++++++++++++-------------------- include/linux/list_lru.h | 17 ----------------- mm/list_lru.c | 42 ----------------------------------------- 3 files changed, 29 insertions(+), 79 deletions(-) (limited to 'fs/dcache.c') diff --git a/fs/dcache.c b/fs/dcache.c index 38a4a03499a2..d74b5bdff7f9 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -956,27 +956,29 @@ long prune_dcache_sb(struct super_block *sb, unsigned long nr_to_scan) return freed; } -/* - * Mark all the dentries as on being the dispose list so we don't think they are - * still on the LRU if we try to kill them from ascending the parent chain in - * try_prune_one_dentry() rather than directly from the dispose list. - */ -static void -shrink_dcache_list( - struct list_head *dispose) +static enum lru_status dentry_lru_isolate_shrink(struct list_head *item, + spinlock_t *lru_lock, void *arg) { - struct dentry *dentry; + struct list_head *freeable = arg; + struct dentry *dentry = container_of(item, struct dentry, d_lru); - rcu_read_lock(); - list_for_each_entry_rcu(dentry, dispose, d_lru) { - spin_lock(&dentry->d_lock); - dentry->d_flags |= DCACHE_SHRINK_LIST; - spin_unlock(&dentry->d_lock); - } - rcu_read_unlock(); - shrink_dentry_list(dispose); + /* + * we are inverting the lru lock/dentry->d_lock here, + * so use a trylock. If we fail to get the lock, just skip + * it + */ + if (!spin_trylock(&dentry->d_lock)) + return LRU_SKIP; + + dentry->d_flags |= DCACHE_SHRINK_LIST; + list_move_tail(&dentry->d_lru, freeable); + this_cpu_dec(nr_dentry_unused); + spin_unlock(&dentry->d_lock); + + return LRU_REMOVED; } + /** * shrink_dcache_sb - shrink dcache for a superblock * @sb: superblock @@ -986,10 +988,17 @@ shrink_dcache_list( */ void shrink_dcache_sb(struct super_block *sb) { - long disposed; + long freed; + + do { + LIST_HEAD(dispose); + + freed = list_lru_walk(&sb->s_dentry_lru, + dentry_lru_isolate_shrink, &dispose, UINT_MAX); - disposed = list_lru_dispose_all(&sb->s_dentry_lru, shrink_dcache_list); - this_cpu_sub(nr_dentry_unused, disposed); + this_cpu_sub(nr_dentry_unused, freed); + shrink_dentry_list(&dispose); + } while (freed > 0); } EXPORT_SYMBOL(shrink_dcache_sb); diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h index 2fe13e1a809a..4d02ad3badab 100644 --- a/include/linux/list_lru.h +++ b/include/linux/list_lru.h @@ -137,21 +137,4 @@ list_lru_walk(struct list_lru *lru, list_lru_walk_cb isolate, } return isolated; } - -typedef void (*list_lru_dispose_cb)(struct list_head *dispose_list); -/** - * list_lru_dispose_all: forceably flush all elements in an @lru - * @lru: the lru pointer - * @dispose: callback function to be called for each lru list. - * - * This function will forceably isolate all elements into the dispose list, and - * call the @dispose callback to flush the list. Please note that the callback - * should expect items in any state, clean or dirty, and be able to flush all of - * them. - * - * Return value: how many objects were freed. It should be equal to all objects - * in the list_lru. - */ -unsigned long -list_lru_dispose_all(struct list_lru *lru, list_lru_dispose_cb dispose); #endif /* _LRU_LIST_H */ diff --git a/mm/list_lru.c b/mm/list_lru.c index 86cb55464f71..f91c24188573 100644 --- a/mm/list_lru.c +++ b/mm/list_lru.c @@ -112,48 +112,6 @@ restart: } EXPORT_SYMBOL_GPL(list_lru_walk_node); -static unsigned long list_lru_dispose_all_node(struct list_lru *lru, int nid, - list_lru_dispose_cb dispose) -{ - struct list_lru_node *nlru = &lru->node[nid]; - LIST_HEAD(dispose_list); - unsigned long disposed = 0; - - spin_lock(&nlru->lock); - while (!list_empty(&nlru->list)) { - list_splice_init(&nlru->list, &dispose_list); - disposed += nlru->nr_items; - nlru->nr_items = 0; - node_clear(nid, lru->active_nodes); - spin_unlock(&nlru->lock); - - dispose(&dispose_list); - - spin_lock(&nlru->lock); - } - spin_unlock(&nlru->lock); - return disposed; -} - -unsigned long list_lru_dispose_all(struct list_lru *lru, - list_lru_dispose_cb dispose) -{ - unsigned long disposed; - unsigned long total = 0; - int nid; - - do { - disposed = 0; - for_each_node_mask(nid, lru->active_nodes) { - disposed += list_lru_dispose_all_node(lru, nid, - dispose); - } - total += disposed; - } while (disposed != 0); - - return total; -} - int list_lru_init(struct list_lru *lru) { int i; -- cgit v1.2.3 From 9b17c62382dd2e7507984b9890bf44e070cdd8bb Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Wed, 28 Aug 2013 10:18:05 +1000 Subject: fs: convert inode and dentry shrinking to be node aware MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Now that the shrinker is passing a node in the scan control structure, we can pass this to the the generic LRU list code to isolate reclaim to the lists on matching nodes. Signed-off-by: Dave Chinner Signed-off-by: Glauber Costa Acked-by: Mel Gorman Cc: "Theodore Ts'o" Cc: Adrian Hunter Cc: Al Viro Cc: Artem Bityutskiy Cc: Arve Hjønnevåg Cc: Carlos Maiolino Cc: Christoph Hellwig Cc: Chuck Lever Cc: Daniel Vetter Cc: David Rientjes Cc: Gleb Natapov Cc: Greg Thelen Cc: J. Bruce Fields Cc: Jan Kara Cc: Jerome Glisse Cc: John Stultz Cc: KAMEZAWA Hiroyuki Cc: Kent Overstreet Cc: Kirill A. Shutemov Cc: Marcelo Tosatti Cc: Mel Gorman Cc: Steven Whitehouse Cc: Thomas Hellstrom Cc: Trond Myklebust Signed-off-by: Andrew Morton Signed-off-by: Al Viro --- fs/dcache.c | 8 +++++--- fs/inode.c | 7 ++++--- fs/internal.h | 6 ++++-- fs/super.c | 23 ++++++++++++++--------- fs/xfs/xfs_super.c | 6 ++++-- include/linux/fs.h | 4 ++-- 6 files changed, 33 insertions(+), 21 deletions(-) (limited to 'fs/dcache.c') diff --git a/fs/dcache.c b/fs/dcache.c index d74b5bdff7f9..c932ed32c77b 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -937,6 +937,7 @@ dentry_lru_isolate(struct list_head *item, spinlock_t *lru_lock, void *arg) * prune_dcache_sb - shrink the dcache * @sb: superblock * @nr_to_scan : number of entries to try to free + * @nid: which node to scan for freeable entities * * Attempt to shrink the superblock dcache LRU by @nr_to_scan entries. This is * done when we need more memory an called from the superblock shrinker @@ -945,13 +946,14 @@ dentry_lru_isolate(struct list_head *item, spinlock_t *lru_lock, void *arg) * This function may fail to free any resources if all the dentries are in * use. */ -long prune_dcache_sb(struct super_block *sb, unsigned long nr_to_scan) +long prune_dcache_sb(struct super_block *sb, unsigned long nr_to_scan, + int nid) { LIST_HEAD(dispose); long freed; - freed = list_lru_walk(&sb->s_dentry_lru, dentry_lru_isolate, - &dispose, nr_to_scan); + freed = list_lru_walk_node(&sb->s_dentry_lru, nid, dentry_lru_isolate, + &dispose, &nr_to_scan); shrink_dentry_list(&dispose); return freed; } diff --git a/fs/inode.c b/fs/inode.c index 5b1ec47c5d39..b33ba8e021cc 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -748,13 +748,14 @@ inode_lru_isolate(struct list_head *item, spinlock_t *lru_lock, void *arg) * to trim from the LRU. Inodes to be freed are moved to a temporary list and * then are freed outside inode_lock by dispose_list(). */ -long prune_icache_sb(struct super_block *sb, unsigned long nr_to_scan) +long prune_icache_sb(struct super_block *sb, unsigned long nr_to_scan, + int nid) { LIST_HEAD(freeable); long freed; - freed = list_lru_walk(&sb->s_inode_lru, inode_lru_isolate, - &freeable, nr_to_scan); + freed = list_lru_walk_node(&sb->s_inode_lru, nid, inode_lru_isolate, + &freeable, &nr_to_scan); dispose_list(&freeable); return freed; } diff --git a/fs/internal.h b/fs/internal.h index cb83a3417a68..513e0d859a6c 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -114,7 +114,8 @@ extern int open_check_o_direct(struct file *f); * inode.c */ extern spinlock_t inode_sb_list_lock; -extern long prune_icache_sb(struct super_block *sb, unsigned long nr_to_scan); +extern long prune_icache_sb(struct super_block *sb, unsigned long nr_to_scan, + int nid); extern void inode_add_lru(struct inode *inode); /* @@ -131,7 +132,8 @@ extern int invalidate_inodes(struct super_block *, bool); */ extern struct dentry *__d_alloc(struct super_block *, const struct qstr *); extern int d_set_mounted(struct dentry *dentry); -extern long prune_dcache_sb(struct super_block *sb, unsigned long nr_to_scan); +extern long prune_dcache_sb(struct super_block *sb, unsigned long nr_to_scan, + int nid); /* * read_write.c diff --git a/fs/super.c b/fs/super.c index cd3c2cd9144d..181d42e2abff 100644 --- a/fs/super.c +++ b/fs/super.c @@ -76,10 +76,10 @@ static unsigned long super_cache_scan(struct shrinker *shrink, return SHRINK_STOP; if (sb->s_op->nr_cached_objects) - fs_objects = sb->s_op->nr_cached_objects(sb); + fs_objects = sb->s_op->nr_cached_objects(sb, sc->nid); - inodes = list_lru_count(&sb->s_inode_lru); - dentries = list_lru_count(&sb->s_dentry_lru); + inodes = list_lru_count_node(&sb->s_inode_lru, sc->nid); + dentries = list_lru_count_node(&sb->s_dentry_lru, sc->nid); total_objects = dentries + inodes + fs_objects + 1; /* proportion the scan between the caches */ @@ -90,13 +90,14 @@ static unsigned long super_cache_scan(struct shrinker *shrink, * prune the dcache first as the icache is pinned by it, then * prune the icache, followed by the filesystem specific caches */ - freed = prune_dcache_sb(sb, dentries); - freed += prune_icache_sb(sb, inodes); + freed = prune_dcache_sb(sb, dentries, sc->nid); + freed += prune_icache_sb(sb, inodes, sc->nid); if (fs_objects) { fs_objects = mult_frac(sc->nr_to_scan, fs_objects, total_objects); - freed += sb->s_op->free_cached_objects(sb, fs_objects); + freed += sb->s_op->free_cached_objects(sb, fs_objects, + sc->nid); } drop_super(sb); @@ -115,10 +116,13 @@ static unsigned long super_cache_count(struct shrinker *shrink, return 0; if (sb->s_op && sb->s_op->nr_cached_objects) - total_objects = sb->s_op->nr_cached_objects(sb); + total_objects = sb->s_op->nr_cached_objects(sb, + sc->nid); - total_objects += list_lru_count(&sb->s_dentry_lru); - total_objects += list_lru_count(&sb->s_inode_lru); + total_objects += list_lru_count_node(&sb->s_dentry_lru, + sc->nid); + total_objects += list_lru_count_node(&sb->s_inode_lru, + sc->nid); total_objects = vfs_pressure_ratio(total_objects); drop_super(sb); @@ -228,6 +232,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags) s->s_shrink.scan_objects = super_cache_scan; s->s_shrink.count_objects = super_cache_count; s->s_shrink.batch = 1024; + s->s_shrink.flags = SHRINKER_NUMA_AWARE; } out: return s; diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 71d7aaebb912..15188cc99449 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1537,7 +1537,8 @@ xfs_fs_mount( static long xfs_fs_nr_cached_objects( - struct super_block *sb) + struct super_block *sb, + int nid) { return xfs_reclaim_inodes_count(XFS_M(sb)); } @@ -1545,7 +1546,8 @@ xfs_fs_nr_cached_objects( static long xfs_fs_free_cached_objects( struct super_block *sb, - long nr_to_scan) + long nr_to_scan, + int nid) { return xfs_reclaim_inodes_nr(XFS_M(sb), nr_to_scan); } diff --git a/include/linux/fs.h b/include/linux/fs.h index 36e45df87f6e..a4acd3c61190 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1624,8 +1624,8 @@ struct super_operations { ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); #endif int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t); - long (*nr_cached_objects)(struct super_block *); - long (*free_cached_objects)(struct super_block *, long); + long (*nr_cached_objects)(struct super_block *, int); + long (*free_cached_objects)(struct super_block *, long, int); }; /* -- cgit v1.2.3