diff options
author | SeongJae Park <sjpark@amazon.de> | 2020-12-11 14:24:05 +0300 |
---|---|---|
committer | Jakub Kicinski <kuba@kernel.org> | 2020-12-13 02:08:54 +0300 |
commit | 0b9b241406818a871c6d25390aa487dba966d548 (patch) | |
tree | 8d1b4990106eed407ef7704833fe892584c37221 /include/net | |
parent | e0a64d1dffca048a99546993322bd1fb5c728ee8 (diff) | |
download | linux-0b9b241406818a871c6d25390aa487dba966d548.tar.xz |
inet: frags: batch fqdir destroy works
On a few of our systems, I found frequent 'unshare(CLONE_NEWNET)' calls
make the number of active slab objects including 'sock_inode_cache' type
rapidly and continuously increase. As a result, memory pressure occurs.
In more detail, I made an artificial reproducer that resembles the
workload that we found the problem and reproduce the problem faster. It
merely repeats 'unshare(CLONE_NEWNET)' 50,000 times in a loop. It takes
about 2 minutes. On 40 CPU cores / 70GB DRAM machine, the available
memory continuously reduced in a fast speed (about 120MB per second,
15GB in total within the 2 minutes). Note that the issue don't
reproduce on every machine. On my 6 CPU cores machine, the problem
didn't reproduce.
'cleanup_net()' and 'fqdir_work_fn()' are functions that deallocate the
relevant memory objects. They are asynchronously invoked by the work
queues and internally use 'rcu_barrier()' to ensure safe destructions.
'cleanup_net()' works in a batched maneer in a single thread worker,
while 'fqdir_work_fn()' works for each 'fqdir_exit()' call in the
'system_wq'. Therefore, 'fqdir_work_fn()' called frequently under the
workload and made the contention for 'rcu_barrier()' high. In more
detail, the global mutex, 'rcu_state.barrier_mutex' became the
bottleneck.
This commit avoids such contention by doing the 'rcu_barrier()' and
subsequent lightweight works in a batched manner, as similar to that of
'cleanup_net()'. The fqdir hashtable destruction, which is done before
the 'rcu_barrier()', is still allowed to run in parallel for fast
processing, but this commit makes it to use a dedicated work queue
instead of the 'system_wq', to make sure that the number of threads is
bounded.
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20201211112405.31158-1-sjpark@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Diffstat (limited to 'include/net')
-rw-r--r-- | include/net/inet_frag.h | 1 |
1 files changed, 1 insertions, 0 deletions
diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h index bac79e817776..48cc5795ceda 100644 --- a/include/net/inet_frag.h +++ b/include/net/inet_frag.h @@ -21,6 +21,7 @@ struct fqdir { /* Keep atomic mem on separate cachelines in structs that include it */ atomic_long_t mem ____cacheline_aligned_in_smp; struct work_struct destroy_work; + struct llist_node free_list; }; /** |