diff options
| author | Johannes Weiner <hannes@cmpxchg.org> | 2026-05-27 23:45:16 +0300 |
|---|---|---|
| committer | Andrew Morton <akpm@linux-foundation.org> | 2026-06-09 04:21:25 +0300 |
| commit | fafaeceb89a5e2e856ff04c2cacb6cae4a2ecb67 (patch) | |
| tree | 0920f260bf1dfb4b8551e321e7d9cdcc6e73947b /include/linux/workqueue_api.h | |
| parent | 65180e9663c782e45ed1c76276dc64d96615da9d (diff) | |
| download | linux-fafaeceb89a5e2e856ff04c2cacb6cae4a2ecb67.tar.xz | |
mm: switch deferred split shrinker to list_lru
The deferred split queue handles cgroups in a suboptimal fashion. The
queue is per-NUMA node or per-cgroup, not the intersection. That means on
a cgrouped system, a node-restricted allocation entering reclaim can end
up splitting large pages on other nodes:
alloc/unmap
deferred_split_folio()
list_add_tail(memcg->split_queue)
set_shrinker_bit(memcg, node, deferred_shrinker_id)
for_each_zone_zonelist_nodemask(restricted_nodes)
mem_cgroup_iter()
shrink_slab(node, memcg)
shrink_slab_memcg(node, memcg)
if test_shrinker_bit(memcg, node, deferred_shrinker_id)
deferred_split_scan()
walks memcg->split_queue
The shrinker bit adds an imperfect guard rail. As soon as the cgroup has
a single large page on the node of interest, all large pages owned by that
memcg, including those on other nodes, will be split.
list_lru properly sets up per-node, per-cgroup lists. As a bonus, it
streamlines a lot of the list operations and reclaim walks. It's used
widely by other major shrinkers already. Convert the deferred split queue
as well.
The list_lru per-memcg heads are instantiated on demand when the first
object of interest is allocated for a cgroup, by calling
folio_memcg_alloc_deferred(). Add calls to where splittable pages are
created: anon faults, swapin faults, khugepaged collapse.
These calls create all possible node heads for the cgroup at once, so the
migration code (between nodes) doesn't need any special care.
[akpm@linux-foundation.org: fix build with CONFIG_TRANSPARENT_HUGEPAGE=n]
Link: https://lore.kernel.org/202605281620.lc3rtkBm-lkp@intel.com
[hannes@cmpxchg.org: fix cgroup.memory=nokmem handling]
Link: https://lore.kernel.org/ah9PGv12mqai84ES@cmpxchg.org
Link: https://lore.kernel.org/20260527204757.2544958-10-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Tested-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Usama Arif <usama.arif@linux.dev>
Reviewed-by: Kairui Song <kasong@tencent.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: David Hildenbrand (Arm) <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nico Pache <npache@redhat.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'include/linux/workqueue_api.h')
0 files changed, 0 insertions, 0 deletions
