diff options
| author | Kairui Song <kasong@tencent.com> | 2026-05-17 18:39:40 +0300 |
|---|---|---|
| committer | Andrew Morton <akpm@linux-foundation.org> | 2026-06-03 01:22:20 +0300 |
| commit | a2e61ffb47493ff009b24105792318b3b62e18e2 (patch) | |
| tree | 3330f88f8e044670337f831ba37812c5c4f268fc /include/linux | |
| parent | 63b02a9409cb5180398491b093e48bcb5315f5fb (diff) | |
| download | linux-a2e61ffb47493ff009b24105792318b3b62e18e2.tar.xz | |
mm, swap: simplify swap cache allocation helper
Patch series "mm, swap: swap table phase IV: unify allocation", v5.
This series unifies the allocation and charging of anon and shmem swap in
folios, provides better synchronization, consolidates the metadata
management, hence dropping the static array and map, and improves the
performance. The static metadata overhead is now close to zero, and
workload performance is slightly improved.
For example, mounting a 1TB swap device saves about 512MB of memory:
Before:
free -m
total used free shared buff/cache available
Mem: 1464 805 346 1 382 658
Swap: 1048575 0 1048575
After:
free -m
total used free shared buff/cache available
Mem: 1464 277 899 1 356 1187
Swap: 1048575 0 1048575
Memory usage is ~512M lower, and we now have a close to 0 static overhead.
It was about 2 bytes per slot before, now roughly 0.09375 bytes per slot
(48 bytes ci info per cluster, which is 512 slots).
Performance test is also looking good, testing Redis in a 2G VM using 6G
ZRAM as swap:
valkey-server --maxmemory 2560M
redis-benchmark -r 3000000 -n 3000000 -d 1024 -c 12 -P 32 -t get
Before: 3385017.283654 RPS
After: 3433309.307292 RPS (1.42% better)
Testing with build kernel under global pressure on a 48c96t system,
limiting the total memory to 8G, using 12G ZRAM, 24 test runs, enabling
THP:
make -j96, using defconfig
Before: user time 2904.59s system time 4773.99s
After: user time 2909.38s system time 4641.55s (2.77% better)
Testing with usemem on a 32c machine using 48G brd ramdisk and 16G RAM, 12
test run:
usemem --init-time -O -y -x -n 48 1G
Before: Throughput (Sum): 6482.58 MB/s Free Latency: 371371.67us
After: Throughput (Sum): 6539.28 MB/s Free Latency: 363059.88us
Seems similar, or slightly better.
This series also reduces memory thrashing, I no longer see any: "Huh
VM_FAULT_OOM leaked out to the #PF handler. Retrying PF", it was shown
several times during stress testing before this series when under great
pressure:
Before: grep -Ri VM_FAULT_OOM <test logs> | wc -l => 18
After: grep -Ri VM_FAULT_OOM <test logs> | wc -l => 0
This patch (of 12):
Instead of trying to return the existing folio if the entry is already
cached in swap_cache_alloc_folio, simply return an error pointer if the
allocation failed, and drop the output argument that indicates what kind
of folio is actually returned.
And a proper wrapper swap_cache_read_folio that decouples and handles the
actual requirement - read in the folio, or return the already read folio
in cache. This is what async swapin and readahead actually required.
As for zswap swap out, the caller just needs to abort if the allocation
fails because the entry is gone or already cached, so removing simplifies
the return argument, making it cleaner.
No feature change.
Link: https://lore.kernel.org/20260517-swap-table-p4-v5-0-88ae43e064c7@tencent.com
Link: https://lore.kernel.org/20260517-swap-table-p4-v5-1-88ae43e064c7@tencent.com
Signed-off-by: Kairui Song <kasong@tencent.com>
Acked-by: Chris Li <chrisl@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: David Hildenbrand <david@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Youngjun Park <youngjun.park@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'include/linux')
0 files changed, 0 insertions, 0 deletions
