<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/linux/swap.h, branch v7.0-rc7</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v7.0-rc7</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v7.0-rc7'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-01-31T22:22:57+00:00</updated>
<entry>
<title>mm, swap: drop the SWAP_HAS_CACHE flag</title>
<updated>2026-01-31T22:22:57+00:00</updated>
<author>
<name>Kairui Song</name>
<email>kasong@tencent.com</email>
</author>
<published>2025-12-19T19:43:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d3852f9692b8a6af7566f92f7432ee5067c6be15'/>
<id>urn:sha1:d3852f9692b8a6af7566f92f7432ee5067c6be15</id>
<content type='text'>
Now, the swap cache is managed by the swap table.  All swap cache users
are checking the swap table directly to check the swap cache state. 
SWAP_HAS_CACHE is now just a temporary pin before the first increase from
0 to 1 of a slot's swap count (swap_dup_entries) after swap allocation
(folio_alloc_swap), or before the final free of slots pinned by folio in
swap cache (put_swap_folio).

Drop these two usages.  For the first dup, SWAP_HAS_CACHE pinning was hard
to kill because it used to have multiple meanings, more than just "a slot
is cached".  We have just simplified that and defined that the first dup
is always done with folio locked in swap cache (folio_dup_swap), so stop
checking the SWAP_HAS_CACHE bit and just check the swap cache (swap table)
directly, and add a WARN if a swap entry's count is being increased for
the first time while the folio is not in swap cache.

As for freeing, just let the swap cache free all swap entries of a folio
that have a swap count of zero directly upon folio removal.  We have also
just cleaned up batch freeing to check the swap cache usage using the swap
table: a slot with swap cache in the swap table will not be freed until
its cache is gone, and no SWAP_HAS_CACHE bit is involved anymore.  And
besides, the removal of a folio and freeing of the slots are being done in
the same critical section now, which should improve the performance.

After these two changes, SWAP_HAS_CACHE no longer has any users.  Swap
cache synchronization is also done by the swap table directly, so using
SWAP_HAS_CACHE to pin a slot before adding the cache is also no longer
needed.  Remove all related logic and helpers.  swap_map is now only used
for tracking the count, so all swap_map users can just read it directly,
ignoring the swap_count helper, which was previously used to filter out
the SWAP_HAS_CACHE bit.

The idea of dropping SWAP_HAS_CACHE and using the swap table directly was
initially from Chris's idea of merging all the metadata usage of all swaps
into one place.

Link: https://lkml.kernel.org/r/20251220-swap-table-p2-v5-18-8862a265a033@tencent.com
Signed-off-by: Kairui Song &lt;kasong@tencent.com&gt;
Suggested-by: Chris Li &lt;chrisl@kernel.org&gt;
Reviewed-by: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: Nhat Pham &lt;nphamcs@gmail.com&gt;
Cc: Rafael J. Wysocki (Intel) &lt;rafael@kernel.org&gt;
Cc: Yosry Ahmed &lt;yosry.ahmed@linux.dev&gt;
Cc: Deepanshu Kartikey &lt;kartikey406@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kairui Song &lt;ryncsn@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, swap: add folio to swap cache directly on allocation</title>
<updated>2026-01-31T22:22:57+00:00</updated>
<author>
<name>Kairui Song</name>
<email>kasong@tencent.com</email>
</author>
<published>2025-12-19T19:43:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=270f095179ff15b7c72f25dd6720dcab3d15cc9b'/>
<id>urn:sha1:270f095179ff15b7c72f25dd6720dcab3d15cc9b</id>
<content type='text'>
The allocator uses SWAP_HAS_CACHE to pin a swap slot upon allocation. 
SWAP_HAS_CACHE is being deprecated as it caused a lot of confusion.  This
pinning usage here can be dropped by adding the folio to swap cache
directly on allocation.

All swap allocations are folio-based now (except for hibernation), so the
swap allocator can always take the folio as the parameter.  And now both
swap cache (swap table) and swap map are protected by the cluster lock,
scanning the map and inserting the folio can be done in the same critical
section.  This eliminates the time window that a slot is pinned by
SWAP_HAS_CACHE, but it has no cache, and avoids touching the lock multiple
times.

This is both a cleanup and an optimization.

Link: https://lkml.kernel.org/r/20251220-swap-table-p2-v5-15-8862a265a033@tencent.com
Signed-off-by: Kairui Song &lt;kasong@tencent.com&gt;
Reviewed-by: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: Chris Li &lt;chrisl@kernel.org&gt;
Cc: Nhat Pham &lt;nphamcs@gmail.com&gt;
Cc: Rafael J. Wysocki (Intel) &lt;rafael@kernel.org&gt;
Cc: Yosry Ahmed &lt;yosry.ahmed@linux.dev&gt;
Cc: Deepanshu Kartikey &lt;kartikey406@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kairui Song &lt;ryncsn@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, swap: cleanup swap entry management workflow</title>
<updated>2026-01-31T22:22:56+00:00</updated>
<author>
<name>Kairui Song</name>
<email>kasong@tencent.com</email>
</author>
<published>2025-12-19T19:43:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=36976159140bc288c3752a9b799090a49f1a8b62'/>
<id>urn:sha1:36976159140bc288c3752a9b799090a49f1a8b62</id>
<content type='text'>
The current swap entry allocation/freeing workflow has never had a clear
definition.  This makes it hard to debug or add new optimizations.

This commit introduces a proper definition of how swap entries would be
allocated and freed.  Now, most operations are folio based, so they will
never exceed one swap cluster, and we now have a cleaner border between
swap and the rest of mm, making it much easier to follow and debug,
especially with new added sanity checks.  Also making more optimization
possible.

Swap entry will be mostly freed and free with a folio bound.  The folio
lock will be useful for resolving many swap related races.

Now swap allocation (except hibernation) always starts with a folio in the
swap cache, and gets duped/freed protected by the folio lock:

- folio_alloc_swap() - The only allocation entry point now.
  Context: The folio must be locked.
  This allocates one or a set of continuous swap slots for a folio and
  binds them to the folio by adding the folio to the swap cache. The
  swap slots' swap count start with zero value.

- folio_dup_swap() - Increase the swap count of one or more entries.
  Context: The folio must be locked and in the swap cache. For now, the
  caller still has to lock the new swap entry owner (e.g., PTL).
  This increases the ref count of swap entries allocated to a folio.
  Newly allocated swap slots' count has to be increased by this helper
  as the folio got unmapped (and swap entries got installed).

- folio_put_swap() - Decrease the swap count of one or more entries.
  Context: The folio must be locked and in the swap cache. For now, the
  caller still has to lock the new swap entry owner (e.g., PTL).
  This decreases the ref count of swap entries allocated to a folio.
  Typically, swapin will decrease the swap count as the folio got
  installed back and the swap entry got uninstalled

  This won't remove the folio from the swap cache and free the
  slot. Lazy freeing of swap cache is helpful for reducing IO.
  There is already a folio_free_swap() for immediate cache reclaim.
  This part could be further optimized later.

The above locking constraints could be further relaxed when the swap table
is fully implemented.  Currently dup still needs the caller to lock the
swap entry container (e.g.  PTL), or a concurrent zap may underflow the
swap count.

Some swap users need to interact with swap count without involving folio
(e.g.  forking/zapping the page table or mapping truncate without swapin).
In such cases, the caller has to ensure there is no race condition on
whatever owns the swap count and use the below helpers:

- swap_put_entries_direct() - Decrease the swap count directly.
  Context: The caller must lock whatever is referencing the slots to
  avoid a race.

  Typically the page table zapping or shmem mapping truncate will need
  to free swap slots directly. If a slot is cached (has a folio bound),
  this will also try to release the swap cache.

- swap_dup_entry_direct() - Increase the swap count directly.
  Context: The caller must lock whatever is referencing the entries to
  avoid race, and the entries must already have a swap count &gt; 1.

  Typically, forking will need to copy the page table and hence needs to
  increase the swap count of the entries in the table. The page table is
  locked while referencing the swap entries, so the entries all have a
  swap count &gt; 1 and can't be freed.

Hibernation subsystem is a bit different, so two special wrappers are here:

- swap_alloc_hibernation_slot() - Allocate one entry from one device.
- swap_free_hibernation_slot() - Free one entry allocated by the above
  helper.

All hibernation entries are exclusive to the hibernation subsystem and
should not interact with ordinary swap routines.

By separating the workflows, it will be possible to bind folio more
tightly with swap cache and get rid of the SWAP_HAS_CACHE as a temporary
pin.

This commit should not introduce any behavior change

[kasong@tencent.com: fix leak, per Chris Mason.  Remove WARN_ON, per Lai Yi]
  Link: https://lkml.kernel.org/r/CAMgjq7AUz10uETVm8ozDWcB3XohkOqf0i33KGrAquvEVvfp5cg@mail.gmail.com
[ryncsn@gmail.com: fix KSM copy pages for swapoff, per Chris]
  Link: https://lkml.kernel.org/r/aXxkANcET3l2Xu6J@KASONG-MC4
Link: https://lkml.kernel.org/r/20251220-swap-table-p2-v5-14-8862a265a033@tencent.com
Signed-off-by: Kairui Song &lt;kasong@tencent.com&gt;
Signed-off-by: Kairui Song &lt;ryncsn@gmail.com&gt;
Acked-by: Rafael J. Wysocki (Intel) &lt;rafael@kernel.org&gt;
Reviewed-by: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: Chris Li &lt;chrisl@kernel.org&gt;
Cc: Nhat Pham &lt;nphamcs@gmail.com&gt;
Cc: Yosry Ahmed &lt;yosry.ahmed@linux.dev&gt;
Cc: Deepanshu Kartikey &lt;kartikey406@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kairui Song &lt;ryncsn@gmail.com&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Chris Mason &lt;clm@meta.com&gt;
Cc: Lai Yi &lt;yi1.lai@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, swap: use swap cache as the swap in synchronize layer</title>
<updated>2026-01-31T22:22:56+00:00</updated>
<author>
<name>Kairui Song</name>
<email>kasong@tencent.com</email>
</author>
<published>2025-12-19T19:43:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2732acda82c93475c5986e1a5640004a5d4f9c3e'/>
<id>urn:sha1:2732acda82c93475c5986e1a5640004a5d4f9c3e</id>
<content type='text'>
Current swap in synchronization mostly uses the swap_map's SWAP_HAS_CACHE
bit.  Whoever sets the bit first does the actual work to swap in a folio.

This has been causing many issues as it's just a poor implementation of a
bit lock.  Raced users have no idea what is pinning a slot, so it has to
loop with a schedule_timeout_uninterruptible(1), which is ugly and causes
long-tailing or other performance issues.  Besides, the abuse of
SWAP_HAS_CACHE has been causing many other troubles for synchronization or
maintenance.

This is the first step to remove this bit completely.

Now all swap in paths are using the swap cache, and both the swap cache
and swap map are protected by the cluster lock.  So we can just resolve
the swap synchronization with the swap cache layer directly using the
cluster lock and folio lock.  Whoever inserts a folio in the swap cache
first does the swap in work.  And because folios are locked during swap
operations, other raced swap operations will just wait on the folio lock.

The SWAP_HAS_CACHE will be removed in later commit.  For now, we still set
it for some remaining users.  But now we do the bit setting and swap cache
folio adding in the same critical section, after swap cache is ready.  No
one will have to spin on the SWAP_HAS_CACHE bit anymore.

This both simplifies the logic and should improve the performance,
eliminating issues like the one solved in commit 01626a1823024 ("mm: avoid
unconditional one-tick sleep when swapcache_prepare fails"), or the
"skip_if_exists" from commit a65b0e7607ccb ("zswap: make shrinking
memcg-aware"), which will be removed very soon.

[kasong@tencent.com: fix cgroup v1 accounting issue]
 Link: https://lkml.kernel.org/r/CAMgjq7CGUnzOVG7uSaYjzw9wD7w2dSKOHprJfaEp4CcGLgE3iw@mail.gmail.com
Link: https://lkml.kernel.org/r/20251220-swap-table-p2-v5-12-8862a265a033@tencent.com
Signed-off-by: Kairui Song &lt;kasong@tencent.com&gt;
Reviewed-by: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: Chris Li &lt;chrisl@kernel.org&gt;
Cc: Nhat Pham &lt;nphamcs@gmail.com&gt;
Cc: Rafael J. Wysocki (Intel) &lt;rafael@kernel.org&gt;
Cc: Yosry Ahmed &lt;yosry.ahmed@linux.dev&gt;
Cc: Deepanshu Kartikey &lt;kartikey406@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kairui Song &lt;ryncsn@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/shmem, swap: remove SWAP_MAP_SHMEM</title>
<updated>2026-01-31T22:22:55+00:00</updated>
<author>
<name>Nhat Pham</name>
<email>nphamcs@gmail.com</email>
</author>
<published>2025-12-19T19:43:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bc617c990eae4259cd5014d596477cbe0d596417'/>
<id>urn:sha1:bc617c990eae4259cd5014d596477cbe0d596417</id>
<content type='text'>
The SWAP_MAP_SHMEM state was introduced in the commit aaa468653b4a
("swap_info: note SWAP_MAP_SHMEM"), to quickly determine if a swap entry
belongs to shmem during swapoff.

However, swapoff has since been rewritten in the commit b56a2d8af914 ("mm:
rid swapoff of quadratic complexity").  Now having swap count ==
SWAP_MAP_SHMEM value is basically the same as having swap count == 1, and
swap_shmem_alloc() behaves analogously to swap_duplicate().  The only
difference of note is that swap_shmem_alloc() does not check for -ENOMEM
returned from __swap_duplicate(), but it is OK because shmem never
re-duplicates any swap entry it owns.  This will stil be safe if we use
(batched) swap_duplicate() instead.

This commit adds swap_duplicate_nr(), the batched variant of
swap_duplicate(), and removes the SWAP_MAP_SHMEM state and the associated
swap_shmem_alloc() helper to simplify the state machine (both mentally and
in terms of actual code).  We will also have an extra state/special value
that can be repurposed (for swap entries that never gets re-duplicated).

Link: https://lkml.kernel.org/r/20251220-swap-table-p2-v5-8-8862a265a033@tencent.com
Signed-off-by: Kairui Song &lt;kasong@tencent.com&gt;
Signed-off-by: Nhat Pham &lt;nphamcs@gmail.com&gt;
Reviewed-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Tested-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: Chris Li &lt;chrisl@kernel.org&gt;
Cc: Rafael J. Wysocki (Intel) &lt;rafael@kernel.org&gt;
Cc: Yosry Ahmed &lt;yosry.ahmed@linux.dev&gt;
Cc: Deepanshu Kartikey &lt;kartikey406@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kairui Song &lt;ryncsn@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/swap: do not choose swap device according to numa node</title>
<updated>2025-11-17T01:28:27+00:00</updated>
<author>
<name>Baoquan He</name>
<email>bhe@redhat.com</email>
</author>
<published>2025-10-28T03:43:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8e689f8ea45ffdae20350246dd37d124d7092c92'/>
<id>urn:sha1:8e689f8ea45ffdae20350246dd37d124d7092c92</id>
<content type='text'>
Patch series "mm/swapfile.c: select swap devices of default priority round
robin", v5.

Currently, on system with multiple swap devices, swap allocation will
select one swap device according to priority.  The swap device with the
highest priority will be chosen to allocate firstly.

People can specify a priority from 0 to 32767 when swapon a swap device,
or the system will set it from -2 then downwards by default.  Meanwhile,
on NUMA system, the swap device with node_id will be considered first on
that NUMA node of the node_id.

In the current code, an array of plist, swap_avail_heads[nid], is used to
organize swap devices on each NUMA node.  For each NUMA node, there is a
plist organizing all swap devices.  The 'prio' value in the plist is the
negated value of the device's priority due to plist being sorted from low
to high.  The swap device owning one node_id will be promoted to the front
position on that NUMA node, then other swap devices are put in order of
their default priority.

E.g I got a system with 8 NUMA nodes, and I setup 4 zram partition as
swap devices.

Current behaviour:
their priorities will be(note that -1 is skipped):
NAME       TYPE      SIZE USED PRIO
/dev/zram0 partition  16G   0B   -2
/dev/zram1 partition  16G   0B   -3
/dev/zram2 partition  16G   0B   -4
/dev/zram3 partition  16G   0B   -5

And their positions in the 8 swap_avail_lists[nid] will be:
swap_avail_lists[0]: /* node 0's available swap device list */
zram0   -&gt; zram1   -&gt; zram2   -&gt; zram3
prio:1     prio:3     prio:4     prio:5
swap_avali_lists[1]: /* node 1's available swap device list */
zram1   -&gt; zram0   -&gt; zram2   -&gt; zram3
prio:1     prio:2     prio:4     prio:5
swap_avail_lists[2]: /* node 2's available swap device list */
zram2   -&gt; zram0   -&gt; zram1   -&gt; zram3
prio:1     prio:2     prio:3     prio:5
swap_avail_lists[3]: /* node 3's available swap device list */
zram3   -&gt; zram0   -&gt; zram1   -&gt; zram2
prio:1     prio:2     prio:3     prio:4
swap_avail_lists[4-7]: /* node 4,5,6,7's available swap device list */
zram0   -&gt; zram1   -&gt; zram2   -&gt; zram3
prio:2     prio:3     prio:4     prio:5

The adjustment for swap device with node_id intended to decrease the
pressure of lock contention for one swap device by taking different swap
device on different node.  The adjustment was introduced in commit
a2468cc9bfdf ("swap: choose swap device according to numa node"). 
However, the adjustment is a little coarse-grained.  On the node, the swap
device sharing the node's id will always be selected firstly by node's
CPUs until exhausted, then next one.  And on other nodes where no swap
device shares its node id, swap device with priority '-2' will be selected
firstly until exhausted, then next with priority '-3'.

This is the swapon output during the process high pressure vm-scability
test is being taken.  It's clearly showing zram0 is heavily exploited
until exhausted.

===================================
[root@hp-dl385g10-03 ~]# swapon
NAME       TYPE      SIZE  USED PRIO
/dev/zram0 partition  16G 15.7G   -2
/dev/zram1 partition  16G  3.4G   -3
/dev/zram2 partition  16G  3.4G   -4
/dev/zram3 partition  16G  2.6G   -5

The node based strategy on selecting swap device is much better then the
old way one by one selecting swap device.  However it is still
unreasonable because swap devices are assumed to have similar accessing
speed if no priority is specified when swapon.  It's unfair and doesn't
make sense just because one swap device is swapped on firstly, its
priority will be higher than the one swapped on later.

So in this patchset, change is made to select the swap device round robin
if default priority.  In code, the plist array swap_avail_heads[nid] is
replaced with a plist swap_avail_head which reverts commit a2468cc9bfdf. 
Meanwhile, on top of the revert, further change is taken to make any
device w/o specified priority get the same default priority '-1'.  Surely,
swap device with specified priority are always put foremost, this is not
impacted.  If you care about their different accessing speed, then use
'swapon -p xx' to deploy priority for your swap devices.

New behaviour:

swap_avail_list: /* one global available swap device list */
zram0   -&gt; zram1   -&gt; zram2   -&gt; zram3
prio:1     prio:1     prio:1     prio:1

This is the swapon output during the process high pressure vm-scability
being taken, all is selected round robin:
=======================================
[root@hp-dl385g10-03 linux]# swapon
NAME       TYPE      SIZE  USED PRIO
/dev/zram0 partition  16G 12.6G   -1
/dev/zram1 partition  16G 12.6G   -1
/dev/zram2 partition  16G 12.6G   -1
/dev/zram3 partition  16G 12.6G   -1

With the change, we can see about 18% efficiency promotion as below:

vm-scability test:
==================
Test with:
usemem --init-time -O -y -x -n 31 2G (4G memcg, zram as swap)
                           Before:          After:
System time:               637.92 s         526.74 s      (lower is better)
Sum Throughput:            3546.56 MB/s     4207.56 MB/s  (higher is better)
Single process Throughput: 114.40 MB/s      135.72 MB/s   (higher is better)
free latency:              10138455.99 us   6810119.01 us (low is better)


This patch (of 2):

This reverts commit a2468cc9bfdf ("swap: choose swap device according to
numa node").

After this patch, the behaviour will change back to pre-commit
a2468cc9bfdf.  Means the priority will be set from -1 then downwards by
default, and when swapping, it will exhault swap device one by one
according to priority from high to low.  This is preparation work for
later change.

[root@hp-dl385g10-03 ~]# swapon
NAME       TYPE      SIZE   USED PRIO
/dev/zram0 partition  16G    16G   -1
/dev/zram1 partition  16G 966.2M   -2
/dev/zram2 partition  16G     0B   -3
/dev/zram3 partition  16G     0B   -4

Link: https://lkml.kernel.org/r/20251028034308.929550-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20251028034308.929550-2-bhe@redhat.com
Signed-off-by: Baoquan He &lt;bhe@redhat.com&gt;
Suggested-by: Chris Li &lt;chrisl@kernel.org&gt;
Acked-by: Chris Li &lt;chrisl@kernel.org&gt;
Acked-by: Nhat Pham &lt;nphamcs@gmail.com&gt;
Reviewed-by: Kairui Song &lt;kasong@tencent.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: Kemeng Shi &lt;shikemeng@huaweicloud.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, swap: cleanup swap entry allocation parameter</title>
<updated>2025-11-17T01:28:20+00:00</updated>
<author>
<name>Kairui Song</name>
<email>kasong@tencent.com</email>
</author>
<published>2025-10-23T18:00:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a983471cfc454afeba23526ee5d17fd8cdc7876f'/>
<id>urn:sha1:a983471cfc454afeba23526ee5d17fd8cdc7876f</id>
<content type='text'>
We no longer need this GFP parameter after commit 8578e0c00dcf ("mm, swap:
use the swap table for the swap cache and switch API").  Before that
commit the GFP parameter is already almost identical for all callers, so
nothing changed by that commit.  Swap table just moved the GFP to lower
layer and make it more defined and changes depend on atomic or sleep
allocation.

Now this parameter is no longer used, just remove it.  No behavior change.

Link: https://lkml.kernel.org/r/20251024-swap-clean-after-swap-table-p1-v2-3-a709469052e7@tencent.com
Signed-off-by: Kairui Song &lt;kasong@tencent.com&gt;
Acked-by: Chris Li &lt;chrisl@kernel.org&gt;
Acked-by: Nhat Pham &lt;nphamcs@gmail.com&gt;
Reviewed-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: "Huang, Ying" &lt;ying.huang@linux.alibaba.com&gt;
Cc: Kemeng Shi &lt;shikemeng@huaweicloud.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, swap: use the swap table for the swap cache and switch API</title>
<updated>2025-09-21T21:22:24+00:00</updated>
<author>
<name>Kairui Song</name>
<email>kasong@tencent.com</email>
</author>
<published>2025-09-16T16:00:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8578e0c00dcf0c58fbc32d4904ecaf8e802a6590'/>
<id>urn:sha1:8578e0c00dcf0c58fbc32d4904ecaf8e802a6590</id>
<content type='text'>
Introduce basic swap table infrastructures, which are now just a
fixed-sized flat array inside each swap cluster, with access wrappers.

Each cluster contains a swap table of 512 entries.  Each table entry is an
opaque atomic long.  It could be in 3 types: a shadow type (XA_VALUE), a
folio type (pointer), or NULL.

In this first step, it only supports storing a folio or shadow, and it is
a drop-in replacement for the current swap cache.  Convert all swap cache
users to use the new sets of APIs.  Chris Li has been suggesting using a
new infrastructure for swap cache for better performance, and that idea
combined well with the swap table as the new backing structure.  Now the
lock contention range is reduced to 2M clusters, which is much smaller
than the 64M address_space.  And we can also drop the multiple
address_space design.

All the internal works are done with swap_cache_get_* helpers.  Swap cache
lookup is still lock-less like before, and the helper's contexts are same
with original swap cache helpers.  They still require a pin on the swap
device to prevent the backing data from being freed.

Swap cache updates are now protected by the swap cluster lock instead of
the XArray lock.  This is mostly handled internally, but new
__swap_cache_* helpers require the caller to lock the cluster.  So, a few
new cluster access and locking helpers are also introduced.

A fully cluster-based unified swap table can be implemented on top of this
to take care of all count tracking and synchronization work, with dynamic
allocation.  It should reduce the memory usage while making the
performance even better.

Link: https://lkml.kernel.org/r/20250916160100.31545-12-ryncsn@gmail.com
Co-developed-by: Chris Li &lt;chrisl@kernel.org&gt;
Signed-off-by: Chris Li &lt;chrisl@kernel.org&gt;
Signed-off-by: Kairui Song &lt;kasong@tencent.com&gt;
Acked-by: Chris Li &lt;chrisl@kernel.org&gt;
Suggested-by: Chris Li &lt;chrisl@kernel.org&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: "Huang, Ying" &lt;ying.huang@linux.alibaba.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kemeng Shi &lt;shikemeng@huaweicloud.com&gt;
Cc: kernel test robot &lt;oliver.sang@intel.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Nhat Pham &lt;nphamcs@gmail.com&gt;
Cc: Yosry Ahmed &lt;yosryahmed@google.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Cc: SeongJae Park &lt;sj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, swap: tidy up swap device and cluster info helpers</title>
<updated>2025-09-21T21:22:23+00:00</updated>
<author>
<name>Kairui Song</name>
<email>kasong@tencent.com</email>
</author>
<published>2025-09-16T16:00:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0fcf8ef4fdab8e5c91d1bce39c7fe6565974ffad'/>
<id>urn:sha1:0fcf8ef4fdab8e5c91d1bce39c7fe6565974ffad</id>
<content type='text'>
swp_swap_info is the most commonly used helper for retrieving swap info. 
It has an internal check that may lead to a NULL return value, but almost
none of its caller checks the return value, making the internal check
pointless.  In fact, most of these callers already ensured the entry is
valid and never expect a NULL value.

Tidy this up and improve the function names.  If the caller can make sure
the swap entry/type is valid and the device is pinned, use the new
introduced __swap_entry_to_info/__swap_type_to_info instead.  They have
more debug sanity checks and lower overhead as they are inlined.

Callers that may expect a NULL value should use
swap_entry_to_info/swap_type_to_info instead.

No feature change.  The rearranged codes should have had no effect, or
they should have been hitting NULL de-ref bugs already.  Only some new
sanity checks are added so potential issues may show up in debug build.

The new helpers will be frequently used with swap table later when working
with swap cache folios.  A locked swap cache folio ensures the entries are
valid and stable so these helpers are very helpful.

Link: https://lkml.kernel.org/r/20250916160100.31545-8-ryncsn@gmail.com
Signed-off-by: Kairui Song &lt;kasong@tencent.com&gt;
Acked-by: Chris Li &lt;chrisl@kernel.org&gt;
Reviewed-by: Barry Song &lt;baohua@kernel.org&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Suggested-by: Chris Li &lt;chrisl@kernel.org&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: "Huang, Ying" &lt;ying.huang@linux.alibaba.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kemeng Shi &lt;shikemeng@huaweicloud.com&gt;
Cc: kernel test robot &lt;oliver.sang@intel.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Nhat Pham &lt;nphamcs@gmail.com&gt;
Cc: Yosry Ahmed &lt;yosryahmed@google.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Cc: SeongJae Park &lt;sj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, swap: rename and move some swap cluster definition and helpers</title>
<updated>2025-09-21T21:22:23+00:00</updated>
<author>
<name>Kairui Song</name>
<email>kasong@tencent.com</email>
</author>
<published>2025-09-16T16:00:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4522aed4fffbbd18ab3581d733d0572d45780d07'/>
<id>urn:sha1:4522aed4fffbbd18ab3581d733d0572d45780d07</id>
<content type='text'>
No feature change, move cluster related definitions and helpers to
mm/swap.h, also tidy up and add a "swap_" prefix for cluster lock/unlock
helpers, so they can be used outside of swap files.  And while at it, add
kerneldoc.

Link: https://lkml.kernel.org/r/20250916160100.31545-7-ryncsn@gmail.com
Signed-off-by: Kairui Song &lt;kasong@tencent.com&gt;
Reviewed-by: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Reviewed-by: Barry Song &lt;baohua@kernel.org&gt;
Acked-by: Chris Li &lt;chrisl@kernel.org&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Suggested-by: Chris Li &lt;chrisl@kernel.org&gt;
Acked-by: Nhat Pham &lt;nphamcs@gmail.com&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: "Huang, Ying" &lt;ying.huang@linux.alibaba.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kemeng Shi &lt;shikemeng@huaweicloud.com&gt;
Cc: kernel test robot &lt;oliver.sang@intel.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Yosry Ahmed &lt;yosryahmed@google.com&gt;
Cc: Zi Yan &lt;ziy@nvidia.com&gt;
Cc: SeongJae Park &lt;sj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
</feed>
