<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/mm/swap.h, branch linux-7.1.y</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=linux-7.1.y</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=linux-7.1.y'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-04-05T20:52:59+00:00</updated>
<entry>
<title>mm, swap: no need to clear the shadow explicitly</title>
<updated>2026-04-05T20:52:59+00:00</updated>
<author>
<name>Kairui Song</name>
<email>kasong@tencent.com</email>
</author>
<published>2026-02-17T20:06:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1df1a1b950863e64c00d48df718ed7ed28db3ea3'/>
<id>urn:sha1:1df1a1b950863e64c00d48df718ed7ed28db3ea3</id>
<content type='text'>
Since we no longer bypass the swap cache, every swap-in will clear the
swap shadow by inserting the folio into the swap table.  The only place we
may seem to need to free the swap shadow is when the swap slots are freed
directly without a folio (swap_put_entries_direct).  But with the swap
table, that is not needed either.  Freeing a slot in the swap table will
set the table entry to NULL, which erases the shadow just fine.

So just delete all explicit shadow clearing, it's no longer needed.  Also,
rearrange the freeing.

Link: https://lkml.kernel.org/r/20260218-swap-table-p3-v3-12-f4e34be021a7@tencent.com
Signed-off-by: Kairui Song &lt;kasong@tencent.com&gt;
Acked-by: Chris Li &lt;chrisl@kernel.org&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: David Hildenbrand &lt;david@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kairui Song &lt;ryncsn@gmail.com&gt;
Cc: Kemeng Shi &lt;shikemeng@huaweicloud.com&gt;
Cc: kernel test robot &lt;lkp@intel.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Nhat Pham &lt;nphamcs@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, swap: simplify checking if a folio is swapped</title>
<updated>2026-04-05T20:52:59+00:00</updated>
<author>
<name>Kairui Song</name>
<email>kasong@tencent.com</email>
</author>
<published>2026-02-17T20:06:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a0f79916e125f75cf665f5b3ff6ccc1ff60b1a10'/>
<id>urn:sha1:a0f79916e125f75cf665f5b3ff6ccc1ff60b1a10</id>
<content type='text'>
Clean up and simplify how we check if a folio is swapped.  The helper
already requires the folio to be in swap cache and locked.  That's enough
to pin the swap cluster from being freed, so there is no need to lock
anything else to avoid UAF.

And besides, we have cleaned up and defined the swap operation to be
mostly folio based, and now the only place a folio will have any of its
swap slots' count increased from 0 to 1 is folio_dup_swap, which also
requires the folio lock.  So as we are holding the folio lock here, a
folio can't change its swap status from not swapped (all swap slots have a
count of 0) to swapped (any slot has a swap count larger than 0).

So there won't be any false negatives of this helper if we simply depend
on the folio lock to stabilize the cluster.

We are only using this helper to determine if we can and should release
the swap cache.  So false positives are completely harmless, and also
already exist before.  Depending on the timing, previously, it's also
possible that a racing thread releases the swap count right after
releasing the ci lock and before this helper returns.  In any case, the
worst that could happen is we leave a clean swap cache.  It will still be
reclaimed when under pressure just fine.

So, in conclusion, we can simplify and make the check much simpler and
lockless.  Also, rename it to folio_maybe_swapped to reflect the design.

Link: https://lkml.kernel.org/r/20260218-swap-table-p3-v3-11-f4e34be021a7@tencent.com
Signed-off-by: Kairui Song &lt;kasong@tencent.com&gt;
Acked-by: Chris Li &lt;chrisl@kernel.org&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: David Hildenbrand &lt;david@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kairui Song &lt;ryncsn@gmail.com&gt;
Cc: Kemeng Shi &lt;shikemeng@huaweicloud.com&gt;
Cc: kernel test robot &lt;lkp@intel.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Nhat Pham &lt;nphamcs@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, swap: no need to truncate the scan border</title>
<updated>2026-04-05T20:52:59+00:00</updated>
<author>
<name>Kairui Song</name>
<email>kasong@tencent.com</email>
</author>
<published>2026-02-17T20:06:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=45711d446b743da099b7a795ce91ca581d5981a3'/>
<id>urn:sha1:45711d446b743da099b7a795ce91ca581d5981a3</id>
<content type='text'>
swap_map had a static flexible size, so the last cluster won't be fully
covered, hence the allocator needs to check the scan border to avoid OOB. 
But the swap table has a fixed-sized swap table for each cluster, and the
slots beyond the device size are marked as bad slots.  The allocator can
simply scan all slots as usual, and any bad slots will be skipped.

Link: https://lkml.kernel.org/r/20260218-swap-table-p3-v3-10-f4e34be021a7@tencent.com
Signed-off-by: Kairui Song &lt;kasong@tencent.com&gt;
Acked-by: Chris Li &lt;chrisl@kernel.org&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: David Hildenbrand &lt;david@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kairui Song &lt;ryncsn@gmail.com&gt;
Cc: Kemeng Shi &lt;shikemeng@huaweicloud.com&gt;
Cc: kernel test robot &lt;lkp@intel.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Nhat Pham &lt;nphamcs@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, swap: use the swap table to track the swap count</title>
<updated>2026-04-05T20:52:59+00:00</updated>
<author>
<name>Kairui Song</name>
<email>kasong@tencent.com</email>
</author>
<published>2026-02-17T20:06:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0d6af9bcf383bcdf601e670bb605861b01e318e7'/>
<id>urn:sha1:0d6af9bcf383bcdf601e670bb605861b01e318e7</id>
<content type='text'>
Now all the infrastructures are ready, switch to using the swap table
only.  This is unfortunately a large patch because the whole old counting
mechanism, especially SWP_CONTINUED, has to be gone and switch to the new
mechanism together, with no intermediate steps available.

The swap table is capable of holding up to SWP_TB_COUNT_MAX - 1 counts in
the higher bits of each table entry, so using that, the swap_map can be
completely dropped.

swap_map also had a limit of SWAP_CONT_MAX.  Any value beyond that limit
will require a COUNT_CONTINUED page.  COUNT_CONTINUED is a bit complex to
maintain, so for the swap table, a simpler approach is used: when the
count goes beyond SWP_TB_COUNT_MAX - 1, the cluster will have an
extend_table allocated, which is a swap cluster-sized array of unsigned
int.  The counting is basically offloaded there until the count drops
below SWP_TB_COUNT_MAX again.

Both the swap table and the extend table are cluster-based, so they
exhibit good performance and sparsity.

To make the switch from swap_map to swap table clean, this commit cleans
up and introduces a new set of functions based on the swap table design,
for manipulating swap counts:

- __swap_cluster_dup_entry, __swap_cluster_put_entry,
  __swap_cluster_alloc_entry, __swap_cluster_free_entry:

  Increase/decrease the count of a swap slot, or alloc / free a swap
  slot. This is the internal routine that does the counting work based
  on the swap table and handles all the complexities. The caller will
  need to lock the cluster before calling them.

  All swap count-related update operations are wrapped by these four
  helpers.

- swap_dup_entries_cluster, swap_put_entries_cluster:

  Increase/decrease the swap count of one or a set of swap slots in the
  same cluster range. These two helpers serve as the common routines for
  folio_dup_swap &amp; swap_dup_entry_direct, or
  folio_put_swap &amp; swap_put_entries_direct.

And use these helpers to replace all existing callers. This helps to
simplify the count tracking by a lot, and the swap_map is gone.

[ryncsn@gmail.com: fix build]
  Link: https://lkml.kernel.org/r/aZWuLZi-vYi3vAWe@KASONG-MC4
Link: https://lkml.kernel.org/r/20260218-swap-table-p3-v3-9-f4e34be021a7@tencent.com
Signed-off-by: Kairui Song &lt;kasong@tencent.com&gt;
Suggested-by: Chris Li &lt;chrisl@kernel.org&gt;
Acked-by: Chris Li &lt;chrisl@kernel.org&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: David Hildenbrand &lt;david@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kairui Song &lt;ryncsn@gmail.com&gt;
Cc: Kemeng Shi &lt;shikemeng@huaweicloud.com&gt;
Cc: kernel test robot &lt;lkp@intel.com&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Nhat Pham &lt;nphamcs@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, swap: drop the SWAP_HAS_CACHE flag</title>
<updated>2026-01-31T22:22:57+00:00</updated>
<author>
<name>Kairui Song</name>
<email>kasong@tencent.com</email>
</author>
<published>2025-12-19T19:43:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d3852f9692b8a6af7566f92f7432ee5067c6be15'/>
<id>urn:sha1:d3852f9692b8a6af7566f92f7432ee5067c6be15</id>
<content type='text'>
Now, the swap cache is managed by the swap table.  All swap cache users
are checking the swap table directly to check the swap cache state. 
SWAP_HAS_CACHE is now just a temporary pin before the first increase from
0 to 1 of a slot's swap count (swap_dup_entries) after swap allocation
(folio_alloc_swap), or before the final free of slots pinned by folio in
swap cache (put_swap_folio).

Drop these two usages.  For the first dup, SWAP_HAS_CACHE pinning was hard
to kill because it used to have multiple meanings, more than just "a slot
is cached".  We have just simplified that and defined that the first dup
is always done with folio locked in swap cache (folio_dup_swap), so stop
checking the SWAP_HAS_CACHE bit and just check the swap cache (swap table)
directly, and add a WARN if a swap entry's count is being increased for
the first time while the folio is not in swap cache.

As for freeing, just let the swap cache free all swap entries of a folio
that have a swap count of zero directly upon folio removal.  We have also
just cleaned up batch freeing to check the swap cache usage using the swap
table: a slot with swap cache in the swap table will not be freed until
its cache is gone, and no SWAP_HAS_CACHE bit is involved anymore.  And
besides, the removal of a folio and freeing of the slots are being done in
the same critical section now, which should improve the performance.

After these two changes, SWAP_HAS_CACHE no longer has any users.  Swap
cache synchronization is also done by the swap table directly, so using
SWAP_HAS_CACHE to pin a slot before adding the cache is also no longer
needed.  Remove all related logic and helpers.  swap_map is now only used
for tracking the count, so all swap_map users can just read it directly,
ignoring the swap_count helper, which was previously used to filter out
the SWAP_HAS_CACHE bit.

The idea of dropping SWAP_HAS_CACHE and using the swap table directly was
initially from Chris's idea of merging all the metadata usage of all swaps
into one place.

Link: https://lkml.kernel.org/r/20251220-swap-table-p2-v5-18-8862a265a033@tencent.com
Signed-off-by: Kairui Song &lt;kasong@tencent.com&gt;
Suggested-by: Chris Li &lt;chrisl@kernel.org&gt;
Reviewed-by: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: Nhat Pham &lt;nphamcs@gmail.com&gt;
Cc: Rafael J. Wysocki (Intel) &lt;rafael@kernel.org&gt;
Cc: Yosry Ahmed &lt;yosry.ahmed@linux.dev&gt;
Cc: Deepanshu Kartikey &lt;kartikey406@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kairui Song &lt;ryncsn@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, swap: check swap table directly for checking cache</title>
<updated>2026-01-31T22:22:57+00:00</updated>
<author>
<name>Kairui Song</name>
<email>kasong@tencent.com</email>
</author>
<published>2025-12-19T19:43:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4984d746c80e888a89342d03e2b1ef20f804dff0'/>
<id>urn:sha1:4984d746c80e888a89342d03e2b1ef20f804dff0</id>
<content type='text'>
Instead of looking at the swap map, check swap table directly to tell if a
swap slot is cached.  Prepares for the removal of SWAP_HAS_CACHE.

Link: https://lkml.kernel.org/r/20251220-swap-table-p2-v5-16-8862a265a033@tencent.com
Signed-off-by: Kairui Song &lt;kasong@tencent.com&gt;
Reviewed-by: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: Chris Li &lt;chrisl@kernel.org&gt;
Cc: Nhat Pham &lt;nphamcs@gmail.com&gt;
Cc: Rafael J. Wysocki (Intel) &lt;rafael@kernel.org&gt;
Cc: Yosry Ahmed &lt;yosry.ahmed@linux.dev&gt;
Cc: Deepanshu Kartikey &lt;kartikey406@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kairui Song &lt;ryncsn@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, swap: add folio to swap cache directly on allocation</title>
<updated>2026-01-31T22:22:57+00:00</updated>
<author>
<name>Kairui Song</name>
<email>kasong@tencent.com</email>
</author>
<published>2025-12-19T19:43:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=270f095179ff15b7c72f25dd6720dcab3d15cc9b'/>
<id>urn:sha1:270f095179ff15b7c72f25dd6720dcab3d15cc9b</id>
<content type='text'>
The allocator uses SWAP_HAS_CACHE to pin a swap slot upon allocation. 
SWAP_HAS_CACHE is being deprecated as it caused a lot of confusion.  This
pinning usage here can be dropped by adding the folio to swap cache
directly on allocation.

All swap allocations are folio-based now (except for hibernation), so the
swap allocator can always take the folio as the parameter.  And now both
swap cache (swap table) and swap map are protected by the cluster lock,
scanning the map and inserting the folio can be done in the same critical
section.  This eliminates the time window that a slot is pinned by
SWAP_HAS_CACHE, but it has no cache, and avoids touching the lock multiple
times.

This is both a cleanup and an optimization.

Link: https://lkml.kernel.org/r/20251220-swap-table-p2-v5-15-8862a265a033@tencent.com
Signed-off-by: Kairui Song &lt;kasong@tencent.com&gt;
Reviewed-by: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: Chris Li &lt;chrisl@kernel.org&gt;
Cc: Nhat Pham &lt;nphamcs@gmail.com&gt;
Cc: Rafael J. Wysocki (Intel) &lt;rafael@kernel.org&gt;
Cc: Yosry Ahmed &lt;yosry.ahmed@linux.dev&gt;
Cc: Deepanshu Kartikey &lt;kartikey406@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kairui Song &lt;ryncsn@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, swap: cleanup swap entry management workflow</title>
<updated>2026-01-31T22:22:56+00:00</updated>
<author>
<name>Kairui Song</name>
<email>kasong@tencent.com</email>
</author>
<published>2025-12-19T19:43:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=36976159140bc288c3752a9b799090a49f1a8b62'/>
<id>urn:sha1:36976159140bc288c3752a9b799090a49f1a8b62</id>
<content type='text'>
The current swap entry allocation/freeing workflow has never had a clear
definition.  This makes it hard to debug or add new optimizations.

This commit introduces a proper definition of how swap entries would be
allocated and freed.  Now, most operations are folio based, so they will
never exceed one swap cluster, and we now have a cleaner border between
swap and the rest of mm, making it much easier to follow and debug,
especially with new added sanity checks.  Also making more optimization
possible.

Swap entry will be mostly freed and free with a folio bound.  The folio
lock will be useful for resolving many swap related races.

Now swap allocation (except hibernation) always starts with a folio in the
swap cache, and gets duped/freed protected by the folio lock:

- folio_alloc_swap() - The only allocation entry point now.
  Context: The folio must be locked.
  This allocates one or a set of continuous swap slots for a folio and
  binds them to the folio by adding the folio to the swap cache. The
  swap slots' swap count start with zero value.

- folio_dup_swap() - Increase the swap count of one or more entries.
  Context: The folio must be locked and in the swap cache. For now, the
  caller still has to lock the new swap entry owner (e.g., PTL).
  This increases the ref count of swap entries allocated to a folio.
  Newly allocated swap slots' count has to be increased by this helper
  as the folio got unmapped (and swap entries got installed).

- folio_put_swap() - Decrease the swap count of one or more entries.
  Context: The folio must be locked and in the swap cache. For now, the
  caller still has to lock the new swap entry owner (e.g., PTL).
  This decreases the ref count of swap entries allocated to a folio.
  Typically, swapin will decrease the swap count as the folio got
  installed back and the swap entry got uninstalled

  This won't remove the folio from the swap cache and free the
  slot. Lazy freeing of swap cache is helpful for reducing IO.
  There is already a folio_free_swap() for immediate cache reclaim.
  This part could be further optimized later.

The above locking constraints could be further relaxed when the swap table
is fully implemented.  Currently dup still needs the caller to lock the
swap entry container (e.g.  PTL), or a concurrent zap may underflow the
swap count.

Some swap users need to interact with swap count without involving folio
(e.g.  forking/zapping the page table or mapping truncate without swapin).
In such cases, the caller has to ensure there is no race condition on
whatever owns the swap count and use the below helpers:

- swap_put_entries_direct() - Decrease the swap count directly.
  Context: The caller must lock whatever is referencing the slots to
  avoid a race.

  Typically the page table zapping or shmem mapping truncate will need
  to free swap slots directly. If a slot is cached (has a folio bound),
  this will also try to release the swap cache.

- swap_dup_entry_direct() - Increase the swap count directly.
  Context: The caller must lock whatever is referencing the entries to
  avoid race, and the entries must already have a swap count &gt; 1.

  Typically, forking will need to copy the page table and hence needs to
  increase the swap count of the entries in the table. The page table is
  locked while referencing the swap entries, so the entries all have a
  swap count &gt; 1 and can't be freed.

Hibernation subsystem is a bit different, so two special wrappers are here:

- swap_alloc_hibernation_slot() - Allocate one entry from one device.
- swap_free_hibernation_slot() - Free one entry allocated by the above
  helper.

All hibernation entries are exclusive to the hibernation subsystem and
should not interact with ordinary swap routines.

By separating the workflows, it will be possible to bind folio more
tightly with swap cache and get rid of the SWAP_HAS_CACHE as a temporary
pin.

This commit should not introduce any behavior change

[kasong@tencent.com: fix leak, per Chris Mason.  Remove WARN_ON, per Lai Yi]
  Link: https://lkml.kernel.org/r/CAMgjq7AUz10uETVm8ozDWcB3XohkOqf0i33KGrAquvEVvfp5cg@mail.gmail.com
[ryncsn@gmail.com: fix KSM copy pages for swapoff, per Chris]
  Link: https://lkml.kernel.org/r/aXxkANcET3l2Xu6J@KASONG-MC4
Link: https://lkml.kernel.org/r/20251220-swap-table-p2-v5-14-8862a265a033@tencent.com
Signed-off-by: Kairui Song &lt;kasong@tencent.com&gt;
Signed-off-by: Kairui Song &lt;ryncsn@gmail.com&gt;
Acked-by: Rafael J. Wysocki (Intel) &lt;rafael@kernel.org&gt;
Reviewed-by: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: Chris Li &lt;chrisl@kernel.org&gt;
Cc: Nhat Pham &lt;nphamcs@gmail.com&gt;
Cc: Yosry Ahmed &lt;yosry.ahmed@linux.dev&gt;
Cc: Deepanshu Kartikey &lt;kartikey406@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kairui Song &lt;ryncsn@gmail.com&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Chris Mason &lt;clm@meta.com&gt;
Cc: Lai Yi &lt;yi1.lai@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, swap: remove workaround for unsynchronized swap map cache state</title>
<updated>2026-01-31T22:22:56+00:00</updated>
<author>
<name>Kairui Song</name>
<email>kasong@tencent.com</email>
</author>
<published>2025-12-19T19:43:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=de85024b34839e9c476b6f93c3104e920bd9d270'/>
<id>urn:sha1:de85024b34839e9c476b6f93c3104e920bd9d270</id>
<content type='text'>
Remove the "skip if exists" check from commit a65b0e7607ccb ("zswap: make
shrinking memcg-aware").  It was needed because there is a tiny time
window between setting the SWAP_HAS_CACHE bit and actually adding the
folio to the swap cache.  If a user is trying to add the folio into the
swap cache but another user was interrupted after setting SWAP_HAS_CACHE
but hasn't added the folio to the swap cache yet, it might lead to a
deadlock.

We have moved the bit setting to the same critical section as adding the
folio, so this is no longer needed.  Remove it and clean it up.

Link: https://lkml.kernel.org/r/20251220-swap-table-p2-v5-13-8862a265a033@tencent.com
Signed-off-by: Kairui Song &lt;kasong@tencent.com&gt;
Reviewed-by: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: Chris Li &lt;chrisl@kernel.org&gt;
Cc: Nhat Pham &lt;nphamcs@gmail.com&gt;
Cc: Rafael J. Wysocki (Intel) &lt;rafael@kernel.org&gt;
Cc: Yosry Ahmed &lt;yosry.ahmed@linux.dev&gt;
Cc: Deepanshu Kartikey &lt;kartikey406@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kairui Song &lt;ryncsn@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, swap: use swap cache as the swap in synchronize layer</title>
<updated>2026-01-31T22:22:56+00:00</updated>
<author>
<name>Kairui Song</name>
<email>kasong@tencent.com</email>
</author>
<published>2025-12-19T19:43:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2732acda82c93475c5986e1a5640004a5d4f9c3e'/>
<id>urn:sha1:2732acda82c93475c5986e1a5640004a5d4f9c3e</id>
<content type='text'>
Current swap in synchronization mostly uses the swap_map's SWAP_HAS_CACHE
bit.  Whoever sets the bit first does the actual work to swap in a folio.

This has been causing many issues as it's just a poor implementation of a
bit lock.  Raced users have no idea what is pinning a slot, so it has to
loop with a schedule_timeout_uninterruptible(1), which is ugly and causes
long-tailing or other performance issues.  Besides, the abuse of
SWAP_HAS_CACHE has been causing many other troubles for synchronization or
maintenance.

This is the first step to remove this bit completely.

Now all swap in paths are using the swap cache, and both the swap cache
and swap map are protected by the cluster lock.  So we can just resolve
the swap synchronization with the swap cache layer directly using the
cluster lock and folio lock.  Whoever inserts a folio in the swap cache
first does the swap in work.  And because folios are locked during swap
operations, other raced swap operations will just wait on the folio lock.

The SWAP_HAS_CACHE will be removed in later commit.  For now, we still set
it for some remaining users.  But now we do the bit setting and swap cache
folio adding in the same critical section, after swap cache is ready.  No
one will have to spin on the SWAP_HAS_CACHE bit anymore.

This both simplifies the logic and should improve the performance,
eliminating issues like the one solved in commit 01626a1823024 ("mm: avoid
unconditional one-tick sleep when swapcache_prepare fails"), or the
"skip_if_exists" from commit a65b0e7607ccb ("zswap: make shrinking
memcg-aware"), which will be removed very soon.

[kasong@tencent.com: fix cgroup v1 accounting issue]
 Link: https://lkml.kernel.org/r/CAMgjq7CGUnzOVG7uSaYjzw9wD7w2dSKOHprJfaEp4CcGLgE3iw@mail.gmail.com
Link: https://lkml.kernel.org/r/20251220-swap-table-p2-v5-12-8862a265a033@tencent.com
Signed-off-by: Kairui Song &lt;kasong@tencent.com&gt;
Reviewed-by: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Baolin Wang &lt;baolin.wang@linux.alibaba.com&gt;
Cc: Barry Song &lt;baohua@kernel.org&gt;
Cc: Chris Li &lt;chrisl@kernel.org&gt;
Cc: Nhat Pham &lt;nphamcs@gmail.com&gt;
Cc: Rafael J. Wysocki (Intel) &lt;rafael@kernel.org&gt;
Cc: Yosry Ahmed &lt;yosry.ahmed@linux.dev&gt;
Cc: Deepanshu Kartikey &lt;kartikey406@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kairui Song &lt;ryncsn@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
</feed>
