<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/mm/zsmalloc.c, branch v4.14.265</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v4.14.265</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v4.14.265'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2021-11-26T10:40:36+00:00</updated>
<entry>
<title>mm/zsmalloc.c: close race window between zs_pool_dec_isolated() and zs_unregister_migration()</title>
<updated>2021-11-26T10:40:36+00:00</updated>
<author>
<name>Miaohe Lin</name>
<email>linmiaohe@huawei.com</email>
</author>
<published>2021-11-05T20:45:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=643f20a3c893141e7d2aeedfd6c85871d5796a82'/>
<id>urn:sha1:643f20a3c893141e7d2aeedfd6c85871d5796a82</id>
<content type='text'>
[ Upstream commit afe8605ca45424629fdddfd85984b442c763dc47 ]

There is one possible race window between zs_pool_dec_isolated() and
zs_unregister_migration() because wait_for_isolated_drain() checks the
isolated count without holding class-&gt;lock and there is no order inside
zs_pool_dec_isolated().  Thus the below race window could be possible:

  zs_pool_dec_isolated		zs_unregister_migration
    check pool-&gt;destroying != 0
				  pool-&gt;destroying = true;
				  smp_mb();
				  wait_for_isolated_drain()
				    wait for pool-&gt;isolated_pages == 0
    atomic_long_dec(&amp;pool-&gt;isolated_pages);
    atomic_long_read(&amp;pool-&gt;isolated_pages) == 0

Since we observe the pool-&gt;destroying (false) before atomic_long_dec()
for pool-&gt;isolated_pages, waking pool-&gt;migration_wait up is missed.

Fix this by ensure checking pool-&gt;destroying happens after the
atomic_long_dec(&amp;pool-&gt;isolated_pages).

Link: https://lkml.kernel.org/r/20210708115027.7557-1-linmiaohe@huawei.com
Fixes: 701d678599d0 ("mm/zsmalloc.c: fix race condition in zs_destroy_pool")
Signed-off-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Sergey Senozhatsky &lt;senozhatsky@chromium.org&gt;
Cc: Henry Burns &lt;henryburns@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS</title>
<updated>2021-11-12T13:28:22+00:00</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2021-11-03T20:57:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4307f706a2be9008e4565d14352c771bb429f836'/>
<id>urn:sha1:4307f706a2be9008e4565d14352c771bb429f836</id>
<content type='text'>
commit 02390b87a9459937cdb299e6b34ff33992512ec7 upstream

With boot-time switching between paging mode we will have variable
MAX_PHYSMEM_BITS.

Let's use the maximum variable possible for CONFIG_X86_5LEVEL=y
configuration to define zsmalloc data structures.

The patch introduces MAX_POSSIBLE_PHYSMEM_BITS to cover such case.
It also suits well to handle PAE special case.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Reviewed-by: Nitin Gupta &lt;ngupta@vflare.org&gt;
Acked-by: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Borislav Petkov &lt;bp@suse.de&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Sergey Senozhatsky &lt;sergey.senozhatsky.work@gmail.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214111656.88514-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Florian Fainelli &lt;f.fainelli@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>zsmalloc: account the number of compacted pages correctly</title>
<updated>2021-03-07T10:27:46+00:00</updated>
<author>
<name>Rokudo Yan</name>
<email>wu-yan@tcl.com</email>
</author>
<published>2021-02-26T01:18:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=09beeb4694414e6dd40f947096bc07ab3fa169d4'/>
<id>urn:sha1:09beeb4694414e6dd40f947096bc07ab3fa169d4</id>
<content type='text'>
commit 2395928158059b8f9858365fce7713ce7fef62e4 upstream.

There exists multiple path may do zram compaction concurrently.
1. auto-compaction triggered during memory reclaim
2. userspace utils write zram&lt;id&gt;/compaction node

So, multiple threads may call zs_shrinker_scan/zs_compact concurrently.
But pages_compacted is a per zsmalloc pool variable and modification
of the variable is not serialized(through under class-&gt;lock).
There are two issues here:
1. the pages_compacted may not equal to total number of pages
freed(due to concurrently add).
2. zs_shrinker_scan may not return the correct number of pages
freed(issued by current shrinker).

The fix is simple:
1. account the number of pages freed in zs_compact locally.
2. use actomic variable pages_compacted to accumulate total number.

Link: https://lkml.kernel.org/r/20210202122235.26885-1-wu-yan@tcl.com
Fixes: 860c707dca155a56 ("zsmalloc: account the number of compacted pages")
Signed-off-by: Rokudo Yan &lt;wu-yan@tcl.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Sergey Senozhatsky &lt;sergey.senozhatsky@gmail.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm/zsmalloc.c: fix the migrated zspage statistics.</title>
<updated>2020-01-09T09:17:54+00:00</updated>
<author>
<name>Chanho Min</name>
<email>chanho.min@lge.com</email>
</author>
<published>2020-01-04T20:59:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1501713aee057266a0774f757678e482a1dd047b'/>
<id>urn:sha1:1501713aee057266a0774f757678e482a1dd047b</id>
<content type='text'>
commit ac8f05da5174c560de122c499ce5dfb5d0dfbee5 upstream.

When zspage is migrated to the other zone, the zone page state should be
updated as well, otherwise the NR_ZSPAGE for each zone shows wrong
counts including proc/zoneinfo in practice.

Link: http://lkml.kernel.org/r/1575434841-48009-1-git-send-email-chanho.min@lge.com
Fixes: 91537fee0013 ("mm: add NR_ZSMALLOC to vmstat")
Signed-off-by: Chanho Min &lt;chanho.min@lge.com&gt;
Signed-off-by: Jinsuk Choi &lt;jjinsuk.choi@lge.com&gt;
Reviewed-by: Sergey Senozhatsky &lt;sergey.senozhatsky@gmail.com&gt;
Acked-by: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;        [4.9+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm/zsmalloc.c: fix build when CONFIG_COMPACTION=n</title>
<updated>2019-09-06T08:20:51+00:00</updated>
<author>
<name>Andrew Morton</name>
<email>akpm@linux-foundation.org</email>
</author>
<published>2019-08-30T23:04:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bf147673368d1e60d3e94a7df517c848cda0ba33'/>
<id>urn:sha1:bf147673368d1e60d3e94a7df517c848cda0ba33</id>
<content type='text'>
commit 441e254cd40dc03beec3c650ce6ce6074bc6517f upstream.

Fixes: 701d678599d0c1 ("mm/zsmalloc.c: fix race condition in zs_destroy_pool")
Link: http://lkml.kernel.org/r/201908251039.5oSbEEUT%25lkp@intel.com
Reported-by: kbuild test robot &lt;lkp@intel.com&gt;
Cc: Sergey Senozhatsky &lt;sergey.senozhatsky@gmail.com&gt;
Cc: Henry Burns &lt;henrywolfeburns@gmail.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Jonathan Adams &lt;jwadams@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm/zsmalloc.c: fix race condition in zs_destroy_pool</title>
<updated>2019-08-29T06:26:45+00:00</updated>
<author>
<name>Henry Burns</name>
<email>henryburns@google.com</email>
</author>
<published>2019-08-25T00:55:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1a439560bb31f8422404e9a394d4c3d1fac19764'/>
<id>urn:sha1:1a439560bb31f8422404e9a394d4c3d1fac19764</id>
<content type='text'>
commit 701d678599d0c1623aaf4139c03eea260a75b027 upstream.

In zs_destroy_pool() we call flush_work(&amp;pool-&gt;free_work).  However, we
have no guarantee that migration isn't happening in the background at
that time.

Since migration can't directly free pages, it relies on free_work being
scheduled to free the pages.  But there's nothing preventing an
in-progress migrate from queuing the work *after*
zs_unregister_migration() has called flush_work().  Which would mean
pages still pointing at the inode when we free it.

Since we know at destroy time all objects should be free, no new
migrations can come in (since zs_page_isolate() fails for fully-free
zspages).  This means it is sufficient to track a "# isolated zspages"
count by class, and have the destroy logic ensure all such pages have
drained before proceeding.  Keeping that state under the class spinlock
keeps the logic straightforward.

In this case a memory leak could lead to an eventual crash if compaction
hits the leaked page.  This crash would only occur if people are
changing their zswap backend at runtime (which eventually starts
destruction).

Link: http://lkml.kernel.org/r/20190809181751.219326-2-henryburns@google.com
Fixes: 48b4800a1c6a ("zsmalloc: page migration support")
Signed-off-by: Henry Burns &lt;henryburns@google.com&gt;
Reviewed-by: Sergey Senozhatsky &lt;sergey.senozhatsky@gmail.com&gt;
Cc: Henry Burns &lt;henrywolfeburns@gmail.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Jonathan Adams &lt;jwadams@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm/zsmalloc.c: migration can leave pages in ZS_EMPTY indefinitely</title>
<updated>2019-08-29T06:26:45+00:00</updated>
<author>
<name>Henry Burns</name>
<email>henryburns@google.com</email>
</author>
<published>2019-08-25T00:55:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=eaa98c18d47ed74ea7eb7358a9e4577f6e7abac9'/>
<id>urn:sha1:eaa98c18d47ed74ea7eb7358a9e4577f6e7abac9</id>
<content type='text'>
commit 1a87aa03597efa9641e92875b883c94c7f872ccb upstream.

In zs_page_migrate() we call putback_zspage() after we have finished
migrating all pages in this zspage.  However, the return value is
ignored.  If a zs_free() races in between zs_page_isolate() and
zs_page_migrate(), freeing the last object in the zspage,
putback_zspage() will leave the page in ZS_EMPTY for potentially an
unbounded amount of time.

To fix this, we need to do the same thing as zs_page_putback() does:
schedule free_work to occur.

To avoid duplicated code, move the sequence to a new
putback_zspage_deferred() function which both zs_page_migrate() and
zs_page_putback() call.

Link: http://lkml.kernel.org/r/20190809181751.219326-1-henryburns@google.com
Fixes: 48b4800a1c6a ("zsmalloc: page migration support")
Signed-off-by: Henry Burns &lt;henryburns@google.com&gt;
Reviewed-by: Sergey Senozhatsky &lt;sergey.senozhatsky@gmail.com&gt;
Cc: Henry Burns &lt;henrywolfeburns@gmail.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Jonathan Adams &lt;jwadams@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>zsmalloc: calling zs_map_object() from irq is a bug</title>
<updated>2017-12-14T08:53:10+00:00</updated>
<author>
<name>Sergey Senozhatsky</name>
<email>sergey.senozhatsky.work@gmail.com</email>
</author>
<published>2017-11-16T01:34:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1238334082ea47738c79c2acb4546767593e1a75'/>
<id>urn:sha1:1238334082ea47738c79c2acb4546767593e1a75</id>
<content type='text'>
[ Upstream commit 1aedcafbf32b3f232c159b14cd0d423fcfe2b861 ]

Use BUG_ON(in_interrupt()) in zs_map_object().  This is not a new
BUG_ON(), it's always been there, but was recently changed to
VM_BUG_ON().  There are several problems there.  First, we use use
per-CPU mappings both in zsmalloc and in zram, and interrupt may easily
corrupt those buffers.  Second, and more importantly, we believe it's
possible to start leaking sensitive information.  Consider the following
case:

-&gt; process P
	swap out
	 zram
	  per-cpu mapping CPU1
	   compress page A
-&gt; IRQ

	swap out
	 zram
	  per-cpu mapping CPU1
	   compress page B
	    write page from per-cpu mapping CPU1 to zsmalloc pool
	iret

-&gt; process P
	    write page from per-cpu mapping CPU1 to zsmalloc pool  [*]
	return

* so we store overwritten data that actually belongs to another
  page (task) and potentially contains sensitive data. And when
  process P will page fault it's going to read (swap in) that
  other task's data.

Link: http://lkml.kernel.org/r/20170929045140.4055-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky &lt;sergey.senozhatsky@gmail.com&gt;
Acked-by: Minchan Kim &lt;minchan@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm/zsmalloc.c: change stat type parameter to int</title>
<updated>2017-09-09T01:26:47+00:00</updated>
<author>
<name>Matthias Kaehlcke</name>
<email>mka@chromium.org</email>
</author>
<published>2017-09-08T23:13:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3eb95feac113d8ebad5b7b5189a65efcbd95a749'/>
<id>urn:sha1:3eb95feac113d8ebad5b7b5189a65efcbd95a749</id>
<content type='text'>
zs_stat_inc/dec/get() uses enum zs_stat_type for the stat type, however
some callers pass an enum fullness_group value.  Change the type to int to
reflect the actual use of the functions and get rid of 'enum-conversion'
warnings

Link: http://lkml.kernel.org/r/20170731175000.56538-1-mka@chromium.org
Signed-off-by: Matthias Kaehlcke &lt;mka@chromium.org&gt;
Reviewed-by: Sergey Senozhatsky &lt;sergey.senozhatsky@gmail.com&gt;
Acked-by: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Doug Anderson &lt;dianders@chromium.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY</title>
<updated>2017-09-09T01:26:46+00:00</updated>
<author>
<name>Jérôme Glisse</name>
<email>jglisse@redhat.com</email>
</author>
<published>2017-09-08T23:12:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2916ecc0f9d435d849c98f4da50e453124c87531'/>
<id>urn:sha1:2916ecc0f9d435d849c98f4da50e453124c87531</id>
<content type='text'>
Introduce a new migration mode that allow to offload the copy to a device
DMA engine.  This changes the workflow of migration and not all
address_space migratepage callback can support this.

This is intended to be use by migrate_vma() which itself is use for thing
like HMM (see include/linux/hmm.h).

No additional per-filesystem migratepage testing is needed.  I disables
MIGRATE_SYNC_NO_COPY in all problematic migratepage() callback and i
added comment in those to explain why (part of this patch).  The commit
message is unclear it should say that any callback that wish to support
this new mode need to be aware of the difference in the migration flow
from other mode.

Some of these callbacks do extra locking while copying (aio, zsmalloc,
balloon, ...) and for DMA to be effective you want to copy multiple
pages in one DMA operations.  But in the problematic case you can not
easily hold the extra lock accross multiple call to this callback.

Usual flow is:

For each page {
 1 - lock page
 2 - call migratepage() callback
 3 - (extra locking in some migratepage() callback)
 4 - migrate page state (freeze refcount, update page cache, buffer
     head, ...)
 5 - copy page
 6 - (unlock any extra lock of migratepage() callback)
 7 - return from migratepage() callback
 8 - unlock page
}

The new mode MIGRATE_SYNC_NO_COPY:
 1 - lock multiple pages
For each page {
 2 - call migratepage() callback
 3 - abort in all problematic migratepage() callback
 4 - migrate page state (freeze refcount, update page cache, buffer
     head, ...)
} // finished all calls to migratepage() callback
 5 - DMA copy multiple pages
 6 - unlock all the pages

To support MIGRATE_SYNC_NO_COPY in the problematic case we would need a
new callback migratepages() (for instance) that deals with multiple
pages in one transaction.

Because the problematic cases are not important for current usage I did
not wanted to complexify this patchset even more for no good reason.

Link: http://lkml.kernel.org/r/20170817000548.32038-14-jglisse@redhat.com
Signed-off-by: Jérôme Glisse &lt;jglisse@redhat.com&gt;
Cc: Aneesh Kumar &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: David Nellans &lt;dnellans@nvidia.com&gt;
Cc: Evgeny Baskakov &lt;ebaskakov@nvidia.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Mark Hairgrove &lt;mhairgrove@nvidia.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Sherry Cheung &lt;SCheung@nvidia.com&gt;
Cc: Subhash Gutti &lt;sgutti@nvidia.com&gt;
Cc: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Cc: Bob Liu &lt;liubo95@huawei.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
