<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/lib/stackdepot.c, branch v6.12.80</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-12-14T19:03:32+00:00</updated>
<entry>
<title>stackdepot: fix stack_depot_save_flags() in NMI context</title>
<updated>2024-12-14T19:03:32+00:00</updated>
<author>
<name>Marco Elver</name>
<email>elver@google.com</email>
</author>
<published>2024-11-22T15:39:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9bfeeeff2c92b9dd261198b601b45bde4c529841'/>
<id>urn:sha1:9bfeeeff2c92b9dd261198b601b45bde4c529841</id>
<content type='text'>
commit 031e04bdc834cda3b054ef6b698503b2b97e8186 upstream.

Per documentation, stack_depot_save_flags() was meant to be usable from
NMI context if STACK_DEPOT_FLAG_CAN_ALLOC is unset.  However, it still
would try to take the pool_lock in an attempt to save a stack trace in the
current pool (if space is available).

This could result in deadlock if an NMI is handled while pool_lock is
already held.  To avoid deadlock, only try to take the lock in NMI context
and give up if unsuccessful.

The documentation is fixed to clearly convey this.

Link: https://lkml.kernel.org/r/Z0CcyfbPqmxJ9uJH@elver.google.com
Link: https://lkml.kernel.org/r/20241122154051.3914732-1-elver@google.com
Fixes: 4434a56ec209 ("stackdepot: make fast paths lock-less again")
Signed-off-by: Marco Elver &lt;elver@google.com&gt;
Reported-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Reviewed-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Andrey Konovalov &lt;andreyknvl@gmail.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>stackdepot: use gfp_nested_mask() instead of open coded masking</title>
<updated>2024-05-19T21:40:44+00:00</updated>
<author>
<name>Dave Chinner</name>
<email>dchinner@redhat.com</email>
</author>
<published>2024-04-30T05:28:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=70c435ca8dcb64e3d7983a30a14484aa163bb2d2'/>
<id>urn:sha1:70c435ca8dcb64e3d7983a30a14484aa163bb2d2</id>
<content type='text'>
The stackdepot code is used by KASAN and lockdep for recoding stack
traces.  Both of these track allocation context information, and so their
internal allocations must obey the caller allocation contexts to avoid
generating their own false positive warnings that have nothing to do with
the code they are instrumenting/tracking.

We also don't want recording stack traces to deplete emergency memory
reserves - debug code is useless if it creates new issues that can't be
replicated when the debug code is disabled.

Switch the stackdepot allocation masking to use gfp_nested_mask() to
address these issues.  gfp_nested_mask() also strips GFP_ZONEMASK
naturally, so that greatly simplifies this code.

Link: https://lkml.kernel.org/r/20240430054604.4169568-3-david@fromorbit.com
Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Reviewed-by: Marco Elver &lt;elver@google.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Andrey Konovalov &lt;andreyknvl@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>stackdepot: respect __GFP_NOLOCKDEP allocation flag</title>
<updated>2024-04-25T02:34:26+00:00</updated>
<author>
<name>Andrey Ryabinin</name>
<email>ryabinin.a.a@gmail.com</email>
</author>
<published>2024-04-18T14:11:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6fe60465e1d53ea321ee909be26d97529e8f746c'/>
<id>urn:sha1:6fe60465e1d53ea321ee909be26d97529e8f746c</id>
<content type='text'>
If stack_depot_save_flags() allocates memory it always drops
__GFP_NOLOCKDEP flag.  So when KASAN tries to track __GFP_NOLOCKDEP
allocation we may end up with lockdep splat like bellow:

======================================================
 WARNING: possible circular locking dependency detected
 6.9.0-rc3+ #49 Not tainted
 ------------------------------------------------------
 kswapd0/149 is trying to acquire lock:
 ffff88811346a920
(&amp;xfs_nondir_ilock_class){++++}-{4:4}, at: xfs_reclaim_inode+0x3ac/0x590
[xfs]

 but task is already holding lock:
 ffffffff8bb33100 (fs_reclaim){+.+.}-{0:0}, at:
balance_pgdat+0x5d9/0xad0

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:
 -&gt; #1 (fs_reclaim){+.+.}-{0:0}:
        __lock_acquire+0x7da/0x1030
        lock_acquire+0x15d/0x400
        fs_reclaim_acquire+0xb5/0x100
 prepare_alloc_pages.constprop.0+0xc5/0x230
        __alloc_pages+0x12a/0x3f0
        alloc_pages_mpol+0x175/0x340
        stack_depot_save_flags+0x4c5/0x510
        kasan_save_stack+0x30/0x40
        kasan_save_track+0x10/0x30
        __kasan_slab_alloc+0x83/0x90
        kmem_cache_alloc+0x15e/0x4a0
        __alloc_object+0x35/0x370
        __create_object+0x22/0x90
 __kmalloc_node_track_caller+0x477/0x5b0
        krealloc+0x5f/0x110
        xfs_iext_insert_raw+0x4b2/0x6e0 [xfs]
        xfs_iext_insert+0x2e/0x130 [xfs]
        xfs_iread_bmbt_block+0x1a9/0x4d0 [xfs]
        xfs_btree_visit_block+0xfb/0x290 [xfs]
        xfs_btree_visit_blocks+0x215/0x2c0 [xfs]
        xfs_iread_extents+0x1a2/0x2e0 [xfs]
 xfs_buffered_write_iomap_begin+0x376/0x10a0 [xfs]
        iomap_iter+0x1d1/0x2d0
 iomap_file_buffered_write+0x120/0x1a0
        xfs_file_buffered_write+0x128/0x4b0 [xfs]
        vfs_write+0x675/0x890
        ksys_write+0xc3/0x160
        do_syscall_64+0x94/0x170
 entry_SYSCALL_64_after_hwframe+0x71/0x79

Always preserve __GFP_NOLOCKDEP to fix this.

Link: https://lkml.kernel.org/r/20240418141133.22950-1-ryabinin.a.a@gmail.com
Fixes: cd11016e5f52 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB")
Signed-off-by: Andrey Ryabinin &lt;ryabinin.a.a@gmail.com&gt;
Reported-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Closes: https://lore.kernel.org/all/a0caa289-ca02-48eb-9bf2-d86fd47b71f4@redhat.com/
Reported-by: Damien Le Moal &lt;damien.lemoal@opensource.wdc.com&gt;
Closes: https://lore.kernel.org/all/f9ff999a-e170-b66b-7caf-293f2b147ac2@opensource.wdc.com/
Suggested-by: Dave Chinner &lt;david@fromorbit.com&gt;
Tested-by: Xiubo Li &lt;xiubli@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>stackdepot: rename pool_index to pool_index_plus_1</title>
<updated>2024-04-05T18:21:31+00:00</updated>
<author>
<name>Peter Collingbourne</name>
<email>pcc@google.com</email>
</author>
<published>2024-04-02T00:14:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a6c1d9cb9a68bfa4512248419c4f4d880d19fe90'/>
<id>urn:sha1:a6c1d9cb9a68bfa4512248419c4f4d880d19fe90</id>
<content type='text'>
Commit 3ee34eabac2a ("lib/stackdepot: fix first entry having a 0-handle")
changed the meaning of the pool_index field to mean "the pool index plus
1".  This made the code accessing this field less self-documenting, as
well as causing debuggers such as drgn to not be able to easily remain
compatible with both old and new kernels, because they typically do that
by testing for presence of the new field.  Because stackdepot is a
debugging tool, we should make sure that it is debugger friendly. 
Therefore, give the field a different name to improve readability as well
as enabling debugger backwards compatibility.

This is needed in 6.9, which would otherwise become an odd release with
the new semantics and old name so debuggers wouldn't recognize the new
semantics there.

Fixes: 3ee34eabac2a ("lib/stackdepot: fix first entry having a 0-handle")
Link: https://lkml.kernel.org/r/20240402001500.53533-1-pcc@google.com
Link: https://linux-review.googlesource.com/id/Ib3e70c36c1d230dd0a118dc22649b33e768b9f88
Signed-off-by: Peter Collingbourne &lt;pcc@google.com&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Reviewed-by: Alexander Potapenko &lt;glider@google.com&gt;
Acked-by: Marco Elver &lt;elver@google.com&gt;
Acked-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Andrey Konovalov &lt;andreyknvl@gmail.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Omar Sandoval &lt;osandov@fb.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>lib/stackdepot: off by one in depot_fetch_stack()</title>
<updated>2024-03-05T01:01:17+00:00</updated>
<author>
<name>Dan Carpenter</name>
<email>dan.carpenter@linaro.org</email>
</author>
<published>2024-02-23T14:20:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=dc24559472a682eb124e869cb110e7a2fd857322'/>
<id>urn:sha1:dc24559472a682eb124e869cb110e7a2fd857322</id>
<content type='text'>
The stack_pools[] array has DEPOT_MAX_POOLS.  The "pools_num" tracks the
number of pools which are initialized.  See depot_init_pool() for more
details.

If pool_index == pools_num_cached, this will read one element beyond what
we want.  If not all the pools are initialized, then the pool will be
NULL, triggering a WARN(), and if they are all initialized it will read
one element beyond the end of the array.

Link: https://lkml.kernel.org/r/361ac881-60b7-471f-91e5-5bf8fe8042b2@moroto.mountain
Fixes: b29d31885814 ("lib/stackdepot: store free stack records in a freelist")
Signed-off-by: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Andrey Konovalov &lt;andreyknvl@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm,page_owner: maintain own list of stack_records structs</title>
<updated>2024-02-24T01:48:17+00:00</updated>
<author>
<name>Oscar Salvador</name>
<email>osalvador@suse.de</email>
</author>
<published>2024-02-15T21:59:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4bedfb314bdd85c1662ecc46fa25b33b998f994d'/>
<id>urn:sha1:4bedfb314bdd85c1662ecc46fa25b33b998f994d</id>
<content type='text'>
page_owner needs to increment a stack_record refcount when a new
allocation occurs, and decrement it on a free operation.  In order to do
that, we need to have a way to get a stack_record from a handle. 
Implement __stack_depot_get_stack_record() which just does that, and make
it public so page_owner can use it.

Also, traversing all stackdepot buckets comes with its own complexity,
plus we would have to implement a way to mark only those stack_records
that were originated from page_owner, as those are the ones we are
interested in.  For that reason, page_owner maintains its own list of
stack_records, because traversing that list is faster than traversing all
buckets while keeping at the same time a low complexity.

For now, add to stack_list only the stack_records of dummy_handle and
failure_handle, and set their refcount of 1.

Further patches will add code to increment or decrement stack_records
count on allocation and free operation.

Link: https://lkml.kernel.org/r/20240215215907.20121-4-osalvador@suse.de
Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Reviewed-by: Marco Elver &lt;elver@google.com&gt;
Acked-by: Andrey Konovalov &lt;andreyknvl@gmail.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>lib/stackdepot: move stack_record struct definition into the header</title>
<updated>2024-02-24T01:48:16+00:00</updated>
<author>
<name>Oscar Salvador</name>
<email>osalvador@suse.de</email>
</author>
<published>2024-02-15T21:59:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8151c7a35d8bd8a12e93538ef7963ea209b6ab41'/>
<id>urn:sha1:8151c7a35d8bd8a12e93538ef7963ea209b6ab41</id>
<content type='text'>
In order to move the heavy lifting into page_owner code, this one needs to
have access to the stack_record structure, which right now sits in
lib/stackdepot.c.  Move it to the stackdepot.h header so page_owner can
access stack_record's struct fields.

Link: https://lkml.kernel.org/r/20240215215907.20121-3-osalvador@suse.de
Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Reviewed-by: Marco Elver &lt;elver@google.com&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Andrey Konovalov &lt;andreyknvl@gmail.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>lib/stackdepot: fix first entry having a 0-handle</title>
<updated>2024-02-24T01:48:16+00:00</updated>
<author>
<name>Oscar Salvador</name>
<email>osalvador@suse.de</email>
</author>
<published>2024-02-15T21:59:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3ee34eabac2abb6b1b6fcdebffe18870719ad000'/>
<id>urn:sha1:3ee34eabac2abb6b1b6fcdebffe18870719ad000</id>
<content type='text'>
Patch series "page_owner: print stacks and their outstanding allocations",
v10.

page_owner is a great debug functionality tool that lets us know about all
pages that have been allocated/freed and their specific stacktrace.  This
comes very handy when debugging memory leaks, since with some scripting we
can see the outstanding allocations, which might point to a memory leak.

In my experience, that is one of the most useful cases, but it can get
really tedious to screen through all pages and try to reconstruct the
stack &lt;-&gt; allocated/freed relationship, becoming most of the time a
daunting and slow process when we have tons of allocation/free operations.
 

This patchset aims to ease that by adding a new functionality into
page_owner.  This functionality creates a new directory called
'page_owner_stacks' under 'sys/kernel//debug' with a read-only file called
'show_stacks', which prints out all the stacks followed by their
outstanding number of allocations (being that the times the stacktrace has
allocated but not freed yet).  This gives us a clear and a quick overview
of stacks &lt;-&gt; allocated/free.

We take advantage of the new refcount_f field that stack_record struct
gained, and increment/decrement the stack refcount on every
__set_page_owner() (alloc operation) and __reset_page_owner (free
operation) call.

Unfortunately, we cannot use the new stackdepot api STACK_DEPOT_FLAG_GET
because it does not fulfill page_owner needs, meaning we would have to
special case things, at which point makes more sense for page_owner to do
its own {dec,inc}rementing of the stacks.  E.g: Using
STACK_DEPOT_FLAG_PUT, once the refcount reaches 0, such stack gets
evicted, so page_owner would lose information.

This patchset also creates a new file called 'set_threshold' within
'page_owner_stacks' directory, and by writing a value to it, the stacks
which refcount is below such value will be filtered out.

A PoC can be found below:

 # cat /sys/kernel/debug/page_owner_stacks/show_stacks &gt; page_owner_full_stacks.txt
 # head -40 page_owner_full_stacks.txt 
  prep_new_page+0xa9/0x120
  get_page_from_freelist+0x801/0x2210
  __alloc_pages+0x18b/0x350
  alloc_pages_mpol+0x91/0x1f0
  folio_alloc+0x14/0x50
  filemap_alloc_folio+0xb2/0x100
  page_cache_ra_unbounded+0x96/0x180
  filemap_get_pages+0xfd/0x590
  filemap_read+0xcc/0x330
  blkdev_read_iter+0xb8/0x150
  vfs_read+0x285/0x320
  ksys_read+0xa5/0xe0
  do_syscall_64+0x80/0x160
  entry_SYSCALL_64_after_hwframe+0x6e/0x76
 stack_count: 521



  prep_new_page+0xa9/0x120
  get_page_from_freelist+0x801/0x2210
  __alloc_pages+0x18b/0x350
  alloc_pages_mpol+0x91/0x1f0
  folio_alloc+0x14/0x50
  filemap_alloc_folio+0xb2/0x100
  __filemap_get_folio+0x14a/0x490
  ext4_write_begin+0xbd/0x4b0 [ext4]
  generic_perform_write+0xc1/0x1e0
  ext4_buffered_write_iter+0x68/0xe0 [ext4]
  ext4_file_write_iter+0x70/0x740 [ext4]
  vfs_write+0x33d/0x420
  ksys_write+0xa5/0xe0
  do_syscall_64+0x80/0x160
  entry_SYSCALL_64_after_hwframe+0x6e/0x76
 stack_count: 4609
...
...

 # echo 5000 &gt; /sys/kernel/debug/page_owner_stacks/set_threshold 
 # cat /sys/kernel/debug/page_owner_stacks/show_stacks &gt; page_owner_full_stacks_5000.txt
 # head -40 page_owner_full_stacks_5000.txt 
  prep_new_page+0xa9/0x120
  get_page_from_freelist+0x801/0x2210
  __alloc_pages+0x18b/0x350
  alloc_pages_mpol+0x91/0x1f0
  folio_alloc+0x14/0x50
  filemap_alloc_folio+0xb2/0x100
  __filemap_get_folio+0x14a/0x490
  ext4_write_begin+0xbd/0x4b0 [ext4]
  generic_perform_write+0xc1/0x1e0
  ext4_buffered_write_iter+0x68/0xe0 [ext4]
  ext4_file_write_iter+0x70/0x740 [ext4]
  vfs_write+0x33d/0x420
  ksys_pwrite64+0x75/0x90
  do_syscall_64+0x80/0x160
  entry_SYSCALL_64_after_hwframe+0x6e/0x76
 stack_count: 6781



  prep_new_page+0xa9/0x120
  get_page_from_freelist+0x801/0x2210
  __alloc_pages+0x18b/0x350
  pcpu_populate_chunk+0xec/0x350
  pcpu_balance_workfn+0x2d1/0x4a0
  process_scheduled_works+0x84/0x380
  worker_thread+0x12a/0x2a0
  kthread+0xe3/0x110
  ret_from_fork+0x30/0x50
  ret_from_fork_asm+0x1b/0x30
 stack_count: 8641


This patch (of 7):

The very first entry of stack_record gets a handle of 0, but this is wrong
because stackdepot treats a 0-handle as a non-valid one.  E.g: See the
check in stack_depot_fetch()

Fix this by adding and offset of 1.

This bug has been lurking since the very beginning of stackdepot, but no
one really cared as it seems.  Because of that I am not adding a Fixes
tag.

Link: https://lkml.kernel.org/r/20240215215907.20121-1-osalvador@suse.de
Link: https://lkml.kernel.org/r/20240215215907.20121-2-osalvador@suse.de
Co-developed-by: Marco Elver &lt;elver@google.com&gt;
Signed-off-by: Marco Elver &lt;elver@google.com&gt;
Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Acked-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Acked-by: Andrey Konovalov &lt;andreyknvl@gmail.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>stackdepot: use variable size records for non-evictable entries</title>
<updated>2024-02-24T01:27:12+00:00</updated>
<author>
<name>Marco Elver</name>
<email>elver@google.com</email>
</author>
<published>2024-01-29T10:07:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=31639fd6cebd4fc3687cceda14814f140c9fd95b'/>
<id>urn:sha1:31639fd6cebd4fc3687cceda14814f140c9fd95b</id>
<content type='text'>
With the introduction of stack depot evictions, each stack record is now
fixed size, so that future reuse after an eviction can safely store
differently sized stack traces.  In all cases that do not make use of
evictions, this wastes lots of space.

Fix it by re-introducing variable size stack records (up to the max
allowed size) for entries that will never be evicted.  We know if an entry
will never be evicted if the flag STACK_DEPOT_FLAG_GET is not provided,
since a later stack_depot_put() attempt is undefined behavior.

With my current kernel config that enables KASAN and also SLUB owner
tracking, I observe (after a kernel boot) a whopping reduction of 296
stack depot pools, which translates into 4736 KiB saved.  The savings here
are from SLUB owner tracking only, because KASAN generic mode still uses
refcounting.

Before:

  pools: 893
  allocations: 29841
  frees: 6524
  in_use: 23317
  freelist_size: 3454

After:

  pools: 597
  refcounted_allocations: 17547
  refcounted_frees: 6477
  refcounted_in_use: 11070
  freelist_size: 3497
  persistent_count: 12163
  persistent_bytes: 1717008

[elver@google.com: fix -Wstringop-overflow warning]
  Link: https://lore.kernel.org/all/20240201135747.18eca98e@canb.auug.org.au/
  Link: https://lkml.kernel.org/r/20240201090434.1762340-1-elver@google.com
  Link: https://lore.kernel.org/all/CABXGCsOzpRPZGg23QqJAzKnqkZPKzvieeg=W7sgjgi3q0pBo0g@mail.gmail.com/
Link: https://lkml.kernel.org/r/20240129100708.39460-1-elver@google.com
Link: https://lore.kernel.org/all/CABXGCsOzpRPZGg23QqJAzKnqkZPKzvieeg=W7sgjgi3q0pBo0g@mail.gmail.com/
Fixes: 108be8def46e ("lib/stackdepot: allow users to evict stack traces")
Signed-off-by: Marco Elver &lt;elver@google.com&gt;
Reviewed-by: Andrey Konovalov &lt;andreyknvl@gmail.com&gt;
Tested-by: Mikhail Gavrilov &lt;mikhail.v.gavrilov@gmail.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Andrey Ryabinin &lt;ryabinin.a.a@gmail.com&gt;
Cc: Vincenzo Frascino &lt;vincenzo.frascino@arm.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>stackdepot: make fast paths lock-less again</title>
<updated>2024-01-26T07:52:21+00:00</updated>
<author>
<name>Marco Elver</name>
<email>elver@google.com</email>
</author>
<published>2024-01-18T11:01:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4434a56ec20925333d6cf4d4093641d063abd35b'/>
<id>urn:sha1:4434a56ec20925333d6cf4d4093641d063abd35b</id>
<content type='text'>
With the introduction of the pool_rwlock (reader-writer lock), several
fast paths end up taking the pool_rwlock as readers.  Furthermore,
stack_depot_put() unconditionally takes the pool_rwlock as a writer.

Despite allowing readers to make forward-progress concurrently,
reader-writer locks have inherent cache contention issues, which does not
scale well on systems with large CPU counts.

Rework the synchronization story of stack depot to again avoid taking any
locks in the fast paths.  This is done by relying on RCU-protected list
traversal, and the NMI-safe subset of RCU to delay reuse of freed stack
records.  See code comments for more details.

Along with the performance issues, this also fixes incorrect nesting of
rwlock within a raw_spinlock, given that stack depot should still be
usable from anywhere:

 | [ BUG: Invalid wait context ]
 | -----------------------------
 | swapper/0/1 is trying to lock:
 | ffffffff89869be8 (pool_rwlock){..--}-{3:3}, at: stack_depot_save_flags
 | other info that might help us debug this:
 | context-{5:5}
 | 2 locks held by swapper/0/1:
 |  #0: ffffffff89632440 (rcu_read_lock){....}-{1:3}, at: __queue_work
 |  #1: ffff888100092018 (&amp;pool-&gt;lock){-.-.}-{2:2}, at: __queue_work  &lt;-- raw_spin_lock

Stack depot usage stats are similar to the previous version after a KASAN
kernel boot:

 $ cat /sys/kernel/debug/stackdepot/stats
 pools: 838
 allocations: 29865
 frees: 6604
 in_use: 23261
 freelist_size: 1879

The number of pools is the same as previously.  The freelist size is
minimally larger, but this may also be due to variance across system
boots.  This shows that even though we do not eagerly wait for the next
RCU grace period (such as with synchronize_rcu() or call_rcu()) after
freeing a stack record - requiring depot_pop_free() to "poll" if an entry
may be used - new allocations are very likely to happen in later RCU grace
periods.

Link: https://lkml.kernel.org/r/20240118110216.2539519-2-elver@google.com
Fixes: 108be8def46e ("lib/stackdepot: allow users to evict stack traces")
Reported-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Signed-off-by: Marco Elver &lt;elver@google.com&gt;
Reviewed-by: Andrey Konovalov &lt;andreyknvl@gmail.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Andrey Konovalov &lt;andreyknvl@gmail.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
</feed>
