<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/trace/events/kmem.h, branch v6.6.131</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.131</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.131'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-03-25T10:05:50+00:00</updated>
<entry>
<title>mm/tracing: rss_stat: ensure curr is false from kthread context</title>
<updated>2026-03-25T10:05:50+00:00</updated>
<author>
<name>Kalesh Singh</name>
<email>kaleshsingh@google.com</email>
</author>
<published>2026-02-19T23:36:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b88ce81232bbebdc68c525eddd182c50c981cee1'/>
<id>urn:sha1:b88ce81232bbebdc68c525eddd182c50c981cee1</id>
<content type='text'>
commit 079c24d5690262e83ee476e2a548e416f3237511 upstream.

The rss_stat trace event allows userspace tools, like Perfetto [1], to
inspect per-process RSS metric changes over time.

The curr field was introduced to rss_stat in commit e4dcad204d3a
("rss_stat: add support to detect RSS updates of external mm").  Its
intent is to indicate whether the RSS update is for the mm_struct of the
current execution context; and is set to false when operating on a remote
mm_struct (e.g., via kswapd or a direct reclaimer).

However, an issue arises when a kernel thread temporarily adopts a user
process's mm_struct.  Kernel threads do not have their own mm_struct and
normally have current-&gt;mm set to NULL.  To operate on user memory, they
can "borrow" a memory context using kthread_use_mm(), which sets
current-&gt;mm to the user process's mm.

This can be observed, for example, in the USB Function Filesystem (FFS)
driver.  The ffs_user_copy_worker() handles AIO completions and uses
kthread_use_mm() to copy data to a user-space buffer.  If a page fault
occurs during this copy, the fault handler executes in the kthread's
context.

At this point, current is the kthread, but current-&gt;mm points to the user
process's mm.  Since the rss_stat event (from the page fault) is for that
same mm, the condition current-&gt;mm == mm becomes true, causing curr to be
incorrectly set to true when the trace event is emitted.

This is misleading because it suggests the mm belongs to the kthread,
confusing userspace tools that track per-process RSS changes and
corrupting their mm_id-to-process association.

Fix this by ensuring curr is always false when the trace event is emitted
from a kthread context by checking for the PF_KTHREAD flag.

Link: https://lkml.kernel.org/r/20260219233708.1971199-1-kaleshsingh@google.com
Link: https://perfetto.dev/ [1]
Fixes: e4dcad204d3a ("rss_stat: add support to detect RSS updates of external mm")
Signed-off-by: Kalesh Singh &lt;kaleshsingh@google.com&gt;
Acked-by: Zi Yan &lt;ziy@nvidia.com&gt;
Acked-by: SeongJae Park &lt;sj@kernel.org&gt;
Reviewed-by: Pedro Falcato &lt;pfalcato@suse.de&gt;
Cc: "David Hildenbrand (Arm)" &lt;david@kernel.org&gt;
Cc: Joel Fernandes &lt;joel@joelfernandes.org&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[5.10+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: convert mm's rss stats into percpu_counter</title>
<updated>2022-11-30T23:58:40+00:00</updated>
<author>
<name>Shakeel Butt</name>
<email>shakeelb@google.com</email>
</author>
<published>2022-10-24T05:28:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f1a7941243c102a44e8847e3b94ff4ff3ec56f25'/>
<id>urn:sha1:f1a7941243c102a44e8847e3b94ff4ff3ec56f25</id>
<content type='text'>
Currently mm_struct maintains rss_stats which are updated on page fault
and the unmapping codepaths.  For page fault codepath the updates are
cached per thread with the batch of TASK_RSS_EVENTS_THRESH which is 64. 
The reason for caching is performance for multithreaded applications
otherwise the rss_stats updates may become hotspot for such applications.

However this optimization comes with the cost of error margin in the rss
stats.  The rss_stats for applications with large number of threads can be
very skewed.  At worst the error margin is (nr_threads * 64) and we have a
lot of applications with 100s of threads, so the error margin can be very
high.  Internally we had to reduce TASK_RSS_EVENTS_THRESH to 32.

Recently we started seeing the unbounded errors for rss_stats for specific
applications which use TCP rx0cp.  It seems like vm_insert_pages()
codepath does not sync rss_stats at all.

This patch converts the rss_stats into percpu_counter to convert the error
margin from (nr_threads * 64) to approximately (nr_cpus ^ 2).  However
this conversion enable us to get the accurate stats for situations where
accuracy is more important than the cpu cost.

This patch does not make such tradeoffs - we can just use
percpu_counter_add_local() for the updates and percpu_counter_sum() (or
percpu_counter_sync() + percpu_counter_read) for the readers.  At the
moment the readers are either procfs interface, oom_killer and memory
reclaim which I think are not performance critical and should be ok with
slow read.  However I think we can make that change in a separate patch.

Link: https://lkml.kernel.org/r/20221024052841.3291983-1-shakeelb@google.com
Signed-off-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/slab_common: drop kmem_alloc &amp; avoid dereferencing fields when not using</title>
<updated>2022-09-01T09:44:26+00:00</updated>
<author>
<name>Hyeonggon Yoo</name>
<email>42.hyeyoo@gmail.com</email>
</author>
<published>2022-08-17T10:18:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2c1d697fb8ba6d2d44f914d4268ae1ccdf025f1b'/>
<id>urn:sha1:2c1d697fb8ba6d2d44f914d4268ae1ccdf025f1b</id>
<content type='text'>
Drop kmem_alloc event class, and define kmalloc and kmem_cache_alloc
using TRACE_EVENT() macro.

And then this patch does:
   - Do not pass pointer to struct kmem_cache to trace_kmalloc.
     gfp flag is enough to know if it's accounted or not.
   - Avoid dereferencing s-&gt;object_size and s-&gt;size when not using kmem_cache_alloc event.
   - Avoid dereferencing s-&gt;name in when not using kmem_cache_free event.
   - Adjust s-&gt;size to SLOB_UNITS(s-&gt;size) * SLOB_UNIT in SLOB

Cc: Vasily Averin &lt;vasily.averin@linux.dev&gt;
Suggested-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Hyeonggon Yoo &lt;42.hyeyoo@gmail.com&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
</content>
</entry>
<entry>
<title>mm/slab_common: unify NUMA and UMA version of tracepoints</title>
<updated>2022-09-01T08:40:27+00:00</updated>
<author>
<name>Hyeonggon Yoo</name>
<email>42.hyeyoo@gmail.com</email>
</author>
<published>2022-08-17T10:18:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=11e9734bcb6a7361943f993eba4e97f5812120d8'/>
<id>urn:sha1:11e9734bcb6a7361943f993eba4e97f5812120d8</id>
<content type='text'>
Drop kmem_alloc event class, rename kmem_alloc_node to kmem_alloc, and
remove _node postfix for NUMA version of tracepoints.

This will break some tools that depend on {kmem_cache_alloc,kmalloc}_node,
but at this point maintaining both kmem_alloc and kmem_alloc_node
event classes does not makes sense at all.

Signed-off-by: Hyeonggon Yoo &lt;42.hyeyoo@gmail.com&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
</content>
</entry>
<entry>
<title>mm/tracing: add 'accounted' entry into output of allocation tracepoints</title>
<updated>2022-07-04T15:11:27+00:00</updated>
<author>
<name>Vasily Averin</name>
<email>vvs@openvz.org</email>
</author>
<published>2022-06-03T03:21:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b347aa7b57477f71c740e2bbc6d1078a7109ba23'/>
<id>urn:sha1:b347aa7b57477f71c740e2bbc6d1078a7109ba23</id>
<content type='text'>
Slab caches marked with SLAB_ACCOUNT force accounting for every
allocation from this cache even if __GFP_ACCOUNT flag is not passed.
Unfortunately, at the moment this flag is not visible in ftrace output,
and this makes it difficult to analyze the accounted allocations.

This patch adds boolean "accounted" entry into trace output,
and set it to 'true' for calls used __GFP_ACCOUNT flag and
for allocations from caches marked with SLAB_ACCOUNT.
Set it to 'false' if accounting is disabled in configs.

Signed-off-by: Vasily Averin &lt;vvs@openvz.org&gt;
Acked-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Acked-by: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Acked-by: Muchun Song &lt;songmuchun@bytedance.com&gt;
Reviewed-by: Hyeonggon Yoo &lt;42.hyeyoo@gmail.com&gt;
Link: https://lore.kernel.org/r/c418ed25-65fe-f623-fbf8-1676528859ed@openvz.org
Signed-off-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
</content>
</entry>
<entry>
<title>mm/page_alloc: fix tracepoint mm_page_alloc_zone_locked()</title>
<updated>2022-05-19T21:08:54+00:00</updated>
<author>
<name>Wonhyuk Yang</name>
<email>vvghjk1234@gmail.com</email>
</author>
<published>2022-05-19T21:08:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=10e0f7530205799e7e971aba699a7cb3a47456de'/>
<id>urn:sha1:10e0f7530205799e7e971aba699a7cb3a47456de</id>
<content type='text'>
Currently, trace point mm_page_alloc_zone_locked() doesn't show correct
information.

First, when alloc_flag has ALLOC_HARDER/ALLOC_CMA, page can be allocated
from MIGRATE_HIGHATOMIC/MIGRATE_CMA.  Nevertheless, tracepoint use
requested migration type not MIGRATE_HIGHATOMIC and MIGRATE_CMA.

Second, after commit 44042b4498728 ("mm/page_alloc: allow high-order pages
to be stored on the per-cpu lists") percpu-list can store high order
pages.  But trace point determine whether it is a refiil of percpu-list by
comparing requested order and 0.

To handle these problems, make mm_page_alloc_zone_locked() only be called
by __rmqueue_smallest with correct migration type.  With a new argument
called percpu_refill, it can show roughly whether it is a refill of
percpu-list.

Link: https://lkml.kernel.org/r/20220512025307.57924-1-vvghjk1234@gmail.com
Signed-off-by: Wonhyuk Yang &lt;vvghjk1234@gmail.com&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Baik Song An &lt;bsahn@etri.re.kr&gt;
Cc: Hong Yeon Kim &lt;kimhy@etri.re.kr&gt;
Cc: Taeung Song &lt;taeung@reallinux.co.kr&gt;
Cc: &lt;linuxgeek@linuxgeek.io&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>tracing: incorrect gfp_t conversion</title>
<updated>2022-05-13T14:20:18+00:00</updated>
<author>
<name>Vasily Averin</name>
<email>vvs@openvz.org</email>
</author>
<published>2022-05-13T03:23:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fe573327ffb1deba802c91dd1d4ff41dafa97a0e'/>
<id>urn:sha1:fe573327ffb1deba802c91dd1d4ff41dafa97a0e</id>
<content type='text'>
Fixes the following sparse warnings:

include/trace/events/*: sparse: cast to restricted gfp_t
include/trace/events/*: sparse: restricted gfp_t degrades to integer

gfp_t type is bitwise and requires __force attributes for any casts.

Link: https://lkml.kernel.org/r/331d88fe-f4f7-657c-02a2-d977f15fbff6@openvz.org
Signed-off-by: Vasily Averin &lt;vvs@openvz.org&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, tracing: unify PFN format strings</title>
<updated>2021-06-29T17:53:52+00:00</updated>
<author>
<name>Vincent Whitchurch</name>
<email>vincent.whitchurch@axis.com</email>
</author>
<published>2021-06-29T02:40:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=53d884a6675b0fd7bc8c7b4afd6ead6f17bc4c61'/>
<id>urn:sha1:53d884a6675b0fd7bc8c7b4afd6ead6f17bc4c61</id>
<content type='text'>
Some trace event formats print PFNs as hex while others print them as
decimal.  This is rather annoying when attempting to grep through traces
to understand what's going on with a particular page.

 $ git grep -ho 'pfn=[0x%lu]\+' include/trace/events/ | sort | uniq -c
      11 pfn=0x%lx
      12 pfn=%lu
       2 pfn=%lx

Printing as hex is in the majority in the trace events, and all the normal
printks in mm/ also print PFNs as hex, so change all the PFN formats in
the trace events to use 0x%lx.

Link: https://lkml.kernel.org/r/20210602092608.1493-1-vincent.whitchurch@axis.com
Signed-off-by: Vincent Whitchurch &lt;vincent.whitchurch@axis.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jesper Dangaard Brouer &lt;hawk@kernel.org&gt;
Cc: Ilias Apalodimas &lt;ilias.apalodimas@linaro.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, tracing: improve rss_stat tracepoint message</title>
<updated>2021-04-30T18:20:39+00:00</updated>
<author>
<name>Ovidiu Panait</name>
<email>ovidiu.panait@windriver.com</email>
</author>
<published>2021-04-30T05:57:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f9001107820c647f65b57fb9c1ca2c0908b5fede'/>
<id>urn:sha1:f9001107820c647f65b57fb9c1ca2c0908b5fede</id>
<content type='text'>
Adjust the rss_stat tracepoint to print the name of the resident page type
that got updated (e.g. MM_ANONPAGES/MM_FILEPAGES), rather than the numeric
index corresponding to it (the __entry-&gt;member value):

Before this patch:
------------------
  rss_stat: mm_id=1216113068 curr=0 member=1 size=28672B
  rss_stat: mm_id=1216113068 curr=0 member=1 size=0B
  rss_stat: mm_id=534402304 curr=1 member=0 size=188416B
  rss_stat: mm_id=534402304 curr=1 member=1 size=40960B

After this patch:
-----------------
  rss_stat: mm_id=1726253524 curr=1 type=MM_ANONPAGES size=40960B
  rss_stat: mm_id=1726253524 curr=1 type=MM_FILEPAGES size=663552B
  rss_stat: mm_id=1726253524 curr=1 type=MM_ANONPAGES size=65536B
  rss_stat: mm_id=1726253524 curr=1 type=MM_FILEPAGES size=647168B

Use TRACE_DEFINE_ENUM()/__print_symbolic() logic to map the enum values to
the strings they represent, so that userspace tools can also parse the raw
data correctly.

Link: https://lkml.kernel.org/r/20210310162305.4862-1-ovidiu.panait@windriver.com
Signed-off-by: Ovidiu Panait &lt;ovidiu.panait@windriver.com&gt;
Suggested-by: Steven Rostedt (VMware) &lt;rostedt@goodmis.org&gt;
Reviewed-by: Steven Rostedt (VMware) &lt;rostedt@goodmis.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm, tracing: Fix kmem_cache_free trace event to not print stale pointers</title>
<updated>2021-02-25T17:49:58+00:00</updated>
<author>
<name>Steven Rostedt (VMware)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2021-02-25T17:49:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d814567942ff6ac73869052bdb8ca911364e5eb0'/>
<id>urn:sha1:d814567942ff6ac73869052bdb8ca911364e5eb0</id>
<content type='text'>
The update to kmem_cache_free trace event added printing of the slab name in
the trace event. But it only stores the pointer of the name which will be
printed as a string when the event is read some time in the future. This is
dangerous because the name could be freed in the mean time and when reading
the trace event it would try to dereference the string name by the pointer
to the name that has been freed.

Instead, use the trace event helper macros __string(), __assign_str(), and
__get_str() that are for this very case.

Cc: Jacob Wen &lt;jian.w.wen@oracle.com&gt;
Fixes: 3544de8ee6e4 ("mm, tracing: record slab name for kmem_cache_free()")
Signed-off-by: Steven Rostedt (VMware) &lt;rostedt@goodmis.org&gt;
</content>
</entry>
</feed>
