<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/trace/events/kmem.h, branch v6.12.91</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.91</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.91'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-03-25T10:08:32+00:00</updated>
<entry>
<title>mm/tracing: rss_stat: ensure curr is false from kthread context</title>
<updated>2026-03-25T10:08:32+00:00</updated>
<author>
<name>Kalesh Singh</name>
<email>kaleshsingh@google.com</email>
</author>
<published>2026-02-19T23:36:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b07fc09a226c7f059e27e7263a59bba47682d095'/>
<id>urn:sha1:b07fc09a226c7f059e27e7263a59bba47682d095</id>
<content type='text'>
commit 079c24d5690262e83ee476e2a548e416f3237511 upstream.

The rss_stat trace event allows userspace tools, like Perfetto [1], to
inspect per-process RSS metric changes over time.

The curr field was introduced to rss_stat in commit e4dcad204d3a
("rss_stat: add support to detect RSS updates of external mm").  Its
intent is to indicate whether the RSS update is for the mm_struct of the
current execution context; and is set to false when operating on a remote
mm_struct (e.g., via kswapd or a direct reclaimer).

However, an issue arises when a kernel thread temporarily adopts a user
process's mm_struct.  Kernel threads do not have their own mm_struct and
normally have current-&gt;mm set to NULL.  To operate on user memory, they
can "borrow" a memory context using kthread_use_mm(), which sets
current-&gt;mm to the user process's mm.

This can be observed, for example, in the USB Function Filesystem (FFS)
driver.  The ffs_user_copy_worker() handles AIO completions and uses
kthread_use_mm() to copy data to a user-space buffer.  If a page fault
occurs during this copy, the fault handler executes in the kthread's
context.

At this point, current is the kthread, but current-&gt;mm points to the user
process's mm.  Since the rss_stat event (from the page fault) is for that
same mm, the condition current-&gt;mm == mm becomes true, causing curr to be
incorrectly set to true when the trace event is emitted.

This is misleading because it suggests the mm belongs to the kthread,
confusing userspace tools that track per-process RSS changes and
corrupting their mm_id-to-process association.

Fix this by ensuring curr is always false when the trace event is emitted
from a kthread context by checking for the PF_KTHREAD flag.

Link: https://lkml.kernel.org/r/20260219233708.1971199-1-kaleshsingh@google.com
Link: https://perfetto.dev/ [1]
Fixes: e4dcad204d3a ("rss_stat: add support to detect RSS updates of external mm")
Signed-off-by: Kalesh Singh &lt;kaleshsingh@google.com&gt;
Acked-by: Zi Yan &lt;ziy@nvidia.com&gt;
Acked-by: SeongJae Park &lt;sj@kernel.org&gt;
Reviewed-by: Pedro Falcato &lt;pfalcato@suse.de&gt;
Cc: "David Hildenbrand (Arm)" &lt;david@kernel.org&gt;
Cc: Joel Fernandes &lt;joel@joelfernandes.org&gt;
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[5.10+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: remove CONFIG_MEMCG_KMEM</title>
<updated>2024-07-10T19:14:54+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2024-07-01T15:31:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3a3b7fec3974f954600844e41d773c00857ef48a'/>
<id>urn:sha1:3a3b7fec3974f954600844e41d773c00857ef48a</id>
<content type='text'>
CONFIG_MEMCG_KMEM used to be a user-visible option for whether slab
tracking is enabled.  It has been default-enabled and equivalent to
CONFIG_MEMCG for almost a decade.  We've only grown more kernel memory
accounting sites since, and there is no imaginable cgroup usecase going
forward that wants to track user pages but not the multitude of
user-drivable kernel allocations.

Link: https://lkml.kernel.org/r/20240701153148.452230-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>tracing/treewide: Remove second parameter of __assign_str()</title>
<updated>2024-05-23T00:14:47+00:00</updated>
<author>
<name>Steven Rostedt (Google)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2024-05-16T17:34:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2c92ca849fcc6ee7d0c358e9959abc9f58661aea'/>
<id>urn:sha1:2c92ca849fcc6ee7d0c358e9959abc9f58661aea</id>
<content type='text'>
With the rework of how the __string() handles dynamic strings where it
saves off the source string in field in the helper structure[1], the
assignment of that value to the trace event field is stored in the helper
value and does not need to be passed in again.

This means that with:

  __string(field, mystring)

Which use to be assigned with __assign_str(field, mystring), no longer
needs the second parameter and it is unused. With this, __assign_str()
will now only get a single parameter.

There's over 700 users of __assign_str() and because coccinelle does not
handle the TRACE_EVENT() macro I ended up using the following sed script:

  git grep -l __assign_str | while read a ; do
      sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a &gt; /tmp/test-file;
      mv /tmp/test-file $a;
  done

I then searched for __assign_str() that did not end with ';' as those
were multi line assignments that the sed script above would fail to catch.

Note, the same updates will need to be done for:

  __assign_str_len()
  __assign_rel_str()
  __assign_rel_str_len()

I tested this with both an allmodconfig and an allyesconfig (build only for both).

[1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/

Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home

Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Julia Lawall &lt;Julia.Lawall@inria.fr&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
Acked-by: Jani Nikula &lt;jani.nikula@intel.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt; for the amdgpu parts.
Acked-by: Thomas Hellström &lt;thomas.hellstrom@linux.intel.com&gt; #for
Acked-by: Rafael J. Wysocki &lt;rafael@kernel.org&gt; # for thermal
Acked-by: Takashi Iwai &lt;tiwai@suse.de&gt;
Acked-by: Darrick J. Wong &lt;djwong@kernel.org&gt;	# xfs
Tested-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
</content>
</entry>
<entry>
<title>mm: add alloc_contig_migrate_range allocation statistics</title>
<updated>2024-03-05T01:01:27+00:00</updated>
<author>
<name>Richard Chang</name>
<email>richardycc@google.com</email>
</author>
<published>2024-02-28T05:11:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c8b36003121834cb77fcaf8a1ce0a454d7a97891'/>
<id>urn:sha1:c8b36003121834cb77fcaf8a1ce0a454d7a97891</id>
<content type='text'>
alloc_contig_migrate_range has every information to be able to understand
big contiguous allocation latency.  For example, how many pages are
migrated, how many times they were needed to unmap from page tables.

This patch adds the trace event to collect the allocation statistics.  In
the field, it was quite useful to understand CMA allocation latency.

[akpm@linux-foundation.org: a/trace_mm_alloc_config_migrate_range_info_enabled/trace_mm_alloc_contig_migrate_range_info_enabled]
Link: https://lkml.kernel.org/r/20240228051127.2859472-1-richardycc@google.com
Signed-off-by: Richard Chang &lt;richardycc@google.com&gt;
Reviewed-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org.
Cc: Martin Liu &lt;liumartin@google.com&gt;
Cc: "Masami Hiramatsu (Google)" &lt;mhiramat@kernel.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: convert mm's rss stats into percpu_counter</title>
<updated>2022-11-30T23:58:40+00:00</updated>
<author>
<name>Shakeel Butt</name>
<email>shakeelb@google.com</email>
</author>
<published>2022-10-24T05:28:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f1a7941243c102a44e8847e3b94ff4ff3ec56f25'/>
<id>urn:sha1:f1a7941243c102a44e8847e3b94ff4ff3ec56f25</id>
<content type='text'>
Currently mm_struct maintains rss_stats which are updated on page fault
and the unmapping codepaths.  For page fault codepath the updates are
cached per thread with the batch of TASK_RSS_EVENTS_THRESH which is 64. 
The reason for caching is performance for multithreaded applications
otherwise the rss_stats updates may become hotspot for such applications.

However this optimization comes with the cost of error margin in the rss
stats.  The rss_stats for applications with large number of threads can be
very skewed.  At worst the error margin is (nr_threads * 64) and we have a
lot of applications with 100s of threads, so the error margin can be very
high.  Internally we had to reduce TASK_RSS_EVENTS_THRESH to 32.

Recently we started seeing the unbounded errors for rss_stats for specific
applications which use TCP rx0cp.  It seems like vm_insert_pages()
codepath does not sync rss_stats at all.

This patch converts the rss_stats into percpu_counter to convert the error
margin from (nr_threads * 64) to approximately (nr_cpus ^ 2).  However
this conversion enable us to get the accurate stats for situations where
accuracy is more important than the cpu cost.

This patch does not make such tradeoffs - we can just use
percpu_counter_add_local() for the updates and percpu_counter_sum() (or
percpu_counter_sync() + percpu_counter_read) for the readers.  At the
moment the readers are either procfs interface, oom_killer and memory
reclaim which I think are not performance critical and should be ok with
slow read.  However I think we can make that change in a separate patch.

Link: https://lkml.kernel.org/r/20221024052841.3291983-1-shakeelb@google.com
Signed-off-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/slab_common: drop kmem_alloc &amp; avoid dereferencing fields when not using</title>
<updated>2022-09-01T09:44:26+00:00</updated>
<author>
<name>Hyeonggon Yoo</name>
<email>42.hyeyoo@gmail.com</email>
</author>
<published>2022-08-17T10:18:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2c1d697fb8ba6d2d44f914d4268ae1ccdf025f1b'/>
<id>urn:sha1:2c1d697fb8ba6d2d44f914d4268ae1ccdf025f1b</id>
<content type='text'>
Drop kmem_alloc event class, and define kmalloc and kmem_cache_alloc
using TRACE_EVENT() macro.

And then this patch does:
   - Do not pass pointer to struct kmem_cache to trace_kmalloc.
     gfp flag is enough to know if it's accounted or not.
   - Avoid dereferencing s-&gt;object_size and s-&gt;size when not using kmem_cache_alloc event.
   - Avoid dereferencing s-&gt;name in when not using kmem_cache_free event.
   - Adjust s-&gt;size to SLOB_UNITS(s-&gt;size) * SLOB_UNIT in SLOB

Cc: Vasily Averin &lt;vasily.averin@linux.dev&gt;
Suggested-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Hyeonggon Yoo &lt;42.hyeyoo@gmail.com&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
</content>
</entry>
<entry>
<title>mm/slab_common: unify NUMA and UMA version of tracepoints</title>
<updated>2022-09-01T08:40:27+00:00</updated>
<author>
<name>Hyeonggon Yoo</name>
<email>42.hyeyoo@gmail.com</email>
</author>
<published>2022-08-17T10:18:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=11e9734bcb6a7361943f993eba4e97f5812120d8'/>
<id>urn:sha1:11e9734bcb6a7361943f993eba4e97f5812120d8</id>
<content type='text'>
Drop kmem_alloc event class, rename kmem_alloc_node to kmem_alloc, and
remove _node postfix for NUMA version of tracepoints.

This will break some tools that depend on {kmem_cache_alloc,kmalloc}_node,
but at this point maintaining both kmem_alloc and kmem_alloc_node
event classes does not makes sense at all.

Signed-off-by: Hyeonggon Yoo &lt;42.hyeyoo@gmail.com&gt;
Reviewed-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
</content>
</entry>
<entry>
<title>mm/tracing: add 'accounted' entry into output of allocation tracepoints</title>
<updated>2022-07-04T15:11:27+00:00</updated>
<author>
<name>Vasily Averin</name>
<email>vvs@openvz.org</email>
</author>
<published>2022-06-03T03:21:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b347aa7b57477f71c740e2bbc6d1078a7109ba23'/>
<id>urn:sha1:b347aa7b57477f71c740e2bbc6d1078a7109ba23</id>
<content type='text'>
Slab caches marked with SLAB_ACCOUNT force accounting for every
allocation from this cache even if __GFP_ACCOUNT flag is not passed.
Unfortunately, at the moment this flag is not visible in ftrace output,
and this makes it difficult to analyze the accounted allocations.

This patch adds boolean "accounted" entry into trace output,
and set it to 'true' for calls used __GFP_ACCOUNT flag and
for allocations from caches marked with SLAB_ACCOUNT.
Set it to 'false' if accounting is disabled in configs.

Signed-off-by: Vasily Averin &lt;vvs@openvz.org&gt;
Acked-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Acked-by: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Acked-by: Muchun Song &lt;songmuchun@bytedance.com&gt;
Reviewed-by: Hyeonggon Yoo &lt;42.hyeyoo@gmail.com&gt;
Link: https://lore.kernel.org/r/c418ed25-65fe-f623-fbf8-1676528859ed@openvz.org
Signed-off-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
</content>
</entry>
<entry>
<title>mm/page_alloc: fix tracepoint mm_page_alloc_zone_locked()</title>
<updated>2022-05-19T21:08:54+00:00</updated>
<author>
<name>Wonhyuk Yang</name>
<email>vvghjk1234@gmail.com</email>
</author>
<published>2022-05-19T21:08:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=10e0f7530205799e7e971aba699a7cb3a47456de'/>
<id>urn:sha1:10e0f7530205799e7e971aba699a7cb3a47456de</id>
<content type='text'>
Currently, trace point mm_page_alloc_zone_locked() doesn't show correct
information.

First, when alloc_flag has ALLOC_HARDER/ALLOC_CMA, page can be allocated
from MIGRATE_HIGHATOMIC/MIGRATE_CMA.  Nevertheless, tracepoint use
requested migration type not MIGRATE_HIGHATOMIC and MIGRATE_CMA.

Second, after commit 44042b4498728 ("mm/page_alloc: allow high-order pages
to be stored on the per-cpu lists") percpu-list can store high order
pages.  But trace point determine whether it is a refiil of percpu-list by
comparing requested order and 0.

To handle these problems, make mm_page_alloc_zone_locked() only be called
by __rmqueue_smallest with correct migration type.  With a new argument
called percpu_refill, it can show roughly whether it is a refill of
percpu-list.

Link: https://lkml.kernel.org/r/20220512025307.57924-1-vvghjk1234@gmail.com
Signed-off-by: Wonhyuk Yang &lt;vvghjk1234@gmail.com&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Baik Song An &lt;bsahn@etri.re.kr&gt;
Cc: Hong Yeon Kim &lt;kimhy@etri.re.kr&gt;
Cc: Taeung Song &lt;taeung@reallinux.co.kr&gt;
Cc: &lt;linuxgeek@linuxgeek.io&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>tracing: incorrect gfp_t conversion</title>
<updated>2022-05-13T14:20:18+00:00</updated>
<author>
<name>Vasily Averin</name>
<email>vvs@openvz.org</email>
</author>
<published>2022-05-13T03:23:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fe573327ffb1deba802c91dd1d4ff41dafa97a0e'/>
<id>urn:sha1:fe573327ffb1deba802c91dd1d4ff41dafa97a0e</id>
<content type='text'>
Fixes the following sparse warnings:

include/trace/events/*: sparse: cast to restricted gfp_t
include/trace/events/*: sparse: restricted gfp_t degrades to integer

gfp_t type is bitwise and requires __force attributes for any casts.

Link: https://lkml.kernel.org/r/331d88fe-f4f7-657c-02a2-d977f15fbff6@openvz.org
Signed-off-by: Vasily Averin &lt;vvs@openvz.org&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
</feed>
