<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/linux/perf_event.h, branch v6.19.11</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.19.11'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-01-11T16:09:11+00:00</updated>
<entry>
<title>treewide: Update email address</title>
<updated>2026-01-11T16:09:11+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@kernel.org</email>
</author>
<published>2026-01-11T15:53:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2e4b28c48f88ce9e263957b1d944cf5349952f88'/>
<id>urn:sha1:2e4b28c48f88ce9e263957b1d944cf5349952f88</id>
<content type='text'>
In a vain attempt to consolidate the email zoo switch everything to the
kernel.org account.

Signed-off-by: Thomas Gleixner &lt;tglx@kernel.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>perf: Support deferred user unwind</title>
<updated>2025-10-29T09:29:58+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2025-10-23T13:17:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c69993ecdd4dfde2b7da08b022052a33b203da07'/>
<id>urn:sha1:c69993ecdd4dfde2b7da08b022052a33b203da07</id>
<content type='text'>
Add support for deferred userspace unwind to perf.

Where perf currently relies on in-place stack unwinding; from NMI
context and all that. This moves the userspace part of the unwind to
right before the return-to-userspace.

This has two distinct benefits, the biggest is that it moves the
unwind to a faultable context. It becomes possible to fault in debug
info (.eh_frame, SFrame etc.) that might not otherwise be readily
available. And secondly, it de-duplicates the user callchain where
multiple samples happen during the same kernel entry.

To facilitate this the perf interface is extended with a new record
type:

  PERF_RECORD_CALLCHAIN_DEFERRED

and two new attribute flags:

  perf_event_attr::defer_callchain - to request the user unwind be deferred
  perf_event_attr::defer_output    - to request PERF_RECORD_CALLCHAIN_DEFERRED records

The existing PERF_RECORD_SAMPLE callchain section gets a new
context type:

  PERF_CONTEXT_USER_DEFERRED

After which will come a single entry, denoting the 'cookie' of the
deferred callchain that should be attached here, matching the 'cookie'
field of the above mentioned PERF_RECORD_CALLCHAIN_DEFERRED.

The 'defer_callchain' flag is expected on all events with
PERF_SAMPLE_CALLCHAIN. The 'defer_output' flag is expect on the event
responsible for collecting side-band events (like mmap, comm etc.).
Setting 'defer_output' on multiple events will get you duplicated
PERF_RECORD_CALLCHAIN_DEFERRED records.

Based on earlier patches by Josh and Steven.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://patch.msgid.link/20251023150002.GR4067720@noisy.programming.kicks-ass.net
</content>
</entry>
<entry>
<title>perf: Remove get_perf_callchain() init_nr argument</title>
<updated>2025-08-26T07:51:12+00:00</updated>
<author>
<name>Josh Poimboeuf</name>
<email>jpoimboe@kernel.org</email>
</author>
<published>2025-08-20T18:03:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e649bcda25b5ae1a30a182cc450f928a0b282c93'/>
<id>urn:sha1:e649bcda25b5ae1a30a182cc450f928a0b282c93</id>
<content type='text'>
The 'init_nr' argument has double duty: it's used to initialize both the
number of contexts and the number of stack entries.  That's confusing
and the callers always pass zero anyway.  Hard code the zero.

Signed-off-by: Josh Poimboeuf &lt;jpoimboe@kernel.org&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Namhyung Kim &lt;Namhyung@kernel.org&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/r/20250820180428.259565081@kernel.org
</content>
</entry>
<entry>
<title>perf: Convert mmap() refcounts to refcount_t</title>
<updated>2025-08-15T11:13:02+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2025-08-12T10:39:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=448f97fba9013ffa13f5dd82febd18836b189499'/>
<id>urn:sha1:448f97fba9013ffa13f5dd82febd18836b189499</id>
<content type='text'>
The recently fixed reference count leaks could have been detected by using
refcount_t and refcount_t would have mitigated the potential overflow at
least.

Now that the code is properly structured, convert the mmap() related
mmap_count variants over to refcount_t.

No functional change intended.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Link: https://lore.kernel.org/r/20250812104020.071507932@infradead.org
</content>
</entry>
<entry>
<title>perf: Add comment to enum perf_event_state</title>
<updated>2025-06-05T12:37:53+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2025-06-04T08:21:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=49b393af3130c7712c7e8f215f4126c9a8060fa6'/>
<id>urn:sha1:49b393af3130c7712c7e8f215f4126c9a8060fa6</id>
<content type='text'>
Better describe the event states.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Leo Yan &lt;leo.yan@arm.com&gt;
Link: https://lkml.kernel.org/r/20250604135801.GK38114@noisy.programming.kicks-ass.net
</content>
</entry>
<entry>
<title>perf/headers: Clean up &lt;linux/perf_event.h&gt; a bit</title>
<updated>2025-05-25T07:51:56+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2025-05-24T09:23:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e7d952cc39fca34386ec9f15f68cb2eaac01b5ae'/>
<id>urn:sha1:e7d952cc39fca34386ec9f15f68cb2eaac01b5ae</id>
<content type='text'>
Do a bit of readability spring cleaning:

 - Fix misaligned structure member in perf_addr_filter: the new
   struct perf_addr_filter::action member was too long, but when
   it was added it was not aligned properly. Align all fields to
   the customary column 41 alignment of most of the rest of the
   header.

 - Adjust the vertical alignment of the definition of other
   structures and definitions as well, so that the 'most of' in
   the previous paragraph changes to 'all of'. ;-)

 - Prettify the assignments in perf_clear_branch_entry_bitfields()

 - Move comments from CPP definitions to outside the macro

 - Move perf_guest_info_callbacks and related defines from the front
   of the header closer to where it's used within the header.

 - And more #endif markers for larger CPP blocks and standardize
   #if/#else/#endif blocks to the following nomenclature:

	#ifdef CONFIG_FOO
	...
	#else /* !CONFIG_FOO: */
	...
	#endif /* !CONFIG_FOO */

 - Standardize on consistently using the 'extern' storage class where
   appropriate, we had cases where method prototypes sometimes omitted
   the storage class:

	extern void perf_pmu_migrate_context(struct pmu *pmu,
					int src_cpu, int dst_cpu);
	int perf_event_read_local(struct perf_event *event, u64 *value,
				  u64 *enabled, u64 *running);
	extern u64 perf_event_read_value(struct perf_event *event,
					 u64 *enabled, u64 *running);

   Which is obviously a bit confusing and adds unnecessary noise.

 - s/__u64/u64 and similar cleanups: there's no point in using __u64
   in non-UAPI headers, and doing so only adds unnecessary visual noise.

 - Harmonize all multi-parameter function prototypes along the following
   style:

	extern struct perf_event *
	perf_event_create_kernel_counter(struct perf_event_attr *attr,
					 int cpu,
					 struct task_struct *task,
					 perf_overflow_handler_t callback,
					 void *context);
 - etc.

Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf/aux: Allocate non-contiguous AUX pages by default</title>
<updated>2025-05-15T16:07:19+00:00</updated>
<author>
<name>Yabin Cui</name>
<email>yabinc@google.com</email>
</author>
<published>2025-05-08T23:26:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=18049c8cff9cc89daadc4df6975f7d9069638926'/>
<id>urn:sha1:18049c8cff9cc89daadc4df6975f7d9069638926</id>
<content type='text'>
perf always allocates contiguous AUX pages based on aux_watermark.
However, this contiguous allocation doesn't benefit all PMUs. For
instance, ARM SPE and TRBE operate with virtual pages, and Coresight
ETR allocates a separate buffer. For these PMUs, allocating contiguous
AUX pages unnecessarily exacerbates memory fragmentation. This
fragmentation can prevent their use on long-running devices.

This patch modifies the perf driver to be memory-friendly by default,
by allocating non-contiguous AUX pages. For PMUs requiring contiguous
pages (Intel BTS and some Intel PT), the existing
PERF_PMU_CAP_AUX_NO_SG capability can be used. For PMUs that don't
require but can benefit from contiguous pages (some Intel PT), a new
capability, PERF_PMU_CAP_AUX_PREFER_LARGE, is added to maintain their
existing behavior.

Signed-off-by: Yabin Cui &lt;yabinc@google.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Reviewed-by: James Clark &lt;james.clark@linaro.org&gt;
Reviewed-by: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Link: https://lore.kernel.org/r/20250508232642.148767-1-yabinc@google.com
</content>
</entry>
<entry>
<title>perf/x86/intel: Support auto counter reload</title>
<updated>2025-04-08T18:55:49+00:00</updated>
<author>
<name>Kan Liang</name>
<email>kan.liang@linux.intel.com</email>
</author>
<published>2025-03-27T19:52:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ec980e4facef8110f6fce27e5b6344660117f01f'/>
<id>urn:sha1:ec980e4facef8110f6fce27e5b6344660117f01f</id>
<content type='text'>
The relative rates among two or more events are useful for performance
analysis, e.g., a high branch miss rate may indicate a performance
issue. Usually, the samples with a relative rate that exceeds some
threshold are more useful. However, the traditional sampling takes
samples of events separately. To get the relative rates among two or
more events, a high sample rate is required, which can bring high
overhead. Many samples taken in the non-hotspot area are also dropped
(useless) in the post-process.

The auto counter reload (ACR) feature takes samples when the relative
rate of two or more events exceeds some threshold, which provides the
fine-grained information at a low cost.
To support the feature, two sets of MSRs are introduced. For a given
counter IA32_PMC_GPn_CTR/IA32_PMC_FXm_CTR, bit fields in the
IA32_PMC_GPn_CFG_B/IA32_PMC_FXm_CFG_B MSR indicate which counter(s)
can cause a reload of that counter. The reload value is stored in the
IA32_PMC_GPn_CFG_C/IA32_PMC_FXm_CFG_C.
The details can be found at Intel SDM (085), Volume 3, 21.9.11 Auto
Counter Reload.

In the hw_config(), an ACR event is specially configured, because the
cause/reloadable counter mask has to be applied to the dyn_constraint.
Besides the HW limit, e.g., not support perf metrics, PDist and etc, a
SW limit is applied as well. ACR events in a group must be contiguous.
It facilitates the later conversion from the event idx to the counter
idx. Otherwise, the intel_pmu_acr_late_setup() has to traverse the whole
event list again to find the "cause" event.
Also, add a new flag PERF_X86_EVENT_ACR to indicate an ACR group, which
is set to the group leader.

The late setup() is also required for an ACR group. It's to convert the
event idx to the counter idx, and saved it in hw.config1.

The ACR configuration MSRs are only updated in the enable_event().
The disable_event() doesn't clear the ACR CFG register.
Add acr_cfg_b/acr_cfg_c in the struct cpu_hw_events to cache the MSR
values. It can avoid a MSR write if the value is not changed.

Expose an acr_mask to the sysfs. The perf tool can utilize the new
format to configure the relation of events in the group. The bit
sequence of the acr_mask follows the events enabled order of the group.

Example:

Here is the snippet of the mispredict.c. Since the array has a random
numbers, jumps are random and often mispredicted.
The mispredicted rate depends on the compared value.

For the Loop1, ~11% of all branches are mispredicted.
For the Loop2, ~21% of all branches are mispredicted.

main()
{
...
        for (i = 0; i &lt; N; i++)
                data[i] = rand() % 256;
...
        /* Loop 1 */
        for (k = 0; k &lt; 50; k++)
                for (i = 0; i &lt; N; i++)
                        if (data[i] &gt;= 64)
                                sum += data[i];
...

...
        /* Loop 2 */
        for (k = 0; k &lt; 50; k++)
                for (i = 0; i &lt; N; i++)
                        if (data[i] &gt;= 128)
                                sum += data[i];
...
}

Usually, a code with a high branch miss rate means a bad performance.
To understand the branch miss rate of the codes, the traditional method
usually samples both branches and branch-misses events. E.g.,
perf record -e "{cpu_atom/branch-misses/ppu, cpu_atom/branch-instructions/u}"
               -c 1000000 -- ./mispredict

[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 0.925 MB perf.data (5106 samples) ]
The 5106 samples are from both events and spread in both Loops.
In the post-process stage, a user can know that the Loop 2 has a 21%
branch miss rate. Then they can focus on the samples of branch-misses
events for the Loop 2.

With this patch, the user can generate the samples only when the branch
miss rate &gt; 20%. For example,
perf record -e "{cpu_atom/branch-misses,period=200000,acr_mask=0x2/ppu,
                 cpu_atom/branch-instructions,period=1000000,acr_mask=0x3/u}"
                -- ./mispredict

(Two different periods are applied to branch-misses and
branch-instructions. The ratio is set to 20%.
If the branch-instructions is overflowed first, the branch-miss
rate &lt; 20%. No samples should be generated. All counters should be
automatically reloaded.
If the branch-misses is overflowed first, the branch-miss rate &gt; 20%.
A sample triggered by the branch-misses event should be
generated. Just the counter of the branch-instructions should be
automatically reloaded.

The branch-misses event should only be automatically reloaded when
the branch-instructions is overflowed. So the "cause" event is the
branch-instructions event. The acr_mask is set to 0x2, since the
event index in the group of branch-instructions is 1.

The branch-instructions event is automatically reloaded no matter which
events are overflowed. So the "cause" events are the branch-misses
and the branch-instructions event. The acr_mask should be set to 0x3.)

[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.098 MB perf.data (2498 samples) ]

 $perf report

Percent       │154:   movl    $0x0,-0x14(%rbp)
              │     ↓ jmp     1af
              │     for (i = j; i &lt; N; i++)
              │15d:   mov     -0x10(%rbp),%eax
              │       mov     %eax,-0x18(%rbp)
              │     ↓ jmp     1a2
              │     if (data[i] &gt;= 128)
              │165:   mov     -0x18(%rbp),%eax
              │       cltq
              │       lea     0x0(,%rax,4),%rdx
              │       mov     -0x8(%rbp),%rax
              │       add     %rdx,%rax
              │       mov     (%rax),%eax
              │    ┌──cmp     $0x7f,%eax
100.00   0.00 │    ├──jle     19e
              │    │sum += data[i];

The 2498 samples are all from the branch-misses events for the Loop 2.

The number of samples and overhead is significantly reduced without
losing any information.

Signed-off-by: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Thomas Falcon &lt;thomas.falcon@intel.com&gt;
Link: https://lkml.kernel.org/r/20250327195217.2683619-6-kan.liang@linux.intel.com
</content>
</entry>
<entry>
<title>perf: Extend the bit width of the arch-specific flag</title>
<updated>2025-04-08T18:55:49+00:00</updated>
<author>
<name>Kan Liang</name>
<email>kan.liang@linux.intel.com</email>
</author>
<published>2025-03-27T19:52:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c9449c8506a5df5052ef4d17867699517b10b55a'/>
<id>urn:sha1:c9449c8506a5df5052ef4d17867699517b10b55a</id>
<content type='text'>
The auto counter reload feature requires an event flag to indicate an
auto counter reload group, which can only be scheduled on specific
counters that enumerated in CPUID. However, the hw_perf_event.flags has
run out on X86.

Two solutions were considered to address the issue.
- Currently, 20 bits are reserved for the architecture-specific flags.
  Only the bit 31 is used for the generic flag. There is still plenty
  of space left. Reserve 8 more bits for the arch-specific flags.
- Add a new X86 specific hw_perf_event.flags1 to support more flags.

The former is implemented. Enough room is still left in the global
generic flag.

Signed-off-by: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Thomas Falcon &lt;thomas.falcon@intel.com&gt;
Link: https://lkml.kernel.org/r/20250327195217.2683619-4-kan.liang@linux.intel.com
</content>
</entry>
<entry>
<title>perf/x86: Add dynamic constraint</title>
<updated>2025-04-08T18:55:48+00:00</updated>
<author>
<name>Kan Liang</name>
<email>kan.liang@linux.intel.com</email>
</author>
<published>2025-03-27T19:52:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4dfe3232cc04325a09e96f6c7f9546ba6c0b132b'/>
<id>urn:sha1:4dfe3232cc04325a09e96f6c7f9546ba6c0b132b</id>
<content type='text'>
More and more features require a dynamic event constraint, e.g., branch
counter logging, auto counter reload, Arch PEBS, etc.

Add a generic flag, PMU_FL_DYN_CONSTRAINT, to indicate the case. It
avoids keeping adding the individual flag in intel_cpuc_prepare().

Add a variable dyn_constraint in the struct hw_perf_event to track the
dynamic constraint of the event. Apply it if it's updated.

Apply the generic dynamic constraint for branch counter logging.
Many features on and after V6 require dynamic constraint. So
unconditionally set the flag for V6+.

Signed-off-by: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Thomas Falcon &lt;thomas.falcon@intel.com&gt;
Link: https://lkml.kernel.org/r/20250327195217.2683619-2-kan.liang@linux.intel.com
</content>
</entry>
</feed>
