summaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)AuthorFilesLines
12 daysperf symbol-elf: Add support for the block argument for libbfdIan Rogers1-3/+7
James Clark caught that the BUILD_NONDISTRO=1 build with libbfd was broken due to an update to the read_build_id function adding a blocking argument. Add support for this argument by first opening the file blocking or non-blocking, then switching from bfd_openr to bfd_fdopenr and passing the opened fd. bfd_fdopenr closes the fd on error and when bfd_close are called. Reported-by: James Clark <james.clark@linaro.org> Closes: https://lore.kernel.org/lkml/20250903-james-perf-read-build-id-fix-v1-2-6a694d0a980f@linaro.org/ Fixes: 2c369d91d093 ("perf symbol: Add blocking argument to filename__read_build_id") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250904161731.1193729-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
12 daysperf test: Checking BPF metadata collection fails on version stringThomas Richter1-1/+1
commit edf2cadf01e8f ("perf test: add test for BPF metadata collection") fails consistently on the version string check. The perf version string on some of the constant integration test machines contains characters with special meaning in grep's extended regular expression matching algorithm. The output of perf version is: # perf version perf version 6.17.0-20250814.rc1.git20.24ea63ea3877.63.fc42.s390x+git # and the '+' character has special meaning in egrep command. Also the use of egrep is deprecated. Change the perf version string check to fixed character matching and get rid of egrep's warning being deprecated. Use grep -F instead. Output before: # perf test -F 102 Checking BPF metadata collection egrep: warning: egrep is obsolescent; using grep -E Basic BPF metadata test [Failed invalid output] 102: BPF metadata collection test : FAILED! # Output after: # perf test -F 102 Checking BPF metadata collection Basic BPF metadata test [Success] 102: BPF metadata collection test : Ok # Fixes: edf2cadf01e8f ("perf test: add test for BPF metadata collection") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Blake Jones <blakejones@google.com> Link: https://lore.kernel.org/r/20250822122540.4104658-1-tmricht@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
14 daysperf tests: Fix "PE file support" test buildJames Clark1-2/+2
filename__read_build_id() now takes a blocking/non-blocking argument. The original behavior of filename__read_build_id() was blocking so add block=true to fix the build. Fixes: 2c369d91d093 ("perf symbol: Add blocking argument to filename__read_build_id") Signed-off-by: James Clark <james.clark@linaro.org> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250903-james-perf-read-build-id-fix-v1-1-6a694d0a980f@linaro.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-09-03perf bpf-utils: Harden get_bpf_prog_info_linearIan Rogers1-10/+33
In get_bpf_prog_info_linear two calls to bpf_obj_get_info_by_fd are made, the first to compute memory requirements for a struct perf_bpil and the second to fill it in. Previously the code would warn when the second call didn't match the first. Such races can be common place in things like perf test, whose perf trace tests will frequently load BPF programs. Rather than a debug message, return actual errors for this case. Out of paranoia also validate the read bpf_prog_info array value. Change the type of ptr to avoid mismatched pointer type compiler warnings. Add some additional debug print outs and sanity asserts. Closes: https://lore.kernel.org/lkml/CAP-5=fWJQcmUOP7MuCA2ihKnDAHUCOBLkQFEkQES-1ZZTrgf8Q@mail.gmail.com/ Fixes: 6ac22d036f86 ("perf bpf: Pull in bpf_program__get_prog_info_linear()") Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250902181713.309797-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-09-03perf bpf-utils: Constify bpil_array_descIan Rogers1-12/+6
The array's contents is a compile time constant. Constify to make the code more intention revealing and avoid unintended errors. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250902181713.309797-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-09-03perf bpf-event: Fix use-after-free in synthesisIan Rogers1-12/+27
Calls to perf_env__insert_bpf_prog_info may fail as a sideband thread may already have inserted the bpf_prog_info. Such failures may yield info_linear being freed which then causes use-after-free issues with the internal bpf_prog_info info struct. Make it so that perf_env__insert_bpf_prog_info trigger early non-error paths and fix the use-after-free in perf_event__synthesize_one_bpf_prog. Add proper return error handling to perf_env__add_bpf_info (that calls perf_env__insert_bpf_prog_info) and propagate the return value in its callers. Closes: https://lore.kernel.org/lkml/CAP-5=fWJQcmUOP7MuCA2ihKnDAHUCOBLkQFEkQES-1ZZTrgf8Q@mail.gmail.com/ Fixes: 03edb7020bb9 ("perf bpf: Fix two memory leakages when calling perf_env__insert_bpf_prog_info()") Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250902181713.309797-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-08-26perf symbol: Add blocking argument to filename__read_build_idIan Rogers12-27/+32
When synthesizing build-ids, for build ID mmap2 events, they will be added for data mmaps if -d/--data is specified. The files opened for their build IDs may block on the open causing perf to hang during synthesis. There is some robustness in existing calls to filename__read_build_id by checking the file path is to a regular file, which unfortunately fails for symlinks. Rather than adding more is_regular_file calls, switch filename__read_build_id to take a "block" argument and specify O_NONBLOCK when this is false. The existing is_regular_file checking callers and the event synthesis callers are made to pass false and thereby avoiding the hang. Fixes: 53b00ff358dc ("perf record: Make --buildid-mmap the default") Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250823000024.724394-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-08-26perf symbol-minimal: Fix ehdr reading in filename__read_build_idIan Rogers1-28/+27
The e_ident is part of the ehdr and so reading it a second time would mean the read ehdr was displaced by 16-bytes. Switch from stdio to open/read/lseek syscalls for similarity with the symbol-elf version of the function and so that later changes can alter then open flags. Fixes: fef8f648bb47 ("perf symbol: Fix use-after-free in filename__read_build_id") Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250823000024.724394-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-08-18tools headers: Sync uapi/linux/vhost.h with the kernel sourceNamhyung Kim1-0/+35
To pick up the changes in this cset: 7d9896e9f6d02d8a vhost: Reintroduce kthread API and add mode selection 333c515d189657c9 vhost-net: allow configuring extended features This addresses these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/perf/trace/beauty/include/uapi/linux/vhost.h include/uapi/linux/vhost.h Please see tools/include/uapi/README for further details. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: kvm@vger.kernel.org Cc: virtualization@lists.linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-08-18tools headers: Sync uapi/linux/prctl.h with the kernel sourceNamhyung Kim1-1/+8
To pick up the changes in this cset: b1fabef37bd504f3 prctl: Introduce PR_MTE_STORE_ONLY a2fc422ed75748ee syscall_user_dispatch: Add PR_SYS_DISPATCH_INCLUSIVE_ON This addresses these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/perf/trace/beauty/include/uapi/linux/prctl.h include/uapi/linux/prctl.h Please see tools/include/uapi/README for further details. Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-08-18tools headers: Sync uapi/linux/fs.h with the kernel sourceNamhyung Kim1-0/+88
To pick up the changes in this cset: 76fdb7eb4e1c9108 uapi: export PROCFS_ROOT_INO ca115d7e754691c0 tree-wide: s/struct fileattr/struct file_kattr/g be7efb2d20d67f33 fs: introduce file_getattr and file_setattr syscalls 9eb22f7fedfc9eb1 fs: add ioctl to query metadata and protection info capabilities This addresses these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/perf/trace/beauty/include/uapi/linux/fs.h include/uapi/linux/fs.h Please see tools/include/uapi/README for further details. Cc: Christian Brauner <brauner@kernel.org> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-08-18tools headers: Sync uapi/linux/fcntl.h with the kernel sourceNamhyung Kim1-0/+18
To pick up the changes in this cset: 3941e37f62fe2c3c uapi/fcntl: add FD_PIDFS_ROOT cd5d2006327b6d84 uapi/fcntl: add FD_INVALID 67fcec2919e4ed31 fcntl/pidfd: redefine PIDFD_SELF_THREAD_GROUP a4c746f06853f91d uapi/fcntl: mark range as reserved This addresses these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/perf/trace/beauty/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h Please see tools/include/uapi/README for further details. Cc: Christian Brauner <brauner@kernel.org> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-08-18tools headers: Sync syscall tables with the kernel sourceNamhyung Kim9-0/+18
To pick up the changes in this cset: be7efb2d20d67f33 fs: introduce file_getattr and file_setattr syscalls This addresses these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h diff -u tools/scripts/syscall.tbl scripts/syscall.tbl diff -u tools/perf/arch/x86/entry/syscalls/syscall_32.tbl arch/x86/entry/syscalls/syscall_32.tbl diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl diff -u tools/perf/arch/arm/entry/syscalls/syscall.tbl arch/arm/tools/syscall.tbl diff -u tools/perf/arch/sh/entry/syscalls/syscall.tbl arch/sh/kernel/syscalls/syscall.tbl diff -u tools/perf/arch/sparc/entry/syscalls/syscall.tbl arch/sparc/kernel/syscalls/syscall.tbl diff -u tools/perf/arch/xtensa/entry/syscalls/syscall.tbl arch/xtensa/kernel/syscalls/syscall.tbl Please see tools/include/uapi/README for further details. Cc: Arnd Bergmann <arnd@arndb.de> CC: linux-api@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-08-18perf test: Fix a build error in x86 topdown testNamhyung Kim1-0/+1
There's an environment that caused the following build error. Include "debug.h" (under util directory) to fix it. arch/x86/tests/topdown.c: In function 'event_cb': arch/x86/tests/topdown.c:53:25: error: implicit declaration of function 'pr_debug' [-Werror=implicit-function-declaration] 53 | pr_debug("Broken topdown information for '%s'\n", evsel__name(evsel)); | ^~~~~~~~ cc1: all warnings being treated as errors Link: https://lore.kernel.org/r/20250815164122.289651-1-namhyung@kernel.org Fixes: 5b546de9cc177936 ("perf topdown: Use attribute to see an event is a topdown metic or slots") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-08-07perf bpf-filter: Enable events manuallyIlya Leoshkevich1-1/+4
On s390, and, in general, on all platforms where the respective event supports auxiliary data gathering, the command: # ./perf record -u 0 -aB --synth=no -- ./perf test -w thloop [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.011 MB perf.data ] # ./perf report --stats | grep SAMPLE # does not generate samples in the perf.data file. On x86 the command: # sudo perf record -e intel_pt// -u 0 ls is broken too. Looking at the sequence of calls in 'perf record' reveals this behavior: 1. The event 'cycles' is created and enabled: record__open() +-> evlist__apply_filters() +-> perf_bpf_filter__prepare() +-> bpf_program.attach_perf_event() +-> bpf_program.attach_perf_event_opts() +-> __GI___ioctl(..., PERF_EVENT_IOC_ENABLE, ...) The event 'cycles' is enabled and active now. However the event's ring-buffer to store the samples generated by hardware is not allocated yet. 2. The event's fd is mmap()ed to create the ring buffer: record__open() +-> record__mmap() +-> record__mmap_evlist() +-> evlist__mmap_ex() +-> perf_evlist__mmap_ops() +-> mmap_per_cpu() +-> mmap_per_evsel() +-> mmap__mmap() +-> perf_mmap__mmap() +-> mmap() This allocates the ring buffer for the event 'cycles'. With mmap() the kernel creates the ring buffer: perf_mmap(): kernel function to create the event's ring | buffer to save the sampled data. | +-> ring_buffer_attach(): Allocates memory for ring buffer. | The PMU has auxiliary data setup function. The | has_aux(event) condition is true and the PMU's | stop() is called to stop sampling. It is not | restarted: | | if (has_aux(event)) | perf_event_stop(event, 0); | +-> cpumsf_pmu_stop(): Hardware sampling is stopped. No samples are generated and saved anymore. 3. After the event 'cycles' has been mapped, the event is enabled a second time in: __cmd_record() +-> evlist__enable() +-> __evlist__enable() +-> evsel__enable_cpu() +-> perf_evsel__enable_cpu() +-> perf_evsel__run_ioctl() +-> perf_evsel__ioctl() +-> __GI___ioctl(., PERF_EVENT_IOC_ENABLE, .) The second ioctl(fd, PERF_EVENT_IOC_ENABLE, 0); is just a NOP in this case. The first invocation in (1.) sets the event::state to PERF_EVENT_STATE_ACTIVE. The kernel functions perf_ioctl() +-> _perf_ioctl() +-> _perf_event_enable() +-> __perf_event_enable() return immediately because event::state is already set to PERF_EVENT_STATE_ACTIVE. This happens on s390, because the event 'cycles' offers the possibility to save auxilary data. The PMU callbacks setup_aux() and free_aux() are defined. Without both callback functions, cpumsf_pmu_stop() is not invoked and sampling continues. To remedy this, remove the first invocation of ioctl(..., PERF_EVENT_IOC_ENABLE, ...). in step (1.) Create the event in step (1.) and enable it in step (3.) after the ring buffer has been mapped. Output after: # ./perf record -aB --synth=no -u 0 -- ./perf test -w thloop 2 [ perf record: Woken up 3 times to write data ] [ perf record: Captured and wrote 0.876 MB perf.data ] # ./perf report --stats | grep SAMPLE SAMPLE events: 16200 (99.5%) SAMPLE events: 16200 # The software event succeeded both before and after the patch: # ./perf record -e cpu-clock -aB --synth=no -u 0 -- \ ./perf test -w thloop 2 [ perf record: Woken up 7 times to write data ] [ perf record: Captured and wrote 2.870 MB perf.data ] # ./perf report --stats | grep SAMPLE SAMPLE events: 53506 (99.8%) SAMPLE events: 53506 # Fixes: b4c658d4d63d61 ("perf target: Remove uid from target") Suggested-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Co-developed-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250806162417.19666-3-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-02Merge tag 'perf-tools-for-v6.17-2025-08-01' of ↵Linus Torvalds348-3197/+10466
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Namhyung Kim: "Build-ID processing goodies: Build-IDs are content based hashes to link regions of memory to ELF files in post processing. They have been available in distros for quite a while: $ file /bin/bash /bin/bash: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=707a1c670cd72f8e55ffedfbe94ea98901b7ce3a, for GNU/Linux 3.2.0, stripped It is possible to ask the kernel to get it from mmap executable backing storage at time they are being put in place and send it as metadata at that moment to have in perf.data. Prefer that across the board to speed up 'record' time - it post processes the samples to find binaries touched by any samples and to save them with build-ID. It can skip reading build-ID in userspace if it comes from the kernel. perf record: * Make --buildid-mmap default. The kernel can generate MMAP2 events with a build-ID from ELF header. Use that by default instead of using inode and device ID to identify binaries. It also can be disabled with --no-buildid-mmap. * Use BPF for -u/--uid option to sample processes belong to a user. BPF can track user processes more accurately and the existing logic often fails to get the list of processes due to race with reading the /proc filesystem. * Generate PERF_RECORD_BPF_METADATA when it profiles BPF programs and they have variables starting with "bpf_metadata_". This will help to identify BPF objects used in the profile. This has been supported in bpftool for some time and allows the recording of metadata such as commit hashes, versions, etc, that now gets recorded in perf.data as well. * Collect list of DSOs touched in the sample callchains as well as in the sample itself. This would increase the processing time at the end of record, but can improve the data quality. perf stat: * Add a new 'drm' pseudo-PMU support like in 'hwmon'. It can collect DRM usage stats using fdinfo in /proc. On my Intel laptop, it shows like below: $ perf list drm ... drm: drm-active-stolen-system0 [Total memory active in one or more engines. Unit: drm_i915] drm-active-system0 [Total memory active in one or more engines. Unit: drm_i915] drm-engine-capacity-video [Engine capacity. Unit: drm_i915] drm-engine-copy [Utilization in ns. Unit: drm_i915] drm-engine-render [Utilization in ns. Unit: drm_i915] drm-engine-video [Utilization in ns. Unit: drm_i915] ... $ sudo perf stat -a -e drm-engine-render,drm-engine-video,drm-engine-capacity-video sleep 1 Performance counter stats for 'system wide': 48,137,316,988,873 ns drm-engine-render 34,452,696,746 ns drm-engine-video 20 capacity drm-engine-capacity-video 1.002086194 seconds time elapsed perf list * Add description for software events. The description is in JSON format and the event parser now can handle the software events like others (for example, it's case-insensitive and subject to wildcard matching). $ perf list software List of pre-defined events (to be used in -e or -M): software: alignment-faults [Number of kernel handled memory alignment faults. Unit: software] bpf-output [An event used by BPF programs to write to the perf ring buffer. Unit: software] cgroup-switches [Number of context switches to a task in a different cgroup. Unit: software] context-switches [Number of context switches [This event is an alias of cs]. Unit: software] cpu-clock [Per-CPU high-resolution timer based event. Unit: software] cpu-migrations [Number of times a process has migrated to a new CPU [This event is an alias of migrations]. Unit: software] cs [Number of context switches [This event is an alias of context-switches]. Unit: software] dummy [A placeholder event that doesn't count anything. Unit: software] emulation-faults [Number of kernel handled unimplemented instruction faults handled through emulation. Unit: software] faults [Number of page faults [This event is an alias of page-faults]. Unit: software] major-faults [Number of major page faults. Major faults require I/O to handle. Unit: software] migrations [Number of times a process has migrated to a new CPU [This event is an alias of cpu-migrations]. Unit: software] minor-faults [Number of minor page faults. Minor faults don't require I/O to handle. Unit: software] page-faults [Number of page faults [This event is an alias of faults]. Unit: software] task-clock [Per-task high-resolution timer based event. Unit: software] perf ftrace: * Add -e/--events option to perf ftrace latency to measure latency between the two events instead of a function. $ sudo perf ftrace latency -ab -e i915_request_wait_begin,i915_request_wait_end --hide-empty -- sleep 1 # DURATION | COUNT | GRAPH | 256 - 512 us | 4 | ###### | 2 - 4 ms | 2 | ### | 4 - 8 ms | 12 | ################### | 8 - 16 ms | 10 | ################ | # statistics (in usec) total time: 194915 avg time: 6961 max time: 12855 min time: 373 count: 28 * Add new function graph tracer options (--graph-opts) to display more info like arguments and return value. They will be passed to the kernel ftrace directly. $ sudo perf ftrace -G vfs_write --graph-opts retval,retaddr # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | ... 5) | mutex_unlock() { /* <-rb_simple_write+0xda/0x150 */ 5) 0.188 us | local_clock(); /* <-lock_release+0x2ad/0x440 ret=0x3bf2a3cf90e */ 5) | rt_mutex_slowunlock() { /* <-rb_simple_write+0xda/0x150 */ 5) | _raw_spin_lock_irqsave() { /* <-rt_mutex_slowunlock+0x4f/0x200 */ 5) 0.123 us | preempt_count_add(); /* <-_raw_spin_lock_irqsave+0x23/0x90 ret=0x0 */ 5) 0.128 us | local_clock(); /* <-__lock_acquire.isra.0+0x17a/0x740 ret=0x3bf2a3cfc8b */ 5) 0.086 us | do_raw_spin_trylock(); /* <-_raw_spin_lock_irqsave+0x4a/0x90 ret=0x1 */ 5) 0.845 us | } /* _raw_spin_lock_irqsave ret=0x292 */ ... Misc: * Add perf archive --exclude-buildids <FILE> option to skip some binaries. The format of the FILE should be same as an output of perf buildid-list. * Get rid of dependency of libcrypto. It was just to get SHA-1 hash so implement it directly like in the kernel. A side effect is that it needs -fno-strict-aliasing compiler option (again, like in the kernel). * Convert all shell script tests to use bash" * tag 'perf-tools-for-v6.17-2025-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (179 commits) perf record: Cache build-ID of hit DSOs only perf test: Ensure lock contention using pipe mode perf python: Stop using deprecated PyUnicode_AsString() perf list: Skip ABI PMUs when printing pmu values perf list: Remove tracepoint printing code perf tp_pmu: Add event APIs perf tp_pmu: Factor existing tracepoint logic to new file perf parse-events: Remove non-json software events perf jevents: Add common software event json perf tools: Remove libtraceevent in .gitignore perf test: Fix comment ordering perf sort: Use perf_env to set arch sort keys and header perf test: Move PERF_SAMPLE_WEIGHT_STRUCT parsing to common test perf sample: Remove arch notion of sample parsing perf env: Remove global perf_env perf trace: Avoid global perf_env with evsel__env perf auxtrace: Pass perf_env from session through to mmap read perf machine: Explicitly pass in host perf_env perf bench synthesize: Avoid use of global perf_env perf top: Make perf_env locally scoped ...
2025-07-31perf record: Cache build-ID of hit DSOs onlyNamhyung Kim1-1/+1
It post-processes samples to find which DSO has samples. Based on that info, it can save used DSOs in the build-ID cache directory. But for some reason, it saves all DSOs without checking the hit mark. Skipping unused DSOs can give some speedup especially with --buildid-mmap being default. On my idle machine, `time perf record -a sleep 1` goes down from 3 sec to 1.5 sec with this change. Fixes: e29386c8f7d71fa5 ("perf record: Add --buildid-mmap option to enable PERF_RECORD_MMAP2's build id") Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250731070330.57116-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-31Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds3-38/+60
Pull kvm updates from Paolo Bonzini: "ARM: - Host driver for GICv5, the next generation interrupt controller for arm64, including support for interrupt routing, MSIs, interrupt translation and wired interrupts - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on GICv5 hardware, leveraging the legacy VGIC interface - Userspace control of the 'nASSGIcap' GICv3 feature, allowing userspace to disable support for SGIs w/o an active state on hardware that previously advertised it unconditionally - Map supporting endpoints with cacheable memory attributes on systems with FEAT_S2FWB and DIC where KVM no longer needs to perform cache maintenance on the address range - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest hypervisor to inject external aborts into an L2 VM and take traps of masked external aborts to the hypervisor - Convert more system register sanitization to the config-driven implementation - Fixes to the visibility of EL2 registers, namely making VGICv3 system registers accessible through the VGIC device instead of the ONE_REG vCPU ioctls - Various cleanups and minor fixes LoongArch: - Add stat information for in-kernel irqchip - Add tracepoints for CPUCFG and CSR emulation exits - Enhance in-kernel irqchip emulation - Various cleanups RISC-V: - Enable ring-based dirty memory tracking - Improve perf kvm stat to report interrupt events - Delegate illegal instruction trap to VS-mode - MMU improvements related to upcoming nested virtualization s390x - Fixes x86: - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC, and PIT emulation at compile time - Share device posted IRQ code between SVM and VMX and harden it against bugs and runtime errors - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1) instead of O(n) - For MMIO stale data mitigation, track whether or not a vCPU has access to (host) MMIO based on whether the page tables have MMIO pfns mapped; using VFIO is prone to false negatives - Rework the MSR interception code so that the SVM and VMX APIs are more or less identical - Recalculate all MSR intercepts from scratch on MSR filter changes, instead of maintaining shadow bitmaps - Advertise support for LKGS (Load Kernel GS base), a new instruction that's loosely related to FRED, but is supported and enumerated independently - Fix a user-triggerable WARN that syzkaller found by setting the vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting the vCPU into VMX Root Mode (post-VMXON). Trying to detect every possible path leading to architecturally forbidden states is hard and even risks breaking userspace (if it goes from valid to valid state but passes through invalid states), so just wait until KVM_RUN to detect that the vCPU state isn't allowed - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of APERF/MPERF reads, so that a "properly" configured VM can access APERF/MPERF. This has many caveats (APERF/MPERF cannot be zeroed on vCPU creation or saved/restored on suspend and resume, or preserved over thread migration let alone VM migration) but can be useful whenever you're interested in letting Linux guests see the effective physical CPU frequency in /proc/cpuinfo - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been created, as there's no known use case for changing the default frequency for other VM types and it goes counter to the very reason why the ioctl was added to the vm file descriptor. And also, there would be no way to make it work for confidential VMs with a "secure" TSC, so kill two birds with one stone - Dynamically allocation the shadow MMU's hashed page list, and defer allocating the hashed list until it's actually needed (the TDP MMU doesn't use the list) - Extract many of KVM's helpers for accessing architectural local APIC state to common x86 so that they can be shared by guest-side code for Secure AVIC - Various cleanups and fixes x86 (Intel): - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest. Failure to honor FREEZE_IN_SMM can leak host state into guests - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF x86 (AMD): - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the nested SVM MSRPM offsets tracker can't handle an MSR (which is pretty much a static condition and therefore should never happen, but still) - Fix a variety of flaws and bugs in the AVIC device posted IRQ code - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware supports) instead of rejecting vCPU creation - Extend enable_ipiv module param support to SVM, by simply leaving IsRunning clear in the vCPU's physical ID table entry - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by erratum #1235, to allow (safely) enabling AVIC on such CPUs - Request GA Log interrupts if and only if the target vCPU is blocking, i.e. only if KVM needs a notification in order to wake the vCPU - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the vCPU's CPUID model - Accept any SNP policy that is accepted by the firmware with respect to SMT and single-socket restrictions. An incompatible policy doesn't put the kernel at risk in any way, so there's no reason for KVM to care - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and use WBNOINVD instead of WBINVD when possible for SEV cache maintenance - When reclaiming memory from an SEV guest, only do cache flushes on CPUs that have ever run a vCPU for the guest, i.e. don't flush the caches for CPUs that can't possibly have cache lines with dirty, encrypted data Generic: - Rework irqbypass to track/match producers and consumers via an xarray instead of a linked list. Using a linked list leads to O(n^2) insertion times, which is hugely problematic for use cases that create large numbers of VMs. Such use cases typically don't actually use irqbypass, but eliminating the pointless registration is a future problem to solve as it likely requires new uAPI - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *", to avoid making a simple concept unnecessarily difficult to understand - Decouple device posted IRQs from VFIO device assignment, as binding a VM to a VFIO group is not a requirement for enabling device posted IRQs - Clean up and document/comment the irqfd assignment code - Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e. ensure an eventfd is bound to at most one irqfd through the entire host, and add a selftest to verify eventfd:irqfd bindings are globally unique - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues related to private <=> shared memory conversions - Drop guest_memfd's .getattr() implementation as the VFS layer will call generic_fillattr() if inode_operations.getattr is NULL - Fix issues with dirty ring harvesting where KVM doesn't bound the processing of entries in any way, which allows userspace to keep KVM in a tight loop indefinitely - Kill off kvm_arch_{start,end}_assignment() and x86's associated tracking, now that KVM no longer uses assigned_device_count as a heuristic for either irqbypass usage or MDS mitigation Selftests: - Fix a comment typo - Verify KVM is loaded when getting any KVM module param so that attempting to run a selftest without kvm.ko loaded results in a SKIP message about KVM not being loaded/enabled (versus some random parameter not existing) - Skip tests that hit EACCES when attempting to access a file, and print a "Root required?" help message. In most cases, the test just needs to be run with elevated permissions" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits) Documentation: KVM: Use unordered list for pre-init VGIC registers RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map() RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs RISC-V: perf/kvm: Add reporting of interrupt events RISC-V: KVM: Enable ring-based dirty memory tracking RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap RISC-V: KVM: Delegate illegal instruction fault to VS mode RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs RISC-V: KVM: Factor-out g-stage page table management RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence RISC-V: KVM: Introduce struct kvm_gstage_mapping RISC-V: KVM: Factor-out MMU related declarations into separate headers RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect() RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range() RISC-V: KVM: Don't flush TLB when PTE is unchanged RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize() RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init() RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list ...
2025-07-30perf test: Ensure lock contention using pipe modeJan Polensky1-13/+13
The 'kernel lock contention analysis test' requires reliable triggering of lock contention. On some systems, previous benchmark calls failed to generate sufficient contention due to low system activity or resource limits. This patch adds the -p (pipe) option to all calls of perf bench sched messaging, ensuring consistent lock contention without relying on socket-based communication. Suggested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Jan Polensky <japo@linux.ibm.com> Link: https://lore.kernel.org/r/20250725170801.3176678-1-japo@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-30perf python: Stop using deprecated PyUnicode_AsString()Arnaldo Carvalho de Melo1-1/+11
As noticed while building for Fedora 43: GEN /tmp/build/perf/python/perf.cpython-314-x86_64-linux-gnu.so /git/perf-6.16.0-rc3/tools/perf/util/python.c: In function ‘get_tracepoint_field’: /git/perf-6.16.0-rc3/tools/perf/util/python.c:340:9: error: ‘_PyUnicode_AsString’ is deprecated [-Werror=deprecated-declarations] 340 | const char *str = _PyUnicode_AsString(PyObject_Str(attr_name)); | ^~~~~ In file included from /usr/include/python3.14/unicodeobject.h:1022, from /usr/include/python3.14/Python.h:89, from /git/perf-6.16.0-rc3/tools/perf/util/python.c:2: /usr/include/python3.14/cpython/unicodeobject.h:648:1: note: declared here 648 | _PyUnicode_AsString(PyObject *unicode) | ^~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors error: command '/usr/bin/gcc' failed with exit code 1 Use PyUnicode_AsUTF8() instead and also check if PyObject_Str() fails before doing so. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/aIofXNK8QLtLIaI3@x1 Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-28RISC-V: perf/kvm: Add reporting of interrupt eventsQuan Zhou3-38/+60
For `perf kvm stat` on the RISC-V, in order to avoid the occurrence of `UNKNOWN` event names, interrupts should be reported in addition to exceptions. testing without patch: Event name Samples Sample% Time(ns) --------------------------- -------- -------- ------------ STORE_GUEST_PAGE_FAULT 1496461 53.00% 889612544 UNKNOWN 887514 31.00% 272857968 LOAD_GUEST_PAGE_FAULT 305164 10.00% 189186331 VIRTUAL_INST_FAULT 70625 2.00% 134114260 SUPERVISOR_SYSCALL 32014 1.00% 58577110 INST_GUEST_PAGE_FAULT 1 0.00% 2545 testing with patch: Event name Samples Sample% Time(ns) --------------------------- -------- -------- ------------ IRQ_S_TIMER 211271 58.00% 738298680600 EXC_STORE_GUEST_PAGE_FAULT 111279 30.00% 130725914800 EXC_LOAD_GUEST_PAGE_FAULT 22039 6.00% 25441480600 EXC_VIRTUAL_INST_FAULT 8913 2.00% 21015381600 IRQ_VS_EXT 4748 1.00% 10155464300 IRQ_S_EXT 2802 0.00% 13288775800 IRQ_S_SOFT 1998 0.00% 4254129300 Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/9693132df4d0f857b8be3a75750c36b40213fcc0.1726211632.git.zhouquan@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2025-07-27perf list: Skip ABI PMUs when printing pmu valuesIan Rogers5-5/+23
Avoid printing tracepoint, legacy and software events when listing for the pmu option. Add the PMU type to the print_event callbacks to ease detection. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250725185202.68671-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-27perf list: Remove tracepoint printing codeIan Rogers3-101/+23
Now that the tp_pmu can iterate and describe events remove the custom tracepoint printing logic, this avoids perf list showing the tracepoint events twice. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250725185202.68671-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-27perf tp_pmu: Add event APIsIan Rogers3-0/+129
Add event APIs for the tracepoint PMU allowing things like perf list to function using it. For perf list add the tracepoint format in the long description (shown with -v). $ sudo perf list -v tracepoint List of pre-defined events (to be used in -e or -M): alarmtimer:alarmtimer_cancel [Tracepoint event] [name: alarmtimer_cancel ID: 416 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:void * alarm; offset:8; size:8; signed:0; field:unsigned char alarm_type; offset:16; size:1; signed:0; field:s64 expires; offset:24; size:8; signed:1; field:s64 now; offset:32; size:8; signed:1; print fmt: "alarmtimer:%p type:%s expires:%llu now:%llu",REC->alarm,__print_flags((1 << REC->alarm_type)," | ",{ 1 << 0, "REALTIME" },{ 1 << 1,"BOOTTIME" },{ 1 << 3,"REALTIME Freezer" },{ 1 << 4,"BOOTTIME Freezer" }),REC->expires,REC->now . Unit: tracepoint] alarmtimer:alarmtimer_fired [Tracepoint event] [name: alarmtimer_fired ID: 418 ... Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250725185202.68671-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-27perf tp_pmu: Factor existing tracepoint logic to new fileIan Rogers5-106/+170
Start the creation of a tracepoint PMU abstraction. Tracepoint events don't follow the regular sysfs perf conventions. Eventually the new PMU abstraction will bridge the gap so tracepoint events look more like regular perf ones. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250725185202.68671-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-27perf parse-events: Remove non-json software eventsIan Rogers6-107/+33
Remove the hard coded encodings from parse-events. This has the consequence that software events are matched using the sysfs/json priority, will be case insensitive and will be wildcarded across PMUs. As there were software and hardware types in the parsing code, the removal means software vs hardware logic can be removed and hardware assumed. Now the perf json provides detailed descriptions of software events, remove the previous listing support that didn't contain event descriptions. When globbing is required for the "sw" option in perf list, use string PMU globbing as was done previously for the tool PMU. The output of `perf list sw` command changed like this. Before: List of pre-defined events (to be used in -e or -M): alignment-faults [Software event] bpf-output [Software event] cgroup-switches [Software event] context-switches OR cs [Software event] cpu-clock [Software event] cpu-migrations OR migrations [Software event] dummy [Software event] emulation-faults [Software event] major-faults [Software event] minor-faults [Software event] page-faults OR faults [Software event] task-clock [Software event] After: List of pre-defined events (to be used in -e or -M): software: alignment-faults [Number of kernel handled memory alignment faults. Unit: software] bpf-output [An event used by BPF programs to write to the perf ring buffer. Unit: software] cgroup-switches [Number of context switches to a task in a different cgroup. Unit: software] context-switches [Number of context switches [This event is an alias of cs]. Unit: software] cpu-clock [Per-CPU high-resolution timer based event. Unit: software] cpu-migrations [Number of times a process has migrated to a new CPU [This event is an alias of migrations]. Unit: software] cs [Number of context switches [This event is an alias of context-switches]. Unit: software] dummy [A placeholder event that doesn't count anything. Unit: software] ... Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250725185202.68671-4-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-27perf jevents: Add common software event jsonIan Rogers3-109/+264
Add json for software events so that in perf list the events can have a description. Common json exists for the tool PMU but it has no sysfs equivalent. Modify the map_for_pmu code to return the common map (rather than an architecture specific one) when a PMU with a common name is being looked for, this allows the events to be found. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20250725185202.68671-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-27perf tools: Remove libtraceevent in .gitignoreChen Pei1-2/+0
The libtraceevent has been removed from the source tree, and .gitignore needs to be updated as well. Fixes: 4171925aa9f3f7bf ("tools lib traceevent: Remove libtraceevent") Signed-off-by: Chen Pei <cp0613@linux.alibaba.com> Link: https://lore.kernel.org/r/20250726111532.8031-1-cp0613@linux.alibaba.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-27perf test: Fix comment orderingBlake Jones1-2/+2
The previous commit that introduced this test overlooked a behavior of "perf test list", causing it to print "SPDX-License-Identifier: GPL-2.0" as a description for that test. This reorders the comments to fix that issue. Fixes: edf2cadf01e8 ("perf test: add test for BPF metadata collection") Signed-off-by: Blake Jones <blakejones@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250726004023.3466563-1-blakejones@google.com [ update the commit message a little bit ] Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf sort: Use perf_env to set arch sort keys and headerIan Rogers15-131/+107
Previously arch_support_sort_key and arch_perf_header_entry used a weak symbol to compile as appropriate for x86 and powerpc. A limitation to this is that the handling of a data file could vary in cross-platform development. Change to using the perf_env of the current session to determine the architecture kind and set the sort key and header entries as appropriate. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-23-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf test: Move PERF_SAMPLE_WEIGHT_STRUCT parsing to common testIan Rogers5-129/+14
test__x86_sample_parsing is identical to test__sample_parsing except it explicitly tested PERF_SAMPLE_WEIGHT_STRUCT. Now the parsing code is common move the PERF_SAMPLE_WEIGHT_STRUCT to the common sample parsing test and remove the x86 version. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-22-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf sample: Remove arch notion of sample parsingIan Rogers14-80/+36
By definition arch sample parsing and synthesis will inhibit certain kinds of cross-platform record then analysis (report, script, etc.). Remove arch_perf_parse_sample_weight and arch_perf_synthesize_sample_weight replacing with a common implementation. Combine perf_sample p_stage_cyc and retire_lat as weight3 to capture the differing uses regardless of compiled for architecture. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-21-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf env: Remove global perf_envIan Rogers6-10/+4
The global perf_env was used for the host, but if a perf_env wasn't easy to come by it was used in a lot of places where potentially recorded and host data could be confused. Remove the global variable as now the majority of accesses retrieve the perf_env for the host from the session. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-20-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf trace: Avoid global perf_env with evsel__envIan Rogers1-9/+3
There is no session in perf trace unless in replay mode, so in host mode no session can be associated with the evlist. If the evsel__env call fails resort to the host_env that's part of the trace. Remove errno_to_name as it becomes a called once 1-line function once the argument is turned into a perf_env, just call perf_env__arch_strerrno directly. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-19-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf auxtrace: Pass perf_env from session through to mmap readIan Rogers3-10/+17
auxtrace_mmap__read and auxtrace_mmap__read_snapshot end up calling `evsel__env(NULL)` which returns the global perf_env variable for the host. Their only call is in perf record. Rather than use the global variable pass through the perf_env for `perf record`. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-18-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf machine: Explicitly pass in host perf_envIan Rogers12-35/+81
When creating a machine for the host explicitly pass in a scoped perf_env. This removes a use of the global perf_env. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-17-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf bench synthesize: Avoid use of global perf_envIan Rogers1-8/+19
The benchmark doesn't use a data file and so the header perf_env isn't used. Stack allocate a host perf_env for use to avoid the use of the global perf_env. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-16-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf top: Make perf_env locally scopedIan Rogers1-13/+28
The use of the global host perf_env variable is potentially inconsistent within the code. Switch perf top to using a locally scoped variable that is generally accessed through the session. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-15-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf session: Add host_env argument to perf_session__newIan Rogers3-5/+8
When creating a perf_session the host perf_env may or may not want to be used. For example, `perf top` uses a host perf_env while `perf inject` does not. Add a host_env argument to perf_session__new so that sessions requiring a host perf_env can pass it in. Currently if none is specified the global perf_env variable is used, but this will change in later patches. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-14-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf test: Avoid use perf_envIan Rogers3-23/+33
The perf_env global variable holds the host perf_env data but its use is hit and miss. Switch to using local perf_env variables and ensure scoped perf_env__init and perf_env__exit. This loses command line setting of the perf_env, but this doesn't matter for tests. So the perf_env is fully initialized, clear it with memset in perf_env__init. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-13-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf header: Clean up use of perf_envIan Rogers1-76/+98
Always use the perf_env from the feat_fd's perf_header. Cache the value on entry to a function in `env` and use `env->` consistently in the code. Ensure the header is initialized for use in perf_session__do_write_header. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-12-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf evlist: Change env variable to sessionIan Rogers17-22/+35
The session holds a perf_env pointer env. In UI code container_of is used to turn the env to a session, but this assumes the session header's env is in use. Rather than a dubious container_of, hold the session in the evlist and derive the env from the session with evsel__env, perf_session__env, etc. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-11-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf session: Add accessor for session->header.envIan Rogers25-107/+120
The perf_env from the header in the session is frequently accessed, add an accessor function rather than access directly. Cache the value to avoid repeated calls. No behavioral change. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-10-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf record: Make --buildid-mmap the defaultIan Rogers4-22/+34
Support for build IDs in mmap2 perf events has been present since Linux v5.12: https://lore.kernel.org/lkml/20210219194619.1780437-1-acme@kernel.org/ Build ID mmap events don't avoid the need to inject build IDs for DSO touched by samples as the build ID cache is populated by perf record. They can avoid some cases of symbol mis-resolution caused by the file system changing from when a sample occurred and when the DSO is sought. Unlike the --buildid-mmap option, this chnage doesn't disable the build ID cache but it does disable the processing of samples looking for DSOs to inject build IDs for. To disable the build ID cache the -B (--no-buildid) option should be used. Making this option the default was raised on the list in: https://lore.kernel.org/linux-perf-users/CAP-5=fXP7jN_QrGUcd55_QH5J-Y-FCaJ6=NaHVtyx0oyNh8_-Q@mail.gmail.com/ Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-9-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf jitdump: Directly mark the jitdump DSOIan Rogers1-4/+17
The DSO being generated was being accessed through a thread's maps, this is unnecessary as the dso can just be directly found. This avoids problems with passing a NULL evsel which may be inspected to determine properties of a callchain when using the buildid DSO marking code. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf dso: Move build_id to dso_idIan Rogers14-155/+197
The dso_id previously contained the major, minor, inode and inode generation information from a mmap2 event - the inode generation would be zero when reading from /proc/pid/maps. The build_id was in the dso. With build ID mmap2 events these fields wouldn't be initialized which would largely mean the special empty case where any dso would match for equality. This isn't desirable as if a dso is replaced we want the comparison to yield a difference. To support detecting the difference between DSOs based on build_id, move the build_id out of the DSO and into the dso_id. The dso_id is also stored in the DSO so nothing is lost. Capture in the dso_id what parts have been initialized and rename dso_id__inject to dso_id__improve_id so that it is clear the dso_id is being improved upon with additional information. With the build_id in the dso_id, use memcmp to compare for equality. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf build-id: Ensure struct build_id is empty before useIan Rogers11-17/+20
If a build ID is read then not all code paths may ensure it is empty before use. Initialize the build_id to be zero-ed unless there is clear initialization such as a call to build_id__init. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf build-id: Mark DSO in sample callchainsIan Rogers1-1/+16
Previously only the sample IP's map DSO would be marked hit for the purposes of populating the build ID cache. Walk the call chain to mark all IPs and DSOs. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf build-id: Change sprintf functions to snprintfIan Rogers16-50/+42
Pass in a size argument rather than implying all build id strings must be SBUILD_ID_SIZE. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-4-irogers@google.com [ fixed some build errors ] Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-24perf build-id: Truncate to avoid overflowing the build_id dataIan Rogers1-1/+4
Warning when the build_id data would be overflowed would lead to memory corruption, switch to truncation. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-3-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>