summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)AuthorFilesLines
2025-07-26selftests/bpf: Fix test dynptr/test_dynptr_copy_xdp failureYonghong Song1-2/+3
For arm64 64K page size, the bpf_dynptr_copy() in test dynptr/test_dynptr_copy_xdp will succeed, but the test will failure with 4K page size. This patch made a change so the test will fail expectedly for both 4K and 64K page sizes. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://patch.msgid.link/20250725043435.208974-1-yonghong.song@linux.dev
2025-07-26selftests/bpf: Increase xdp data size for arm64 64K page sizeYonghong Song1-2/+8
With arm64 64K page size, the following 4 subtests failed: #97/25 dynptr/test_probe_read_user_dynptr:FAIL #97/26 dynptr/test_probe_read_kernel_dynptr:FAIL #97/27 dynptr/test_probe_read_user_str_dynptr:FAIL #97/28 dynptr/test_probe_read_kernel_str_dynptr:FAIL These failures are due to function bpf_dynptr_check_off_len() in include/linux/bpf.h where there is a test if (len > size || offset > size - len) return -E2BIG; With 64K page size, the 'offset' is greater than 'size - len', which caused the test failure. For 64KB page size, this patch increased the xdp buffer size from 5000 to 90000. The above 4 test failures are fixed as 'size' value is increased. But it introduced two new failures: #97/4 dynptr/test_dynptr_copy_xdp:FAIL #97/12 dynptr/test_dynptr_memset_xdp_chunks:FAIL These two failures will be addressed in subsequent patches. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://patch.msgid.link/20250725043430.208469-1-yonghong.song@linux.dev
2025-07-26selftests: drv-net: Wait for bkg socat to startMohsin Bashir1-1/+3
Currently, UDP exchange is prone to failure when cmd attempt to send data while socat in bkg is not ready. Since, the behavior is probabilistic, this can result in flakiness for XDP tests. While testing test_xdp_native_tx_mb() on netdevsim, a failure rate of around 1% in 500 500 iterations was observed. Use wait_port_listen() to ensure that the bkg socat is started and ready to receive before cmd start sending. With proposed changes, a re-run of the same test passed 100% of time. Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20250724235140.2645885-1-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-26Merge tag 'nf-next-25-07-25' of ↵Jakub Kicinski8-4/+31
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following series contains Netfilter/IPVS updates for net-next: 1) Display netns inode in conntrack table full log, from lvxiafei. 2) Autoload nf_log_syslog in case no logging backend is available, from Lance Yang. 3) Three patches to remove unused functions in x_tables, nf_tables and conntrack. From Yue Haibing. 4) Exclude LEGACY TABLES on PREEMPT_RT: Add NETFILTER_XTABLES_LEGACY to exclude xtables legacy infrastructure. 5) Restore selftests by toggling NETFILTER_XTABLES_LEGACY where needed. From Florian Westphal. 6) Use CONFIG_INET_SCTP_DIAG in tools/testing/selftests/net/netfilter/config, from Sebastian Andrzej Siewior. 7) Use timer_delete in comment in IPVS codebase, from WangYuli. 8) Dump flowtable information in nfnetlink_hook, this includes an initial patch to consolidate common code in helper function, from Phil Sutter. 9) Remove unused arguments in nft_pipapo set backend, from Florian Westphal. 10) Return nft_set_ext instead of boolean in set lookup function, from Florian Westphal. 11) Remove indirection in dynamic set infrastructure, also from Florian. 12) Consolidate pipapo_get/lookup, from Florian. 13) Use kvmalloc in nft_pipapop, from Florian Westphal. 14) syzbot reports slab-out-of-bounds in xt_nfacct log message, fix from Florian Westphal. 15) Ignored tainted kernels in selftest nft_interface_stress.sh, from Phil Sutter. 16) Fix IPVS selftest by disabling rp_filter with ipip tunnel device, from Yi Chen. * tag 'nf-next-25-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: selftests: netfilter: ipvs.sh: Explicity disable rp_filter on interface tunl0 selftests: netfilter: Ignore tainted kernels in interface stress test netfilter: xt_nfacct: don't assume acct name is null-terminated netfilter: nft_set_pipapo: prefer kvmalloc for scratch maps netfilter: nft_set_pipapo: merge pipapo_get/lookup netfilter: nft_set: remove indirection from update API call netfilter: nft_set: remove one argument from lookup and update functions netfilter: nft_set_pipapo: remove unused arguments netfilter: nfnetlink_hook: Dump flowtable info netfilter: nfnetlink: New NFNLA_HOOK_INFO_DESC helper ipvs: Rename del_timer in comment in ip_vs_conn_expire_now() selftests: netfilter: Enable CONFIG_INET_SCTP_DIAG selftests: net: Enable legacy netfilter legacy options. netfilter: Exclude LEGACY TABLES on PREEMPT_RT. netfilter: conntrack: Remove unused net in nf_conntrack_double_lock() netfilter: nf_tables: Remove unused nft_reduce_is_readonly() netfilter: x_tables: Remove unused functions xt_{in|out}name() netfilter: load nf_log_syslog on enabling nf_conntrack_log_invalid netfilter: conntrack: table full detailed log ==================== Link: https://patch.msgid.link/20250725170340.21327-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-25rtla/tests: Limit duration to maximum of 10sTomas Glozar3-10/+10
Many of the original rtla tests included durations of 1 minute and 30 seconds. Experience has shown this is unnecessary, since 10 seconds as waiting time for samples to appear. Change duration of all rtla tests to at most 10 seconds. This speeds up testing significantly. Before: $ make check All tests successful. Files=3, Tests=54, 536 wallclock secs ( 0.03 usr 0.00 sys + 20.31 cusr 22.02 csys = 42.36 CPU) Result: PASS After: $ make check ... All tests successful. Files=3, Tests=54, 196 wallclock secs ( 0.03 usr 0.01 sys + 20.28 cusr 20.68 csys = 41.00 CPU) Result: PASS Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-9-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25rtla/tests: Add tests for actionsTomas Glozar1-0/+28
Add a bunch of tests covering most of both --on-threshold and --on-end. Parts sensitive to implementation of hist/top are tested for both. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-8-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25rtla/tests: Check rtla output with grepTomas Glozar1-4/+17
Add argument to the check command in the test suite that takes a regular expression that the output of rtla command is checked against. This allows testing for specific information in rtla output in addition to checking the return value. Two minor improvements are included: running rtla with "eval" so that arguments with spaces can be passed to it via shell quotations, and the stdout of pushd and popd is suppressed to clean up the test output. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-7-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25rtla/timerlat: Add action on end featureTomas Glozar3-29/+65
Implement actions on end next to actions on threshold. A new option, --on-end is added, parallel to --on-threshold. Instead of being executed whenever a latency threshold is reached, it is executed at the end of the measurement. For example: $ rtla timerlat hist -d 5s --on-end trace will save the trace output at the end. All actions supported by --on-threshold are also supported by --on-end, except for continue, which does nothing with --on-end. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-6-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25rtla/timerlat: Add continue actionTomas Glozar4-29/+100
Introduce option to resume tracing after a latency threshold overflow. The option is implemented as an action named "continue". Example: $ rtla timerlat top -q -T 200 -d 1s --on-threshold \ exec,command="echo Threshold" --on-threshold continue Threshold Threshold Threshold Timer Latency ... The feature is supported for both hist and top. After the continue action is executed, processing of the list of actions is stopped and tracing is resumed. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-5-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25rtla/timerlat_bpf: Allow resuming tracingTomas Glozar3-4/+25
Currently, rtla-timerlat BPF program uses a global variable stored in a .bss section to store whether tracing has been stopped. Move the information to a separate map, so that it is easily writable from userspace, and add a function that clears the value, resuming tracing after it has been stopped. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-4-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25rtla/timerlat: Add action on threshold featureTomas Glozar6-22/+341
Extend the functionality provided by the -t/--trace option, which triggers saving the contents of a tracefs buffer after tracing is stopped, to support implementing arbitrary actions. A new option, --on-threshold, is added, taking an argument that further specifies the action. Actions added in this patch are: - trace[,file=<filename>]: Saves tracefs buffer, optionally taking a filename. - signal,num=<sig>,pid=<pid>: Sends signal to process. "parent" might be specified instead of number to send signal to parent process. - shell,command=<command>: Execute shell command. Multiple actions may be specified and will be executed in order, including multiple actions of the same type. Trace output requested via -t and -a now adds a trace action to the end of the list. If an action fails, the following actions are not executed. For example, this command: $ rtla timerlat -T 20 --on-threshold trace \ --on-threshold shell,command="grep ipi_send timerlat_trace.txt" \ --on-threshold signal,num=2,pid=parent will send signal 2 (SIGINT) to parent process, but only if saved trace contains the text "ipi_send". This way, the feature can be used for flexible reactions on latency spikes, and allows combining rtla with other tooling like perf. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-3-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25rtla/timerlat: Introduce enum timerlat_tracing_modeTomas Glozar4-53/+97
After the introduction of BPF-based sample collection, rtla-timerlat effectively runs in one of three modes: - Pure BPF mode, with tracefs only being used to set up the timerlat tracer. Sample processing and stop on threshold are handled by BPF. - tracefs mode. BPF is unsupported or kernel is lacking the necessary trace event (osnoise:timerlat_sample). Stop on theshold is handled by timerlat tracer stopping tracing in all instances. - BPF/tracefs mixed mode - BPF is used for sample collection for top or histogram, tracefs is used for trace output and/or auto-analysis. Stop on threshold is handled both through BPF program, which stops sample collection for top/histogram and wakes up rtla, and by timerlat tracer, which stops tracing for trace output/auto-analysis instances. Add enum timerlat_tracing_mode, with three values: - TRACING_MODE_BPF - TRACING_MODE_TRACEFS - TRACING_MODE_MIXED Those represent the modes described above. A field of this type is added to struct timerlat_params, named "mode", replacing the no_bpf variable. params->mode is set in timerlat_{top,hist}_parse_args to TRACING_MODE_BPF or TRACING_MODE_MIXED based on whether trace output and/or auto-analysis is requested. timerlat_{top,hist}_main then checks if BPF is not unavailable or disabled, in that case, it sets params->mode to TRACING_MODE_TRACEFS. A condition is added to timerlat_apply_config that skips setting timerlat tracer thresholds if params->mode is TRACING_MODE_BPF (those are unnecessary, since they only turn off tracing, which is already turned off in that case, since BPF is used to collect samples). Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-2-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25ipv6: add `force_forwarding` sysctl to enable per-interface forwardingGabriel Goller2-0/+106
It is currently impossible to enable ipv6 forwarding on a per-interface basis like in ipv4. To enable forwarding on an ipv6 interface we need to enable it on all interfaces and disable it on the other interfaces using a netfilter rule. This is especially cumbersome if you have lots of interfaces and only want to enable forwarding on a few. According to the sysctl docs [0] the `net.ipv6.conf.all.forwarding` enables forwarding for all interfaces, while the interface-specific `net.ipv6.conf.<interface>.forwarding` configures the interface Host/Router configuration. Introduce a new sysctl flag `force_forwarding`, which can be set on every interface. The ip6_forwarding function will then check if the global forwarding flag OR the force_forwarding flag is active and forward the packet. To preserve backwards-compatibility reset the flag (on all interfaces) to 0 if the net.ipv6.conf.all.forwarding flag is set to 0. Add a short selftest that checks if a packet gets forwarded with and without `force_forwarding`. [0]: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Gabriel Goller <g.goller@proxmox.com> Link: https://patch.msgid.link/20250722081847.132632-1-g.goller@proxmox.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-25selftests: net: Skip test if IPv6 is not configuredBreno Leitao1-0/+5
Extend the `check_for_dependencies()` function in `lib_netcons.sh` to check whether IPv6 is enabled by verifying the existence of `/proc/net/if_inet6`. Having IPv6 is a now a dependency of netconsole tests. If the file does not exist, the script will skip the test with an appropriate message suggesting to verify if `CONFIG_IPV6` is enabled. This prevents the test to misbehave if IPv6 is not configured. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250723-netcons_test_ipv6-v1-1-41c9092f93f9@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-25selftests: rtnetlink: add macsec and vlan nesting testStanislav Fomichev1-0/+36
Add reproducer for [0] with a dummy device. 0: https://lore.kernel.org/netdev/2aff4342b0f5b1539c02ffd8df4c7e58dd9746e7.camel@nvidia.com/ Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250723224715.1341121-2-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-25perf sort: Use perf_env to set arch sort keys and headerIan Rogers15-131/+107
Previously arch_support_sort_key and arch_perf_header_entry used a weak symbol to compile as appropriate for x86 and powerpc. A limitation to this is that the handling of a data file could vary in cross-platform development. Change to using the perf_env of the current session to determine the architecture kind and set the sort key and header entries as appropriate. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-23-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf test: Move PERF_SAMPLE_WEIGHT_STRUCT parsing to common testIan Rogers5-129/+14
test__x86_sample_parsing is identical to test__sample_parsing except it explicitly tested PERF_SAMPLE_WEIGHT_STRUCT. Now the parsing code is common move the PERF_SAMPLE_WEIGHT_STRUCT to the common sample parsing test and remove the x86 version. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-22-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf sample: Remove arch notion of sample parsingIan Rogers14-80/+36
By definition arch sample parsing and synthesis will inhibit certain kinds of cross-platform record then analysis (report, script, etc.). Remove arch_perf_parse_sample_weight and arch_perf_synthesize_sample_weight replacing with a common implementation. Combine perf_sample p_stage_cyc and retire_lat as weight3 to capture the differing uses regardless of compiled for architecture. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-21-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf env: Remove global perf_envIan Rogers6-10/+4
The global perf_env was used for the host, but if a perf_env wasn't easy to come by it was used in a lot of places where potentially recorded and host data could be confused. Remove the global variable as now the majority of accesses retrieve the perf_env for the host from the session. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-20-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf trace: Avoid global perf_env with evsel__envIan Rogers1-9/+3
There is no session in perf trace unless in replay mode, so in host mode no session can be associated with the evlist. If the evsel__env call fails resort to the host_env that's part of the trace. Remove errno_to_name as it becomes a called once 1-line function once the argument is turned into a perf_env, just call perf_env__arch_strerrno directly. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-19-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf auxtrace: Pass perf_env from session through to mmap readIan Rogers3-10/+17
auxtrace_mmap__read and auxtrace_mmap__read_snapshot end up calling `evsel__env(NULL)` which returns the global perf_env variable for the host. Their only call is in perf record. Rather than use the global variable pass through the perf_env for `perf record`. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-18-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf machine: Explicitly pass in host perf_envIan Rogers12-35/+81
When creating a machine for the host explicitly pass in a scoped perf_env. This removes a use of the global perf_env. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-17-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf bench synthesize: Avoid use of global perf_envIan Rogers1-8/+19
The benchmark doesn't use a data file and so the header perf_env isn't used. Stack allocate a host perf_env for use to avoid the use of the global perf_env. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-16-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf top: Make perf_env locally scopedIan Rogers1-13/+28
The use of the global host perf_env variable is potentially inconsistent within the code. Switch perf top to using a locally scoped variable that is generally accessed through the session. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-15-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf session: Add host_env argument to perf_session__newIan Rogers3-5/+8
When creating a perf_session the host perf_env may or may not want to be used. For example, `perf top` uses a host perf_env while `perf inject` does not. Add a host_env argument to perf_session__new so that sessions requiring a host perf_env can pass it in. Currently if none is specified the global perf_env variable is used, but this will change in later patches. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-14-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf test: Avoid use perf_envIan Rogers3-23/+33
The perf_env global variable holds the host perf_env data but its use is hit and miss. Switch to using local perf_env variables and ensure scoped perf_env__init and perf_env__exit. This loses command line setting of the perf_env, but this doesn't matter for tests. So the perf_env is fully initialized, clear it with memset in perf_env__init. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-13-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf header: Clean up use of perf_envIan Rogers1-76/+98
Always use the perf_env from the feat_fd's perf_header. Cache the value on entry to a function in `env` and use `env->` consistently in the code. Ensure the header is initialized for use in perf_session__do_write_header. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-12-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf evlist: Change env variable to sessionIan Rogers17-22/+35
The session holds a perf_env pointer env. In UI code container_of is used to turn the env to a session, but this assumes the session header's env is in use. Rather than a dubious container_of, hold the session in the evlist and derive the env from the session with evsel__env, perf_session__env, etc. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-11-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf session: Add accessor for session->header.envIan Rogers25-107/+120
The perf_env from the header in the session is frequently accessed, add an accessor function rather than access directly. Cache the value to avoid repeated calls. No behavioral change. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-10-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf record: Make --buildid-mmap the defaultIan Rogers4-22/+34
Support for build IDs in mmap2 perf events has been present since Linux v5.12: https://lore.kernel.org/lkml/20210219194619.1780437-1-acme@kernel.org/ Build ID mmap events don't avoid the need to inject build IDs for DSO touched by samples as the build ID cache is populated by perf record. They can avoid some cases of symbol mis-resolution caused by the file system changing from when a sample occurred and when the DSO is sought. Unlike the --buildid-mmap option, this chnage doesn't disable the build ID cache but it does disable the processing of samples looking for DSOs to inject build IDs for. To disable the build ID cache the -B (--no-buildid) option should be used. Making this option the default was raised on the list in: https://lore.kernel.org/linux-perf-users/CAP-5=fXP7jN_QrGUcd55_QH5J-Y-FCaJ6=NaHVtyx0oyNh8_-Q@mail.gmail.com/ Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-9-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf jitdump: Directly mark the jitdump DSOIan Rogers1-4/+17
The DSO being generated was being accessed through a thread's maps, this is unnecessary as the dso can just be directly found. This avoids problems with passing a NULL evsel which may be inspected to determine properties of a callchain when using the buildid DSO marking code. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-8-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf dso: Move build_id to dso_idIan Rogers14-155/+197
The dso_id previously contained the major, minor, inode and inode generation information from a mmap2 event - the inode generation would be zero when reading from /proc/pid/maps. The build_id was in the dso. With build ID mmap2 events these fields wouldn't be initialized which would largely mean the special empty case where any dso would match for equality. This isn't desirable as if a dso is replaced we want the comparison to yield a difference. To support detecting the difference between DSOs based on build_id, move the build_id out of the DSO and into the dso_id. The dso_id is also stored in the DSO so nothing is lost. Capture in the dso_id what parts have been initialized and rename dso_id__inject to dso_id__improve_id so that it is clear the dso_id is being improved upon with additional information. With the build_id in the dso_id, use memcmp to compare for equality. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-7-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf build-id: Ensure struct build_id is empty before useIan Rogers11-17/+20
If a build ID is read then not all code paths may ensure it is empty before use. Initialize the build_id to be zero-ed unless there is clear initialization such as a call to build_id__init. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-6-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf build-id: Mark DSO in sample callchainsIan Rogers1-1/+16
Previously only the sample IP's map DSO would be marked hit for the purposes of populating the build ID cache. Walk the call chain to mark all IPs and DSOs. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-5-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25perf build-id: Change sprintf functions to snprintfIan Rogers16-50/+42
Pass in a size argument rather than implying all build id strings must be SBUILD_ID_SIZE. Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20250724163302.596743-4-irogers@google.com [ fixed some build errors ] Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-07-25selftests: netfilter: ipvs.sh: Explicity disable rp_filter on interface tunl0Yi Chen1-2/+2
Although setup_ns() set net.ipv4.conf.default.rp_filter=0, loading certain module such as ipip will automatically create a tunl0 interface in all netns including new created ones. In the script, this is before than default.rp_filter=0 applied, as a result tunl0.rp_filter remains set to 1 which causes the test report FAIL when ipip module is preloaded. Before fix: Testing DR mode... Testing NAT mode... Testing Tunnel mode... ipvs.sh: FAIL After fix: Testing DR mode... Testing NAT mode... Testing Tunnel mode... ipvs.sh: PASS Fixes: 7c8b89ec506e ("selftests: netfilter: remove rp_filter configuration") Signed-off-by: Yi Chen <yiche@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25selftests: netfilter: Ignore tainted kernels in interface stress testPhil Sutter1-1/+4
Complain about kernel taint value only if it wasn't set at start already. Fixes: 73db1b5dab6f ("selftests: netfilter: Torture nftables netdev hooks") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25selftests: netfilter: Enable CONFIG_INET_SCTP_DIAGSebastian Andrzej Siewior1-1/+1
The config snippet specifies CONFIG_SCTP_DIAG. This was never an option. Replace CONFIG_SCTP_DIAG with the intended CONFIG_INET_SCTP_DIAG. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25selftests: net: Enable legacy netfilter legacy options.Florian Westphal6-0/+24
Some specified options rely on NETFILTER_XTABLES_LEGACY to be enabled. IP_NF_TARGET_TTL for instance depends on IP_NF_MANGLE which in turn depends on IP_NF_IPTABLES_LEGACY -> NETFILTER_XTABLES_LEGACY. Enable relevant iptables config options explicitly, this is needed to avoid breakage when symbols related to iptables-legacy will depend on NETFILTER_LEGACY resp. IP_TABLES_LEGACY. This also means that the classic tables (Kernel modules) will not be enabled by default, so enable them too. Signed-off-by: Florian Westphal <fw@strlen.de> [bigeasy: Split out the config bits from the main patch] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25Merge tag 'mm-hotfixes-stable-2025-07-24-18-03' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "11 hotfixes. 9 are cc:stable and the remainder address post-6.15 issues or aren't considered necessary for -stable kernels. 7 are for MM" * tag 'mm-hotfixes-stable-2025-07-24-18-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: sprintf.h requires stdarg.h resource: fix false warning in __request_region() mm/damon/core: commit damos_quota_goal->nid kasan: use vmalloc_dump_obj() for vmalloc error reports mm/ksm: fix -Wsometimes-uninitialized from clang-21 in advisor_mode_show() mm: update MAINTAINERS entry for HMM nilfs2: reject invalid file types when reading inodes selftests/mm: fix split_huge_page_test for folio_split() tests mailmap: add entry for Senozhatsky mm/zsmalloc: do not pass __GFP_MOVABLE if CONFIG_COMPACTION=n mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list
2025-07-25tools/testing/selftests: explicitly test split multi VMA mremap moveLorenzo Stoakes1-1/+126
Check that moving a range of VMAs where we are offset into the first and last VMAs works correctly. This results in the VMAs being split at these points at which we are offset into VMAs. We explicitly test both the ordinary MREMAP_FIXED multi VMA move case and the MREMAP_DONTUNMAP multi VMA move case. Link: https://lkml.kernel.org/r/b04920bb6c09dc86c207c251eab8ec670fbbcaef.1753119043.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-25tools/testing/selftests: test MREMAP_DONTUNMAP on multiple VMA moveLorenzo Stoakes1-10/+19
We support MREMAP_MAYMOVE | MREMAP_FIXED | MREMAP_DONTUNMAP for moving multiple VMAs via mremap(), so assert that the tests pass with both MREMAP_DONTUNMAP set and not set. Additionally, add success = false settings when mremap() fails. This is something that cannot realistically happen, so in no way impacted test outcome, but it is incorrect to indicate a test pass when something has failed. Link: https://lkml.kernel.org/r/d7359941981e4e44c774753b3e364d1c54928e6a.1753119043.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-25tools/testing/selftests: add mremap() shrink test for multiple VMAsLorenzo Stoakes1-1/+82
Patch series "tools/testing: expand mremap testing". Expand our mremap() testing to further assert that behaviour is as expected. There is a poorly documented mremap() feature whereby it is possible to mremap() multiple VMAs (even with gaps) when shrinking, as long as the resultant shrunk range spans only a single VMA. So we start by asserting this behaviour functions correctly both with an in-place shrink and a shrink/move. Next, we further test the newly introduced ability to mremap() multiple VMAs when performing a MAP_FIXED move (that is without the size being changed), firstly by asserting that MREMAP_DONTUNMAP has no bearing on this behaviour. Finally, we explicitly test that such moves, when splitting source VMAs, function correctly. This patch (of 3): There is an apparently little-known feature of mremap() whereby, in stark contrast to other modes (other than the recently introduced capacity to move multiple VMAs), the input source range span multiple VMAs with gaps between. This is, when shrinking a VMA, whether moving it or not, and the shrink would reduce the range to a single VMA - this is permitted, as the shrink is actioned by an unmap. This patch adds tests to assert that this behaves as expected. Link: https://lkml.kernel.org/r/cover.1753119043.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/f08122893a26092a2bec6e69443e87f468ffdbed.1753119043.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-25selftests/mm: guard-regions: Use SKIP() instead of ksft_exit_skip()wang lian1-1/+1
To ensure only the current test is skipped on permission failure, instead of terminating the entire test binary. Link: https://lkml.kernel.org/r/20250717131857.59909-3-lianux.mm@gmail.com Signed-off-by: wang lian <lianux.mm@gmail.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Kairui Song <ryncsn@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-25selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));"wang lian7-39/+31
Patch series "selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));" and some cleanup", v2. This series introduces a common FORCE_READ() macro to replace the cryptic asm volatile("" : "+r" (variable)); construct used in several mm selftests. This improves code readability and maintainability by removing duplicated, hard-to-understand code. This patch (of 2): Several mm selftests use the `asm volatile("" : "+r" (variable));` construct to force a read of a variable, preventing the compiler from optimizing away the memory access. This idiom is cryptic and duplicated across multiple test files. Following a suggestion from David[1], this patch refactors this common pattern into a FORCE_READ() macro Link: https://lkml.kernel.org/r/20250717131857.59909-1-lianux.mm@gmail.com Link: https://lkml.kernel.org/r/20250717131857.59909-2-lianux.mm@gmail.com Link: https://lore.kernel.org/lkml/4a3e0759-caa1-4cfa-bc3f-402593f1eee3@redhat.com/ [1] Signed-off-by: wang lian <lianux.mm@gmail.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Kairui Song <ryncsn@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Brown <broonie@kernel.org> Cc: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-25selftests/proc: add verbose mode for /proc/pid/maps tearing testsSuren Baghdasaryan1-12/+141
Add verbose mode to the /proc/pid/maps tearing tests to print debugging information. VERBOSE environment variable is used to enable it. Usage example: VERBOSE=1 ./proc-maps-race Link: https://lkml.kernel.org/r/20250719182854.3166724-5-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jeongjun Park <aha310510@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: T.J. Mercier <tjmercier@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Ye Bin <yebin10@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-25selftests/proc: extend /proc/pid/maps tearing test to include vma remappingSuren Baghdasaryan1-0/+86
Test that /proc/pid/maps does not report unexpected holes in the address space when we concurrently remap a part of a vma into the middle of another vma. This remapping results in the destination vma being split into three parts and the part in the middle being patched back from, all done concurrently from under the reader. We should always see either original vma or the split one with no holes. Link: https://lkml.kernel.org/r/20250719182854.3166724-4-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jeongjun Park <aha310510@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: T.J. Mercier <tjmercier@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Ye Bin <yebin10@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-25selftests/proc: extend /proc/pid/maps tearing test to include vma resizingSuren Baghdasaryan1-0/+79
Test that /proc/pid/maps does not report unexpected holes in the address space when a vma at the edge of the page is being concurrently remapped. This remapping results in the vma shrinking and expanding from under the reader. We should always see either shrunk or expanded (original) version of the vma. Link: https://lkml.kernel.org/r/20250719182854.3166724-3-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jeongjun Park <aha310510@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: T.J. Mercier <tjmercier@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Ye Bin <yebin10@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-25selftests/proc: add /proc/pid/maps tearing from vma split testSuren Baghdasaryan3-0/+449
Patch series "use per-vma locks for /proc/pid/maps reads", v8. Reading /proc/pid/maps requires read-locking mmap_lock which prevents any other task from concurrently modifying the address space. This guarantees coherent reporting of virtual address ranges, however it can block important updates from happening. Oftentimes /proc/pid/maps readers are low priority monitoring tasks and them blocking high priority tasks results in priority inversion. Locking the entire address space is required to present fully coherent picture of the address space, however even current implementation does not strictly guarantee that by outputting vmas in page-size chunks and dropping mmap_lock in between each chunk. Address space modifications are possible while mmap_lock is dropped and userspace reading the content is expected to deal with possible concurrent address space modifications. Considering these relaxed rules, holding mmap_lock is not strictly needed as long as we can guarantee that a concurrently modified vma is reported either in its original form or after it was modified. This patchset switches from holding mmap_lock while reading /proc/pid/maps to taking per-vma locks as we walk the vma tree. This reduces the contention with tasks modifying the address space because they would have to contend for the same vma as opposed to the entire address space. Previous version of this patchset [1] tried to perform /proc/pid/maps reading under RCU, however its implementation is quite complex and the results are worse than the new version because it still relied on mmap_lock speculation which retries if any part of the address space gets modified. New implementaion is both simpler and results in less contention. Note that similar approach would not work for /proc/pid/smaps reading as it also walks the page table and that's not RCU-safe. Paul McKenney's designed a test [2] to measure mmap/munmap latencies while concurrently reading /proc/pid/maps. The test has a pair of processes scanning /proc/PID/maps, and another process unmapping and remapping 4K pages from a 128MB range of anonymous memory. At the end of each 10 second run, the latency of each mmap() or munmap() operation is measured, and for each run the maximum and mean latency is printed. The map/unmap process is started first, its PID is passed to the scanners, and then the map/unmap process waits until both scanners are running before starting its timed test. The scanners keep scanning until the specified /proc/PID/maps file disappears. The latest results from Paul: Stock mm-unstable, all of the runs had maximum latencies in excess of 0.5 milliseconds, and with 80% of the runs' latencies exceeding a full millisecond, and ranging up beyond 4 full milliseconds. In contrast, 99% of the runs with this patch series applied had maximum latencies of less than 0.5 milliseconds, with the single outlier at only 0.608 milliseconds. From a median-performance (as opposed to maximum-latency) viewpoint, this patch series also looks good, with stock mm weighing in at 11 microseconds and patch series at 6 microseconds, better than a 2x improvement. Before the change: ./run-proc-vs-map.sh --nsamples 100 --rawdata -- --busyduration 2 0.011 0.008 0.521 0.011 0.008 0.552 0.011 0.008 0.590 0.011 0.008 0.660 ... 0.011 0.015 2.987 0.011 0.015 3.038 0.011 0.016 3.431 0.011 0.016 4.707 After the change: ./run-proc-vs-map.sh --nsamples 100 --rawdata -- --busyduration 2 0.006 0.005 0.026 0.006 0.005 0.029 0.006 0.005 0.034 0.006 0.005 0.035 ... 0.006 0.006 0.421 0.006 0.006 0.423 0.006 0.006 0.439 0.006 0.006 0.608 The patchset also adds a number of tests to check for /proc/pid/maps data coherency. They are designed to detect any unexpected data tearing while performing some common address space modifications (vma split, resize and remap). Even before these changes, reading /proc/pid/maps might have inconsistent data because the file is read page-by-page with mmap_lock being dropped between the pages. An example of user-visible inconsistency can be that the same vma is printed twice: once before it was modified and then after the modifications. For example if vma was extended, it might be found and reported twice. What is not expected is to see a gap where there should have been a vma both before and after modification. This patchset increases the chances of such tearing, therefore it's even more important now to test for unexpected inconsistencies. In [3] Lorenzo identified the following possible vma merging/splitting scenarios: Merges with changes to existing vmas: 1 Merge both - mapping a vma over another one and between two vmas which can be merged after this replacement; 2. Merge left full - mapping a vma at the end of an existing one and completely over its right neighbor; 3. Merge left partial - mapping a vma at the end of an existing one and partially over its right neighbor; 4. Merge right full - mapping a vma before the start of an existing one and completely over its left neighbor; 5. Merge right partial - mapping a vma before the start of an existing one and partially over its left neighbor; Merges without changes to existing vmas: 6. Merge both - mapping a vma into a gap between two vmas which can be merged after the insertion; 7. Merge left - mapping a vma at the end of an existing one; 8. Merge right - mapping a vma before the start end of an existing one; Splits 9. Split with new vma at the lower address; 10. Split with new vma at the higher address; If such merges or splits happen concurrently with the /proc/maps reading we might report a vma twice, once before the modification and once after it is modified: Case 1 might report overwritten and previous vma along with the final merged vma; Case 2 might report previous and the final merged vma; Case 3 might cause us to retry once we detect the temporary gap caused by shrinking of the right neighbor; Case 4 might report overritten and the final merged vma; Case 5 might cause us to retry once we detect the temporary gap caused by shrinking of the left neighbor; Case 6 might report previous vma and the gap along with the final marged vma; Case 7 might report previous and the final merged vma; Case 8 might report the original gap and the final merged vma covering the gap; Case 9 might cause us to retry once we detect the temporary gap caused by shrinking of the original vma at the vma start; Case 10 might cause us to retry once we detect the temporary gap caused by shrinking of the original vma at the vma end; In all these cases the retry mechanism prevents us from reporting possible temporary gaps. [1] https://lore.kernel.org/all/20250418174959.1431962-1-surenb@google.com/ [2] https://github.com/paulmckrcu/proc-mmap_sem-test [3] https://lore.kernel.org/all/e1863f40-39ab-4e5b-984a-c48765ffde1c@lucifer.local/ The /proc/pid/maps file is generated page by page, with the mmap_lock released between pages. This can lead to inconsistent reads if the underlying vmas are concurrently modified. For instance, if a vma split or merge occurs at a page boundary while /proc/pid/maps is being read, the same vma might be seen twice: once before and once after the change. This duplication is considered acceptable for userspace handling. However, observing a "hole" where a vma should be (e.g., due to a vma being replaced and the space temporarily being empty) is unacceptable. Implement a test that: 1. Forks a child process which continuously modifies its address space, specifically targeting a vma at the boundary between two pages. 2. The parent process repeatedly reads the child's /proc/pid/maps. 3. The parent process checks the last vma of the first page and the first vma of the second page for consistency, looking for the effects of vma splits or merges. The test duration is configurable via DURATION environment variable expressed in seconds. The default test duration is 5 seconds. Example Command: DURATION=10 ./proc-maps-race Link: https://lore.kernel.org/all/20250418174959.1431962-1-surenb@google.com/ [1] Link: https://github.com/paulmckrcu/proc-mmap_sem-test [2] Link: https://lore.kernel.org/all/e1863f40-39ab-4e5b-984a-c48765ffde1c@lucifer.local/ [3] Link: https://lkml.kernel.org/r/20250719182854.3166724-1-surenb@google.com Link: https://lkml.kernel.org/r/20250719182854.3166724-2-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jeongjun Park <aha310510@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: T.J. Mercier <tjmercier@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Ye Bin <yebin10@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-25tools/testing/selftests: extend mremap_test to test multi-VMA mremapLorenzo Stoakes1-1/+145
Now that we have added the ability to move multiple VMAs at once, assert that this functions correctly, both overwriting VMAs and moving backwards and forwards with merge and VMA invalidation. Additionally assert that page tables are correctly propagated by setting random data and reading it back. Link: https://lkml.kernel.org/r/139074a24a011ca4ed52498a7fa2080024b43917.1752770784.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>