summaryrefslogtreecommitdiff
path: root/tools/perf/util
AgeCommit message (Collapse)AuthorFilesLines
2022-03-24Merge tag 'net-next-5.18' of ↵Linus Torvalds2-55/+47
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "The sprinkling of SPI drivers is because we added a new one and Mark sent us a SPI driver interface conversion pull request. Core ---- - Introduce XDP multi-buffer support, allowing the use of XDP with jumbo frame MTUs and combination with Rx coalescing offloads (LRO). - Speed up netns dismantling (5x) and lower the memory cost a little. Remove unnecessary per-netns sockets. Scope some lists to a netns. Cut down RCU syncing. Use batch methods. Allow netdev registration to complete out of order. - Support distinguishing timestamp types (ingress vs egress) and maintaining them across packet scrubbing points (e.g. redirect). - Continue the work of annotating packet drop reasons throughout the stack. - Switch netdev error counters from an atomic to dynamically allocated per-CPU counters. - Rework a few preempt_disable(), local_irq_save() and busy waiting sections problematic on PREEMPT_RT. - Extend the ref_tracker to allow catching use-after-free bugs. BPF --- - Introduce "packing allocator" for BPF JIT images. JITed code is marked read only, and used to be allocated at page granularity. Custom allocator allows for more efficient memory use, lower iTLB pressure and prevents identity mapping huge pages from getting split. - Make use of BTF type annotations (e.g. __user, __percpu) to enforce the correct probe read access method, add appropriate helpers. - Convert the BPF preload to use light skeleton and drop the user-mode-driver dependency. - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling its use as a packet generator. - Allow local storage memory to be allocated with GFP_KERNEL if called from a hook allowed to sleep. - Introduce fprobe (multi kprobe) to speed up mass attachment (arch bits to come later). - Add unstable conntrack lookup helpers for BPF by using the BPF kfunc infra. - Allow cgroup BPF progs to return custom errors to user space. - Add support for AF_UNIX iterator batching. - Allow iterator programs to use sleepable helpers. - Support JIT of add, and, or, xor and xchg atomic ops on arm64. - Add BTFGen support to bpftool which allows to use CO-RE in kernels without BTF info. - Large number of libbpf API improvements, cleanups and deprecations. Protocols --------- - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev. - Adjust TSO packet sizes based on min_rtt, allowing very low latency links (data centers) to always send full-sized TSO super-frames. - Make IPv6 flow label changes (AKA hash rethink) more configurable, via sysctl and setsockopt. Distinguish between server and client behavior. - VxLAN support to "collect metadata" devices to terminate only configured VNIs. This is similar to VLAN filtering in the bridge. - Support inserting IPv6 IOAM information to a fraction of frames. - Add protocol attribute to IP addresses to allow identifying where given address comes from (kernel-generated, DHCP etc.) - Support setting socket and IPv6 options via cmsg on ping6 sockets. - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS. Define dscp_t and stop taking ECN bits into account in fib-rules. - Add support for locked bridge ports (for 802.1X). - tun: support NAPI for packets received from batched XDP buffs, doubling the performance in some scenarios. - IPv6 extension header handling in Open vSwitch. - Support IPv6 control message load balancing in bonding, prevent neighbor solicitation and advertisement from using the wrong port. Support NS/NA monitor selection similar to existing ARP monitor. - SMC - improve performance with TCP_CORK and sendfile() - support auto-corking - support TCP_NODELAY - MCTP (Management Component Transport Protocol) - add user space tag control interface - I2C binding driver (as specified by DMTF DSP0237) - Multi-BSSID beacon handling in AP mode for WiFi. - Bluetooth: - handle MSFT Monitor Device Event - add MGMT Adv Monitor Device Found/Lost events - Multi-Path TCP: - add support for the SO_SNDTIMEO socket option - lots of selftest cleanups and improvements - Increase the max PDU size in CAN ISOTP to 64 kB. Driver API ---------- - Add HW counters for SW netdevs, a mechanism for devices which offload packet forwarding to report packet statistics back to software interfaces such as tunnels. - Select the default NIC queue count as a fraction of number of physical CPU cores, instead of hard-coding to 8. - Expose devlink instance locks to drivers. Allow device layer of drivers to use that lock directly instead of creating their own which always runs into ordering issues in devlink callbacks. - Add header/data split indication to guide user space enabling of TCP zero-copy Rx. - Allow configuring completion queue event size. - Refactor page_pool to enable fragmenting after allocation. - Add allocation and page reuse statistics to page_pool. - Improve Multiple Spanning Trees support in the bridge to allow reuse of topologies across VLANs, saving HW resources in switches. - DSA (Distributed Switch Architecture): - replay and offload of host VLAN entries - offload of static and local FDB entries on LAG interfaces - FDB isolation and unicast filtering New hardware / drivers ---------------------- - Ethernet: - LAN937x T1 PHYs - Davicom DM9051 SPI NIC driver - Realtek RTL8367S, RTL8367RB-VB switch and MDIO - Microchip ksz8563 switches - Netronome NFP3800 SmartNICs - Fungible SmartNICs - MediaTek MT8195 switches - WiFi: - mt76: MediaTek mt7916 - mt76: MediaTek mt7921u USB adapters - brcmfmac: Broadcom BCM43454/6 - Mobile: - iosm: Intel M.2 7360 WWAN card Drivers ------- - Convert many drivers to the new phylink API built for split PCS designs but also simplifying other cases. - Intel Ethernet NICs: - add TTY for GNSS module for E810T device - improve AF_XDP performance - GTP-C and GTP-U filter offload - QinQ VLAN support - Mellanox Ethernet NICs (mlx5): - support xdp->data_meta - multi-buffer XDP - offload tc push_eth and pop_eth actions - Netronome Ethernet NICs (nfp): - flow-independent tc action hardware offload (police / meter) - AF_XDP - Other Ethernet NICs: - at803x: fiber and SFP support - xgmac: mdio: preamble suppression and custom MDC frequencies - r8169: enable ASPM L1.2 if system vendor flags it as safe - macb/gem: ZynqMP SGMII - hns3: add TX push mode - dpaa2-eth: software TSO - lan743x: multi-queue, mdio, SGMII, PTP - axienet: NAPI and GRO support - Mellanox Ethernet switches (mlxsw): - source and dest IP address rewrites - RJ45 ports - Marvell Ethernet switches (prestera): - basic routing offload - multi-chain TC ACL offload - NXP embedded Ethernet switches (ocelot & felix): - PTP over UDP with the ocelot-8021q DSA tagging protocol - basic QoS classification on Felix DSA switch using dcbnl - port mirroring for ocelot switches - Microchip high-speed industrial Ethernet (sparx5): - offloading of bridge port flooding flags - PTP Hardware Clock - Other embedded switches: - lan966x: PTP Hardward Clock - qca8k: mdio read/write operations via crafted Ethernet packets - Qualcomm 802.11ax WiFi (ath11k): - add LDPC FEC type and 802.11ax High Efficiency data in radiotap - enable RX PPDU stats in monitor co-exist mode - Intel WiFi (iwlwifi): - UHB TAS enablement via BIOS - band disablement via BIOS - channel switch offload - 32 Rx AMPDU sessions in newer devices - MediaTek WiFi (mt76): - background radar detection - thermal management improvements on mt7915 - SAR support for more mt76 platforms - MBSSID and 6 GHz band on mt7915 - RealTek WiFi: - rtw89: AP mode - rtw89: 160 MHz channels and 6 GHz band - rtw89: hardware scan - Bluetooth: - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS) - Microchip CAN (mcp251xfd): - multiple RX-FIFOs and runtime configurable RX/TX rings - internal PLL, runtime PM handling simplification - improve chip detection and error handling after wakeup" * tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits) llc: fix netdevice reference leaks in llc_ui_bind() drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool ice: don't allow to run ice_send_event_to_aux() in atomic ctx ice: fix 'scheduling while atomic' on aux critical err interrupt net/sched: fix incorrect vlan_push_eth dest field net: bridge: mst: Restrict info size queries to bridge ports net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init() drivers: net: xgene: Fix regression in CRC stripping net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT net: dsa: fix missing host-filtered multicast addresses net/mlx5e: Fix build warning, detected write beyond size of field iwlwifi: mvm: Don't fail if PPAG isn't supported selftests/bpf: Fix kprobe_multi test. Revert "rethook: x86: Add rethook x86 implementation" Revert "arm64: rethook: Add arm64 rethook implementation" Revert "powerpc: Add rethook support" Revert "ARM: rethook: Add rethook arm implementation" netdevice: add missing dm_private kdoc net: bridge: mst: prevent NULL deref in br_mst_info_size() selftests: forwarding: Use same VRF for port and VLAN upper ...
2022-03-22Merge tag 'perf-core-2022-03-21' of ↵Linus Torvalds1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf event updates from Ingo Molnar: - Fix address filtering for Intel/PT,ARM/CoreSight - Enable Intel/PEBS format 5 - Allow more fixed-function counters for x86 - Intel/PT: Enable not recording Taken-Not-Taken packets - Add a few branch-types * tag 'perf-core-2022-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Fix the build on !CONFIG_PHYS_ADDR_T_64BIT perf: Add irq and exception return branch types perf/x86/intel/uncore: Make uncore_discovery clean for 64 bit addresses perf/x86/intel/pt: Add a capability and config bit for disabling TNTs perf/x86/intel/pt: Add a capability and config bit for event tracing perf/x86/intel: Increase max number of the fixed counters KVM: x86: use the KVM side max supported fixed counter perf/x86/intel: Enable PEBS format 5 perf/core: Allow kernel address filter when not filtering the kernel perf/x86/intel/pt: Fix address filter config for 32-bit kernel perf/core: Fix address filter parser for multiple filters x86: Share definition of __is_canonical_address() perf/x86/intel/pt: Relax address filter validation
2022-03-21Merge tag 'arm64-upstream' of ↵Linus Torvalds1-19/+33
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: - Support for including MTE tags in ELF coredumps - Instruction encoder updates, including fixes to 64-bit immediate generation and support for the LSE atomic instructions - Improvements to kselftests for MTE and fpsimd - Symbol aliasing and linker script cleanups - Reduce instruction cache maintenance performed for user mappings created using contiguous PTEs - Support for the new "asymmetric" MTE mode, where stores are checked asynchronously but loads are checked synchronously - Support for the latest pointer authentication algorithm ("QARMA3") - Support for the DDR PMU present in the Marvell CN10K platform - Support for the CPU PMU present in the Apple M1 platform - Use the RNDR instruction for arch_get_random_{int,long}() - Update our copy of the Arm optimised string routines for str{n}cmp() - Fix signal frame generation for CPUs which have foolishly elected to avoid building in support for the fpsimd instructions - Workaround for Marvell GICv3 erratum #38545 - Clarification to our Documentation (booting reqs. and MTE prctl()) - Miscellanous cleanups and minor fixes * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (90 commits) docs: sysfs-devices-system-cpu: document "asymm" value for mte_tcf_preferred arm64/mte: Remove asymmetric mode from the prctl() interface arm64: Add cavium_erratum_23154_cpus missing sentinel perf/marvell: Fix !CONFIG_OF build for CN10K DDR PMU driver arm64: mm: Drop 'const' from conditional arm64_dma_phys_limit definition Documentation: vmcoreinfo: Fix htmldocs warning kasan: fix a missing header include of static_keys.h drivers/perf: Add Apple icestorm/firestorm CPU PMU driver drivers/perf: arm_pmu: Handle 47 bit counters arm64: perf: Consistently make all event numbers as 16-bits arm64: perf: Expose some Armv9 common events under sysfs perf/marvell: cn10k DDR perf event core ownership perf/marvell: cn10k DDR perfmon event overflow handling perf/marvell: CN10k DDR performance monitor support dt-bindings: perf: marvell: cn10k ddr performance monitor arm64: clean up tools Makefile perf/arm-cmn: Update watchpoint format perf/arm-cmn: Hide XP PUB events for CMN-600 arm64: drop unused includes of <linux/personality.h> arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones ...
2022-03-19perf evlist: Avoid iteration for empty evlist.Ian Rogers1-11/+17
As seen with 'perf stat --null ..' and reported in: https://lore.kernel.org/lkml/YjCLcpcX2peeQVCH@kernel.org/ v2. Avoids setting evsel in the empty list case as suggested by Jiri Olsa. Committer testing: Before: $ perf stat --null sleep 1 Segmentation fault (core dumped) $ After: $ perf stat --null sleep 1 Performance counter stats for 'sleep 1': 1.010340646 seconds time elapsed 0.001420000 seconds user 0.000000000 seconds sys $ Fixes: 472832d2c000b961 ("perf evlist: Refactor evlist__for_each_cpu()") Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220317231643.550902-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-19perf symbols: Fix symbol size calculation conditionMichael Petlan1-1/+1
Before this patch, the symbol end address fixup to be called, needed two conditions being met: if (prev->end == prev->start && prev->end != curr->start) Where "prev->end == prev->start" means that prev is zero-long (and thus needs a fixup) and "prev->end != curr->start" means that fixup hasn't been applied yet However, this logic is incorrect in the following situation: *curr = {rb_node = {__rb_parent_color = 278218928, rb_right = 0x0, rb_left = 0x0}, start = 0xc000000000062354, end = 0xc000000000062354, namelen = 40, type = 2 '\002', binding = 0 '\000', idle = 0 '\000', ignore = 0 '\000', inlined = 0 '\000', arch_sym = 0 '\000', annotate2 = false, name = 0x1159739e "kprobe_optinsn_page\t[__builtin__kprobes]"} *prev = {rb_node = {__rb_parent_color = 278219041, rb_right = 0x109548b0, rb_left = 0x109547c0}, start = 0xc000000000062354, end = 0xc000000000062354, namelen = 12, type = 2 '\002', binding = 1 '\001', idle = 0 '\000', ignore = 0 '\000', inlined = 0 '\000', arch_sym = 0 '\000', annotate2 = false, name = 0x1095486e "optinsn_slot"} In this case, prev->start == prev->end == curr->start == curr->end, thus the condition above thinks that "we need a fixup due to zero length of prev symbol, but it has been probably done, since the prev->end == curr->start", which is wrong. After the patch, the execution path proceeds to arch__symbols__fixup_end function which fixes up the size of prev symbol by adding page_size to its end offset. Fixes: 3b01a413c196c910 ("perf symbols: Improve kallsyms symbol end addr calculation") Signed-off-by: Michael Petlan <mpetlan@redhat.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Link: http://lore.kernel.org/lkml/20220317135536.805-1-mpetlan@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-3/+5
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-12perf parse: Fix event parser error for hybrid systemsZhengjun Xing1-2/+4
This bug happened on hybrid systems when both cpu_core and cpu_atom have the same event name such as "UOPS_RETIRED.MS" while their event terms are different, then during perf stat, the event for cpu_atom will parse fail and then no output for cpu_atom. UOPS_RETIRED.MS -> cpu_core/period=0x1e8483,umask=0x4,event=0xc2,frontend=0x8/ UOPS_RETIRED.MS -> cpu_atom/period=0x1e8483,umask=0x1,event=0xc2/ It is because event terms in the "head" of parse_events_multi_pmu_add will be changed to event terms for cpu_core after parsing UOPS_RETIRED.MS for cpu_core, then when parsing the same event for cpu_atom, it still uses the event terms for cpu_core, but event terms for cpu_atom are different with cpu_core, the event parses for cpu_atom will fail. This patch fixes it, the event terms should be parsed from the original event. This patch can work for the hybrid systems that have the same event in more than 2 PMUs. It also can work in non-hybrid systems. Before: # perf stat -v -e UOPS_RETIRED.MS -a sleep 1 Using CPUID GenuineIntel-6-97-1 UOPS_RETIRED.MS -> cpu_core/period=0x1e8483,umask=0x4,event=0xc2,frontend=0x8/ Control descriptor is not initialized UOPS_RETIRED.MS: 2737845 16068518485 16068518485 Performance counter stats for 'system wide': 2,737,845 cpu_core/UOPS_RETIRED.MS/ 1.002553850 seconds time elapsed After: # perf stat -v -e UOPS_RETIRED.MS -a sleep 1 Using CPUID GenuineIntel-6-97-1 UOPS_RETIRED.MS -> cpu_core/period=0x1e8483,umask=0x4,event=0xc2,frontend=0x8/ UOPS_RETIRED.MS -> cpu_atom/period=0x1e8483,umask=0x1,event=0xc2/ Control descriptor is not initialized UOPS_RETIRED.MS: 1977555 16076950711 16076950711 UOPS_RETIRED.MS: 568684 8038694234 8038694234 Performance counter stats for 'system wide': 1,977,555 cpu_core/UOPS_RETIRED.MS/ 568,684 cpu_atom/UOPS_RETIRED.MS/ 1.004758259 seconds time elapsed Fixes: fb0811535e92c6c1 ("perf parse-events: Allow config on kernel PMU events") Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220307151627.30049-1-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-12perf parse-events: Fix NULL check against wrong variableWeiguo Li1-1/+1
We did a null check after "tmp->symbol = strdup(...)", but we checked "list->symbol" other than "tmp->symbol". Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Weiguo Li <liwg06@foxmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/tencent_DF39269807EC9425E24787E6DB632441A405@qq.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-03-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-6/+5
net/batman-adv/hard-interface.c commit 690bb6fb64f5 ("batman-adv: Request iflink once in batadv-on-batadv check") commit 6ee3c393eeb7 ("batman-adv: Demote batadv-on-batadv skip error message") https://lore.kernel.org/all/20220302163049.101957-1-sw@simonwunderlich.de/ net/smc/af_smc.c commit 4d08b7b57ece ("net/smc: Fix cleanup when register ULP fails") commit 462791bbfa35 ("net/smc: add sysctl interface for SMC") https://lore.kernel.org/all/20220302112209.355def40@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-01perf: Add irq and exception return branch typesAnshuman Khandual1-1/+3
This expands generic branch type classification by adding two more entries there in i.e irq and exception return. Also updates the x86 implementation to process X86_BR_IRET and X86_BR_IRQ records as appropriate. This changes branch types reported to user space on x86 platform but it should not be a problem. The possible scenarios and impacts are enumerated here. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1645681014-3346-1-git-send-email-anshuman.khandual@arm.com
2022-02-22perf data: Fix double free in perf_session__delete()Alexey Bayduraev1-4/+3
When perf_data__create_dir() fails, it calls close_dir(), but perf_session__delete() also calls close_dir() and since dir.version and dir.nr were initialized by perf_data__create_dir(), a double free occurs. This patch moves the initialization of dir.version and dir.nr after successful initialization of dir.files, that prevents double freeing. This behavior is already implemented in perf_data__open_dir(). Fixes: 145520631130bd64 ("perf data: Add perf_data__(create_dir|close_dir) functions") Signed-off-by: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220218152341.5197-2-alexey.v.bayduraev@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-22linkage: remove SYM_FUNC_{START,END}_ALIAS()Mark Rutland1-21/+0
Now that all aliases are defined using SYM_FUNC_ALIAS(), remove the old SYM_FUNC_{START,END}_ALIAS() macros. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Mark Brown <broonie@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220216162229.1076788-5-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-02-22linkage: add SYM_FUNC_ALIAS{,_LOCAL,_WEAK}()Mark Rutland1-0/+35
Currently aliasing an asm function requires adding START and END annotations for each name, as per Documentation/asm-annotations.rst: SYM_FUNC_START_ALIAS(__memset) SYM_FUNC_START(memset) ... asm insns ... SYM_FUNC_END(memset) SYM_FUNC_END_ALIAS(__memset) This is more painful than necessary to maintain, especially where a function has many aliases, some of which we may wish to define conditionally. For example, arm64's memcpy/memmove implementation (which uses some arch-specific SYM_*() helpers) has: SYM_FUNC_START_ALIAS(__memmove) SYM_FUNC_START_ALIAS_WEAK_PI(memmove) SYM_FUNC_START_ALIAS(__memcpy) SYM_FUNC_START_WEAK_PI(memcpy) ... asm insns ... SYM_FUNC_END_PI(memcpy) EXPORT_SYMBOL(memcpy) SYM_FUNC_END_ALIAS(__memcpy) EXPORT_SYMBOL(__memcpy) SYM_FUNC_END_ALIAS_PI(memmove) EXPORT_SYMBOL(memmove) SYM_FUNC_END_ALIAS(__memmove) EXPORT_SYMBOL(__memmove) SYM_FUNC_START(name) It would be much nicer if we could define the aliases *after* the standard function definition. This would avoid the need to specify each symbol name twice, and would make it easier to spot the canonical function definition. This patch adds new macros to allow us to do so, which allows the above example to be rewritten more succinctly as: SYM_FUNC_START(__pi_memcpy) ... asm insns ... SYM_FUNC_END(__pi_memcpy) SYM_FUNC_ALIAS(__memcpy, __pi_memcpy) EXPORT_SYMBOL(__memcpy) SYM_FUNC_ALIAS_WEAK(memcpy, __memcpy) EXPORT_SYMBOL(memcpy) SYM_FUNC_ALIAS(__pi_memmove, __pi_memcpy) SYM_FUNC_ALIAS(__memmove, __pi_memmove) EXPORT_SYMBOL(__memmove) SYM_FUNC_ALIAS_WEAK(memmove, __memmove) EXPORT_SYMBOL(memmove) The reduction in duplication will also make it possible to replace some uses of WEAK with more accurate Kconfig guards, e.g. #ifndef CONFIG_KASAN SYM_FUNC_ALIAS(memmove, __memmove) EXPORT_SYMBOL(memmove) #endif ... which should make it easier to ensure that symbols are neither used nor overidden unexpectedly. The existing SYM_FUNC_START_ALIAS() and SYM_FUNC_START_LOCAL_ALIAS() are marked as deprecated, and will be removed once existing users are moved over to the new scheme. The tools/perf/ copy of linkage.h is updated to match. A subsequent patch will depend upon this when updating the x86 asm annotations. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Mark Brown <broonie@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220216162229.1076788-2-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-02-18perf evlist: Fix failed to use cpu list for uncore eventsZhengjun Xing1-2/+2
The 'perf record' and 'perf stat' commands have supported the option '-C/--cpus' to count or collect only on the list of CPUs provided. Commit 1d3351e631fc34d7 ("perf tools: Enable on a list of CPUs for hybrid") add it to be supported for hybrid. For hybrid support, it checks the cpu list are available on hybrid PMU. But when we test only uncore events(or events not in cpu_core and cpu_atom), there is a bug: Before: # perf stat -C0 -e uncore_clock/clockticks/ sleep 1 failed to use cpu list 0 In this case, for uncore event, its pmu_name is not cpu_core or cpu_atom, so in evlist__fix_hybrid_cpus, perf_pmu__find_hybrid_pmu should return NULL,both events_nr and unmatched_count should be 0 ,then the cpu list check function evlist__fix_hybrid_cpus return -1 and the error "failed to use cpu list 0" will happen. Bypass "events_nr=0" case then the issue is fixed. After: # perf stat -C0 -e uncore_clock/clockticks/ sleep 1 Performance counter stats for 'CPU(s) 0': 195,476,873 uncore_clock/clockticks/ 1.004518677 seconds time elapsed When testing with at least one core event and uncore events, it has no issue. # perf stat -C0 -e cpu_core/cpu-cycles/,uncore_clock/clockticks/ sleep 1 Performance counter stats for 'CPU(s) 0': 5,993,774 cpu_core/cpu-cycles/ 301,025,912 uncore_clock/clockticks/ 1.003964934 seconds time elapsed Fixes: 1d3351e631fc34d7 ("perf tools: Enable on a list of CPUs for hybrid") Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: alexander.shishkin@intel.com Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20220218093127.1844241-1-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-10/+9
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17perf bpf: Defer freeing string after possible strlen() on itArnaldo Carvalho de Melo1-1/+2
This was detected by the gcc in Fedora Rawhide's gcc: 50 11.01 fedora:rawhide : FAIL gcc version 12.0.1 20220205 (Red Hat 12.0.1-0) (GCC) inlined from 'bpf__config_obj' at util/bpf-loader.c:1242:9: util/bpf-loader.c:1225:34: error: pointer 'map_opt' may be used after 'free' [-Werror=use-after-free] 1225 | *key_scan_pos += strlen(map_opt); | ^~~~~~~~~~~~~~~ util/bpf-loader.c:1223:9: note: call to 'free' here 1223 | free(map_name); | ^~~~~~~~~~~~~~ cc1: all warnings being treated as errors So do the calculations on the pointer before freeing it. Fixes: 04f9bf2bac72480c ("perf bpf-loader: Add missing '*' for key_scan_pos") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang ShaoBo <bobo.shaobowang@huawei.com> Link: https://lore.kernel.org/lkml/Yg1VtQxKrPpS3uNA@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-16perf cs-etm: Fix corrupt inject files when only last branch option is enabledJames Clark1-0/+2
'perf inject' with Coresight data generates files that cannot be opened when only the last branch option is specified: perf inject -i perf.data --itrace=l -o inject.data perf script -i inject.data 0x33faa8 [0x8]: failed to process type: 9 [Bad address] This is because cs_etm__synth_instruction_sample() is called even when the sample type for instructions hasn't been setup. Last branch records are attached to instruction samples so it doesn't make sense to generate them when --itrace=i isn't specified anyway. This change disables all calls of cs_etm__synth_instruction_sample() unless --itrace=i is specified, resulting in a file with no samples if only --itrace=l is provided, rather than a bad file. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220210200620.1227232-2-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-16perf cs-etm: No-op refactor of synth opt usageJames Clark1-9/+5
sample_branches and sample_instructions are already saved in the synth_opts struct. Other usages like synth_opts.last_branch don't save a value, so make this more consistent by always going through synth_opts and not saving duplicate values. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220210200620.1227232-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski9-14/+39
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-10Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski1-4/+6
Daniel Borkmann says: ==================== pull-request: bpf-next 2022-02-09 We've added 126 non-merge commits during the last 16 day(s) which contain a total of 201 files changed, 4049 insertions(+), 2215 deletions(-). The main changes are: 1) Add custom BPF allocator for JITs that pack multiple programs into a huge page to reduce iTLB pressure, from Song Liu. 2) Add __user tagging support in vmlinux BTF and utilize it from BPF verifier when generating loads, from Yonghong Song. 3) Add per-socket fast path check guarding from cgroup/BPF overhead when used by only some sockets, from Pavel Begunkov. 4) Continued libbpf deprecation work of APIs/features and removal of their usage from samples, selftests, libbpf & bpftool, from Andrii Nakryiko and various others. 5) Improve BPF instruction set documentation by adding byte swap instructions and cleaning up load/store section, from Christoph Hellwig. 6) Switch BPF preload infra to light skeleton and remove libbpf dependency from it, from Alexei Starovoitov. 7) Fix architecture-agnostic macros in libbpf for accessing syscall arguments from BPF progs for non-x86 architectures, from Ilya Leoshkevich. 8) Rework port members in struct bpf_sk_lookup and struct bpf_sock to be of 16-bit field with anonymous zero padding, from Jakub Sitnicki. 9) Add new bpf_copy_from_user_task() helper to read memory from a different task than current. Add ability to create sleepable BPF iterator progs, from Kenny Yu. 10) Implement XSK batching for ice's zero-copy driver used by AF_XDP and utilize TX batching API from XSK buffer pool, from Maciej Fijalkowski. 11) Generate temporary netns names for BPF selftests to avoid naming collisions, from Hangbin Liu. 12) Implement bpf_core_types_are_compat() with limited recursion for in-kernel usage, from Matteo Croce. 13) Simplify pahole version detection and finally enable CONFIG_DEBUG_INFO_DWARF5 to be selected with CONFIG_DEBUG_INFO_BTF, from Nathan Chancellor. 14) Misc minor fixes to libbpf and selftests from various folks. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (126 commits) selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide libbpf: Fix compilation warning due to mismatched printf format selftests/bpf: Test BPF_KPROBE_SYSCALL macro libbpf: Add BPF_KPROBE_SYSCALL macro libbpf: Fix accessing the first syscall argument on s390 libbpf: Fix accessing the first syscall argument on arm64 libbpf: Allow overriding PT_REGS_PARM1{_CORE}_SYSCALL selftests/bpf: Skip test_bpf_syscall_macro's syscall_arg1 on arm64 and s390 libbpf: Fix accessing syscall arguments on riscv libbpf: Fix riscv register names libbpf: Fix accessing syscall arguments on powerpc selftests/bpf: Use PT_REGS_SYSCALL_REGS in bpf_syscall_macro libbpf: Add PT_REGS_SYSCALL_REGS macro selftests/bpf: Fix an endianness issue in bpf_syscall_macro test bpf: Fix bpf_prog_pack build HPAGE_PMD_SIZE bpf: Fix leftover header->pages in sparc and powerpc code. libbpf: Fix signedness bug in btf_dump_array_data() selftests/bpf: Do not export subtest as standalone test bpf, x86_64: Fail gracefully on bpf_jit_binary_pack_finalize failures ... ==================== Link: https://lore.kernel.org/r/20220209210050.8425-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-06perf stat: Fix display of grouped aliased eventsIan Rogers1-9/+10
An event may have a number of uncore aliases that when added to the evlist are consecutive. If there are multiple uncore events in a group then parse_events__set_leader_for_uncore_aliase will reorder the evlist so that events on the same PMU are adjacent. The collect_all_aliases function assumes that aliases are in blocks so that only the first counter is printed and all others are marked merged. The reordering for groups breaks the assumption and so all counts are printed. This change removes the assumption from collect_all_aliases that the events are in blocks and instead processes the entire evlist. Before: ``` $ perf stat -e '{UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE,UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE},duration_time' -a -A -- sleep 1 Performance counter stats for 'system wide': CPU0 256,866 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 494,413 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 967 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,738 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 285,161 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 429,920 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 955 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,443 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 310,753 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 416,657 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 1,231 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,573 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 416,067 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 405,966 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 1,481 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,447 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 312,911 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 408,154 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 1,086 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,380 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 333,994 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 370,349 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 1,287 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,335 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 188,107 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 302,423 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 701 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,070 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 307,221 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 383,642 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 1,036 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,158 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 318,479 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 821,545 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 1,028 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 2,550 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 227,618 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 372,272 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 903 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,456 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 376,783 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 419,827 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 1,406 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,453 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 286,583 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 429,956 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 999 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,436 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 313,867 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 370,159 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 1,114 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,291 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 342,083 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 409,111 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 1,399 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,684 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 365,828 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 376,037 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 1,378 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,411 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 382,456 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 621,743 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 1,232 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,955 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 342,316 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 385,067 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 1,176 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,268 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 373,588 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 386,163 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 1,394 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,464 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 381,206 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 546,891 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 1,266 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,712 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 221,176 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 392,069 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 831 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,456 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 355,401 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 705,595 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 1,235 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 2,216 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 371,436 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 428,103 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 1,306 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,442 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 384,352 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 504,200 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 1,468 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,860 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 228,856 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 287,976 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 832 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,060 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 215,121 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 334,162 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 681 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,026 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 296,179 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 436,083 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 1,084 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,525 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 262,296 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 416,573 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 986 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,533 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 285,852 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 359,842 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 1,073 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,326 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 303,379 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 367,222 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 1,008 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,156 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 273,487 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 425,449 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 932 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,367 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 297,596 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 414,793 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 1,140 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,601 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 342,365 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 360,422 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 1,291 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,342 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 327,196 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 580,858 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 1,122 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 2,014 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 296,564 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 452,817 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 1,087 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,694 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 375,002 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 389,393 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 1,478 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 1,540 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 365,213 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 594,685 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 1,401 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 2,222 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 1,000,749,060 ns duration_time 1.000749060 seconds time elapsed ``` After: ``` Performance counter stats for 'system wide': CPU0 20,547,434 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU36 45,202,862 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE CPU0 82,001 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU36 159,688 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE CPU0 1,000,464,828 ns duration_time 1.000464828 seconds time elapsed ``` Fixes: 3cdc5c2cb924acb4 ("perf parse-events: Handle uncore event aliases in small groups properly") Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Asaf Yaffe <asaf.yaffe@intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220205010941.1065469-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-06perf tools: Apply correct label to user/kernel symbols in branch modeGerman Gomez3-2/+5
In branch mode, the branch symbols were being displayed with incorrect cpumode labels. So fix this. For example, before: # perf record -b -a -- sleep 1 # perf report -b Overhead Command Source Shared Object Source Symbol Target Symbol 0.08% swapper [kernel.kallsyms] [k] rcu_idle_enter [k] cpuidle_enter_state ==> 0.08% cmd0 [kernel.kallsyms] [.] psi_group_change [.] psi_group_change 0.08% cmd1 [kernel.kallsyms] [k] psi_group_change [k] psi_group_change After: # perf report -b Overhead Command Source Shared Object Source Symbol Target Symbol 0.08% swapper [kernel.kallsyms] [k] rcu_idle_enter [k] cpuidle_enter_state 0.08% cmd0 [kernel.kallsyms] [k] psi_group_change [k] pei_group_change 0.08% cmd1 [kernel.kallsyms] [k] psi_group_change [k] psi_group_change Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: German Gomez <german.gomez@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220126105927.3411216-1-german.gomez@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-06perf bpf: Fix a typo in bpf_counter_cgroup.cMasanari Iida1-1/+1
This patch fixes a spelling typo in error message. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20211225005558.503935-1-standby24x7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-06perf synthetic-events: Return error if procfs isn't mounted for PID namespacesLeo Yan1-0/+19
For perf recording, it retrieves process info by iterating nodes in proc fs. If we run perf in a non-root PID namespace with command: # unshare --fork --pid perf record -e cycles -a -- test_program ... in this case, unshare command creates a child PID namespace and launches perf tool in it, but the issue is the proc fs is not mounted for the non-root PID namespace, this leads to the perf tool gathering process info from its parent PID namespace. We can use below command to observe the process nodes under proc fs: # unshare --pid --fork ls /proc 1 137 1968 2128 3 342 48 62 78 crypto kcore net uptime 10 138 2 2142 30 35 49 63 8 devices keys pagetypeinfo version 11 139 20 2143 304 36 50 64 82 device-tree key-users partitions vmallocinfo 12 14 2011 22 305 37 51 65 83 diskstats kmsg self vmstat 128 140 2038 23 307 39 52 656 84 driver kpagecgroup slabinfo zoneinfo 129 15 2074 24 309 4 53 67 9 execdomains kpagecount softirqs 13 16 2094 241 31 40 54 68 asound fb kpageflags stat 130 164 2096 242 310 41 55 69 buddyinfo filesystems loadavg swaps 131 17 2098 25 317 42 56 70 bus fs locks sys 132 175 21 26 32 43 57 71 cgroups interrupts meminfo sysrq-trigger 133 179 2102 263 329 44 58 75 cmdline iomem misc sysvipc 134 1875 2103 27 330 45 59 76 config.gz ioports modules thread-self 135 19 2117 29 333 46 6 77 consoles irq mounts timer_list 136 1941 2121 298 34 47 60 773 cpuinfo kallsyms mtd tty So it shows many existed tasks, since unshared command has not mounted the proc fs for the new created PID namespace, it still accesses the proc fs of the root PID namespace. This leads to two prominent issues: - Firstly, PID values are mismatched between thread info and samples. The gathered thread info are coming from the proc fs of the root PID namespace, but samples record its PID from the child PID namespace. - The second issue is profiled program 'test_program' returns its forked PID number from the child PID namespace, perf tool wrongly uses this PID number to retrieve the process info via the proc fs of the root PID namespace. To avoid issues, we need to mount proc fs for the child PID namespace with the option '--mount-proc' when use unshare command: # unshare --fork --pid --mount-proc perf record -e cycles -a -- test_program Conversely, when the proc fs of the root PID namespace is used by child namespace, perf tool can detect the multiple PID levels and nsinfo__is_in_root_namespace() returns false, this patch reports error for this case: # unshare --fork --pid perf record -e cycles -a -- test_program Couldn't synthesize bpf events. Perf runs in non-root PID namespace but it tries to gather process info from its parent PID namespace. Please mount the proc file system properly, e.g. add the option '--mount-proc' for unshare command. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20211224124014.2492751-1-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-06perf session: Check for NULL pointer before dereferenceAmeer Hamza1-1/+2
Move NULL pointer check before dereferencing the variable. Addresses-Coverity: 1497622 ("Derereference before null check") Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ameer Hamza <amhamza.mgc@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Link: https://lore.kernel.org/r/20220125121141.18347-1-amhamza.mgc@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-06perf annotate: Set error stream of objdump process for TUINamhyung Kim1-0/+1
The stderr should be set to a pipe when using TUI. Otherwise it'd print to stdout and break TUI windows with an error message. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20220202070828.143303-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-06perf tools: Add missing branch_sample_type to perf_event_attr__fprintf()Anshuman Khandual1-1/+1
This updates branch sample type with missing PERF_SAMPLE_BRANCH_TYPE_SAVE. Suggested-by: James Clark <james.clark@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/1643799443-15109-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski24-81/+172
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-26perf: use generic bpf_program__set_type() to set BPF prog typeAndrii Nakryiko1-2/+2
bpf_program__set_<type>() APIs are deprecated, use generic bpf_program__set_type() APIs instead. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20220124194254.2051434-8-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-25perf: Stop using bpf_object__open_buffer() APIChristy Lee1-2/+4
bpf_object__open_buffer() API is deprecated, use the unified opts bpf_object__open_mem() API in perf instead. This requires at least libbpf 0.0.6. Signed-off-by: Christy Lee <christylee@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220125005923.418339-3-christylee@fb.com
2022-01-25Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski2-51/+41
Daniel Borkmann says: ==================== pull-request: bpf-next 2022-01-24 We've added 80 non-merge commits during the last 14 day(s) which contain a total of 128 files changed, 4990 insertions(+), 895 deletions(-). The main changes are: 1) Add XDP multi-buffer support and implement it for the mvneta driver, from Lorenzo Bianconi, Eelco Chaudron and Toke Høiland-Jørgensen. 2) Add unstable conntrack lookup helpers for BPF by using the BPF kfunc infra, from Kumar Kartikeya Dwivedi. 3) Extend BPF cgroup programs to export custom ret value to userspace via two helpers bpf_get_retval() and bpf_set_retval(), from YiFei Zhu. 4) Add support for AF_UNIX iterator batching, from Kuniyuki Iwashima. 5) Complete missing UAPI BPF helper description and change bpf_doc.py script to enforce consistent & complete helper documentation, from Usama Arif. 6) Deprecate libbpf's legacy BPF map definitions and streamline XDP APIs to follow tc-based APIs, from Andrii Nakryiko. 7) Support BPF_PROG_QUERY for BPF programs attached to sockmap, from Di Zhu. 8) Deprecate libbpf's bpf_map__def() API and replace users with proper getters and setters, from Christy Lee. 9) Extend libbpf's btf__add_btf() with an additional hashmap for strings to reduce overhead, from Kui-Feng Lee. 10) Fix bpftool and libbpf error handling related to libbpf's hashmap__new() utility function, from Mauricio Vásquez. 11) Add support to BTF program names in bpftool's program dump, from Raman Shukhau. 12) Fix resolve_btfids build to pick up host flags, from Connor O'Brien. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (80 commits) selftests, bpf: Do not yet switch to new libbpf XDP APIs selftests, xsk: Fix rx_full stats test bpf: Fix flexible_array.cocci warnings xdp: disable XDP_REDIRECT for xdp frags bpf: selftests: add CPUMAP/DEVMAP selftests for xdp frags bpf: selftests: introduce bpf_xdp_{load,store}_bytes selftest net: xdp: introduce bpf_xdp_pointer utility routine bpf: generalise tail call map compatibility check libbpf: Add SEC name for xdp frags programs bpf: selftests: update xdp_adjust_tail selftest to include xdp frags bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature bpf: introduce frags support to bpf_prog_test_run_xdp() bpf: move user_size out of bpf_test_init bpf: add frags support to xdp copy helpers bpf: add frags support to the bpf_xdp_adjust_tail() API bpf: introduce bpf_xdp_get_buff_len helper net: mvneta: enable jumbo frames if the loaded XDP program support frags bpf: introduce BPF_F_XDP_HAS_FRAGS flag in prog_flags loading the ebpf program net: mvneta: add frags support to XDP_TX xdp: add frags support to xdp_return_{buff/frame} ... ==================== Link: https://lore.kernel.org/r/20220124221235.18993-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-22perf test: Add parse-events test for aliases with hyphensJohn Garry1-9/+33
Add a test which allows us to test parsing an event alias with hyphens. Since these events typically do not exist on most host systems, add the alias to the fake pmu. Function perf_pmu__test_parse_init() has terms added to match known test aliases. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Liu <liuqi115@huawei.com> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/1642432215-234089-4-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-22perf parse-events: Support event alias in form foo-bar-bazJohn Garry4-4/+41
Event aliasing for events whose name in the form foo-bar-baz is not supported, while foo-bar, foo_bar_baz, and other combinations are, i.e. two hyphens are not supported. The HiSilicon D06 platform has events in such form: $ ./perf list sdir-home-migrate List of pre-defined events (to be used in -e): uncore hha: sdir-home-migrate [Unit: hisi_sccl,hha] $ sudo ./perf stat -e sdir-home-migrate event syntax error: 'sdir-home-migrate' \___ parser error Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event>event selector. use 'perf list' to list available events To support, add an extra PMU event symbol type for "baz", and add a new rule in the bison file. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Liu <liuqi115@huawei.com> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/1642432215-234089-2-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-22perf evsel: Override attr->sample_period for non-libpfm4 eventsGerman Gomez1-8/+17
A previous patch preventing "attr->sample_period" values from being overridden in pfm events changed a related behaviour in arm-spe. Before said patch: perf record -c 10000 -e arm_spe_0// -- sleep 1 Would yield an SPE event with period=10000. After the patch, the period in "-c 10000" was being ignored because the arm-spe code initializes sample_period to a non-zero value. This patch restores the previous behaviour for non-libpfm4 events. Fixes: ae5dcc8abe31 (“perf record: Prevent override of attr->sample_period for libpfm4 events”) Reported-by: Chase Conklin <chase.conklin@arm.com> Signed-off-by: German Gomez <german.gomez@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20220118144054.2541-1-german.gomez@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-22perf cpumap: Remove duplicate include in cpumap.hLv Ruyi1-1/+0
Remove all but the first include of stdbool.h from cpumap.h. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220117083730.863200-1-lv.ruyi@zte.com.cn Signed-off-by: CGEL ZTE <cgel.zte@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-22perf cpumap: Migrate to libperf cpumap apiIan Rogers15-44/+48
Switch from directly accessing the perf_cpu_map to using the appropriate libperf API when possible. Using the API simplifies the job of refactoring use of perf_cpu_map. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: André Almeida <andrealmeid@collabora.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yury Norov <yury.norov@gmail.com> Link: http://lore.kernel.org/lkml/20220122045811.3402706-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-22perf python: Fix cpu_map__item() buildingIan Rogers1-3/+3
Value should be built as an integer. Switch some uses of perf_cpu_map to use the library API. Fixes: 6d18804b963b78dc ("perf cpumap: Give CPUs their own type") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: André Almeida <andrealmeid@collabora.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yury Norov <yury.norov@gmail.com> Link: http://lore.kernel.org/lkml/20220122045811.3402706-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-19perf machine: Use path__join() to compose a path instead of snprintf(dir, ↵Arnaldo Carvalho de Melo1-1/+2
'/', filename) Its more intention revealing, and if we're interested in the odd cases where this may end up truncating we can do debug checks at one centralized place. Motivation, of all the container builds, fedora rawhide started complaining of: util/machine.c: In function ‘machine__create_modules’: util/machine.c:1419:50: error: ‘%s’ directive output may be truncated writing up to 255 bytes into a region of size between 0 and 4095 [-Werror=format-truncation=] 1419 | snprintf(path, sizeof(path), "%s/%s", dir_name, dent->d_name); | ^~ In file included from /usr/include/stdio.h:894, from util/branch.h:9, from util/callchain.h:8, from util/machine.c:7: In function ‘snprintf’, inlined from ‘maps__set_modules_path_dir’ at util/machine.c:1419:3, inlined from ‘machine__set_modules_path’ at util/machine.c:1473:9, inlined from ‘machine__create_modules’ at util/machine.c:1519:7: /usr/include/bits/stdio2.h:71:10: note: ‘__builtin___snprintf_chk’ output between 2 and 4352 bytes into a destination of size 4096 There are other places where we should use path__join(), but lets get rid of this one first. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Link: Link: https://lore.kernel.org/r/YebZKjwgfdOz0lAs@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-18perf evlist: No need to setup affinities when disabling events for pid targetsArnaldo Carvalho de Melo1-5/+9
When the target is a pid, not started by 'perf stat' we need to disable the events, and in that case there is no need to setup affinities as we use a dummy CPU map, with just one entry set to -1. So stop doing it to avoid this needless call to sched_getaffinity(): # strace -ke sched_getaffinity perf stat -e cycles -p 241957 sleep 1 <SNIP> sched_getaffinity(0, 512, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]) = 8 > /usr/lib64/libc-2.33.so(sched_getaffinity@@GLIBC_2.3.4+0x1a) [0xe6eea] > /var/home/acme/bin/perf(affinity__setup+0x6a) [0x532a2a] > /var/home/acme/bin/perf(__evlist__disable.constprop.0+0x27) [0x4b9827] > /var/home/acme/bin/perf(cmd_stat+0x29b5) [0x431725] > /var/home/acme/bin/perf(run_builtin+0x6a) [0x4a2cfa] > /var/home/acme/bin/perf(main+0x612) [0x40f8c2] > /usr/lib64/libc-2.33.so(__libc_start_main+0xd4) [0x27b74] > /var/home/acme/bin/perf(_start+0x2d) [0x40fadd] <SNIP> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220117160931.1191712-5-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-18perf evlist: No need to setup affinities when enabling events for pid targetsArnaldo Carvalho de Melo1-5/+9
When the target is a pid, not started by 'perf stat' we need to enable the events, and in that case there is no need to setup affinities as we use a dummy CPU map, with just one entry set to -1. So stop doing it to avoid this needless call to sched_getaffinity(): # strace -ke sched_getaffinity perf stat -e cycles -p 241957 sleep 1 <SNIP> sched_getaffinity(0, 512, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]) = 8 > /usr/lib64/libc-2.33.so(sched_getaffinity@@GLIBC_2.3.4+0x1a) [0xe6eea] > /var/home/acme/bin/perf(affinity__setup+0x6a) [0x5329ca] > /var/home/acme/bin/perf(__evlist__enable.constprop.0+0x23) [0x4b9693] > /var/home/acme/bin/perf(enable_counters+0x14d) [0x42de5d] > /var/home/acme/bin/perf(cmd_stat+0x2358) [0x4310c8] > /var/home/acme/bin/perf(run_builtin+0x6a) [0x4a2cfa] > /var/home/acme/bin/perf(main+0x612) [0x40f8c2] > /usr/lib64/libc-2.33.so(__libc_start_main+0xd4) [0x27b74] > /var/home/acme/bin/perf(_start+0x2d) [0x40fadd] <SNIP> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220117160931.1191712-4-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-18perf affinity: Allow passing a NULL arg to affinity__cleanup()Arnaldo Carvalho de Melo1-1/+7
Just like with free(), NULL is checked to avoid having all callers do it. Its convenient for when not using affinity setup/cleanup for dummy CPU maps, i.e. CPU maps for pid targets. Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220117160931.1191712-2-acme@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-18perf probe: Fix ppc64 'perf probe add events failed' caseZechuan Chen1-0/+3
Because of commit bf794bf52a80c627 ("powerpc/kprobes: Fix kallsyms lookup across powerpc ABIv1 and ABIv2"), in ppc64 ABIv1, our perf command eliminates the need to use the prefix "." at the symbol name. But when the command "perf probe -a schedule" is executed on ppc64 ABIv1, it obtains two symbol address information through /proc/kallsyms, for example: cat /proc/kallsyms | grep -w schedule c000000000657020 T .schedule c000000000d4fdb8 D schedule The symbol "D schedule" is not a function symbol, and perf will print: "p:probe/schedule _text+13958584"Failed to write event: Invalid argument Therefore, when searching symbols from map and adding probe point for them, a symbol type check is added. If the type of symbol is not a function, skip it. Fixes: bf794bf52a80c627 ("powerpc/kprobes: Fix kallsyms lookup across powerpc ABIv1 and ABIv2") Signed-off-by: Zechuan Chen <chenzechuan1@huawei.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jianlin Lv <Jianlin.Lv@arm.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20211228111338.218602-1-chenzechuan1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-18Merge tag 'perf-tools-for-v5.17-2022-01-16' of ↵Linus Torvalds64-830/+2317
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tool updates from Arnaldo Carvalho de Melo: "New features: - Add 'trace' subcommand for 'perf ftrace', setting the stage for more 'perf ftrace' subcommands. Not using a subcommand yields the previous behaviour of 'perf ftrace'. - Add 'latency' subcommand to 'perf ftrace', that can use the function graph tracer or a BPF optimized one, via the -b/--use-bpf option. E.g.: $ sudo perf ftrace latency -a -T mutex_lock sleep 1 # DURATION | COUNT | GRAPH | 0 - 1 us | 4596 | ######################## | 1 - 2 us | 1680 | ######### | 2 - 4 us | 1106 | ##### | 4 - 8 us | 546 | ## | 8 - 16 us | 562 | ### | 16 - 32 us | 1 | | 32 - 64 us | 0 | | 64 - 128 us | 0 | | 128 - 256 us | 0 | | 256 - 512 us | 0 | | 512 - 1024 us | 0 | | 1 - 2 ms | 0 | | 2 - 4 ms | 0 | | 4 - 8 ms | 0 | | 8 - 16 ms | 0 | | 16 - 32 ms | 0 | | 32 - 64 ms | 0 | | 64 - 128 ms | 0 | | 128 - 256 ms | 0 | | 256 - 512 ms | 0 | | 512 - 1024 ms | 0 | | 1 - ... s | 0 | | The original implementation of this command was in the bcc tool. - Support --cputype option for hybrid events in 'perf stat'. Improvements: - Call chain improvements for ARM64. - No need to do any affinity setup when profiling pids. - Reduce multiplexing with duration_time in 'perf stat' metrics. - Improve error message for uncore events, stating that some event groups are can only be used in system wide (-a) mode. - perf stat metric group leader fixes/improvements, including arch specific changes to better support Intel topdown events. - Probe non-deprecated sysfs path first, i.e. try the path /sys/devices/system/cpu/cpuN/topology/thread_siblings first, then the old /sys/devices/system/cpu/cpuN/topology/core_cpus. - Disable debuginfod by default in 'perf record', to avoid stalls on distros such as Fedora 35. - Use unbuffered output in 'perf bench' when pipe/tee'ing to a file. - Enable ignore_missing_thread in 'perf trace' Fixes: - Avoid TUI crash when navigating in the annotation of recursive functions. - Fix hex dump character output in 'perf script'. - Fix JSON indentation to 4 spaces standard in the ARM vendor event files. - Fix use after free in metric__new(). - Fix IS_ERR_OR_NULL() usage in the perf BPF loader. - Fix up cross-arch register support, i.e. when printing register names take into account the architecture where the perf.data file was collected. - Fix SMT fallback with large core counts. - Don't lower case MetricExpr when parsing JSON files so as not to lose info such as the ":G" event modifier in metrics. perf test: - Add basic stress test for sigtrap handling to 'perf test'. - Fix 'perf test' failures on s/390 - Enable system wide for metricgroups test in 'perf test´. - Use 3 digits for test numbering now we can have more tests. Arch specific: - Add events for Arm Neoverse N2 in the ARM JSON vendor event files - Support PERF_MEM_LVLNUM encodings in powerpc, that came from a single patch series, where I incorrectly merged the kernel bits, that were then reverted after coordination with Michael Ellerman and Stephen Rothwell. - Add ARM SPE total latency as PERF_SAMPLE_WEIGHT. - Update AMD documentation, with info on raw event encoding. - Add support for global and local variants of the "p_stage_cyc" sort key, applicable to perf.data files collected on powerpc. - Remove duplicate and incorrect aux size checks in the ARM CoreSight ETM code. Refactorings: - Add a perf_cpu abstraction to disambiguate CPUs and CPU map indexes, fixing problems along the way. - Document CPU map methods. UAPI sync: - Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench mem memcpy' - Sync UAPI files with the kernel sources: drm, msr-index, cpufeatures. Build system - Enable warnings through HOSTCFLAGS. - Drop requirement for libstdc++.so for libopencsd check libperf: - Make libperf adopt perf_counts_values__scale() from tools/perf/util/. - Add a stat multiplexing test to libperf" * tag 'perf-tools-for-v5.17-2022-01-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (115 commits) perf record: Disable debuginfod by default perf evlist: No need to do any affinity setup when profiling pids perf cpumap: Add is_dummy() method perf metric: Fix metric_leader perf cputopo: Fix CPU topology reading on s/390 perf metricgroup: Fix use after free in metric__new() libperf tests: Update a use of the new cpumap API perf arm: Fix off-by-one directory path tools arch x86: Sync the msr-index.h copy with the kernel sources tools headers cpufeatures: Sync with the kernel sources tools headers UAPI: Update tools's copy of drm.h header tools arch: Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench mem memcpy' perf pmu-events: Don't lower case MetricExpr perf expr: Add debug logging for literals perf tools: Probe non-deprecated sysfs path 1st perf tools: Fix SMT fallback with large core counts perf cpumap: Give CPUs their own type perf stat: Correct first_shadow_cpu to return index perf script: Fix flipped index and cpu perf c2c: Use more intention revealing iterator ...
2022-01-16Merge tag 'trace-v5.17' of ↵Linus Torvalds6-0/+12
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "New: - The Real Time Linux Analysis (RTLA) tool is added to the tools directory. - Can safely filter on user space pointers with: field.ustring ~ "match-string" - eprobes can now be filtered like any other event. - trace_marker(_raw) now uses stream_open() to allow multiple threads to safely write to it. Note, this could possibly break existing user space, but we will not know until we hear about it, and then can revert the change if need be. - New field in events to display when bottom halfs are disabled. - Sorting of the ftrace functions are now done at compile time instead of at bootup. Infrastructure changes to support future efforts: - Added __rel_loc type for trace events. Similar to __data_loc but the offset to the dynamic data is based off of the location of the descriptor and not the beginning of the event. Needed for user defined events. - Some simplification of event trigger code. - Make synthetic events process its callback better to not hinder other event callbacks that are registered. Needed for user defined events. And other small fixes and cleanups" * tag 'trace-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (50 commits) tracing: Add ustring operation to filtering string pointers rtla: Add rtla timerlat hist documentation rtla: Add rtla timerlat top documentation rtla: Add rtla timerlat documentation rtla: Add rtla osnoise hist documentation rtla: Add rtla osnoise top documentation rtla: Add rtla osnoise man page rtla: Add Documentation rtla/timerlat: Add timerlat hist mode rtla: Add timerlat tool and timelart top mode rtla/osnoise: Add the hist mode rtla/osnoise: Add osnoise top mode rtla: Add osnoise tool rtla: Helper functions for rtla rtla: Real-Time Linux Analysis tool tracing/osnoise: Properly unhook events if start_per_cpu_kthreads() fails tracing: Remove duplicate warnings when calling trace_create_file() tracing/kprobes: 'nmissed' not showed correctly for kretprobe tracing: Add test for user space strings when filtering on string pointers tracing: Have syscall trace events use trace_event_buffer_lock_reserve() ...
2022-01-15perf record: Disable debuginfod by defaultJiri Olsa2-0/+21
Fedora 35 sets DEBUGINFOD_URLS by default, which might lead to unexpected stalls in perf record exit path, when we try to cache profiled binaries. # DEBUGINFOD_PROGRESS=1 ./perf record -a ^C[ perf record: Woken up 1 times to write data ] Downloading from https://debuginfod.fedoraproject.org/ 447069 Downloading from https://debuginfod.fedoraproject.org/ 1502175 Downloading \^Z Disabling DEBUGINFOD_URLS by default in perf record and adding debuginfod option and .perfconfig variable support to enable id. Default without debuginfo processing: # perf record -a Using system debuginfod setup: # perf record -a --debuginfod Using custom debuginfd url: # perf record -a --debuginfod='https://evenbetterdebuginfodserver.krava' Adding single perf_debuginfod_setup function and using it also in perf buildid-cache command. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20211209200425.303561-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-15perf evlist: No need to do any affinity setup when profiling pidsArnaldo Carvalho de Melo1-1/+1
The cpumap is dummy, so no need to go on figuring out affinity.o This way we reduce the setup time for simple scenarios like: $ perf stat sleep 1 Acked-by: Andi Kleen <andi@firstfloor.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-15perf cpumap: Add is_dummy() methodArnaldo Carvalho de Melo1-0/+10
Needed to check if a cpu_map is dummy, i.e. not a cpu map at all, for pid monitoring scenarios. This probably needs to move to libperf, but since perf itself is the first and so far only user, leave it at tools/perf/util/. Acked-by: Andi Kleen <andi@firstfloor.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-15perf metric: Fix metric_leaderIan Rogers1-1/+1
Multiple events may have a metric_leader to aggregate into. This happens for uncore events where, for example, uncore_imc is expanded into uncore_imc_0, uncore_imc_1, etc. Such events all have the same metric_id and should aggregate into the first event. The change introducing metric_ids had a bug where the metric_id was compared to itself, creating an always true condition. Correct this by comparing the event in the metric_evlist and the metric_leader. Fixes: ec5c5b3d2c21b3f3 ("perf metric: Encode and use metric-id as qualifier") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20220115062852.1959424-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-14perf cputopo: Fix CPU topology reading on s/390Thomas Richter1-1/+2
Commit fdf1e29b6118c18f ("perf expr: Add metric literals for topology.") fails on s390: # ./perf test -Fv 7 ... # FAILED tests/expr.c:173 #num_dies >= #num_packages ---- end ---- Simple expression parser: FAILED! # Investigating this issue leads to these functions: build_cpu_topology() +--> has_die_topology(void) { struct utsname uts; if (uname(&uts) < 0) return false; if (strncmp(uts.machine, "x86_64", 6)) return false; .... } which always returns false on s390. The caller build_cpu_topology() checks has_die_topology() return value. On false the the struct cpu_topology::die_cpu_list is not contructed and has zero entries. This leads to the failing comparison: #num_dies >= #num_packages. s390 of course has a positive number of packages. Fix this by adding s390 architecture to support CPU die list. Output after: # ./perf test -Fv 7 7: Simple expression parser : --- start --- division by zero syntax error ---- end ---- Simple expression parser: Ok # Fixes: fdf1e29b6118c18f ("perf expr: Add metric literals for topology.") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20211124090343.9436-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-14perf metricgroup: Fix use after free in metric__new()José Expósito1-1/+1
We shouldn't free() something that will be used in the next line, fix it. Fixes: b85a4d61d3022608 ("perf metric: Allow modifiers on metrics") Addresses-Coverity-ID: 1494000 Signed-off-by: José Expósito <jose.exposito89@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20211208171113.22089-1-jose.exposito89@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>