summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-09-05bareudp: Fix device stats updates.Guillaume Nault1-11/+11
Bareudp devices update their stats concurrently. Therefore they need proper atomic increments. Fixes: 571912c69f0e ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.") Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/04b7b9d0b480158eb3ab4366ec80aa2ab7e41fcb.1725031794.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-04Merge tag 'bcachefs-2024-09-04' of git://evilpiepirate.org/bcachefsLinus Torvalds4-3/+10
Pull bcachefs fixes from Kent Overstreet: - Fix a typo in the rebalance accounting changes - BCH_SB_MEMBER_INVALID: small on disk format feature which will be needed for full erasure coding support; this is only the minimum so that 6.11 can handle future versions without barfing. * tag 'bcachefs-2024-09-04' of git://evilpiepirate.org/bcachefs: bcachefs: BCH_SB_MEMBER_INVALID bcachefs: fix rebalance accounting
2024-09-04Merge branch 'bpf-fix-incorrect-name-check-pass-logic-in-btf_name_valid_section'Alexei Starovoitov2-1/+37
Jeongjun Park says: ==================== bpf: fix incorrect name check pass logic in btf_name_valid_section This patch was written to fix an issue where btf_name_valid_section() would not properly check names with certain conditions and would throw an OOB vuln. And selftest was added to verify this patch. ==================== Link: https://lore.kernel.org/r/20240831054525.364353-1-aha310510@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-04selftests/bpf: Add a selftest to check for incorrect namesJeongjun Park1-0/+34
Add selftest for cases where btf_name_valid_section() does not properly check for certain types of names. Suggested-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Jeongjun Park <aha310510@gmail.com> Link: https://lore.kernel.org/r/20240831054742.364585-1-aha310510@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
2024-09-04perf check: Fix inconsistencies in feature namesAditya Gupta2-4/+4
Fix two inconsistencies in feature names as discussed in [1]: 1. Rename "dwarf-unwind-support" to "dwarf-unwind" 2. 'get_cpuid' feature and 'HAVE_AUXTRACE_SUPPORT' names don't look related, change the feature name to 'auxtrace' to match the macro name, as 'get_cpuid' string is not used anywhere to check the feature presence [1]: https://lore.kernel.org/linux-perf-users/ZoRw5we4HLSTZND6@x1/ Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240904190132.415212-7-adityag@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-04perf tests probe_vfs_getname.sh: Update to use 'perf check feature'Athira Rajeev3-4/+10
In probe_vfs_getname.sh, current we use "perf record --dry-run" to check for libtraceevent and skip the test if perf is not build with libtraceevent. Change the check to use "perf check feature" option Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240904190132.415212-6-adityag@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-04perf tools test_task_analyzer.sh: Update to use 'perf check feature'Aditya Gupta1-2/+2
Currently we use output of 'perf version --build-options', to check whether perf was built with libtraceevent support. Instead, use 'perf check feature libtraceevent' to check for libtraceevent support. Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240904190132.415212-5-adityag@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-04perf version: Update --build-options to use 'supported_features' arrayAditya Gupta1-36/+8
Now that the feature list has been duplicated in a global 'supported_features' array, use that array instead of manually checking status of built-in features. This helps in being consistent with commands such as 'perf check feature', so commands can use the same array, and any new feature can be added at one place, in the 'supported_features' array Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240904190132.415212-4-adityag@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-04Merge tag 'perf-tools-fixes-for-v6.11-2024-09-04' of ↵Linus Torvalds4-5/+11
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Namhyung Kim: "A number of small fixes for the late cycle: - Two more build fixes on 32-bit archs - Fixed a segfault during perf test - Fixed spinlock/rwlock accounting bug in perf lock contention" * tag 'perf-tools-fixes-for-v6.11-2024-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: perf daemon: Fix the build on more 32-bit architectures perf python: include "util/sample.h" perf lock contention: Fix spinlock and rwlock accounting perf test pmu: Set uninitialized PMU alias to null
2024-09-04Merge tag 'hwmon-for-v6.11-rc7' of ↵Linus Torvalds2-3/+5
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - hp-wmi-sensors: Check if WMI event data exists before accessing it - ltc2991: fix register bits defines * tag 'hwmon-for-v6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (hp-wmi-sensors) Check if WMI event data exists hwmon: ltc2991: fix register bits defines
2024-09-04bpf: add check for invalid name in btf_name_valid_section()Jeongjun Park1-1/+3
If the length of the name string is 1 and the value of name[0] is NULL byte, an OOB vulnerability occurs in btf_name_valid_section() and the return value is true, so the invalid name passes the check. To solve this, you need to check if the first position is NULL byte and if the first character is printable. Suggested-by: Eduard Zingerman <eddyz87@gmail.com> Fixes: bd70a8fb7ca4 ("bpf: Allow all printable characters in BTF DATASEC names") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Link: https://lore.kernel.org/r/20240831054702.364455-1-aha310510@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
2024-09-04Merge tag 'for-6.11-rc6-tag' of ↵Linus Torvalds6-23/+42
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - followup fix for direct io and fsync under some conditions, reported by QEMU users - fix a potential leak when disabling quotas while some extent tracking work can still happen - in zoned mode handle unexpected change of zone write pointer in RAID1-like block groups, turn the zones to read-only * tag 'for-6.11-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix race between direct IO write and fsync when using same fd btrfs: zoned: handle broken write pointer on zones btrfs: qgroup: don't use extent changeset when not needed
2024-09-04Merge tag 'v6.11-rc6-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds3-2/+8
Pull smb server fixes from Steve French: - Fix crash in session setup - Fix locking bug - Improve access bounds checking * tag 'v6.11-rc6-server-fixes' of git://git.samba.org/ksmbd: ksmbd: Unlock on in ksmbd_tcp_set_interfaces() ksmbd: unset the binding mark of a reused connection smb: Annotate struct xattr_smb_acl with __counted_by()
2024-09-04Merge tag 'vfs-6.11-rc7.fixes' of ↵Linus Torvalds2-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "Two netfs fixes for this merge window: - Ensure that fscache_cookie_lru_time is deleted when the fscache module is removed to prevent UAF - Fix filemap_invalidate_inode() to use invalidate_inode_pages2_range() Before it used truncate_inode_pages_partial() which causes copy_file_range() to fail on cifs" * tag 'vfs-6.11-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fscache: delete fscache_cookie_lru_timer when fscache exits to avoid UAF mm: Fix filemap_invalidate_inode() to use invalidate_inode_pages2_range()
2024-09-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linuxLinus Torvalds2-4/+10
Pull ARM fix from Russell King: - Fix a build issue with older binutils with LD dead code elimination disabled * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux: ARM: 9414/1: Fix build issue with LD_DEAD_CODE_DATA_ELIMINATION
2024-09-04Merge tag 'parisc-for-6.11-rc7' of ↵Linus Torvalds1-5/+11
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc architecture fix from Helge Deller: - Fix boot issue where boot memory is marked read-only too early * tag 'parisc-for-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Delay write-protection until mark_rodata_ro() call
2024-09-04Merge tag 'mm-hotfixes-stable-2024-09-03-20-19' of ↵Linus Torvalds17-124/+197
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "17 hotfixes, 15 of which are cc:stable. Mostly MM, no identifiable theme. And a few nilfs2 fixups" * tag 'mm-hotfixes-stable-2024-09-03-20-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: alloc_tag: fix allocation tag reporting when CONFIG_MODULES=n mm: vmalloc: optimize vmap_lazy_nr arithmetic when purging each vmap_area mailmap: update entry for Jan Kuliga codetag: debug: mark codetags for poisoned page as empty mm/memcontrol: respect zswap.writeback setting from parent cg too scripts: fix gfp-translate after ___GFP_*_BITS conversion to an enum Revert "mm: skip CMA pages when they are not available" maple_tree: remove rcu_read_lock() from mt_validate() kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y mm/slub: add check for s->flags in the alloc_tagging_slab_free_hook nilfs2: fix state management in error path of log writing function nilfs2: fix missing cleanup on rollforward recovery error nilfs2: protect references to superblock parameters exposed in sysfs userfaultfd: don't BUG_ON() if khugepaged yanks our page table userfaultfd: fix checks for huge PMDs mm: vmalloc: ensure vmap_block is initialised before adding to queue selftests: mm: fix build errors on armhf
2024-09-04ARM: 9414/1: Fix build issue with LD_DEAD_CODE_DATA_ELIMINATIONYuntao Liu2-4/+10
There is a build issue with LD segmentation fault, while CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is not enabled, as bellow. scripts/link-vmlinux.sh: line 49: 3796 Segmentation fault (core dumped) ${ld} ${ldflags} -o ${output} ${wl}--whole-archive ${objs} ${wl}--no-whole-archive ${wl}--start-group ${libs} ${wl}--end-group ${kallsymso} ${btf_vmlinux_bin_o} ${ldlibs} The error occurs in older versions of the GNU ld with version earlier than 2.36. It makes most sense to have a minimum LD version as a dependency for HAVE_LD_DEAD_CODE_DATA_ELIMINATION and eliminate the impact of ".reloc .text, R_ARM_NONE, ." when CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is not enabled. Fixes: ed0f94102251 ("ARM: 9404/1: arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION") Reported-by: Harith George <mail2hgg@gmail.com> Tested-by: Harith George <mail2hgg@gmail.com> Suggested-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Yuntao Liu <liuyuntao12@huawei.com> Link: https://lore.kernel.org/all/14e9aefb-88d1-4eee-8288-ef15d4a9b059@gmail.com/ Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2024-09-04perf jevents: Add cpuid to model lookup commandIan Rogers1-0/+73
When restricting jevents generated json lookup code with JEVENTS_MODEL a list of models must be provided. Some builds don't know model names but know cpuids. Add a command that can convert a cpuid to a model using mapfile.csv files. This can be used with JEVENTS_MODEL like: $ make JEVENTS_MODEL=`./pmu-events/models.py x86 'GenuineIntel-6-8D-1,AuthenticAMD-26-1' pmu-events/arch/` Committer testing: $ tools/perf/pmu-events/models.py x86 'GenuineIntel-6-8D-1,AuthenticAMD-26-1' tools/perf/pmu-events/arch/ tigerlake,amdzen5 $ perf stat -v sleep 1 |& head -1 Using CPUID GenuineIntel-6-B7-1 $ tools/perf/pmu-events/models.py x86 'GenuineIntel-6-B7-1' tools/perf/pmu-events/arch/ alderlake $ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240904044351.712080-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-04perf check: Introduce 'check' subcommandAditya Gupta5-0/+281
Currently the presence of a feature is checked with a combination of perf version --build-options and greps, such as: perf version --build-options | grep " on .* HAVE_FEATURE" Instead of this, introduce a subcommand "perf check feature", with which scripts can test for presence of a feature, such as: perf check feature HAVE_FEATURE 'perf check feature' command is expected to have exit status of 0 if feature is built-in, and 1 if it's not built-in or if feature is not known. Multiple features can also be passed as a comma-separated list, in which case the exit status will be 1 only if all of the passed features are built-in. For example, with below command, it will have exit status of 0 only if both libtraceevent and bpf are enabled, else 1 in all other cases perf check feature libtraceevent,bpf The arguments are case-insensitive. An array 'supported_features' has also been introduced that can be used by other commands like 'perf version --build-options', so that new features can be added in one place, with the array Committer testing: $ perf check feature libtraceevent,bpf libtraceevent: [ on ] # HAVE_LIBTRACEEVENT bpf: [ on ] # HAVE_LIBBPF_SUPPORT $ perf check feature libtraceevent libtraceevent: [ on ] # HAVE_LIBTRACEEVENT $ perf check feature bpf bpf: [ on ] # HAVE_LIBBPF_SUPPORT $ perf check -q feature bpf && echo "BPF support is present" BPF support is present $ perf check -q feature Bogus && echo "Bogus support is present" $ Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240904061836.55873-3-adityag@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-04libsubcmd: Don't free the usage stringAditya Gupta7-5/+20
Currently, commands which depend on 'parse_options_subcommand()' don't show the usage string, and instead show '(null)' $ ./perf sched Usage: (null) -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -i, --input <file> input file name -v, --verbose be more verbose (show symbol address, etc) 'parse_options_subcommand()' is generally expected to initialise the usage string, with information in the passed 'subcommands[]' array This behaviour was changed in: 230a7a71f92212e7 ("libsubcmd: Fix parse-options memory leak") Where the generated usage string is deallocated, and usage[0] string is reassigned as NULL. As discussed in [1], free the allocated usage string in the main function itself, and don't reset usage string to NULL in parse_options_subcommand With this change, the behaviour is restored. $ ./perf sched Usage: perf sched [<options>] {record|latency|map|replay|script|timehist} -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -i, --input <file> input file name -v, --verbose be more verbose (show symbol address, etc) [1]: https://lore.kernel.org/linux-perf-users/htq5vhx6piet4nuq2mmhk7fs2bhfykv52dbppwxmo3s7du2odf@styd27tioc6e/ Fixes: 230a7a71f92212e7 ("libsubcmd: Fix parse-options memory leak") Suggested-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Aditya Gupta <adityag@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240904061836.55873-2-adityag@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-04perf parse-events: Vary default_breakpoint_len on i386 and arm64Ian Rogers1-0/+16
On arm64 the breakpoint length should be 4-bytes but 8-bytes is tolerated as perf passes that as sizeof(long). Just pass the correct value. On i386 the sizeof(long) check in the kernel needs to match the kernel's long size. Check using an environment (uname checks) whether 4 or 8 bytes needs to be passed. Cache the value in a static. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/r/20240904050606.752788-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-04perf parse-events: Add default_breakpoint_len helperIan Rogers6-5/+17
The default breakpoint length is "sizeof(long)" however this is incorrect on platforms like Aarch64 where sizeof(long) is 8 but the breakpoint length is 4. Add a helper function that can be used to determine the correct breakpoint length, in this change it just returns the existing default sizeof(long) value. Use the helper in the bp_account test so that, when modifying the event from a watchpoint to a breakpoint, the breakpoint length is appropriate for the architecture and not just sizeof(long). Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/r/20240904050606.752788-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-04net: mana: Fix error handling in mana_create_txq/rxq's NAPI cleanupSouradeep Chakrabarti2-9/+15
Currently napi_disable() gets called during rxq and txq cleanup, even before napi is enabled and hrtimer is initialized. It causes kernel panic. ? page_fault_oops+0x136/0x2b0 ? page_counter_cancel+0x2e/0x80 ? do_user_addr_fault+0x2f2/0x640 ? refill_obj_stock+0xc4/0x110 ? exc_page_fault+0x71/0x160 ? asm_exc_page_fault+0x27/0x30 ? __mmdrop+0x10/0x180 ? __mmdrop+0xec/0x180 ? hrtimer_active+0xd/0x50 hrtimer_try_to_cancel+0x2c/0xf0 hrtimer_cancel+0x15/0x30 napi_disable+0x65/0x90 mana_destroy_rxq+0x4c/0x2f0 mana_create_rxq.isra.0+0x56c/0x6d0 ? mana_uncfg_vport+0x50/0x50 mana_alloc_queues+0x21b/0x320 ? skb_dequeue+0x5f/0x80 Cc: stable@vger.kernel.org Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ") Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-04bcachefs: BCH_SB_MEMBER_INVALIDKent Overstreet3-2/+9
Create a sentinal value for "invalid device". This is needed for removing devices that have stripes on them (force removing, without evacuating); we need a sentinal value for the stripe pointers to the device being removed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-04MAINTAINERS: update Andreas Hindborg's email addressAndreas Hindborg2-2/+3
Move away from corporate infrastructure for upstream work. Also update mailmap. Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org> Link: https://lore.kernel.org/r/20240903200956.68231-1-a.hindborg@kernel.org [ Reworded title slightly. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-09-03perf parse-events: Pass cpu_list as a perf_cpu_map in __add_event()Ian Rogers1-5/+6
Previously the cpu_list is a string and typically no cpu_list is passed to __add_event(). Wanting to make events have their cpus distinct from the PMU means that in more occassions we want to pass a cpu_list. If we're reading this from sysfs it is easier to read a perf_cpu_map than allocate and pass around strings that will later be parsed. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Gautham Shenoy <gautham.shenoy@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20240718003025.1486232-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03perf pmu: Merge boolean sysfs event option parsingIan Rogers1-24/+23
Merge perf_pmu__parse_per_pkg() and perf_pmu__parse_snapshot() that do the same parsing except for the file suffix used. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Gautham Shenoy <gautham.shenoy@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/r/20240718003025.1486232-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03Merge tag 'fuse-fixes-6.11-rc7' of ↵Linus Torvalds5-9/+26
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: - Fix EIO if splice and page stealing are enabled on the fuse device - Disable problematic combination of passthrough and writeback-cache - Other bug fixes found by code review * tag 'fuse-fixes-6.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: disable the combination of passthrough and writeback cache fuse: update stats for pages in dropped aux writeback list fuse: clear PG_uptodate when using a stolen page fuse: fix memory leak in fuse_create_open fuse: check aborted connection before adding requests to pending list for resending fuse: use unsigned type for getxattr/listxattr size truncation
2024-09-03bpf, net: Fix a potential race in do_sock_getsockopt()Tze-nan Wu2-11/+2
There's a potential race when `cgroup_bpf_enabled(CGROUP_GETSOCKOPT)` is false during the execution of `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN`, but becomes true when `BPF_CGROUP_RUN_PROG_GETSOCKOPT` is called. This inconsistency can lead to `BPF_CGROUP_RUN_PROG_GETSOCKOPT` receiving an "-EFAULT" from `__cgroup_bpf_run_filter_getsockopt(max_optlen=0)`. Scenario shown as below: `process A` `process B` ----------- ------------ BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN enable CGROUP_GETSOCKOPT BPF_CGROUP_RUN_PROG_GETSOCKOPT (-EFAULT) To resolve this, remove the `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN` macro and directly uses `copy_from_sockptr` to ensure that `max_optlen` is always set before `BPF_CGROUP_RUN_PROG_GETSOCKOPT` is invoked. Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks") Co-developed-by: Yanghui Li <yanghui.li@mediatek.com> Signed-off-by: Yanghui Li <yanghui.li@mediatek.com> Co-developed-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com> Signed-off-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com> Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Acked-by: Alexei Starovoitov <ast@kernel.org> Link: https://patch.msgid.link/20240830082518.23243-1-Tze-nan.Wu@mediatek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-03net: dqs: Do not use extern for unused dql_groupBreno Leitao1-1/+1
When CONFIG_DQL is not enabled, dql_group should be treated as a dead declaration. However, its current extern declaration assumes the linker will ignore it, which is generally true across most compiler and architecture combinations. But in certain cases, the linker still attempts to resolve the extern struct, even when the associated code is dead, resulting in a linking error. For instance the following error in loongarch64: >> loongarch64-linux-ld: net-sysfs.c:(.text+0x589c): undefined reference to `dql_group' Modify the declaration of the dead object to be an empty declaration instead of an extern. This change will prevent the linker from attempting to resolve an undefined reference. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202409012047.eCaOdfQJ-lkp@intel.com/ Fixes: 74293ea1c4db ("net: sysfs: Do not create sysfs for non BQL device") Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> # build-tested Link: https://patch.msgid.link/20240902101734.3260455-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-03perf sched timehist: Add --prio optionYang Jihong2-1/+79
The --prio option is used to only show events for the given task priority(ies). The default is to show events for all priority tasks, which is consistent with the previous behavior. Testcase: # perf sched record nice -n 9 perf bench sched messaging -l 10000 # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 3.435 [sec] [ perf record: Woken up 270 times to write data ] [ perf record: Captured and wrote 618.688 MB perf.data (5729036 samples) ] # perf sched timehist -h Usage: perf sched timehist [<options>] -C, --cpu <cpu> list of cpus to profile -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -g, --call-graph Display call chains if present (default on) -I, --idle-hist Show idle events only -i, --input <file> input file name -k, --vmlinux <file> vmlinux pathname -M, --migrations Show migration events -n, --next Show next task -p, --pid <pid[,pid...]> analyze events only for given process id(s) -s, --summary Show only syscall summary with statistics -S, --with-summary Show all syscalls and summary with statistics -t, --tid <tid[,tid...]> analyze events only for given thread id(s) -V, --cpu-visual Add CPU visual -v, --verbose be more verbose (show symbol address, etc) -w, --wakeups Show wakeup events --kallsyms <file> kallsyms pathname --max-stack <n> Maximum number of functions to display backtrace. --prio <prio> analyze events only for given task priority(ies) --show-prio Show task priority --state Show task state when sched-out --symfs <directory> Look for files with symbols relative to this directory --time <str> Time span for analysis (start,stop) # perf sched timehist --prio 140 Samples of sched_switch event do not have callchains. Invalid prio string # perf sched timehist --show-prio --prio 129 Samples of sched_switch event do not have callchains. time cpu task name prio wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ -------- --------- --------- --------- 2090450.765421 [0002] sched-messaging[1229618] 129 0.000 0.000 0.029 2090450.765445 [0007] sched-messaging[1229616] 129 0.000 0.062 0.043 2090450.765448 [0014] sched-messaging[1229619] 129 0.000 0.000 0.032 2090450.765478 [0013] sched-messaging[1229617] 129 0.000 0.065 0.048 2090450.765503 [0014] sched-messaging[1229622] 129 0.000 0.000 0.017 2090450.765550 [0002] sched-messaging[1229624] 129 0.000 0.000 0.021 2090450.765562 [0007] sched-messaging[1229621] 129 0.000 0.071 0.028 2090450.765570 [0005] sched-messaging[1229620] 129 0.000 0.064 0.066 2090450.765583 [0001] sched-messaging[1229625] 129 0.000 0.001 0.031 2090450.765595 [0013] sched-messaging[1229623] 129 0.000 0.060 0.028 2090450.765637 [0014] sched-messaging[1229628] 129 0.000 0.000 0.019 2090450.765665 [0007] sched-messaging[1229627] 129 0.000 0.038 0.030 <SNIP> # perf sched timehist --show-prio --prio 0,120-129 Samples of sched_switch event do not have callchains. time cpu task name prio wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ -------- --------- --------- --------- 2090450.763231 [0000] perf[1229608] 120 0.000 0.000 0.000 2090450.763235 [0000] migration/0[15] 0 0.000 0.001 0.003 2090450.763263 [0001] perf[1229608] 120 0.000 0.000 0.000 2090450.763268 [0001] migration/1[21] 0 0.000 0.001 0.004 2090450.763302 [0002] perf[1229608] 120 0.000 0.000 0.000 2090450.763309 [0002] migration/2[27] 0 0.000 0.001 0.007 2090450.763338 [0003] perf[1229608] 120 0.000 0.000 0.000 2090450.763343 [0003] migration/3[33] 0 0.000 0.001 0.004 2090450.763459 [0004] perf[1229608] 120 0.000 0.000 0.000 2090450.763469 [0004] migration/4[39] 0 0.000 0.002 0.010 2090450.763496 [0005] perf[1229608] 120 0.000 0.000 0.000 2090450.763501 [0005] migration/5[45] 0 0.000 0.001 0.004 2090450.763613 [0006] perf[1229608] 120 0.000 0.000 0.000 2090450.763622 [0006] migration/6[51] 0 0.000 0.001 0.008 2090450.763652 [0007] perf[1229608] 120 0.000 0.000 0.000 2090450.763660 [0007] migration/7[57] 0 0.000 0.001 0.008 <SNIP> 2090450.765665 [0001] <idle> 120 0.031 0.031 0.081 2090450.765665 [0007] sched-messaging[1229627] 129 0.000 0.038 0.030 2090450.765667 [0000] s1-perf[8235/7168] 120 0.008 0.000 0.004 2090450.765684 [0013] <idle> 120 0.028 0.028 0.088 2090450.765685 [0001] sched-messaging[1229630] 129 0.000 0.001 0.020 2090450.765688 [0000] <idle> 120 0.004 0.004 0.020 2090450.765689 [0002] <idle> 120 0.021 0.021 0.138 2090450.765691 [0005] sched-messaging[1229626] 129 0.000 0.085 0.029 Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240819033016.2427235-3-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03perf sched timehist: Add --show-prio optionYang Jihong2-7/+87
The --show-prio option is used to display the priority of task. It is disabled by default, which is consistent with original behavior. The display format is xxx (priority does not change during task running) or xxx->yyy (priority changes during task running) Testcase: # perf sched record nice -n 9 true [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 0.497 MB perf.data ] # perf sched timehist -h Usage: perf sched timehist [<options>] -C, --cpu <cpu> list of cpus to profile -D, --dump-raw-trace dump raw trace in ASCII -f, --force don't complain, do it -g, --call-graph Display call chains if present (default on) -I, --idle-hist Show idle events only -i, --input <file> input file name -k, --vmlinux <file> vmlinux pathname -M, --migrations Show migration events -n, --next Show next task -p, --pid <pid[,pid...]> analyze events only for given process id(s) -s, --summary Show only syscall summary with statistics -S, --with-summary Show all syscalls and summary with statistics -t, --tid <tid[,tid...]> analyze events only for given thread id(s) -V, --cpu-visual Add CPU visual -v, --verbose be more verbose (show symbol address, etc) -w, --wakeups Show wakeup events --kallsyms <file> kallsyms pathname --max-stack <n> Maximum number of functions to display backtrace. --show-prio Show task priority --state Show task state when sched-out --symfs <directory> Look for files with symbols relative to this directory --time <str> Time span for analysis (start,stop) # perf sched timehist Samples of sched_switch event do not have callchains. time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- 23952.006537 [0000] perf[534] 0.000 0.000 0.000 23952.006593 [0000] migration/0[19] 0.000 0.014 0.056 23952.006899 [0001] perf[534] 0.000 0.000 0.000 23952.006947 [0001] migration/1[22] 0.000 0.015 0.047 23952.007138 [0002] perf[534] 0.000 0.000 0.000 <SNIP> # perf sched timehist --show-prio Samples of sched_switch event do not have callchains. time cpu task name prio wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ -------- --------- --------- --------- 23952.006537 [0000] perf[534] 120 0.000 0.000 0.000 23952.006593 [0000] migration/0[19] 0 0.000 0.014 0.056 23952.006899 [0001] perf[534] 120 0.000 0.000 0.000 <SNIP> 23952.034843 [0003] nice[535] 120->129 0.189 0.024 23.314 <SNIP> 23952.053838 [0005] rcu_preempt[16] 120 3.993 0.000 0.023 23952.053990 [0005] <idle> 120 0.023 0.023 0.152 23952.054137 [0006] <idle> 120 1.427 1.427 17.855 23952.054278 [0007] <idle> 120 0.506 0.506 1.650 Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240819033016.2427235-2-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03sch/netem: fix use after free in netem_dequeueStephen Hemminger1-5/+4
If netem_dequeue() enqueues packet to inner qdisc and that qdisc returns __NET_XMIT_STOLEN. The packet is dropped but qdisc_tree_reduce_backlog() is not called to update the parent's q.qlen, leading to the similar use-after-free as Commit e04991a48dbaf382 ("netem: fix return value if duplicate enqueue fails") Commands to trigger KASAN UaF: ip link add type dummy ip link set lo up ip link set dummy0 up tc qdisc add dev lo parent root handle 1: drr tc filter add dev lo parent 1: basic classid 1:1 tc class add dev lo classid 1:1 drr tc qdisc add dev lo parent 1:1 handle 2: netem tc qdisc add dev lo parent 2: handle 3: drr tc filter add dev lo parent 3: basic classid 3:1 action mirred egress redirect dev dummy0 tc class add dev lo classid 3:1 drr ping -c1 -W0.01 localhost # Trigger bug tc class del dev lo classid 1:1 tc class add dev lo classid 1:1 drr ping -c1 -W0.01 localhost # UaF Fixes: 50612537e9ab ("netem: fix classful handling") Reported-by: Budimir Markovic <markovicbudimir@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Link: https://patch.msgid.link/20240901182438.4992-1-stephen@networkplumber.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-03perf sched timehist: Remove redundant BUG_ON in timehist_sched_change_event()Yang Jihong1-2/+0
The BUG_ON(thread__tid(thread) != 0) in timehist_sched_change_event() is redundant, remove it. No functional change. Fixes: 07235f84ece6b66f ("perf sched timehist: Add -I/--idle-hist option") Reviewed-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240812132606.3126490-2-yangjihong@bytedance.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03perf sched timehist: Skip print non-idle task samples when only show idle eventsYang Jihong1-3/+3
when only show idle events, runtime stats of non-idle tasks is not updated, and the value is 0, there is no need to print non-idle samples. Before: # perf sched timehist -I Samples of sched_switch event do not have callchains. time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- 2090450.763235 [0000] migration/0[15] 0.000 0.000 0.000 2090450.763268 [0001] migration/1[21] 0.000 0.000 0.000 2090450.763309 [0002] migration/2[27] 0.000 0.000 0.000 2090450.763343 [0003] migration/3[33] 0.000 0.000 0.000 2090450.763469 [0004] migration/4[39] 0.000 0.000 0.000 2090450.763501 [0005] migration/5[45] 0.000 0.000 0.000 2090450.763622 [0006] migration/6[51] 0.000 0.000 0.000 2090450.763660 [0007] migration/7[57] 0.000 0.000 0.000 2090450.763741 [0009] migration/9[69] 0.000 0.000 0.000 2090450.763862 [0010] migration/10[75] 0.000 0.000 0.000 2090450.763894 [0011] migration/11[81] 0.000 0.000 0.000 2090450.764021 [0012] migration/12[87] 0.000 0.000 0.000 2090450.764056 [0013] migration/13[93] 0.000 0.000 0.000 2090450.764135 [0014] migration/14[99] 0.000 0.000 0.000 2090450.764163 [0015] migration/15[105] 0.000 0.000 0.000 2090450.764292 [0016] migration/16[111] 0.000 0.000 0.000 2090450.764371 [0017] migration/17[117] 0.000 0.000 0.000 2090450.764422 [0018] migration/18[123] 0.000 0.000 0.000 2090450.764490 [0000] <idle> 0.000 0.000 1.255 2090450.764505 [0000] s1-perf[8235/7168] 0.000 0.000 0.000 2090450.764571 [0016] <idle> 0.000 0.000 0.278 2090450.764588 [0010] <idle> 0.000 0.000 0.725 2090450.764590 [0016] s1-agent[7179/7162] 0.000 0.000 0.000 2090450.764635 [0000] <idle> 0.015 0.015 0.129 2090450.764637 [0017] <idle> 0.000 0.000 0.266 2090450.764639 [0000] s1-perf[8235/7168] 0.000 0.000 0.000 2090450.764668 [0017] s1-agent[7180/7162] 0.000 0.000 0.000 2090450.764669 [0000] <idle> 0.003 0.003 0.029 2090450.764672 [0000] s1-perf[8235/7168] 0.000 0.000 0.000 2090450.764683 [0000] <idle> 0.003 0.003 0.010 After: # perf sched timehist -I Samples of sched_switch event do not have callchains. time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) --------------- ------ ------------------------------ --------- --------- --------- 2090450.764490 [0000] <idle> 0.000 0.000 1.255 2090450.764571 [0016] <idle> 0.000 0.000 0.278 2090450.764588 [0010] <idle> 0.000 0.000 0.725 2090450.764635 [0000] <idle> 0.015 0.015 0.129 2090450.764637 [0017] <idle> 0.000 0.000 0.266 2090450.764669 [0000] <idle> 0.003 0.003 0.029 2090450.764683 [0000] <idle> 0.003 0.003 0.010 2090450.764688 [0016] <idle> 0.019 0.019 0.097 2090450.764694 [0000] <idle> 0.001 0.001 0.009 2090450.764706 [0000] <idle> 0.001 0.001 0.010 2090450.764725 [0002] <idle> 0.000 0.000 1.415 2090450.764728 [0000] <idle> 0.002 0.002 0.019 2090450.764823 [0000] <idle> 0.003 0.003 0.091 2090450.764838 [0019] <idle> 0.000 0.000 0.154 2090450.764865 [0002] <idle> 0.109 0.109 0.029 2090450.764866 [0000] <idle> 0.012 0.012 0.030 2090450.764880 [0002] <idle> 0.013 0.013 0.001 2090450.764880 [0000] <idle> 0.002 0.002 0.011 2090450.764896 [0000] <idle> 0.001 0.001 0.013 2090450.764903 [0019] <idle> 0.063 0.063 0.002 2090450.764908 [0019] <idle> 0.003 0.003 0.001 Fixes: 07235f84ece6b66f ("perf sched timehist: Add -I/--idle-hist option") Signed-off-by: Yang Jihong <yangjihong@bytedance.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240812132606.3126490-1-yangjihong@bytedance.com Reviewed-and-tested-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03usbnet: modern method to get random MACOliver Neukum1-8/+3
The driver generates a random MAC once on load and uses it over and over, including on two devices needing a random MAC at the same time. Jakub suggested revamping the driver to the modern API for setting a random MAC rather than fixing the old stuff. The bug is as old as the driver. Signed-off-by: Oliver Neukum <oneukum@suse.com> Reviewed-by: Simon Horman <horms@kernel.org> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Link: https://patch.msgid.link/20240829175201.670718-1-oneukum@suse.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-03MAINTAINERS: wifi: cw1200: add net-cw1200.hSimon Horman1-0/+1
This is part of an effort [1] to assign a section in MAINTAINERS to header files that relate to Networking. In this case the files with "net" in their name. [1] https://lore.kernel.org/netdev/20240821-net-mnt-v2-0-59a5af38e69d@kernel.org/ It seems that net-cw1200.h is part of the CW1200 WLAN driver and this it is appropriate to add it to the section for that driver. Signed-off-by: Simon Horman <horms@kernel.org> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://patch.msgid.link/20240902-wifi-mnt-v2-1-f5ad1f36e993@kernel.org
2024-09-03btrfs: fix race between direct IO write and fsync when using same fdFilipe Manana4-16/+16
If we have 2 threads that are using the same file descriptor and one of them is doing direct IO writes while the other is doing fsync, we have a race where we can end up either: 1) Attempt a fsync without holding the inode's lock, triggering an assertion failures when assertions are enabled; 2) Do an invalid memory access from the fsync task because the file private points to memory allocated on stack by the direct IO task and it may be used by the fsync task after the stack was destroyed. The race happens like this: 1) A user space program opens a file descriptor with O_DIRECT; 2) The program spawns 2 threads using libpthread for example; 3) One of the threads uses the file descriptor to do direct IO writes, while the other calls fsync using the same file descriptor. 4) Call task A the thread doing direct IO writes and task B the thread doing fsyncs; 5) Task A does a direct IO write, and at btrfs_direct_write() sets the file's private to an on stack allocated private with the member 'fsync_skip_inode_lock' set to true; 6) Task B enters btrfs_sync_file() and sees that there's a private structure associated to the file which has 'fsync_skip_inode_lock' set to true, so it skips locking the inode's VFS lock; 7) Task A completes the direct IO write, and resets the file's private to NULL since it had no prior private and our private was stack allocated. Then it unlocks the inode's VFS lock; 8) Task B enters btrfs_get_ordered_extents_for_logging(), then the assertion that checks the inode's VFS lock is held fails, since task B never locked it and task A has already unlocked it. The stack trace produced is the following: assertion failed: inode_is_locked(&inode->vfs_inode), in fs/btrfs/ordered-data.c:983 ------------[ cut here ]------------ kernel BUG at fs/btrfs/ordered-data.c:983! Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 9 PID: 5072 Comm: worker Tainted: G U OE 6.10.5-1-default #1 openSUSE Tumbleweed 69f48d427608e1c09e60ea24c6c55e2ca1b049e8 Hardware name: Acer Predator PH315-52/Covini_CFS, BIOS V1.12 07/28/2020 RIP: 0010:btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs] Code: 50 d6 86 c0 e8 (...) RSP: 0018:ffff9e4a03dcfc78 EFLAGS: 00010246 RAX: 0000000000000054 RBX: ffff9078a9868e98 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff907dce4a7800 RDI: ffff907dce4a7800 RBP: ffff907805518800 R08: 0000000000000000 R09: ffff9e4a03dcfb38 R10: ffff9e4a03dcfb30 R11: 0000000000000003 R12: ffff907684ae7800 R13: 0000000000000001 R14: ffff90774646b600 R15: 0000000000000000 FS: 00007f04b96006c0(0000) GS:ffff907dce480000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f32acbfc000 CR3: 00000001fd4fa005 CR4: 00000000003726f0 Call Trace: <TASK> ? __die_body.cold+0x14/0x24 ? die+0x2e/0x50 ? do_trap+0xca/0x110 ? do_error_trap+0x6a/0x90 ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a] ? exc_invalid_op+0x50/0x70 ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a] ? asm_exc_invalid_op+0x1a/0x20 ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a] ? btrfs_get_ordered_extents_for_logging.cold+0x1f/0x42 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a] btrfs_sync_file+0x21a/0x4d0 [btrfs bb26272d49b4cdc847cf3f7faadd459b62caee9a] ? __seccomp_filter+0x31d/0x4f0 __x64_sys_fdatasync+0x4f/0x90 do_syscall_64+0x82/0x160 ? do_futex+0xcb/0x190 ? __x64_sys_futex+0x10e/0x1d0 ? switch_fpu_return+0x4f/0xd0 ? syscall_exit_to_user_mode+0x72/0x220 ? do_syscall_64+0x8e/0x160 ? syscall_exit_to_user_mode+0x72/0x220 ? do_syscall_64+0x8e/0x160 ? syscall_exit_to_user_mode+0x72/0x220 ? do_syscall_64+0x8e/0x160 ? syscall_exit_to_user_mode+0x72/0x220 ? do_syscall_64+0x8e/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e Another problem here is if task B grabs the private pointer and then uses it after task A has finished, since the private was allocated in the stack of task A, it results in some invalid memory access with a hard to predict result. This issue, triggering the assertion, was observed with QEMU workloads by two users in the Link tags below. Fix this by not relying on a file's private to pass information to fsync that it should skip locking the inode and instead pass this information through a special value stored in current->journal_info. This is safe because in the relevant section of the direct IO write path we are not holding a transaction handle, so current->journal_info is NULL. The following C program triggers the issue: $ cat repro.c /* Get the O_DIRECT definition. */ #ifndef _GNU_SOURCE #define _GNU_SOURCE #endif #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <stdint.h> #include <fcntl.h> #include <errno.h> #include <string.h> #include <pthread.h> static int fd; static ssize_t do_write(int fd, const void *buf, size_t count, off_t offset) { while (count > 0) { ssize_t ret; ret = pwrite(fd, buf, count, offset); if (ret < 0) { if (errno == EINTR) continue; return ret; } count -= ret; buf += ret; } return 0; } static void *fsync_loop(void *arg) { while (1) { int ret; ret = fsync(fd); if (ret != 0) { perror("Fsync failed"); exit(6); } } } int main(int argc, char *argv[]) { long pagesize; void *write_buf; pthread_t fsyncer; int ret; if (argc != 2) { fprintf(stderr, "Use: %s <file path>\n", argv[0]); return 1; } fd = open(argv[1], O_WRONLY | O_CREAT | O_TRUNC | O_DIRECT, 0666); if (fd == -1) { perror("Failed to open/create file"); return 1; } pagesize = sysconf(_SC_PAGE_SIZE); if (pagesize == -1) { perror("Failed to get page size"); return 2; } ret = posix_memalign(&write_buf, pagesize, pagesize); if (ret) { perror("Failed to allocate buffer"); return 3; } ret = pthread_create(&fsyncer, NULL, fsync_loop, NULL); if (ret != 0) { fprintf(stderr, "Failed to create writer thread: %d\n", ret); return 4; } while (1) { ret = do_write(fd, write_buf, pagesize, 0); if (ret != 0) { perror("Write failed"); exit(5); } } return 0; } $ mkfs.btrfs -f /dev/sdi $ mount /dev/sdi /mnt/sdi $ timeout 10 ./repro /mnt/sdi/foo Usually the race is triggered within less than 1 second. A test case for fstests will follow soon. Reported-by: Paulo Dias <paulo.miguel.dias@gmail.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219187 Reported-by: Andreas Jahn <jahn-andi@web.de> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219199 Reported-by: syzbot+4704b3cc972bd76024f1@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/00000000000044ff540620d7dee2@google.com/ Fixes: 939b656bc8ab ("btrfs: fix corruption after buffer fault in during direct IO append write") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-09-03Merge tag 'ath-current-20240903' of ↵Kalle Valo9-148/+49
git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath ath.git patches for v6.11-rc7 We have three patch which address two issues in the ath11k driver which should be addressed for 6.11-rc7: One patch fixes a NULL pointer dereference while parsing transmit power envelope (TPE) information, and the other two patches revert the hibernation support since it is interfering with suspend on some platforms. Note the cause of the suspend wakeups is still being investigated, and it is hoped this can be addressed and hibernation support can be restored in the near future.
2024-09-03ice: do not bring the VSI up, if it was down before the XDP setupLarysa Zaremba1-2/+5
After XDP configuration is completed, we bring the interface up unconditionally, regardless of its state before the call to .ndo_bpf(). Preserve the information whether the interface had to be brought down and later bring it up only in such case. Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-03ice: remove ICE_CFG_BUSY locking from AF_XDP codeLarysa Zaremba1-9/+0
Locking used in ice_qp_ena() and ice_qp_dis() does pretty much nothing, because ICE_CFG_BUSY is a state flag that is supposed to be set in a PF state, not VSI one. Therefore it does not protect the queue pair from e.g. reset. Remove ICE_CFG_BUSY locking from ice_qp_dis() and ice_qp_ena(). Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-03ice: check ICE_VSI_DOWN under rtnl_lock when preparing for resetLarysa Zaremba1-6/+6
Consider the following scenario: .ndo_bpf() | ice_prepare_for_reset() | ________________________|_______________________________________| rtnl_lock() | | ice_down() | | | test_bit(ICE_VSI_DOWN) - true | | ice_dis_vsi() returns | ice_up() | | | proceeds to rebuild a running VSI | .ndo_bpf() is not the only rtnl-locked callback that toggles the interface to apply new configuration. Another example is .set_channels(). To avoid the race condition above, act only after reading ICE_VSI_DOWN under rtnl_lock. Fixes: 0f9d5027a749 ("ice: Refactor VSI allocation, deletion and rebuild flow") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-03ice: check for XDP rings instead of bpf program when unconfiguringLarysa Zaremba3-7/+7
If VSI rebuild is pending, .ndo_bpf() can attach/detach the XDP program on VSI without applying new ring configuration. When unconfiguring the VSI, we can encounter the state in which there is an XDP program but no XDP rings to destroy or there will be XDP rings that need to be destroyed, but no XDP program to indicate their presence. When unconfiguring, rely on the presence of XDP rings rather then XDP program, as they better represent the current state that has to be destroyed. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-03ice: protect XDP configuration with a mutexLarysa Zaremba4-19/+39
The main threat to data consistency in ice_xdp() is a possible asynchronous PF reset. It can be triggered by a user or by TX timeout handler. XDP setup and PF reset code access the same resources in the following sections: * ice_vsi_close() in ice_prepare_for_reset() - already rtnl-locked * ice_vsi_rebuild() for the PF VSI - not protected * ice_vsi_open() - already rtnl-locked With an unfortunate timing, such accesses can result in a crash such as the one below: [ +1.999878] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 14 [ +2.002992] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 18 [Mar15 18:17] ice 0000:b1:00.0 ens801f0np0: NETDEV WATCHDOG: CPU: 38: transmit queue 14 timed out 80692736 ms [ +0.000093] ice 0000:b1:00.0 ens801f0np0: tx_timeout: VSI_num: 6, Q 14, NTC: 0x0, HW_HEAD: 0x0, NTU: 0x0, INT: 0x4000001 [ +0.000012] ice 0000:b1:00.0 ens801f0np0: tx_timeout recovery level 1, txqueue 14 [ +0.394718] ice 0000:b1:00.0: PTP reset successful [ +0.006184] BUG: kernel NULL pointer dereference, address: 0000000000000098 [ +0.000045] #PF: supervisor read access in kernel mode [ +0.000023] #PF: error_code(0x0000) - not-present page [ +0.000023] PGD 0 P4D 0 [ +0.000018] Oops: 0000 [#1] PREEMPT SMP NOPTI [ +0.000023] CPU: 38 PID: 7540 Comm: kworker/38:1 Not tainted 6.8.0-rc7 #1 [ +0.000031] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021 [ +0.000036] Workqueue: ice ice_service_task [ice] [ +0.000183] RIP: 0010:ice_clean_tx_ring+0xa/0xd0 [ice] [...] [ +0.000013] Call Trace: [ +0.000016] <TASK> [ +0.000014] ? __die+0x1f/0x70 [ +0.000029] ? page_fault_oops+0x171/0x4f0 [ +0.000029] ? schedule+0x3b/0xd0 [ +0.000027] ? exc_page_fault+0x7b/0x180 [ +0.000022] ? asm_exc_page_fault+0x22/0x30 [ +0.000031] ? ice_clean_tx_ring+0xa/0xd0 [ice] [ +0.000194] ice_free_tx_ring+0xe/0x60 [ice] [ +0.000186] ice_destroy_xdp_rings+0x157/0x310 [ice] [ +0.000151] ice_vsi_decfg+0x53/0xe0 [ice] [ +0.000180] ice_vsi_rebuild+0x239/0x540 [ice] [ +0.000186] ice_vsi_rebuild_by_type+0x76/0x180 [ice] [ +0.000145] ice_rebuild+0x18c/0x840 [ice] [ +0.000145] ? delay_tsc+0x4a/0xc0 [ +0.000022] ? delay_tsc+0x92/0xc0 [ +0.000020] ice_do_reset+0x140/0x180 [ice] [ +0.000886] ice_service_task+0x404/0x1030 [ice] [ +0.000824] process_one_work+0x171/0x340 [ +0.000685] worker_thread+0x277/0x3a0 [ +0.000675] ? preempt_count_add+0x6a/0xa0 [ +0.000677] ? _raw_spin_lock_irqsave+0x23/0x50 [ +0.000679] ? __pfx_worker_thread+0x10/0x10 [ +0.000653] kthread+0xf0/0x120 [ +0.000635] ? __pfx_kthread+0x10/0x10 [ +0.000616] ret_from_fork+0x2d/0x50 [ +0.000612] ? __pfx_kthread+0x10/0x10 [ +0.000604] ret_from_fork_asm+0x1b/0x30 [ +0.000604] </TASK> The previous way of handling this through returning -EBUSY is not viable, particularly when destroying AF_XDP socket, because the kernel proceeds with removal anyway. There is plenty of code between those calls and there is no need to create a large critical section that covers all of them, same as there is no need to protect ice_vsi_rebuild() with rtnl_lock(). Add xdp_state_lock mutex to protect ice_vsi_rebuild() and ice_xdp(). Leaving unprotected sections in between would result in two states that have to be considered: 1. when the VSI is closed, but not yet rebuild 2. when VSI is already rebuild, but not yet open The latter case is actually already handled through !netif_running() case, we just need to adjust flag checking a little. The former one is not as trivial, because between ice_vsi_close() and ice_vsi_rebuild(), a lot of hardware interaction happens, this can make adding/deleting rings exit with an error. Luckily, VSI rebuild is pending and can apply new configuration for us in a managed fashion. Therefore, add an additional VSI state flag ICE_VSI_REBUILD_PENDING to indicate that ice_xdp() can just hot-swap the program. Also, as ice_vsi_rebuild() flow is touched in this patch, make it more consistent by deconfiguring VSI when coalesce allocation fails. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-03ice: move netif_queue_set_napi to rtnl-protected sectionsLarysa Zaremba4-118/+49
Currently, netif_queue_set_napi() is called from ice_vsi_rebuild() that is not rtnl-locked when called from the reset. This creates the need to take the rtnl_lock just for a single function and complicates the synchronization with .ndo_bpf. At the same time, there no actual need to fill napi-to-queue information at this exact point. Fill napi-to-queue information when opening the VSI and clear it when the VSI is being closed. Those routines are already rtnl-locked. Also, rewrite napi-to-queue assignment in a way that prevents inclusion of XDP queues, as this leads to out-of-bounds writes, such as one below. [ +0.000004] BUG: KASAN: slab-out-of-bounds in netif_queue_set_napi+0x1c2/0x1e0 [ +0.000012] Write of size 8 at addr ffff889881727c80 by task bash/7047 [ +0.000006] CPU: 24 PID: 7047 Comm: bash Not tainted 6.10.0-rc2+ #2 [ +0.000004] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021 [ +0.000003] Call Trace: [ +0.000003] <TASK> [ +0.000002] dump_stack_lvl+0x60/0x80 [ +0.000007] print_report+0xce/0x630 [ +0.000007] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ +0.000007] ? __virt_addr_valid+0x1c9/0x2c0 [ +0.000005] ? netif_queue_set_napi+0x1c2/0x1e0 [ +0.000003] kasan_report+0xe9/0x120 [ +0.000004] ? netif_queue_set_napi+0x1c2/0x1e0 [ +0.000004] netif_queue_set_napi+0x1c2/0x1e0 [ +0.000005] ice_vsi_close+0x161/0x670 [ice] [ +0.000114] ice_dis_vsi+0x22f/0x270 [ice] [ +0.000095] ice_pf_dis_all_vsi.constprop.0+0xae/0x1c0 [ice] [ +0.000086] ice_prepare_for_reset+0x299/0x750 [ice] [ +0.000087] pci_dev_save_and_disable+0x82/0xd0 [ +0.000006] pci_reset_function+0x12d/0x230 [ +0.000004] reset_store+0xa0/0x100 [ +0.000006] ? __pfx_reset_store+0x10/0x10 [ +0.000002] ? __pfx_mutex_lock+0x10/0x10 [ +0.000004] ? __check_object_size+0x4c1/0x640 [ +0.000007] kernfs_fop_write_iter+0x30b/0x4a0 [ +0.000006] vfs_write+0x5d6/0xdf0 [ +0.000005] ? fd_install+0x180/0x350 [ +0.000005] ? __pfx_vfs_write+0x10/0xA10 [ +0.000004] ? do_fcntl+0x52c/0xcd0 [ +0.000004] ? kasan_save_track+0x13/0x60 [ +0.000003] ? kasan_save_free_info+0x37/0x60 [ +0.000006] ksys_write+0xfa/0x1d0 [ +0.000003] ? __pfx_ksys_write+0x10/0x10 [ +0.000002] ? __x64_sys_fcntl+0x121/0x180 [ +0.000004] ? _raw_spin_lock+0x87/0xe0 [ +0.000005] do_syscall_64+0x80/0x170 [ +0.000007] ? _raw_spin_lock+0x87/0xe0 [ +0.000004] ? __pfx__raw_spin_lock+0x10/0x10 [ +0.000003] ? file_close_fd_locked+0x167/0x230 [ +0.000005] ? syscall_exit_to_user_mode+0x7d/0x220 [ +0.000005] ? do_syscall_64+0x8c/0x170 [ +0.000004] ? do_syscall_64+0x8c/0x170 [ +0.000003] ? do_syscall_64+0x8c/0x170 [ +0.000003] ? fput+0x1a/0x2c0 [ +0.000004] ? filp_close+0x19/0x30 [ +0.000004] ? do_dup2+0x25a/0x4c0 [ +0.000004] ? __x64_sys_dup2+0x6e/0x2e0 [ +0.000002] ? syscall_exit_to_user_mode+0x7d/0x220 [ +0.000004] ? do_syscall_64+0x8c/0x170 [ +0.000003] ? __count_memcg_events+0x113/0x380 [ +0.000005] ? handle_mm_fault+0x136/0x820 [ +0.000005] ? do_user_addr_fault+0x444/0xa80 [ +0.000004] ? clear_bhb_loop+0x25/0x80 [ +0.000004] ? clear_bhb_loop+0x25/0x80 [ +0.000002] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000005] RIP: 0033:0x7f2033593154 Fixes: 080b0c8d6d26 ("ice: Fix ASSERT_RTNL() warning during certain scenarios") Fixes: 91fdbce7e8d6 ("ice: Add support in the driver for associating queue with napi") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-03perf script: Minimize "not reaching sample" for '-F +brstackinsn'Andi Kleen4-6/+9
In some situations 'perf script -F +brstackinsn' sees a lot of "not reaching sample" messages. This happens when the last LBR block before the sample contains a branch that is not in the LBR, and the instruction dumping stops. $ perf record -b emacs -Q --batch '()' [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.396 MB perf.data (443 samples) ] $ perf script -F +brstackinsn ... 00007f0ab2d171a4 insn: 41 0f 94 c0 00007f0ab2d171a8 insn: 83 fa 01 00007f0ab2d171ab insn: 74 d3 # PRED 6 cycles [313] 1.00 IPC 00007f0ab2d17180 insn: 45 84 c0 00007f0ab2d17183 insn: 74 28 ... not reaching sample ... $ perf script -F +brstackinsn | grep -c reach 136 $ This is a problem for further analysis that wants to see the full code upto the sample. There are two common cases where the message is bogus: - The LBR only logs taken branches, but the branch might be a conditional branch that is not taken (that is the most common case actually) - The LBR sampling uses a filter ignoring some branches, but the perf script check checks for all branches. This patch fixes these two conditions, by only checking for conditional branches, as well as checking the perf_event_attr's branch filter attributes. For the test case above it fixes all the messages: $ ./perf script -F +brstackinsn | grep -c reach 0 Note that there are still conditions when the message is hit -- sometimes there can be a unconditional branch that misses the LBR update before the sample -- but they are much more rare now. Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Link: https://lore.kernel.org/r/20240229161828.386397-1-ak@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03perf record offcpu: Constify control data for BPFNamhyung Kim2-12/+13
The control knobs set before loading BPF programs should be declared as 'const volatile' so that it can be optimized by the BPF core. Committer testing: root@x1:~# perf record --off-cpu ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.807 MB perf.data (5645 samples) ] root@x1:~# perf evlist cpu_atom/cycles/P cpu_core/cycles/P offcpu-time dummy:u root@x1:~# perf evlist -v cpu_atom/cycles/P: type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0xa00000000, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|IDENTIFIER, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1 cpu_core/cycles/P: type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x400000000, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|IDENTIFIER, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, precise_ip: 3, sample_id_all: 1 offcpu-time: type: 1 (software), size: 136, config: 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|IDENTIFIER, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, sample_id_all: 1 dummy:u: type: 1 (software), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|IDENTIFIER, read_format: ID|LOST, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 root@x1:~# perf trace -e bpf --max-events 5 perf record --off-cpu 0.000 ( 0.015 ms): :2949124/2949124 bpf(cmd: 36, uattr: 0x7ffefc6dbe30, size: 8) = -1 EOPNOTSUPP (Operation not supported) 0.031 ( 0.115 ms): :2949124/2949124 bpf(cmd: PROG_LOAD, uattr: 0x7ffefc6dbb60, size: 148) = 14 0.159 ( 0.037 ms): :2949124/2949124 bpf(cmd: PROG_LOAD, uattr: 0x7ffefc6dbc20, size: 148) = 14 23.868 ( 0.144 ms): perf/2949124 bpf(cmd: PROG_LOAD, uattr: 0x7ffefc6dbad0, size: 148) = 14 24.027 ( 0.014 ms): perf/2949124 bpf(uattr: 0x7ffefc6dbc80, size: 80) = 14 root@x1:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240902200515.2103769-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03perf lock contention: Constify control data for BPFNamhyung Kim2-34/+38
The control knobs set before loading BPF programs should be declared as 'const volatile' so that it can be optimized by the BPF core. Committer testing: root@x1:~# perf lock contention --use-bpf contended total wait max wait avg wait type caller 5 31.57 us 14.93 us 6.31 us mutex btrfs_delayed_update_inode+0x43 1 16.91 us 16.91 us 16.91 us rwsem:R btrfs_tree_read_lock_nested+0x1b 1 15.13 us 15.13 us 15.13 us spinlock btrfs_getattr+0xd1 1 6.65 us 6.65 us 6.65 us rwsem:R btrfs_tree_read_lock_nested+0x1b 1 4.34 us 4.34 us 4.34 us spinlock process_one_work+0x1a9 root@x1:~# root@x1:~# perf trace -e bpf --max-events 10 perf lock contention --use-bpf 0.000 ( 0.013 ms): :2948281/2948281 bpf(cmd: 36, uattr: 0x7ffd5f12d730, size: 8) = -1 EOPNOTSUPP (Operation not supported) 0.024 ( 0.120 ms): :2948281/2948281 bpf(cmd: PROG_LOAD, uattr: 0x7ffd5f12d460, size: 148) = 16 0.158 ( 0.034 ms): :2948281/2948281 bpf(cmd: PROG_LOAD, uattr: 0x7ffd5f12d520, size: 148) = 16 26.653 ( 0.154 ms): perf/2948281 bpf(cmd: PROG_LOAD, uattr: 0x7ffd5f12d3d0, size: 148) = 16 26.825 ( 0.014 ms): perf/2948281 bpf(uattr: 0x7ffd5f12d580, size: 80) = 16 87.924 ( 0.038 ms): perf/2948281 bpf(cmd: BTF_LOAD, uattr: 0x7ffd5f12d400, size: 40) = 16 87.988 ( 0.006 ms): perf/2948281 bpf(cmd: BTF_LOAD, uattr: 0x7ffd5f12d470, size: 40) = 16 88.019 ( 0.006 ms): perf/2948281 bpf(cmd: BTF_LOAD, uattr: 0x7ffd5f12d250, size: 40) = 16 88.029 ( 0.172 ms): perf/2948281 bpf(cmd: PROG_LOAD, uattr: 0x7ffd5f12d320, size: 148) = 17 88.217 ( 0.005 ms): perf/2948281 bpf(cmd: BTF_LOAD, uattr: 0x7ffd5f12d4d0, size: 40) = 16 root@x1:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240902200515.2103769-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-09-03perf kwork: Constify control data for BPFNamhyung Kim4-10/+13
The control knobs set before loading BPF programs should be declared as 'const volatile' so that it can be optimized by the BPF core. Committer testing: root@x1:~# perf kwork report --use-bpf Starting trace, Hit <Ctrl+C> to stop and report ^C Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end | -------------------------------------------------------------------------------------------------------------------------------- (w)intel_atomic_commit_work [ | 0009 | 18.680 ms | 2 | 18.553 ms | 362410.681580 s | 362410.700133 s | (w)pm_runtime_work | 0007 | 13.300 ms | 1 | 13.300 ms | 362410.254996 s | 362410.268295 s | (w)intel_atomic_commit_work [ | 0009 | 9.846 ms | 2 | 9.717 ms | 362410.172352 s | 362410.182069 s | (w)acpi_ec_event_processor | 0002 | 8.106 ms | 1 | 8.106 ms | 362410.463187 s | 362410.471293 s | (s)SCHED:7 | 0000 | 1.351 ms | 106 | 0.063 ms | 362410.658017 s | 362410.658080 s | i915:157 | 0008 | 0.994 ms | 13 | 0.361 ms | 362411.222125 s | 362411.222486 s | (s)SCHED:7 | 0001 | 0.703 ms | 98 | 0.047 ms | 362410.245004 s | 362410.245051 s | (s)SCHED:7 | 0005 | 0.674 ms | 42 | 0.074 ms | 362411.483039 s | 362411.483113 s | (s)NET_RX:3 | 0001 | 0.556 ms | 10 | 0.079 ms | 362411.066388 s | 362411.066467 s | <SNIP> root@x1:~# perf trace -e bpf --max-events 5 perf kwork report --use-bpf 0.000 ( 0.016 ms): perf/2948007 bpf(cmd: 36, uattr: 0x7ffededa6660, size: 8) = -1 EOPNOTSUPP (Operation not supported) 0.026 ( 0.106 ms): perf/2948007 bpf(cmd: PROG_LOAD, uattr: 0x7ffededa6390, size: 148) = 12 0.152 ( 0.032 ms): perf/2948007 bpf(cmd: PROG_LOAD, uattr: 0x7ffededa6450, size: 148) = 12 26.247 ( 0.138 ms): perf/2948007 bpf(cmd: PROG_LOAD, uattr: 0x7ffededa6300, size: 148) = 12 26.396 ( 0.012 ms): perf/2948007 bpf(uattr: 0x7ffededa64b0, size: 80) = 12 Starting trace, Hit <Ctrl+C> to stop and report root@x1:~# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/r/20240902200515.2103769-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>