summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-05-27mm: page-isolation: skip isolated pageblock in start_isolate_page_range()Zi Yan1-8/+18
start_isolate_page_range() first isolates the first and the last pageblocks in the range and ensure pages across range boundaries are split during isolation. But it missed the case when the range is <= a pageblock and the first and the last pageblocks are the same one, so the second isolate_single_pageblock() will always fail. To fix it, skip the pageblock isolation in second isolate_single_pageblock(). Link: https://lkml.kernel.org/r/20220526231531.2404977-1-zi.yan@sent.com Fixes: 88ee134320b8 ("mm: fix a potential infinite loop in start_isolate_page_range()") Signed-off-by: Zi Yan <ziy@nvidia.com> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/linux-mm/ac65adc0-a7e4-cdfe-a0d8-757195b86293@samsung.com/ Reported-by: Michael Walle <michael@walle.cc> Tested-by: Michael Walle <michael@walle.cc> Link: https://lore.kernel.org/linux-mm/8ca048ca8b547e0dd1c95387ee05c23d@walle.cc/ Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Doug Berger <opendmb@gmail.com> Cc: Eric Ren <renzhengeek@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Qian Cai <quic_qiancai@quicinc.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27tools arch x86: Sync the msr-index.h copy with the kernel sourcesArnaldo Carvalho de Melo1-0/+19
To pick up the changes in: db1af12929c99d15 ("x86/msr-index: Define INTEGRITY_CAPABILITIES MSR") 089be16d5992dd0b ("x86/msr: Add PerfCntrGlobal* registers") f52ba93190457aa2 ("tools/power turbostat: Add Power Limit4 support") Addressing these tools/perf build warnings: diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h' That makes the beautification scripts to pick some new entries: $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after $ diff -u before after --- before 2022-05-26 12:50:01.228612839 -0300 +++ after 2022-05-26 12:50:07.699776166 -0300 @@ -116,6 +116,7 @@ [0x0000026f] = "MTRRfix4K_F8000", [0x00000277] = "IA32_CR_PAT", [0x00000280] = "IA32_MC0_CTL2", + [0x000002d9] = "INTEGRITY_CAPS", [0x000002ff] = "MTRRdefType", [0x00000309] = "CORE_PERF_FIXED_CTR0", [0x0000030a] = "CORE_PERF_FIXED_CTR1", @@ -176,6 +177,7 @@ [0x00000586] = "IA32_RTIT_ADDR3_A", [0x00000587] = "IA32_RTIT_ADDR3_B", [0x00000600] = "IA32_DS_AREA", + [0x00000601] = "VR_CURRENT_CONFIG", [0x00000606] = "RAPL_POWER_UNIT", [0x0000060a] = "PKGC3_IRTL", [0x0000060b] = "PKGC6_IRTL", @@ -260,6 +262,10 @@ [0xc0000102 - x86_64_specific_MSRs_offset] = "KERNEL_GS_BASE", [0xc0000103 - x86_64_specific_MSRs_offset] = "TSC_AUX", [0xc0000104 - x86_64_specific_MSRs_offset] = "AMD64_TSC_RATIO", + [0xc000010f - x86_64_specific_MSRs_offset] = "AMD_DBG_EXTN_CFG", + [0xc0000300 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS", + [0xc0000301 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_CTL", + [0xc0000302 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS_CLR", }; #define x86_AMD_V_KVM_MSRs_offset 0xc0010000 @@ -318,4 +324,5 @@ [0xc00102b4 - x86_AMD_V_KVM_MSRs_offset] = "AMD_CPPC_STATUS", [0xc00102f0 - x86_AMD_V_KVM_MSRs_offset] = "AMD_PPIN_CTL", [0xc00102f1 - x86_AMD_V_KVM_MSRs_offset] = "AMD_PPIN", + [0xc0010300 - x86_AMD_V_KVM_MSRs_offset] = "AMD_SAMP_BR_FROM", }; $ Now one can trace systemwide asking to see backtraces to where those MSRs are being read/written, see this example with a previous update: # perf trace -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB" ^C# If we use -v (verbose mode) we can see what it does behind the scenes: # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB" Using CPUID AuthenticAMD-25-21-0 0x6a0 0x6a8 New filter for msr:read_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313) 0x6a0 0x6a8 New filter for msr:write_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313) mmap size 528384B ^C# Example with a frequent msr: # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2 Using CPUID AuthenticAMD-25-21-0 0x48 New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841) 0x48 New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841) mmap size 528384B Looking at the vmlinux_path (8 entries long) symsrc__init: build id mismatch for vmlinux. Using /proc/kcore for kernel data Using /proc/kallsyms for symbols 0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6) do_trace_write_msr ([kernel.kallsyms]) do_trace_write_msr ([kernel.kallsyms]) __switch_to_xtra ([kernel.kallsyms]) __switch_to ([kernel.kallsyms]) __schedule ([kernel.kallsyms]) schedule ([kernel.kallsyms]) futex_wait_queue_me ([kernel.kallsyms]) futex_wait ([kernel.kallsyms]) do_futex ([kernel.kallsyms]) __x64_sys_futex ([kernel.kallsyms]) do_syscall_64 ([kernel.kallsyms]) entry_SYSCALL_64_after_hwframe ([kernel.kallsyms]) __futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so) 0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2) do_trace_write_msr ([kernel.kallsyms]) do_trace_write_msr ([kernel.kallsyms]) __switch_to_xtra ([kernel.kallsyms]) __switch_to ([kernel.kallsyms]) __schedule ([kernel.kallsyms]) schedule_idle ([kernel.kallsyms]) do_idle ([kernel.kallsyms]) cpu_startup_entry ([kernel.kallsyms]) secondary_startup_64_no_verify ([kernel.kallsyms]) # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/lkml/Yo+i%252Fj5+UtE9dcix@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-27perf scripts python: Support Arm CoreSight trace data disassemblyLeo Yan1-0/+272
This commit adds python script to parse CoreSight tracing event and print out source line and disassembly, it generates readable program execution flow for easier humans inspecting. The script receives CoreSight tracing packet with below format: +------------+------------+------------+ packet(n): | addr | ip | cpu | +------------+------------+------------+ packet(n+1): | addr | ip | cpu | +------------+------------+------------+ packet::addr presents the start address of the coming branch sample, and packet::ip is the last address of the branch smple. Therefore, a code section between branches starts from packet(n)::addr and it stops at packet(n+1)::ip. As results we combines the two continuous packets to generate the address range for instructions: [ sample(n)::addr .. sample(n+1)::ip ] The script supports both objdump or llvm-objdump for disassembly with specifying option '-d'. If doesn't specify option '-d', the script simply outputs source lines and symbols. Below shows usages with llvm-objdump or objdump to output disassembly. # perf script -s scripts/python/arm-cs-trace-disasm.py -- -d llvm-objdump-11 -k ./vmlinux ARM CoreSight Trace Data Assembler Dump ffff800008eb3198 <etm4_enable_hw>: ffff800008eb3310: c0 38 00 35 cbnz w0, 0xffff800008eb3a28 <etm4_enable_hw+0x890> ffff800008eb3314: 9f 3f 03 d5 dsb sy ffff800008eb3318: df 3f 03 d5 isb ffff800008eb331c: f5 5b 42 a9 ldp x21, x22, [sp, #32] ffff800008eb3320: fb 73 45 a9 ldp x27, x28, [sp, #80] ffff800008eb3324: e0 82 40 39 ldrb w0, [x23, #32] ffff800008eb3328: 60 00 00 34 cbz w0, 0xffff800008eb3334 <etm4_enable_hw+0x19c> ffff800008eb332c: e0 03 19 aa mov x0, x25 ffff800008eb3330: 8c fe ff 97 bl 0xffff800008eb2d60 <etm4_cs_lock.isra.0.part.0> main 6728/6728 [0004] 0.000000000 etm4_enable_hw+0x198 [kernel.kallsyms] ffff800008eb2d60 <etm4_cs_lock.isra.0.part.0>: ffff800008eb2d60: 1f 20 03 d5 nop ffff800008eb2d64: 1f 20 03 d5 nop ffff800008eb2d68: 3f 23 03 d5 hint #25 ffff800008eb2d6c: 00 00 40 f9 ldr x0, [x0] ffff800008eb2d70: 9f 3f 03 d5 dsb sy ffff800008eb2d74: 00 c0 3e 91 add x0, x0, #4016 ffff800008eb2d78: 1f 00 00 b9 str wzr, [x0] ffff800008eb2d7c: bf 23 03 d5 hint #29 ffff800008eb2d80: c0 03 5f d6 ret main 6728/6728 [0004] 0.000000000 etm4_cs_lock.isra.0.part.0+0x20 # perf script -s scripts/python/arm-cs-trace-disasm.py -- -d objdump -k ./vmlinux ARM CoreSight Trace Data Assembler Dump ffff800008eb3310 <etm4_enable_hw+0x178>: ffff800008eb3310: 350038c0 cbnz w0, ffff800008eb3a28 <etm4_enable_hw+0x890> ffff800008eb3314: d5033f9f dsb sy ffff800008eb3318: d5033fdf isb ffff800008eb331c: a9425bf5 ldp x21, x22, [sp, #32] ffff800008eb3320: a94573fb ldp x27, x28, [sp, #80] ffff800008eb3324: 394082e0 ldrb w0, [x23, #32] ffff800008eb3328: 34000060 cbz w0, ffff800008eb3334 <etm4_enable_hw+0x19c> ffff800008eb332c: aa1903e0 mov x0, x25 ffff800008eb3330: 97fffe8c bl ffff800008eb2d60 <etm4_cs_lock.isra.0.part.0> main 6728/6728 [0004] 0.000000000 etm4_enable_hw+0x198 [kernel.kallsyms] ffff800008eb2d60 <etm4_cs_lock.isra.0.part.0>: ffff800008eb2d60: d503201f nop ffff800008eb2d64: d503201f nop ffff800008eb2d68: d503233f paciasp ffff800008eb2d6c: f9400000 ldr x0, [x0] ffff800008eb2d70: d5033f9f dsb sy ffff800008eb2d74: 913ec000 add x0, x0, #0xfb0 ffff800008eb2d78: b900001f str wzr, [x0] ffff800008eb2d7c: d50323bf autiasp ffff800008eb2d80: d65f03c0 ret main 6728/6728 [0004] 0.000000000 etm4_cs_lock.isra.0.part.0+0x20 Signed-off-by: Leo Yan <leo.yan@linaro.org> Co-authored-by: Al Grant <al.grant@arm.com> Co-authored-by: Mathieu Poirier <mathieu.poirier@linaro.org> Co-authored-by: Tor Jeremiassen <tor@ti.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Eelco Chaudron <echaudro@redhat.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Tanmay Jagdale <tanmay@marvell.com> Cc: coresight@lists.linaro.org Cc: zengshun . wu <zengshun.wu@outlook.com> Link: https://lore.kernel.org/r/20220521130446.4163597-3-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-27perf scripting python: Expose dso and map informationLeo Yan1-4/+17
This change adds dso build_id and corresponding map's start and end address. The info of dso build_id can be used to find dso file path, and we can validate if a branch address falls into the range of map's start and end addresses. In addition, the map's start address can be used as an offset for disassembly. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Grant <al.grant@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Eelco Chaudron <echaudro@redhat.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Tanmay Jagdale <tanmay@marvell.com> Cc: coresight@lists.linaro.org Cc: zengshun . wu <zengshun.wu@outlook.com> Link: https://lore.kernel.org/r/20220521130446.4163597-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-27perf jevents: Fix event syntax error caused by ExtSelZhengjun Xing1-1/+1
In the origin code, when "ExtSel" is 1, the eventcode will change to "eventcode |= 1 << 21”. For event “UNC_Q_RxL_CREDITS_CONSUMED_VN0.DRS", its "ExtSel" is "1", its eventcode will change from 0x1E to 0x20001E, but in fact the eventcode should <=0x1FF, so this will cause the parse fail: # perf stat -e "UNC_Q_RxL_CREDITS_CONSUMED_VN0.DRS" -a sleep 0.1 event syntax error: '.._RxL_CREDITS_CONSUMED_VN0.DRS' \___ value too big for format, maximum is 511 On the perf kernel side, the kernel assumes the valid bits are continuous. It will adjust the 0x100 (bit 8 for perf tool) to bit 21 in HW. DEFINE_UNCORE_FORMAT_ATTR(event_ext, event, "config:0-7,21"); So the perf tool follows the kernel side and just set bit8 other than bit21. Fixes: fedb2b518239cbc0 ("perf jevents: Add support for parsing uncore json files") Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220525140410.1706851-1-zhengjun.xing@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-27perf tools arm64: Add support for VG registerJames Clark2-0/+40
Add the name of the VG register so it can be used in --user-regs The event will fail to open if the register is requested but not available so only add it to the mask if the kernel supports sve and also if it supports that specific register. Committer notes: Add conditional definition of HWCAP_SVE, as suggested by Leo Yan, to build on older systems where this is not available in the system headers. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Mark Brown <broonie@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220525154114.718321-6-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-27mm/page_table_check: fix accessing unmapped ptepMiaohe Lin1-1/+1
ptep is unmapped too early, so ptep could theoretically be accessed while it's unmapped. This might become a problem if/when CONFIG_HIGHPTE becomes available on riscv. Fix it by deferring pte_unmap() until page table checking is done. [akpm@linux-foundation.org: account for ptep alteration, per Matthew] Link: https://lkml.kernel.org/r/20220526113350.30806-1-linmiaohe@huawei.com Fixes: 80110bbfbba6 ("mm/page_table_check: check entries at pmd levels") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add]Naveen N. Rao4-42/+56
Since commit d1bcae833b32f1 ("ELF: Don't generate unused section symbols") [1], binutils (v2.36+) started dropping section symbols that it thought were unused. This isn't an issue in general, but with kexec_file.c, gcc is placing kexec_arch_apply_relocations[_add] into a separate .text.unlikely section and the section symbol ".text.unlikely" is being dropped. Due to this, recordmcount is unable to find a non-weak symbol in .text.unlikely to generate a relocation record against. Address this by dropping the weak attribute from these functions. Instead, follow the existing pattern of having architectures #define the name of the function they want to override in their headers. [1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1 [akpm@linux-foundation.org: arch/s390/include/asm/kexec.h needs linux/module.h] Link: https://lkml.kernel.org/r/20220519091237.676736-1-naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27mm/page_alloc: always attempt to allocate at least one page during bulk ↵Mel Gorman1-2/+2
allocation Peter Pavlisko reported the following problem on kernel bugzilla 216007. When I try to extract an uncompressed tar archive (2.6 milion files, 760.3 GiB in size) on newly created (empty) XFS file system, after first low tens of gigabytes extracted the process hangs in iowait indefinitely. One CPU core is 100% occupied with iowait, the other CPU core is idle (on 2-core Intel Celeron G1610T). It was bisected to c9fa563072e1 ("xfs: use alloc_pages_bulk_array() for buffers") but XFS is only the messenger. The problem is that nothing is waking kswapd to reclaim some pages at a time the PCP lists cannot be refilled until some reclaim happens. The bulk allocator checks that there are some pages in the array and the original intent was that a bulk allocator did not necessarily need all the requested pages and it was best to return as quickly as possible. This was fine for the first user of the API but both NFS and XFS require the requested number of pages be available before making progress. Both could be adjusted to call the page allocator directly if a bulk allocation fails but it puts a burden on users of the API. Adjust the semantics to attempt at least one allocation via __alloc_pages() before returning so kswapd is woken if necessary. It was reported via bugzilla that the patch addressed the problem and that the tar extraction completed successfully. This may also address bug 215975 but has yet to be confirmed. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216007 BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215975 Link: https://lkml.kernel.org/r/20220526091210.GC3441@techsingularity.net Fixes: 387ba26fb1cb ("mm/page_alloc: add a bulk page allocator") Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <dchinner@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@vger.kernel.org> [5.13+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27hugetlb: fix huge_pmd_unshare address updateMike Kravetz1-1/+8
The routine huge_pmd_unshare() is passed a pointer to an address associated with an area which may be unshared. If unshare is successful this address is updated to 'optimize' callers iterating over huge page addresses. For the optimization to work correctly, address should be updated to the last huge page in the unmapped/unshared area. However, in the common case where the passed address is PUD_SIZE aligned, the address is incorrectly updated to the address of the preceding huge page. That wastes CPU cycles as the unmapped/unshared range is scanned twice. Link: https://lkml.kernel.org/r/20220524205003.126184-1-mike.kravetz@oracle.com Fixes: 39dde65c9940 ("shared page table for hugetlb page") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-27selftests/bpf: fix stacktrace_build_id with missing kprobe/urandom_readSong Liu1-1/+1
Kernel function urandom_read is replaced with urandom_read_iter. Therefore, kprobe on urandom_read is not working any more: [root@eth50-1 bpf]# ./test_progs -n 161 test_stacktrace_build_id:PASS:skel_open_and_load 0 nsec libbpf: kprobe perf_event_open() failed: No such file or directory libbpf: prog 'oncpu': failed to create kprobe 'urandom_read+0x0' \ perf event: No such file or directory libbpf: prog 'oncpu': failed to auto-attach: -2 test_stacktrace_build_id:FAIL:attach_tp err -2 161 stacktrace_build_id:FAIL Fix this by replacing urandom_read with urandom_read_iter in the test. Fixes: 1b388e7765f2 ("random: convert to using fops->read_iter()") Reported-by: Mykola Lysenko <mykolal@fb.com> Signed-off-by: Song Liu <song@kernel.org> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20220526191608.2364049-1-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-05-27powerpc/64: Include cache.h directly in paca.hMichael Ellerman1-0/+1
paca.h uses ____cacheline_aligned without directly including cache.h, where it's defined. For Book3S builds that's OK because paca.h includes lppaca.h, and it does include cache.h. But Book3E builds have been getting cache.h indirectly via printk.h, which is dicey, and in fact that include was recently removed, leading to build errors such as: ld: fs/isofs/dir.o:(.bss+0x0): multiple definition of `____cacheline_aligned'; fs/isofs/namei.o:(.bss+0x0): first defined here So include cache.h directly to fix the build error. Fixes: 534aa1dc975a ("printk: stop including cache.h from printk.h") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2022-05-27net: usb: qmi_wwan: add Telit 0x1250 compositionCarlo Lobrano1-0/+1
Add support for Telit LN910Cx 0x1250 composition 0x1250: rmnet, tty, tty, tty, tty Signed-off-by: Carlo Lobrano <c.lobrano@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-27net: lan743x: PCI11010 / PCI11414 fixRaju Lakkaraju1-10/+22
Fix the MDIO interface declarations to reflect what is currently supported by the PCI11010 / PCI11414 devices (C22 for RGMII and C22_C45 for SGMII) Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-27Revert "printk: wake up all waiters"John Ogness1-1/+1
This reverts commit 938ba4084abcf6fdd21d9078513c52f8fb9b00d0. The wait queue @log_wait never has exclusive waiters, so there is no need to use wake_up_interruptible_all(). Using wake_up_interruptible() was the correct function to wake all waiters. Since there are no exclusive waiters, erroneously changing wake_up_interruptible() to wake_up_interruptible_all() did not result in any behavior change. However, using wake_up_interruptible_all() on a wait queue without exclusive waiters is fundamentally wrong. Go back to using wake_up_interruptible() to wake all waiters. Signed-off-by: John Ogness <john.ogness@linutronix.de> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20220526203056.81123-1-john.ogness@linutronix.de
2022-05-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfDavid S. Miller4-25/+23
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following contain more Netfilter fixes for net: 1) syzbot warning in nfnetlink bind, from Florian. 2) Refetch conntrack after __nf_conntrack_confirm(), from Florian Westphal. 3) Move struct nf_ct_timeout back at the bottom of the ctnl_time, to where it before recent update, also from Florian. 4) Add NL_SET_BAD_ATTR() to nf_tables netlink for proper set element commands error reporting. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-27netfilter: nf_tables: set element extended ACK reporting supportPablo Neira Ayuso1-3/+9
Report the element that causes problems via netlink extended ACK for set element commands. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-27netfilter: cttimeout: fix slab-out-of-bounds read in cttimeout_net_exitFlorian Westphal1-2/+3
syzbot reports: BUG: KASAN: slab-out-of-bounds in __list_del_entry_valid+0xcc/0xf0 lib/list_debug.c:42 [..] list_del include/linux/list.h:148 [inline] cttimeout_net_exit+0x211/0x540 net/netfilter/nfnetlink_cttimeout.c:617 No reproducer so far. Looking at recent changes in this area its clear that the free_head must not be at the end of the structure because nf_ct_timeout structure has variable size. Reported-by: <syzbot+92968395eedbdbd3617d@syzkaller.appspotmail.com> Fixes: 78222bacfca9 ("netfilter: cttimeout: decouple unlink and free on netns destruction") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-27netfilter: conntrack: re-fetch conntrack after insertionFlorian Westphal1-1/+6
In case the conntrack is clashing, insertion can free skb->_nfct and set skb->_nfct to the already-confirmed entry. This wasn't found before because the conntrack entry and the extension space used to free'd after an rcu grace period, plus the race needs events enabled to trigger. Reported-by: <syzbot+793a590957d9c1b96620@syzkaller.appspotmail.com> Fixes: 71d8c47fc653 ("netfilter: conntrack: introduce clash resolution on insertion race") Fixes: 2ad9d7747c10 ("netfilter: conntrack: free extension area immediately") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-27netfilter: nfnetlink: fix warn in nfnetlink_unbindFlorian Westphal1-19/+5
syzbot reports following warn: WARNING: CPU: 0 PID: 3600 at net/netfilter/nfnetlink.c:703 nfnetlink_unbind+0x357/0x3b0 net/netfilter/nfnetlink.c:694 The syzbot generated program does this: socket(AF_NETLINK, SOCK_RAW, NETLINK_NETFILTER) = 3 setsockopt(3, SOL_NETLINK, NETLINK_DROP_MEMBERSHIP, [1], 4) = 0 ... which triggers 'WARN_ON_ONCE(nfnlnet->ctnetlink_listeners == 0)' check. Instead of counting, just enable reporting for every bind request and check if we still have listeners on unbind. While at it, also add the needed bounds check on nfnl_group2type[] access. Reported-by: <syzbot+4903218f7fba0a2d6226@syzkaller.appspotmail.com> Reported-by: <syzbot+afd2d80e495f96049571@syzkaller.appspotmail.com> Fixes: 2794cdb0b97b ("netfilter: nfnetlink: allow to detect if ctnetlink listeners exist") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-05-27net: dsa: mv88e6xxx: Fix refcount leak in mv88e6xxx_mdios_registerMiaoqian Lin1-0/+1
of_get_child_by_name() returns a node pointer with refcount incremented, we should use of_node_put() on it when done. mv88e6xxx_mdio_register() pass the device node to of_mdiobus_register(). We don't need the device node after it. Add missing of_node_put() to avoid refcount leak. Fixes: a3c53be55c95 ("net: dsa: mv88e6xxx: Support multiple MDIO busses") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Reviewed-by: Marek Behún <kabel@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-27net: ethernet: ti: am65-cpsw-nuss: Fix some refcount leaksMiaoqian Lin1-1/+2
of_get_child_by_name() returns a node pointer with refcount incremented, we should use of_node_put() on it when not need anymore. am65_cpsw_init_cpts() and am65_cpsw_nuss_probe() don't release the refcount in error case. Add missing of_node_put() to avoid refcount leak. Fixes: b1f66a5bee07 ("net: ethernet: ti: am65-cpsw-nuss: enable packet timestamping support") Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-27net: ethernet: mtk_eth_soc: out of bounds read in mtk_hwlro_get_fdir_entry()Dan Carpenter1-0/+3
The "fsp->location" variable comes from user via ethtool_get_rxnfc(). Check that it is valid to prevent an out of bounds read. Fixes: 7aab747e5563 ("net: ethernet: mediatek: add ethtool functions to configure RX flows of HW LRO") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-27Merge tag 'for-5.19/dm-changes' of ↵Linus Torvalds15-287/+409
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Enable DM core bioset's per-cpu bio cache if QUEUE_FLAG_POLL set. This change improves DM's hipri bio polling (REQ_POLLED) performance by 7 - 20% depending on the system. - Update DM core to use jump_labels to further reduce cost of unlikely branches for zoned block devices, dm-stats and swap_bios throttling. - Various DM core changes to reduce bio-based DM overhead and simplify IO accounting. - Fundamental DM core improvements to dm_io reference counting and the elimination of using bio_split()+bio_chain() -- instead DM's bio-based IO accounting is updated to account that a split occurred. - Improve DM core's abnormal bio processing to do less work. - Improve DM core's hipri polling support to use a single list rather than an hlist. - Update DM core to pass NULL bdev to bio_alloc_clone() so that initialization that isn't useful for DM can be elided. - Add cond_resched to DM stats' various loops that loop over all entries. - Fix incorrect error code return from DM integrity's constructor. - Make DM crypt's printing of the key constant-time. - Update bio-based DM multipath to provide high-resolution timer to the Historical Service Time (HST) path selector. * tag 'for-5.19/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (26 commits) dm: pass NULL bdev to bio_alloc_clone dm cache metadata: remove unnecessary variable in __dump_mapping dm mpath: provide high-resolution timer to HST for bio-based dm crypt: make printing of the key constant-time dm integrity: fix error code in dm_integrity_ctr() dm stats: add cond_resched when looping over entries dm: improve abnormal bio processing dm: simplify bio-based IO accounting further dm: put all polled dm_io instances into a single list dm: improve dm_io reference counting dm: don't grab target io reference in dm_zone_map_bio dm: improve bio splitting and associated IO accounting dm: switch to bdev based IO accounting interfaces dm: pass dm_io instance to dm_io_acct directly dm: don't pass bio to __dm_start_io_acct and dm_end_io_acct dm: use bio_sectors in dm_aceept_partial_bio dm: simplify basic targets dm: conditionally enable branching for less used features dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio dm: move hot dm_io members to same cacheline as dm_target_io ...
2022-05-27Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds88-2002/+1968
Pull rdma updates from Jason Gunthorpe: "Small collection of incremental improvement patches: - Minor code cleanup patches, comment improvements, etc from static tools - Clean the some of the kernel caps, reducing the historical stealth uAPI leftovers - Bug fixes and minor changes for rdmavt, hns, rxe, irdma - Remove unimplemented cruft from rxe - Reorganize UMR QP code in mlx5 to avoid going through the IB verbs layer - flush_workqueue(system_unbound_wq) removal - Ensure rxe waits for objects to be unused before allowing the core to free them - Several rc quality bug fixes for hfi1" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (67 commits) RDMA/rtrs-clt: Fix one kernel-doc comment RDMA/hfi1: Remove all traces of diagpkt support RDMA/hfi1: Consolidate software versions RDMA/hfi1: Remove pointless driver version RDMA/hfi1: Fix potential integer multiplication overflow errors RDMA/hfi1: Prevent panic when SDMA is disabled RDMA/hfi1: Prevent use of lock before it is initialized RDMA/rxe: Fix an error handling path in rxe_get_mcg() IB/core: Fix typo in comment RDMA/core: Fix typo in comment IB/hf1: Fix typo in comment IB/qib: Fix typo in comment IB/iser: Fix typo in comment RDMA/mlx4: Avoid flush_scheduled_work() usage IB/isert: Avoid flush_scheduled_work() usage RDMA/mlx5: Remove duplicate pointer assignment in mlx5_ib_alloc_implicit_mr() RDMA/qedr: Remove unnecessary synchronize_irq() before free_irq() RDMA/hns: Use hr_reg_read() instead of remaining roce_get_xxx() RDMA/hns: Use hr_reg_xxx() instead of remaining roce_set_xxx() RDMA/irdma: Add SW mechanism to generate completions on error ...
2022-05-27Merge tag 'hardening-v5.19-rc1-fix1' of ↵Linus Torvalds6-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull kernel hardening fix from Kees Cook: "This fixes an unlucky build race condition when using the GCC plugins, noticed by a few folks. - Avoid GCC plugins needing utsrelease.h build target (Masahiro Yamada)" * tag 'hardening-v5.19-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: gcc-plugins: use KERNELVERSION for plugin version
2022-05-27Merge tag 'nfsd-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds30-426/+985
Pull nfsd updates from Chuck Lever: "We introduce 'courteous server' in this release. Previously NFSD would purge open and lock state for an unresponsive client after one lease period (typically 90 seconds). Now, after one lease period, another client can open and lock those files and the unresponsive client's lease is purged; otherwise if the unresponsive client's open and lock state is uncontended, the server retains that open and lock state for up to 24 hours, allowing the client's workload to resume after a lengthy network partition. A longstanding issue with NFSv4 file creation is also addressed. Previously a file creation can fail internally, returning an error to the client, but leave the newly created file in place as an artifact. The file creation code path has been reorganized so that internal failures and race conditions are less likely to result in an unwanted file creation. A fault injector has been added to help exercise paths that are run during kernel metadata cache invalidation. These caches contain information maintained by user space about exported filesystems. Many of our test workloads do not trigger cache invalidation. There is one patch that is needed to support PREEMPT_RT and a fix for an ancient 'sleep while spin-locked' splat that seems to have become easier to hit since v5.18-rc3" * tag 'nfsd-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (36 commits) NFSD: nfsd_file_put() can sleep NFSD: Add documenting comment for nfsd4_release_lockowner() NFSD: Modernize nfsd4_release_lockowner() NFSD: Fix possible sleep during nfsd4_release_lockowner() nfsd: destroy percpu stats counters after reply cache shutdown nfsd: Fix null-ptr-deref in nfsd_fill_super() nfsd: Unregister the cld notifier when laundry_wq create failed SUNRPC: Use RMW bitops in single-threaded hot paths NFSD: Clean up the show_nf_flags() macro NFSD: Trace filecache opens NFSD: Move documenting comment for nfsd4_process_open2() NFSD: Fix whitespace NFSD: Remove dprintk call sites from tail of nfsd4_open() NFSD: Instantiate a struct file when creating a regular NFSv4 file NFSD: Clean up nfsd_open_verified() NFSD: Remove do_nfsd_create() NFSD: Refactor NFSv4 OPEN(CREATE) NFSD: Refactor NFSv3 CREATE NFSD: Refactor nfsd_create_setattr() NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create() ...
2022-05-27net: sched: fixed barrier to prevent skbuff sticking in qdisc backlogVincent Ray1-28/+8
In qdisc_run_begin(), smp_mb__before_atomic() used before test_bit() does not provide any ordering guarantee as test_bit() is not an atomic operation. This, added to the fact that the spin_trylock() call at the beginning of qdisc_run_begin() does not guarantee acquire semantics if it does not grab the lock, makes it possible for the following statement : if (test_bit(__QDISC_STATE_MISSED, &qdisc->state)) to be executed before an enqueue operation called before qdisc_run_begin(). As a result the following race can happen : CPU 1 CPU 2 qdisc_run_begin() qdisc_run_begin() /* true */ set(MISSED) . /* returns false */ . . /* sees MISSED = 1 */ . /* so qdisc not empty */ . __qdisc_run() . . . pfifo_fast_dequeue() ----> /* may be done here */ . | . clear(MISSED) | . . | . smp_mb __after_atomic(); | . . | . /* recheck the queue */ | . /* nothing => exit */ | enqueue(skb1) | . | qdisc_run_begin() | . | spin_trylock() /* fail */ | . | smp_mb__before_atomic() /* not enough */ | . ---- if (test_bit(MISSED)) return false; /* exit */ In the above scenario, CPU 1 and CPU 2 both try to grab the qdisc->seqlock at the same time. Only CPU 2 succeeds and enters the bypass code path, where it emits its skb then calls __qdisc_run(). CPU1 fails, sets MISSED and goes down the traditionnal enqueue() + dequeue() code path. But when executing qdisc_run_begin() for the second time, after enqueuing its skbuff, it sees the MISSED bit still set (by itself) and consequently chooses to exit early without setting it again nor trying to grab the spinlock again. Meanwhile CPU2 has seen MISSED = 1, cleared it, checked the queue and found it empty, so it returned. At the end of the sequence, we end up with skb1 enqueued in the backlog, both CPUs out of __dev_xmit_skb(), the MISSED bit not set, and no __netif_schedule() called made. skb1 will now linger in the qdisc until somebody later performs a full __qdisc_run(). Associated to the bypass capacity of the qdisc, and the ability of the TCP layer to avoid resending packets which it knows are still in the qdisc, this can lead to serious traffic "holes" in a TCP connection. We fix this by replacing the smp_mb__before_atomic() / test_bit() / set_bit() / smp_mb__after_atomic() sequence inside qdisc_run_begin() by a single test_and_set_bit() call, which is more concise and enforces the needed memory barriers. Fixes: 89837eb4b246 ("net: sched: add barrier to ensure correct ordering for lockless qdisc") Signed-off-by: Vincent Ray <vray@kalrayinc.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220526001746.2437669-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-27net: lan966x: check devm_of_phy_get() for -EDEFER_PROBEMichael Walle1-2/+7
At the moment, if devm_of_phy_get() returns an error the serdes simply isn't set. While it is bad to ignore an error in general, there is a particular bug that network isn't working if the serdes driver is compiled as a module. In that case, devm_of_phy_get() returns -EDEFER_PROBE and the error is silently ignored. The serdes is optional, it is not there if the port is using RGMII, in which case devm_of_phy_get() returns -ENODEV. Rearrange the error handling so that -ENODEV will be handled but other error codes will abort the probing. Fixes: d28d6d2e37d1 ("net: lan966x: add port module support") Signed-off-by: Michael Walle <michael@walle.cc> Link: https://lore.kernel.org/r/20220525231239.1307298-1-michael@walle.cc Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski2-9/+12
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Fix UAF when creating non-stateful expression in set. 2) Set limit cost when cloning expression accordingly, from Phil Sutter. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_limit: Clone packet limits' cost value netfilter: nf_tables: disallow non-stateful expression in sets earlier ==================== Link: https://lore.kernel.org/r/20220526205411.315136-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-27tracing: Fix comments for event_trigger_separate_filter()sunliming1-4/+4
The parameter name in comments of event_trigger_separate_filter() is inconsistent with actual parameter name, fix it. Link: https://lkml.kernel.org/r/20220526072957.165655-1-sunliming@kylinos.cn Signed-off-by: sunliming <sunliming@kylinos.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-27x86/traceponit: Fix comment about irq vector tracepointssunliming1-3/+0
Commit: 4b9a8dca0e58 ("x86/idt: Remove the tracing IDT completely") removed the 'tracing IDT' from arch/x86/kernel/tracepoint.c, but left related comment. So that the comment become anachronistic. Just remove the comment. Link: https://lkml.kernel.org/r/20220526110831.175743-1-sunliming@kylinos.cn Signed-off-by: sunliming <sunliming@kylinos.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-27x86,tracing: Remove unused headerssunliming1-3/+0
Commit 4b9a8dca0e58 ("x86/idt: Remove the tracing IDT completely") removed the tracing IDT from the file arch/x86/kernel/tracepoint.c, but left the related headers unused, remove it. Link: https://lkml.kernel.org/r/20220525012827.93464-1-sunliming@kylinos.cn Signed-off-by: sunliming <sunliming@kylinos.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-27ftrace: Clean up hash direct_functions on register failuresSong Liu1-3/+2
We see the following GPF when register_ftrace_direct fails: [ ] general protection fault, probably for non-canonical address \ 0x200000000000010: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI [...] [ ] RIP: 0010:ftrace_find_rec_direct+0x53/0x70 [ ] Code: 48 c1 e0 03 48 03 42 08 48 8b 10 31 c0 48 85 d2 74 [...] [ ] RSP: 0018:ffffc9000138bc10 EFLAGS: 00010206 [ ] RAX: 0000000000000000 RBX: ffffffff813e0df0 RCX: 000000000000003b [ ] RDX: 0200000000000000 RSI: 000000000000000c RDI: ffffffff813e0df0 [ ] RBP: ffffffffa00a3000 R08: ffffffff81180ce0 R09: 0000000000000001 [ ] R10: ffffc9000138bc18 R11: 0000000000000001 R12: ffffffff813e0df0 [ ] R13: ffffffff813e0df0 R14: ffff888171b56400 R15: 0000000000000000 [ ] FS: 00007fa9420c7780(0000) GS:ffff888ff6a00000(0000) knlGS:000000000 [ ] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ ] CR2: 000000000770d000 CR3: 0000000107d50003 CR4: 0000000000370ee0 [ ] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ ] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ ] Call Trace: [ ] <TASK> [ ] register_ftrace_direct+0x54/0x290 [ ] ? render_sigset_t+0xa0/0xa0 [ ] bpf_trampoline_update+0x3f5/0x4a0 [ ] ? 0xffffffffa00a3000 [ ] bpf_trampoline_link_prog+0xa9/0x140 [ ] bpf_tracing_prog_attach+0x1dc/0x450 [ ] bpf_raw_tracepoint_open+0x9a/0x1e0 [ ] ? find_held_lock+0x2d/0x90 [ ] ? lock_release+0x150/0x430 [ ] __sys_bpf+0xbd6/0x2700 [ ] ? lock_is_held_type+0xd8/0x130 [ ] __x64_sys_bpf+0x1c/0x20 [ ] do_syscall_64+0x3a/0x80 [ ] entry_SYSCALL_64_after_hwframe+0x44/0xae [ ] RIP: 0033:0x7fa9421defa9 [ ] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 9 f8 [...] [ ] RSP: 002b:00007ffed743bd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 [ ] RAX: ffffffffffffffda RBX: 00000000069d2480 RCX: 00007fa9421defa9 [ ] RDX: 0000000000000078 RSI: 00007ffed743bd80 RDI: 0000000000000011 [ ] RBP: 00007ffed743be00 R08: 0000000000bb7270 R09: 0000000000000000 [ ] R10: 00000000069da210 R11: 0000000000000246 R12: 0000000000000001 [ ] R13: 00007ffed743c4b0 R14: 00000000069d2480 R15: 0000000000000001 [ ] </TASK> [ ] Modules linked in: klp_vm(OK) [ ] ---[ end trace 0000000000000000 ]--- One way to trigger this is: 1. load a livepatch that patches kernel function xxx; 2. run bpftrace -e 'kfunc:xxx {}', this will fail (expected for now); 3. repeat #2 => gpf. This is because the entry is added to direct_functions, but not removed. Fix this by remove the entry from direct_functions when register_ftrace_direct fails. Also remove the last trailing space from ftrace.c, so we don't have to worry about it anymore. Link: https://lkml.kernel.org/r/20220524170839.900849-1-song@kernel.org Cc: stable@vger.kernel.org Fixes: 763e34e74bb7 ("ftrace: Add register_ftrace_direct()") Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-27tracing: Fix comments of create_filter()sunliming1-1/+1
The name in comments of parameter "filter_string" in function create_filter is annotated as "filter_str", just fix it. Link: https://lkml.kernel.org/r/20220524063937.52873-1-sunliming@kylinos.cn Signed-off-by: sunliming <sunliming@kylinos.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-27tracing: Disable kcov on trace_preemptirq.cCongyu Liu1-0/+4
Functions in trace_preemptirq.c could be invoked from early interrupt code that bypasses kcov trace function's in_task() check. Disable kcov on this file to reduce random code coverage. Link: https://lkml.kernel.org/r/20220523063033.1778974-1-liu3101@purdue.edu Acked-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Congyu Liu <liu3101@purdue.edu> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-27tracing: Initialize integer variable to prevent garbage return valueGautam Menghani1-1/+1
Initialize the integer variable to 0 to fix the clang scan warning: Undefined or garbage value returned to caller [core.uninitialized.UndefReturn] return ret; Link: https://lkml.kernel.org/r/20220522061826.1751-1-gautammenghani201@gmail.com Cc: stable@vger.kernel.org Fixes: 8993665abcce ("tracing/boot: Support multiple handlers for per-event histogram") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Gautam Menghani <gautammenghani201@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-27ftrace: Fix typo in commentJulia Lawall1-1/+1
Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Link: https://lkml.kernel.org/r/20220521111145.81697-81-Julia.Lawall@inria.fr Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-27ftrace: Remove return value of ftrace_arch_modify_*()Li kunyu6-28/+13
All instances of the function ftrace_arch_modify_prepare() and ftrace_arch_modify_post_process() return zero. There's no point in checking their return value. Just have them be void functions. Link: https://lkml.kernel.org/r/20220518023639.4065-1-kunyu@nfschina.com Signed-off-by: Li kunyu <kunyu@nfschina.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-27tracing: Cleanup code by removing init "char *name"liqiong1-3/+1
The pointer is assigned to "type->name" anyway. no need to initialize with "preemption". Link: https://lkml.kernel.org/r/20220513075221.26275-1-liqiong@nfschina.com Signed-off-by: liqiong <liqiong@nfschina.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-27tracing: Change "char *" string form to "char []"liqiong2-2/+2
The "char []" string form declares a single variable. It is better than "char *" which creates two variables in the final assembly. Link: https://lkml.kernel.org/r/20220512143230.28796-1-liqiong@nfschina.com Signed-off-by: liqiong <liqiong@nfschina.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-27tracing/timerlat: Do not wakeup the thread if the trace stops at the IRQDaniel Bristot de Oliveira1-0/+2
There is no need to wakeup the timerlat/ thread if stop tracing is hit at the timerlat's IRQ handler. Return before waking up timerlat's thread. Link: https://lkml.kernel.org/r/b392356c91b56aedd2b289513cc56a84cf87e60d.1652175637.git.bristot@kernel.org Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-27tracing/timerlat: Print stacktrace in the IRQ handler if neededDaniel Bristot de Oliveira2-2/+16
If print_stack and stop_tracing_us are set, and stop_tracing_us is hit with latency higher than or equal to print_stack, print the stack at the IRQ handler as it is useful to define the root cause for the IRQ latency. Link: https://lkml.kernel.org/r/fd04530ce98ae9270e41bb124ee5bf67b05ecfed.1652175637.git.bristot@kernel.org Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-27tracing/timerlat: Notify IRQ new max latency only if stop tracing is setDaniel Bristot de Oliveira1-4/+5
Currently, the notification of a new max latency is sent from timerlat's IRQ handler anytime a new max latency is found. While this behavior is not wrong, the send IPI overhead itself will increase the thread latency and that is not the desired effect (tracing overhead). Moreover, the thread will notify a new max latency again because the thread latency as it is always higher than the IRQ latency that woke it up. The only case in which it is helpful to notify a new max latency from IRQ is when stop tracing (for the IRQ) is set, as in this case, the thread will not be dispatched. Notify a new max latency from the IRQ handler only if stop tracing is set for the IRQ handler. Link: https://lkml.kernel.org/r/2c2d9a56c0886c8402ba320de32856cbbb10c2bb.1652175637.git.bristot@kernel.org Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Reported-by: Clark Williams <williams@redhat.com> Fixes: a955d7eac177 ("trace: Add timerlat tracer") Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-27kprobes: Fix build errors with CONFIG_KRETPROBES=nMasami Hiramatsu2-74/+72
Max Filippov reported: When building kernel with CONFIG_KRETPROBES=n kernel/kprobes.c compilation fails with the following messages: kernel/kprobes.c: In function ‘recycle_rp_inst’: kernel/kprobes.c:1273:32: error: implicit declaration of function ‘get_kretprobe’ kernel/kprobes.c: In function ‘kprobe_flush_task’: kernel/kprobes.c:1299:35: error: ‘struct task_struct’ has no member named ‘kretprobe_instances’ This came from the commit d741bf41d7c7 ("kprobes: Remove kretprobe hash") which introduced get_kretprobe() and kretprobe_instances member in task_struct when CONFIG_KRETPROBES=y, but did not make recycle_rp_inst() and kprobe_flush_task() depending on CONFIG_KRETPORBES. Since those functions are only used for kretprobe, move those functions into #ifdef CONFIG_KRETPROBE area. Link: https://lkml.kernel.org/r/165163539094.74407.3838114721073251225.stgit@devnote2 Reported-by: Max Filippov <jcmvbkbc@gmail.com> Fixes: d741bf41d7c7 ("kprobes: Remove kretprobe hash") Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S . Miller" <davem@davemloft.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-27tracing: Fix return value of trace_pid_write()Wonhyuk Yang1-2/+4
Setting set_event_pid with trailing whitespace lead to endless write system calls like below. $ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid execve("/usr/bin/echo", ["echo", "123 "], ...) = 0 ... write(1, "123 \n", 5) = 4 write(1, "\n", 1) = 0 write(1, "\n", 1) = 0 write(1, "\n", 1) = 0 write(1, "\n", 1) = 0 write(1, "\n", 1) = 0 .... This is because, the result of trace_get_user's are not returned when it read at least one pid. To fix it, update read variable even if parser->idx == 0. The result of applied patch is below. $ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid execve("/usr/bin/echo", ["echo", "123 "], ...) = 0 ... write(1, "123 \n", 5) = 5 close(1) = 0 Link: https://lkml.kernel.org/r/20220503050546.288911-1-vvghjk1234@gmail.com Cc: Ingo Molnar <mingo@redhat.com> Cc: Baik Song An <bsahn@etri.re.kr> Cc: Hong Yeon Kim <kimhy@etri.re.kr> Cc: Taeung Song <taeung@reallinux.co.kr> Cc: linuxgeek@linuxgeek.io Cc: stable@vger.kernel.org Fixes: 4909010788640 ("tracing: Add set_event_pid directory for future use") Signed-off-by: Wonhyuk Yang <vvghjk1234@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-27tracing: Fix potential double free in create_var_ref()Keita Suzuki1-0/+3
In create_var_ref(), init_var_ref() is called to initialize the fields of variable ref_field, which is allocated in the previous function call to create_hist_field(). Function init_var_ref() allocates the corresponding fields such as ref_field->system, but frees these fields when the function encounters an error. The caller later calls destroy_hist_field() to conduct error handling, which frees the fields and the variable itself. This results in double free of the fields which are already freed in the previous function. Fix this by storing NULL to the corresponding fields when they are freed in init_var_ref(). Link: https://lkml.kernel.org/r/20220425063739.3859998-1-keitasuzuki.park@sslab.ics.keio.ac.jp Fixes: 067fe038e70f ("tracing: Add variable reference handling to hist triggers") CC: stable@vger.kernel.org Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Keita Suzuki <keitasuzuki.park@sslab.ics.keio.ac.jp> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-27tracing: Use strim() to remove whitespace instead of doing it manuallyYuntao Wang1-5/+3
The tracing_set_trace_write() function just removes the trailing whitespace from the user supplied tracer name, but the leading whitespace should also be removed. In addition, if the user supplied tracer name contains only a few whitespace characters, the first one will not be removed using the current method, which results it a single whitespace character left in the buf. To fix all of these issues, we use strim() to correctly remove both the leading and trailing whitespace. Link: https://lkml.kernel.org/r/20220121095623.1826679-1-ytcoode@gmail.com Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-27ftrace: Deal with error return code of the ftrace_process_locs() functionYuntao Wang1-4/+13
The ftrace_process_locs() function may return -ENOMEM error code, which should be handled by the callers. Link: https://lkml.kernel.org/r/20220120065949.1813231-1-ytcoode@gmail.com Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-27tracing: Use trace_create_file() to simplify creation of tracefs entriesYuntao Wang5-48/+21
Creating tracefs entries with tracefs_create_file() followed by pr_warn() is tedious and repetitive, we can use trace_create_file() to simplify this process and make the code more readable. Link: https://lkml.kernel.org/r/20220114131052.534382-1-ytcoode@gmail.com Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>