summaryrefslogtreecommitdiff
path: root/kernel/trace
AgeCommit message (Collapse)AuthorFilesLines
2021-12-17tracing: Fix a kmemleak false positive in tracing_mapChen Jun1-0/+3
[ Upstream commit f25667e5980a4333729cac3101e5de1bb851f71a ] Doing the command: echo 'hist:key=common_pid.execname,common_timestamp' > /sys/kernel/debug/tracing/events/xxx/trigger Triggers many kmemleak reports: unreferenced object 0xffff0000c7ea4980 (size 128): comm "bash", pid 338, jiffies 4294912626 (age 9339.324s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000f3469921>] kmem_cache_alloc_trace+0x4c0/0x6f0 [<0000000054ca40c3>] hist_trigger_elt_data_alloc+0x140/0x178 [<00000000633bd154>] tracing_map_init+0x1f8/0x268 [<000000007e814ab9>] event_hist_trigger_func+0xca0/0x1ad0 [<00000000bf8520ed>] trigger_process_regex+0xd4/0x128 [<00000000f549355a>] event_trigger_write+0x7c/0x120 [<00000000b80f898d>] vfs_write+0xc4/0x380 [<00000000823e1055>] ksys_write+0x74/0xf8 [<000000008a9374aa>] __arm64_sys_write+0x24/0x30 [<0000000087124017>] do_el0_svc+0x88/0x1c0 [<00000000efd0dcd1>] el0_svc+0x1c/0x28 [<00000000dbfba9b3>] el0_sync_handler+0x88/0xc0 [<00000000e7399680>] el0_sync+0x148/0x180 unreferenced object 0xffff0000c7ea4980 (size 128): comm "bash", pid 338, jiffies 4294912626 (age 9339.324s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000f3469921>] kmem_cache_alloc_trace+0x4c0/0x6f0 [<0000000054ca40c3>] hist_trigger_elt_data_alloc+0x140/0x178 [<00000000633bd154>] tracing_map_init+0x1f8/0x268 [<000000007e814ab9>] event_hist_trigger_func+0xca0/0x1ad0 [<00000000bf8520ed>] trigger_process_regex+0xd4/0x128 [<00000000f549355a>] event_trigger_write+0x7c/0x120 [<00000000b80f898d>] vfs_write+0xc4/0x380 [<00000000823e1055>] ksys_write+0x74/0xf8 [<000000008a9374aa>] __arm64_sys_write+0x24/0x30 [<0000000087124017>] do_el0_svc+0x88/0x1c0 [<00000000efd0dcd1>] el0_svc+0x1c/0x28 [<00000000dbfba9b3>] el0_sync_handler+0x88/0xc0 [<00000000e7399680>] el0_sync+0x148/0x180 The reason is elts->pages[i] is alloced by get_zeroed_page. and kmemleak will not scan the area alloced by get_zeroed_page. The address stored in elts->pages will be regarded as leaked. That is, the elts->pages[i] will have pointers loaded onto it as well, and without telling kmemleak about it, those pointers will look like memory without a reference. To fix this, call kmemleak_alloc to tell kmemleak to scan elts->pages[i] Link: https://lkml.kernel.org/r/20211124140801.87121-1-chenjun102@huawei.com Signed-off-by: Chen Jun <chenjun102@huawei.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-08tracing/histograms: String compares should not care about signed valuesSteven Rostedt (VMware)1-1/+1
commit 450fec13d9170127678f991698ac1a5b05c02e2f upstream. When comparing two strings for the "onmatch" histogram trigger, fields that are strings use string comparisons, which do not care about being signed or not. Do not fail to match two string fields if one is unsigned char array and the other is a signed char array. Link: https://lore.kernel.org/all/20211129123043.5cfd687a@gandalf.local.home/ Cc: stable@vgerk.kernel.org Cc: Tom Zanussi <zanussi@kernel.org> Cc: Yafang Shao <laoar.shao@gmail.com> Fixes: b05e89ae7cf3b ("tracing: Accept different type for synthetic event fields") Reviewed-by: Masami Hiramatsu <mhiramatsu@kernel.org> Reported-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08tracing: Don't use out-of-sync va_list in event printingNikita Yushchenko1-0/+12
[ Upstream commit 2ef75e9bd2c998f1c6f6f23a3744136105ddefd5 ] If trace_seq becomes full, trace_seq_vprintf() no longer consumes arguments from va_list, making va_list out of sync with format processing by trace_check_vprintf(). This causes va_arg() in trace_check_vprintf() to return wrong positional argument, which results into a WARN_ON_ONCE() hit. ftrace_stress_test from LTP triggers this situation. Fix it by explicitly avoiding further use if va_list at the point when it's consistency can no longer be guaranteed. Link: https://lkml.kernel.org/r/20211118145516.13219-1-nikita.yushchenko@virtuozzo.com Signed-off-by: Nikita Yushchenko <nikita.yushchenko@virtuozzo.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-01tracing: Check pid filtering when creating eventsSteven Rostedt (VMware)1-0/+10
commit 6cb206508b621a9a0a2c35b60540e399225c8243 upstream. When pid filtering is activated in an instance, all of the events trace files for that instance has the PID_FILTER flag set. This determines whether or not pid filtering needs to be done on the event, otherwise the event is executed as normal. If pid filtering is enabled when an event is created (via a dynamic event or modules), its flag is not updated to reflect the current state, and the events are not filtered properly. Cc: stable@vger.kernel.org Fixes: 3fdaf80f4a836 ("tracing: Implement event pid filtering") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01tracing: Fix pid filtering when triggers are attachedSteven Rostedt (VMware)1-6/+18
commit a55f224ff5f238013de8762c4287117e47b86e22 upstream. If a event is filtered by pid and a trigger that requires processing of the event to happen is a attached to the event, the discard portion does not take the pid filtering into account, and the event will then be recorded when it should not have been. Cc: stable@vger.kernel.org Fixes: 3fdaf80f4a836 ("tracing: Implement event pid filtering") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01tracing/uprobe: Fix uprobe_perf_open probes iterationJiri Olsa1-0/+1
commit 1880ed71ce863318c1ce93bf324876fb5f92854f upstream. Add missing 'tu' variable initialization in the probes loop, otherwise the head 'tu' is used instead of added probes. Link: https://lkml.kernel.org/r/20211123142801.182530-1-jolsa@kernel.org Cc: stable@vger.kernel.org Fixes: 99c9a923e97a ("tracing/uprobe: Fix double perf_event linking on multiprobe uprobe") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25bpf: Forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing progsDmitrii Banshchikov1-2/+0
commit 5e0bc3082e2e403ac0753e099c2b01446bb35578 upstream. Use of bpf_ktime_get_coarse_ns() and bpf_timer_* helpers in tracing progs may result in locking issues. bpf_ktime_get_coarse_ns() uses ktime_get_coarse_ns() time accessor that isn't safe for any context: ====================================================== WARNING: possible circular locking dependency detected 5.15.0-syzkaller #0 Not tainted ------------------------------------------------------ syz-executor.4/14877 is trying to acquire lock: ffffffff8cb30008 (tk_core.seq.seqcount){----}-{0:0}, at: ktime_get_coarse_ts64+0x25/0x110 kernel/time/timekeeping.c:2255 but task is already holding lock: ffffffff90dbf200 (&obj_hash[i].lock){-.-.}-{2:2}, at: debug_object_deactivate+0x61/0x400 lib/debugobjects.c:735 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&obj_hash[i].lock){-.-.}-{2:2}: lock_acquire+0x19f/0x4d0 kernel/locking/lockdep.c:5625 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0xd1/0x120 kernel/locking/spinlock.c:162 __debug_object_init+0xd9/0x1860 lib/debugobjects.c:569 debug_hrtimer_init kernel/time/hrtimer.c:414 [inline] debug_init kernel/time/hrtimer.c:468 [inline] hrtimer_init+0x20/0x40 kernel/time/hrtimer.c:1592 ntp_init_cmos_sync kernel/time/ntp.c:676 [inline] ntp_init+0xa1/0xad kernel/time/ntp.c:1095 timekeeping_init+0x512/0x6bf kernel/time/timekeeping.c:1639 start_kernel+0x267/0x56e init/main.c:1030 secondary_startup_64_no_verify+0xb1/0xbb -> #0 (tk_core.seq.seqcount){----}-{0:0}: check_prev_add kernel/locking/lockdep.c:3051 [inline] check_prevs_add kernel/locking/lockdep.c:3174 [inline] validate_chain+0x1dfb/0x8240 kernel/locking/lockdep.c:3789 __lock_acquire+0x1382/0x2b00 kernel/locking/lockdep.c:5015 lock_acquire+0x19f/0x4d0 kernel/locking/lockdep.c:5625 seqcount_lockdep_reader_access+0xfe/0x230 include/linux/seqlock.h:103 ktime_get_coarse_ts64+0x25/0x110 kernel/time/timekeeping.c:2255 ktime_get_coarse include/linux/timekeeping.h:120 [inline] ktime_get_coarse_ns include/linux/timekeeping.h:126 [inline] ____bpf_ktime_get_coarse_ns kernel/bpf/helpers.c:173 [inline] bpf_ktime_get_coarse_ns+0x7e/0x130 kernel/bpf/helpers.c:171 bpf_prog_a99735ebafdda2f1+0x10/0xb50 bpf_dispatcher_nop_func include/linux/bpf.h:721 [inline] __bpf_prog_run include/linux/filter.h:626 [inline] bpf_prog_run include/linux/filter.h:633 [inline] BPF_PROG_RUN_ARRAY include/linux/bpf.h:1294 [inline] trace_call_bpf+0x2cf/0x5d0 kernel/trace/bpf_trace.c:127 perf_trace_run_bpf_submit+0x7b/0x1d0 kernel/events/core.c:9708 perf_trace_lock+0x37c/0x440 include/trace/events/lock.h:39 trace_lock_release+0x128/0x150 include/trace/events/lock.h:58 lock_release+0x82/0x810 kernel/locking/lockdep.c:5636 __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:149 [inline] _raw_spin_unlock_irqrestore+0x75/0x130 kernel/locking/spinlock.c:194 debug_hrtimer_deactivate kernel/time/hrtimer.c:425 [inline] debug_deactivate kernel/time/hrtimer.c:481 [inline] __run_hrtimer kernel/time/hrtimer.c:1653 [inline] __hrtimer_run_queues+0x2f9/0xa60 kernel/time/hrtimer.c:1749 hrtimer_interrupt+0x3b3/0x1040 kernel/time/hrtimer.c:1811 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline] __sysvec_apic_timer_interrupt+0xf9/0x270 arch/x86/kernel/apic/apic.c:1103 sysvec_apic_timer_interrupt+0x8c/0xb0 arch/x86/kernel/apic/apic.c:1097 asm_sysvec_apic_timer_interrupt+0x12/0x20 __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:152 [inline] _raw_spin_unlock_irqrestore+0xd4/0x130 kernel/locking/spinlock.c:194 try_to_wake_up+0x702/0xd20 kernel/sched/core.c:4118 wake_up_process kernel/sched/core.c:4200 [inline] wake_up_q+0x9a/0xf0 kernel/sched/core.c:953 futex_wake+0x50f/0x5b0 kernel/futex/waitwake.c:184 do_futex+0x367/0x560 kernel/futex/syscalls.c:127 __do_sys_futex kernel/futex/syscalls.c:199 [inline] __se_sys_futex+0x401/0x4b0 kernel/futex/syscalls.c:180 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae There is a possible deadlock with bpf_timer_* set of helpers: hrtimer_start() lock_base(); trace_hrtimer...() perf_event() bpf_run() bpf_timer_start() hrtimer_start() lock_base() <- DEADLOCK Forbid use of bpf_ktime_get_coarse_ns() and bpf_timer_* helpers in BPF_PROG_TYPE_KPROBE, BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_PERF_EVENT and BPF_PROG_TYPE_RAW_TRACEPOINT prog types. Fixes: d05512618056 ("bpf: Add bpf_ktime_get_coarse_ns helper") Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.") Reported-by: syzbot+43fd005b5a1b4d10781e@syzkaller.appspotmail.com Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211113142227.566439-2-me@ubique.spb.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-25tracing: Add length protection to histogram string copiesSteven Rostedt (VMware)1-2/+7
[ Upstream commit 938aa33f14657c9ed9deea348b7d6f14b6d69cb7 ] The string copies to the histogram storage has a max size of 256 bytes (defined by MAX_FILTER_STR_VAL). Only the string size of the event field needs to be copied to the event storage, but no more than what is in the event storage. Although nothing should be bigger than 256 bytes, there's no protection against overwriting of the storage if one day there is. Copy no more than the destination size, and enforce it. Also had to turn MAX_FILTER_STR_VAL into an unsigned int, to keep the min() comparison of the string sizes of comparable types. Link: https://lore.kernel.org/all/CAHk-=wjREUihCGrtRBwfX47y_KrLCGjiq3t6QtoNJpmVrAEb1w@mail.gmail.com/ Link: https://lkml.kernel.org/r/20211114132834.183429a4@rorschach.local.home Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Tom Zanussi <zanussi@kernel.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 63f84ae6b82b ("tracing/histogram: Do not copy the fixed-size char array field over the field size") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25tracing/histogram: Do not copy the fixed-size char array field over the ↵Masami Hiramatsu1-4/+5
field size [ Upstream commit 63f84ae6b82bb4dff672f76f30c6fd7b9d3766bc ] Do not copy the fixed-size char array field of the events over the field size. The histogram treats char array as a string and there are 2 types of char array in the event, fixed-size and dynamic string. The dynamic string (__data_loc) field must be null terminated, but the fixed-size char array field may not be null terminated (not a string, but just a data). In that case, histogram can copy the data after the field. This uses the original field size for fixed-size char array field to restrict the histogram not to access over the original field size. Link: https://lkml.kernel.org/r/163673292822.195747.3696966210526410250.stgit@devnote2 Fixes: 02205a6752f2 (tracing: Add support for 'field variables') Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18tracing: Fix missing trace_boot_init_histograms kstrdup NULL checksMathieu Desnoyers1-0/+4
[ Upstream commit 3c20bd3af535d64771b193bb4dd41ed662c464ce ] trace_boot_init_histograms misses NULL pointer checks for kstrdup failure. Link: https://lkml.kernel.org/r/20211015195550.22742-1-mathieu.desnoyers@efficios.com Fixes: 64dc7f6958ef5 ("tracing/boot: Show correct histogram error command") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18ftrace: do CPU checking after preemption disabled王贇1-3/+3
[ Upstream commit d33cc657372366a8959f099c619a208b4c5dc664 ] With CONFIG_DEBUG_PREEMPT we observed reports like: BUG: using smp_processor_id() in preemptible caller is perf_ftrace_function_call+0x6f/0x2e0 CPU: 1 PID: 680 Comm: a.out Not tainted Call Trace: <TASK> dump_stack_lvl+0x8d/0xcf check_preemption_disabled+0x104/0x110 ? optimize_nops.isra.7+0x230/0x230 ? text_poke_bp_batch+0x9f/0x310 perf_ftrace_function_call+0x6f/0x2e0 ... __text_poke+0x5/0x620 text_poke_bp_batch+0x9f/0x310 This telling us the CPU could be changed after task is preempted, and the checking on CPU before preemption will be invalid. Since now ftrace_test_recursion_trylock() will help to disable the preemption, this patch just do the checking after trylock() to address the issue. Link: https://lkml.kernel.org/r/54880691-5fe2-33e7-d12f-1fa6136f5183@linux.alibaba.com CC: Steven Rostedt <rostedt@goodmis.org> Cc: Guo Ren <guoren@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Petr Mladek <pmladek@suse.com> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Jisheng Zhang <jszhang@kernel.org> Reported-by: Abaci <abaci@linux.alibaba.com> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18tracing/cfi: Fix cmp_entries_* functions signature mismatchKalesh Singh1-17/+23
[ Upstream commit 7ce1bb83a14019f8c396d57ec704d19478747716 ] If CONFIG_CFI_CLANG=y, attempting to read an event histogram will cause the kernel to panic due to failed CFI check. 1. echo 'hist:keys=common_pid' >> events/sched/sched_switch/trigger 2. cat events/sched/sched_switch/hist 3. kernel panics on attempting to read hist This happens because the sort() function expects a generic int (*)(const void *, const void *) pointer for the compare function. To prevent this CFI failure, change tracing map cmp_entries_* function signatures to match this. Also, fix the build error reported by the kernel test robot [1]. [1] https://lore.kernel.org/r/202110141140.zzi4dRh4-lkp@intel.com/ Link: https://lkml.kernel.org/r/20211014045217.3265162-1-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18tracing: Disable "other" permission bits in the tracefs filesSteven Rostedt (VMware)15-96/+103
[ Upstream commit 21ccc9cd72116289469e5519b6159c675a2fa58f ] When building the files in the tracefs file system, do not by default set any permissions for OTH (other). This will make it easier for admins who want to define a group for accessing tracefs and not having to first disable all the permission bits for "other" in the file system. As tracing can leak sensitive information, it should never by default allowing all users access. An admin can still set the permission bits for others to have access, which may be useful for creating a honeypot and seeing who takes advantage of it and roots the machine. Link: https://lkml.kernel.org/r/20210818153038.864149276@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18ring-buffer: Protect ring_buffer_reset() from reentrancySteven Rostedt (VMware)1-0/+5
commit 51d157946666382e779f94c39891e8e9a020da78 upstream. The resetting of the entire ring buffer use to simply go through and reset each individual CPU buffer that had its own protection and synchronization. But this was very slow, due to performing a synchronization for each CPU. The code was reshuffled to do one disabling of all CPU buffers, followed by a single RCU synchronization, and then the resetting of each of the CPU buffers. But unfortunately, the mutex that prevented multiple occurrences of resetting the buffer was not moved to the upper function, and there is nothing to protect from it. Take the ring buffer mutex around the global reset. Cc: stable@vger.kernel.org Fixes: b23d7a5f4a07a ("ring-buffer: speed up buffer resets by avoiding synchronize_rcu for each CPU") Reported-by: "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-29tracing: Fix misspelling of "missing"Steven Rostedt (VMware)1-1/+1
My snake instinct was on and I wrote "misssing" instead of "missing". Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-29ftrace: Fix kernel-doc formatting issuesSteven Rostedt (VMware)1-8/+10
Some functions had kernel-doc that used a comma instead of a hash to separate the function name from the one line description. Also, the "ftrace_is_dead()" had an incomplete description. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-28tracing: Do not warn when connecting eprobe to non existing eventSteven Rostedt (VMware)1-2/+2
When the syscall trace points are not configured in, the kselftests for ftrace will try to attach an event probe (eprobe) to one of the system call trace points. This triggered a WARNING, because the failure only expects to see memory issues. But this is not the only failure. The user may attempt to attach to a non existent event, and the kernel must not warn about it. Link: https://lkml.kernel.org/r/20211027120854.0680aa0f@gandalf.local.home Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-19tracing: Have all levels of checks prevent recursionSteven Rostedt (VMware)1-2/+2
While writing an email explaining the "bit = 0" logic for a discussion on making ftrace_test_recursion_trylock() disable preemption, I discovered a path that makes the "not do the logic if bit is zero" unsafe. The recursion logic is done in hot paths like the function tracer. Thus, any code executed causes noticeable overhead. Thus, tricks are done to try to limit the amount of code executed. This included the recursion testing logic. Having recursion testing is important, as there are many paths that can end up in an infinite recursion cycle when tracing every function in the kernel. Thus protection is needed to prevent that from happening. Because it is OK to recurse due to different running context levels (e.g. an interrupt preempts a trace, and then a trace occurs in the interrupt handler), a set of bits are used to know which context one is in (normal, softirq, irq and NMI). If a recursion occurs in the same level, it is prevented*. Then there are infrastructure levels of recursion as well. When more than one callback is attached to the same function to trace, it calls a loop function to iterate over all the callbacks. Both the callbacks and the loop function have recursion protection. The callbacks use the "ftrace_test_recursion_trylock()" which has a "function" set of context bits to test, and the loop function calls the internal trace_test_and_set_recursion() directly, with an "internal" set of bits. If an architecture does not implement all the features supported by ftrace then the callbacks are never called directly, and the loop function is called instead, which will implement the features of ftrace. Since both the loop function and the callbacks do recursion protection, it was seemed unnecessary to do it in both locations. Thus, a trick was made to have the internal set of recursion bits at a more significant bit location than the function bits. Then, if any of the higher bits were set, the logic of the function bits could be skipped, as any new recursion would first have to go through the loop function. This is true for architectures that do not support all the ftrace features, because all functions being traced must first go through the loop function before going to the callbacks. But this is not true for architectures that support all the ftrace features. That's because the loop function could be called due to two callbacks attached to the same function, but then a recursion function inside the callback could be called that does not share any other callback, and it will be called directly. i.e. traced_function_1: [ more than one callback tracing it ] call loop_func loop_func: trace_recursion set internal bit call callback callback: trace_recursion [ skipped because internal bit is set, return 0 ] call traced_function_2 traced_function_2: [ only traced by above callback ] call callback callback: trace_recursion [ skipped because internal bit is set, return 0 ] call traced_function_2 [ wash, rinse, repeat, BOOM! out of shampoo! ] Thus, the "bit == 0 skip" trick is not safe, unless the loop function is call for all functions. Since we want to encourage architectures to implement all ftrace features, having them slow down due to this extra logic may encourage the maintainers to update to the latest ftrace features. And because this logic is only safe for them, remove it completely. [*] There is on layer of recursion that is allowed, and that is to allow for the transition between interrupt context (normal -> softirq -> irq -> NMI), because a trace may occur before the context update is visible to the trace recursion logic. Link: https://lore.kernel.org/all/609b565a-ed6e-a1da-f025-166691b5d994@linux.alibaba.com/ Link: https://lkml.kernel.org/r/20211018154412.09fcad3c@gandalf.local.home Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@hansenpartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Jisheng Zhang <jszhang@kernel.org> Cc: =?utf-8?b?546L6LSH?= <yun.wang@linux.alibaba.com> Cc: Guo Ren <guoren@kernel.org> Cc: stable@vger.kernel.org Fixes: edc15cafcbfa3 ("tracing: Avoid unnecessary multiple recursion checks") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-14tracing: Fix event probe removal from dynamic eventsSteven Rostedt (VMware)1-3/+51
When an event probe is to be removed via the API that created it via the dynamic events, an -ENOENT error is returned. This is because the removal of the event probe does not expect to see the event system and name that the event probe is attached to, even though that's part of the API to create it. As the removal of probes is to use the same API as they are created. In fact, the removal is not consistent with the kprobes and uprobes removal. Fix that by allowing various ways to remove the eprobe. The eprobe is created with: e:[GROUP/]NAME SYSTEM/EVENT [OPTIONS] Have it get removed by echoing in the following into dynamic_events: # Remove all eprobes with NAME echo '-:NAME' >> dynamic_events # Remove a specific eprobe echo '-:GROUP/NAME' >> dynamic_events echo '-:GROUP/NAME SYSTEM/EVENT' >> dynamic_events echo '-:NAME SYSTEM/EVENT' >> dynamic_events echo '-:GROUP/NAME SYSTEM/EVENT OPTIONS' >> dynamic_events echo '-:NAME SYSTEM/EVENT OPTIONS' >> dynamic_events Link: https://lkml.kernel.org/r/20211012081925.0e19cc4f@gandalf.local.home Link: https://lkml.kernel.org/r/20211013205533.630722129@goodmis.org Suggested-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-11tracing: Fix missing * in comment blockColin Ian King1-1/+1
There is a missing * in a comment block, add it in. Link: https://lkml.kernel.org/r/20211006172830.1025336-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-11tracing: Fix memory leak in eprobe_register()Vamshi K Sthambamkadi1-0/+7
kmemleak report: unreferenced object 0xffff900a70ec7ec0 (size 32): comm "ftracetest", pid 2770, jiffies 4295042510 (age 311.464s) hex dump (first 32 bytes): c8 31 23 45 0a 90 ff ff 40 85 c7 6e 0a 90 ff ff .1#E....@..n.... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000009d3751fd>] kmem_cache_alloc_trace+0x2a2/0x440 [<0000000088b8124b>] eprobe_register+0x1e3/0x350 [<000000002a9a0517>] __ftrace_event_enable_disable+0x7c/0x240 [<0000000019109321>] event_enable_write+0x93/0xe0 [<000000007d85b320>] vfs_write+0xb9/0x260 [<00000000b94c5e41>] ksys_write+0x67/0xe0 [<000000005a08c81d>] __x64_sys_write+0x1a/0x20 [<00000000240bf576>] do_syscall_64+0x3b/0xc0 [<0000000043d5d9f6>] entry_SYSCALL_64_after_hwframe+0x44/0xae unreferenced object 0xffff900a56bbf280 (size 128): comm "ftracetest", pid 2770, jiffies 4295042510 (age 311.464s) hex dump (first 32 bytes): ff ff ff ff ff ff ff ff 00 00 00 00 01 00 00 00 ................ 80 69 3b b2 ff ff ff ff 20 69 3b b2 ff ff ff ff .i;..... i;..... backtrace: [<000000009d3751fd>] kmem_cache_alloc_trace+0x2a2/0x440 [<00000000c4e90fad>] eprobe_register+0x1fc/0x350 [<000000002a9a0517>] __ftrace_event_enable_disable+0x7c/0x240 [<0000000019109321>] event_enable_write+0x93/0xe0 [<000000007d85b320>] vfs_write+0xb9/0x260 [<00000000b94c5e41>] ksys_write+0x67/0xe0 [<000000005a08c81d>] __x64_sys_write+0x1a/0x20 [<00000000240bf576>] do_syscall_64+0x3b/0xc0 [<0000000043d5d9f6>] entry_SYSCALL_64_after_hwframe+0x44/0xae In new_eprobe_trigger(), allocated edata and trigger variables are never freed. To fix, free memory in disable_eprobe(). Link: https://lkml.kernel.org/r/20211008071802.GA2098@cosmos Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events") Signed-off-by: Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-07tracing: Fix missing osnoise tracer on max_latencyJackie Liu1-7/+4
The compiler warns when the data are actually unused: kernel/trace/trace.c:1712:13: error: ‘trace_create_maxlat_file’ defined but not used [-Werror=unused-function] 1712 | static void trace_create_maxlat_file(struct trace_array *tr, | ^~~~~~~~~~~~~~~~~~~~~~~~ [Why] CONFIG_HWLAT_TRACER=n, CONFIG_TRACER_MAX_TRACE=n, CONFIG_OSNOISE_TRACER=y gcc report warns. [How] Now trace_create_maxlat_file will only take effect when CONFIG_HWLAT_TRACER=y or CONFIG_TRACER_MAX_TRACE=y. In fact, after adding osnoise trace, it also needs to take effect. Link: https://lore.kernel.org/all/c1d9e328-ad7c-920b-6c24-9e1598a6421c@infradead.org/ Link: https://lkml.kernel.org/r/20210922025122.3268022-1-liu.yun@linux.dev Fixes: bce29ac9ce0b ("trace: Add osnoise tracer") Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Daniel Bristot de Oliveira <bristot@kernel.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-24blktrace: Fix uaf in blk_trace access after removing by sysfsZhihao Cheng1-0/+8
There is an use-after-free problem triggered by following process: P1(sda) P2(sdb) echo 0 > /sys/block/sdb/trace/enable blk_trace_remove_queue synchronize_rcu blk_trace_free relay_close rcu_read_lock __blk_add_trace trace_note_tsk (Iterate running_trace_list) relay_close_buf relay_destroy_buf kfree(buf) trace_note(sdb's bt) relay_reserve buf->offset <- nullptr deference (use-after-free) !!! rcu_read_unlock [ 502.714379] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ 502.715260] #PF: supervisor read access in kernel mode [ 502.715903] #PF: error_code(0x0000) - not-present page [ 502.716546] PGD 103984067 P4D 103984067 PUD 17592b067 PMD 0 [ 502.717252] Oops: 0000 [#1] SMP [ 502.720308] RIP: 0010:trace_note.isra.0+0x86/0x360 [ 502.732872] Call Trace: [ 502.733193] __blk_add_trace.cold+0x137/0x1a3 [ 502.733734] blk_add_trace_rq+0x7b/0xd0 [ 502.734207] blk_add_trace_rq_issue+0x54/0xa0 [ 502.734755] blk_mq_start_request+0xde/0x1b0 [ 502.735287] scsi_queue_rq+0x528/0x1140 ... [ 502.742704] sg_new_write.isra.0+0x16e/0x3e0 [ 502.747501] sg_ioctl+0x466/0x1100 Reproduce method: ioctl(/dev/sda, BLKTRACESETUP, blk_user_trace_setup[buf_size=127]) ioctl(/dev/sda, BLKTRACESTART) ioctl(/dev/sdb, BLKTRACESETUP, blk_user_trace_setup[buf_size=127]) ioctl(/dev/sdb, BLKTRACESTART) echo 0 > /sys/block/sdb/trace/enable & // Add delay(mdelay/msleep) before kernel enters blk_trace_free() ioctl$SG_IO(/dev/sda, SG_IO, ...) // Enters trace_note_tsk() after blk_trace_free() returned // Use mdelay in rcu region rather than msleep(which may schedule out) Remove blk_trace from running_list before calling blk_trace_free() by sysfs if blk_trace is at Blktrace_running state. Fixes: c71a896154119f ("blktrace: add ftrace plugin") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Link: https://lore.kernel.org/r/20210923134921.109194-1-chengzhihao1@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-11Merge tag 'trace-v5.15-3' of ↵Linus Torvalds1-19/+18
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Minor fixes to the processing of the bootconfig tree" * tag 'trace-v5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: bootconfig: Rename xbc_node_find_child() to xbc_node_find_subkey() tracing/boot: Fix to check the histogram control param is a leaf node tracing/boot: Fix trace_boot_hist_add_array() to check array is value
2021-09-10bootconfig: Rename xbc_node_find_child() to xbc_node_find_subkey()Masami Hiramatsu1-12/+12
Rename xbc_node_find_child() to xbc_node_find_subkey() for clarifying that function returns a key node (no value node). Since there are xbc_node_for_each_child() (loop on all child nodes) and xbc_node_for_each_subkey() (loop on only subkey nodes), this name distinction is necessary to avoid confusing users. Link: https://lkml.kernel.org/r/163119459826.161018.11200274779483115300.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-10tracing/boot: Fix to check the histogram control param is a leaf nodeMasami Hiramatsu1-3/+3
Since xbc_node_find_child() doesn't ensure the returned node is a leaf node (key-value pair or do not have subkeys), use xbc_node_find_value to ensure the histogram control parameter is a leaf node in trace_boot_compose_hist_cmd(). Link: https://lkml.kernel.org/r/163119459059.161018.18341288218424528962.stgit@devnote2 Fixes: e66ed86ca6c5 ("tracing/boot: Add per-event histogram action options") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-10tracing/boot: Fix trace_boot_hist_add_array() to check array is valueMasami Hiramatsu1-4/+3
trace_boot_hist_add_array() uses the combination of xbc_node_find_child() and xbc_node_get_child() to get the child node of the key node. But since it missed to check the child node is data node or not, user can pass the subkey node for the array node (anode). To avoid this issue, check the array node is a data node. Actually, there is xbc_node_find_value(node, key, vnode), which ensures the @vnode is a value node, so use it in trace_boot_hist_add_array() to fix this issue. Link: https://lkml.kernel.org/r/163119458308.161018.1516455973625940212.stgit@devnote2 Fixes: e66ed86ca6c5 ("tracing/boot: Add per-event histogram action options") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-09Merge tag 'trace-v5.15-2' of ↵Linus Torvalds8-21/+50
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull more tracing updates from Steven Rostedt: - Add migrate-disable counter to tracing header - Fix error handling in event probes - Fix missed unlock in osnoise in error path - Fix merge issue with tools/bootconfig - Clean up bootconfig data when init memory is removed - Fix bootconfig to loop only on subkeys - Have kernel command lines override bootconfig options - Increase field counts for synthetic events - Have histograms dynamic allocate event elements to save space - Fixes in testing and documentation * tag 'trace-v5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/boot: Fix to loop on only subkeys selftests/ftrace: Exclude "(fault)" in testing add/remove eprobe events tracing: Dynamically allocate the per-elt hist_elt_data array tracing: synth events: increase max fields count tools/bootconfig: Show whole test command for each test case bootconfig: Fix missing return check of xbc_node_compose_key function tools/bootconfig: Fix tracing_on option checking in ftrace2bconf.sh docs: bootconfig: Add how to use bootconfig for kernel parameters init/bootconfig: Reorder init parameter from bootconfig and cmdline init: bootconfig: Remove all bootconfig data when the init memory is removed tracing/osnoise: Fix missed cpus_read_unlock() in start_per_cpu_kthreads() tracing: Fix some alloc_event_probe() error handling bugs tracing: Add migrate-disabled counter to tracing output.
2021-09-08tracing/boot: Fix to loop on only subkeysMasami Hiramatsu1-3/+3
Since the commit e5efaeb8a8f5 ("bootconfig: Support mixing a value and subkeys under a key") allows to co-exist a value node and key nodes under a node, xbc_node_for_each_child() is not only returning key node but also a value node. In the boot-time tracing using xbc_node_for_each_child() to iterate the events, groups and instances, but those must be key nodes. Thus it must use xbc_node_for_each_subkey(). Link: https://lkml.kernel.org/r/163112988361.74896.2267026262061819145.stgit@devnote2 Fixes: e5efaeb8a8f5 ("bootconfig: Support mixing a value and subkeys under a key") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-08tracing: Dynamically allocate the per-elt hist_elt_data arrayTom Zanussi1-2/+12
Setting the hist_elt_data.field_var_str[] array unconditionally to a size of SYNTH_FIELD_MAX elements wastes space unnecessarily. The actual number of elements needed can be calculated at run-time instead. In most cases, this will save a lot of space since it's a per-elt array which isn't normally close to being full. It also allows us to increase SYNTH_FIELD_MAX without worrying about even more wastage when we do that. Link: https://lkml.kernel.org/r/d52ae0ad5e1b59af7c4f54faf3fc098461fd82b3.camel@kernel.org Signed-off-by: Tom Zanussi <zanussi@kernel.org> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-08tracing: synth events: increase max fields countArtem Bityutskiy1-1/+1
Sometimes it is useful to construct larger synthetic trace events. Increase 'SYNTH_FIELDS_MAX' (maximum number of fields in a synthetic event) from 32 to 64. Link: https://lkml.kernel.org/r/20210901135513.3087062-1-dedekind1@gmail.com Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Acked-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-08tracing/osnoise: Fix missed cpus_read_unlock() in start_per_cpu_kthreads()Qiang.Zhang1-3/+3
When start_kthread() return error, the cpus_read_unlock() need to be called. Link: https://lkml.kernel.org/r/20210831022919.27630-1-qiang.zhang@windriver.com Cc: <stable@vger.kernel.org> Fixes: c8895e271f79 ("trace/osnoise: Support hotplug operations") Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Qiang.Zhang <qiang.zhang@windriver.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-08tracing: Fix some alloc_event_probe() error handling bugsDan Carpenter1-2/+3
There are two bugs in this code. First, if the kzalloc() fails it leads to a NULL dereference of "ep" on the next line. Second, if the alloc_event_probe() function returns an error then it leads to an error pointer dereference in the caller. Link: https://lkml.kernel.org/r/20210824115150.GI31143@kili Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-07Merge tag 'kgdb-5.15-rc1' of ↵Linus Torvalds1-3/+9
git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux Pull kgdb updates from Daniel Thompson: "Changes for kgdb/kdb this cycle are dominated by a change from Sumit that removes as small (256K) private heap from kdb. This is change I've hoped for ever since I discovered how few users of this heap remained in the kernel, so many thanks to Sumit for hunting these down. The other change is an incremental step towards SPDX headers" * tag 'kgdb-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux: kernel: debug: Convert to SPDX identifier kdb: Rename members of struct kdbtab_t kdb: Simplify kdb_defcmd macro logic kdb: Get rid of redundant kdb_register_flags() kdb: Rename struct defcmd_set to struct kdb_macro kdb: Get rid of custom debug heap allocator
2021-09-05Merge tag 'trace-v5.15' of ↵Linus Torvalds21-172/+1594
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: - simplify the Kconfig use of FTRACE and TRACE_IRQFLAGS_SUPPORT - bootconfig can now start histograms - bootconfig supports group/all enabling - histograms now can put values in linear size buckets - execnames can be passed to synthetic events - introduce "event probes" that attach to other events and can retrieve data from pointers of fields, or record fields as different types (a pointer to a string as a string instead of just a hex number) - various fixes and clean ups * tag 'trace-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (35 commits) tracing/doc: Fix table format in histogram code selftests/ftrace: Add selftest for testing duplicate eprobes and kprobes selftests/ftrace: Add selftest for testing eprobe events on synthetic events selftests/ftrace: Add test case to test adding and removing of event probe selftests/ftrace: Fix requirement check of README file selftests/ftrace: Add clear_dynamic_events() to test cases tracing: Add a probe that attaches to trace events tracing/probes: Reject events which have the same name of existing one tracing/probes: Have process_fetch_insn() take a void * instead of pt_regs tracing/probe: Change traceprobe_set_print_fmt() to take a type tracing/probes: Use struct_size() instead of defining custom macros tracing/probes: Allow for dot delimiter as well as slash for system names tracing/probe: Have traceprobe_parse_probe_arg() take a const arg tracing: Have dynamic events have a ref counter tracing: Add DYNAMIC flag for dynamic events tracing: Replace deprecated CPU-hotplug functions. MAINTAINERS: Add an entry for os noise/latency tracepoint: Fix kerneldoc comments bootconfig/tracing/ktest: Update ktest example for boot-time tracing tools/bootconfig: Use per-group/all enable option in ftrace2bconf script ...
2021-09-04tracing: Add migrate-disabled counter to tracing output.Thomas Gleixner3-10/+28
migrate_disable() forbids task migration to another CPU. It is available since v5.11 and has already users such as highmem or BPF. It is useful to observe this task state in tracing which already has other states like the preemption counter. Instead of adding the migrate disable counter as a new entry to struct trace_entry, which would extend the whole struct by four bytes, it is squashed into the preempt-disable counter. The lower four bits represent the preemption counter, the upper four bits represent the migrate disable counter. Both counter shouldn't exceed 15 but if they do, there is a safety net which caps the value at 15. Add the migrate-disable counter to the trace entry so it shows up in the trace. Due to the users mentioned above, it is already possible to observe it: | bash-1108 [000] ...21 73.950578: rss_stat: mm_id=2213312838 curr=0 type=MM_ANONPAGES size=8192B | bash-1108 [000] d..31 73.951222: irq_disable: caller=flush_tlb_mm_range+0x115/0x130 parent=ptep_clear_flush+0x42/0x50 | bash-1108 [000] d..31 73.951222: tlb_flush: pages:1 reason:local mm shootdown (3) The last value is the migrate-disable counter. Things that popped up: - trace_print_lat_context() does not print the migrate counter. Not sure if it should. It is used in "verbose" mode and uses 8 digits and I'm not sure ther is something processing the value. - trace_define_common_fields() now defines a different variable. This probably breaks things. No ide what to do in order to preserve the old behaviour. Since this is used as a filter it should be split somehow to be able to match both nibbles here. Link: https://lkml.kernel.org/r/20210810132625.ylssabmsrkygokuv@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [bigeasy: patch description.] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> [ SDR: Removed change to common_preempt_count field name ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-02Merge tag 'printk-for-5.15' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Optionally, provide an index of possible printk messages via <debugfs>/printk/index/. It can be used when monitoring important kernel messages on a farm of various hosts. The monitor has to be updated when some messages has changed or are not longer available by a newly deployed kernel. - Add printk.console_no_auto_verbose boot parameter. It allows to generate crash dump even with slow consoles in a reasonable time frame. - Remove printk_safe buffers. The messages are always stored directly to the main logbuffer, even in NMI or recursive context. Also it allows to serialize syslog operations by a mutex instead of a spin lock. - Misc clean up and build fixes. * tag 'printk-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk/index: Fix -Wunused-function warning lib/nmi_backtrace: Serialize even messages about idle CPUs printk: Add printk.console_no_auto_verbose boot parameter printk: Remove console_silent() lib/test_scanf: Handle n_bits == 0 in random tests printk: syslog: close window between wait and read printk: convert @syslog_lock to mutex printk: remove NMI tracking printk: remove safe buffers printk: track/limit recursion lib/nmi_backtrace: explicitly serialize banner and regs printk: Move the printk() kerneldoc comment to its new home printk/index: Fix warning about missing prototypes MIPS/asm/printk: Fix build failure caused by printk printk: index: Add indexing support to dev_printk printk: Userspace format indexing support printk: Rework parse_prefix into printk_parse_prefix printk: Straighten out log_flags into printk_info_flags string_helpers: Escape double quotes in escape_special printk/console: Check consistent sequence number when handling race in console_unlock()
2021-09-01Merge tag 'net-next-5.15' of ↵Linus Torvalds1-12/+100
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Enable memcg accounting for various networking objects. BPF: - Introduce bpf timers. - Add perf link and opaque bpf_cookie which the program can read out again, to be used in libbpf-based USDT library. - Add bpf_task_pt_regs() helper to access user space pt_regs in kprobes, to help user space stack unwinding. - Add support for UNIX sockets for BPF sockmap. - Extend BPF iterator support for UNIX domain sockets. - Allow BPF TCP congestion control progs and bpf iterators to call bpf_setsockopt(), e.g. to switch to another congestion control algorithm. Protocols: - Support IOAM Pre-allocated Trace with IPv6. - Support Management Component Transport Protocol. - bridge: multicast: add vlan support. - netfilter: add hooks for the SRv6 lightweight tunnel driver. - tcp: - enable mid-stream window clamping (by user space or BPF) - allow data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQD - more accurate DSACK processing for RACK-TLP - mptcp: - add full mesh path manager option - add partial support for MP_FAIL - improve use of backup subflows - optimize option processing - af_unix: add OOB notification support. - ipv6: add IFLA_INET6_RA_MTU to expose MTU value advertised by the router. - mac80211: Target Wake Time support in AP mode. - can: j1939: extend UAPI to notify about RX status. Driver APIs: - Add page frag support in page pool API. - Many improvements to the DSA (distributed switch) APIs. - ethtool: extend IRQ coalesce uAPI with timer reset modes. - devlink: control which auxiliary devices are created. - Support CAN PHYs via the generic PHY subsystem. - Proper cross-chip support for tag_8021q. - Allow TX forwarding for the software bridge data path to be offloaded to capable devices. Drivers: - veth: more flexible channels number configuration. - openvswitch: introduce per-cpu upcall dispatch. - Add internet mix (IMIX) mode to pktgen. - Transparently handle XDP operations in the bonding driver. - Add LiteETH network driver. - Renesas (ravb): - support Gigabit Ethernet IP - NXP Ethernet switch (sja1105): - fast aging support - support for "H" switch topologies - traffic termination for ports under VLAN-aware bridge - Intel 1G Ethernet - support getcrosststamp() with PCIe PTM (Precision Time Measurement) for better time sync - support Credit-Based Shaper (CBS) offload, enabling HW traffic prioritization and bandwidth reservation - Broadcom Ethernet (bnxt) - support pulse-per-second output - support larger Rx rings - Mellanox Ethernet (mlx5) - support ethtool RSS contexts and MQPRIO channel mode - support LAG offload with bridging - support devlink rate limit API - support packet sampling on tunnels - Huawei Ethernet (hns3): - basic devlink support - add extended IRQ coalescing support - report extended link state - Netronome Ethernet (nfp): - add conntrack offload support - Broadcom WiFi (brcmfmac): - add WPA3 Personal with FT to supported cipher suites - support 43752 SDIO device - Intel WiFi (iwlwifi): - support scanning hidden 6GHz networks - support for a new hardware family (Bz) - Xen pv driver: - harden netfront against malicious backends - Qualcomm mobile - ipa: refactor power management and enable automatic suspend - mhi: move MBIM to WWAN subsystem interfaces Refactor: - Ambient BPF run context and cgroup storage cleanup. - Compat rework for ndo_ioctl. Old code removal: - prism54 remove the obsoleted driver, deprecated by the p54 driver. - wan: remove sbni/granch driver" * tag 'net-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1715 commits) net: Add depends on OF_NET for LiteX's LiteETH ipv6: seg6: remove duplicated include net: hns3: remove unnecessary spaces net: hns3: add some required spaces net: hns3: clean up a type mismatch warning net: hns3: refine function hns3_set_default_feature() ipv6: remove duplicated 'net/lwtunnel.h' include net: w5100: check return value after calling platform_get_resource() net/mlxbf_gige: Make use of devm_platform_ioremap_resourcexxx() net: mdio: mscc-miim: Make use of the helper function devm_platform_ioremap_resource() net: mdio-ipq4019: Make use of devm_platform_ioremap_resource() fou: remove sparse errors ipv4: fix endianness issue in inet_rtm_getroute_build_skb() octeontx2-af: Set proper errorcode for IPv4 checksum errors octeontx2-af: Fix static code analyzer reported issues octeontx2-af: Fix mailbox errors in nix_rss_flowkey_cfg octeontx2-af: Fix loop in free and unmap counter af_unix: fix potential NULL deref in unix_dgram_connect() dpaa2-eth: Replace strlcpy with strscpy octeontx2-af: Use NDC TX for transmit packet data ...
2021-08-31Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski1-10/+62
Daniel Borkmann says: ==================== bpf-next 2021-08-31 We've added 116 non-merge commits during the last 17 day(s) which contain a total of 126 files changed, 6813 insertions(+), 4027 deletions(-). The main changes are: 1) Add opaque bpf_cookie to perf link which the program can read out again, to be used in libbpf-based USDT library, from Andrii Nakryiko. 2) Add bpf_task_pt_regs() helper to access userspace pt_regs, from Daniel Xu. 3) Add support for UNIX stream type sockets for BPF sockmap, from Jiang Wang. 4) Allow BPF TCP congestion control progs to call bpf_setsockopt() e.g. to switch to another congestion control algorithm during init, from Martin KaFai Lau. 5) Extend BPF iterator support for UNIX domain sockets, from Kuniyuki Iwashima. 6) Allow bpf_{set,get}sockopt() calls from setsockopt progs, from Prankur Gupta. 7) Add bpf_get_netns_cookie() helper for BPF_PROG_TYPE_{SOCK_OPS,CGROUP_SOCKOPT} progs, from Xu Liu and Stanislav Fomichev. 8) Support for __weak typed ksyms in libbpf, from Hao Luo. 9) Shrink struct cgroup_bpf by 504 bytes through refactoring, from Dave Marchevsky. 10) Fix a smatch complaint in verifier's narrow load handling, from Andrey Ignatov. 11) Fix BPF interpreter's tail call count limit, from Daniel Borkmann. 12) Big batch of improvements to BPF selftests, from Magnus Karlsson, Li Zhijian, Yucong Sun, Yonghong Song, Ilya Leoshkevich, Jussi Maki, Ilya Leoshkevich, others. 13) Another big batch to revamp XDP samples in order to give them consistent look and feel, from Kumar Kartikeya Dwivedi. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (116 commits) MAINTAINERS: Remove self from powerpc BPF JIT selftests/bpf: Fix potential unreleased lock samples: bpf: Fix uninitialized variable in xdp_redirect_cpu selftests/bpf: Reduce more flakyness in sockmap_listen bpf: Fix bpf-next builds without CONFIG_BPF_EVENTS bpf: selftests: Add dctcp fallback test bpf: selftests: Add connect_to_fd_opts to network_helpers bpf: selftests: Add sk_state to bpf_tcp_helpers.h bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt selftests: xsk: Preface options with opt selftests: xsk: Make enums lower case selftests: xsk: Generate packets from specification selftests: xsk: Generate packet directly in umem selftests: xsk: Simplify cleanup of ifobjects selftests: xsk: Decrease sending speed selftests: xsk: Validate tx stats on tx thread selftests: xsk: Simplify packet validation in xsk tests selftests: xsk: Rename worker_* functions that are not thread entry points selftests: xsk: Disassociate umem size with packets sent selftests: xsk: Remove end-of-test packet ... ==================== Link: https://lore.kernel.org/r/20210830225618.11634-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-30Merge tag 's390-5.15-1' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Heiko Carstens: - Improve ftrace code patching so that stop_machine is not required anymore. This requires a small common code patch acked by Steven Rostedt: https://lore.kernel.org/linux-s390/20210730220741.4da6fdf6@oasis.local.home/ - Enable KCSAN for s390. This comes with a small common code change to fix a compile warning. Acked by Marco Elver: https://lore.kernel.org/r/20210729142811.1309391-1-hca@linux.ibm.com - Add KFENCE support for s390. This also comes with a minimal x86 patch from Marco Elver who said also this can be carried via the s390 tree: https://lore.kernel.org/linux-s390/YQJdarx6XSUQ1tFZ@elver.google.com/ - More changes to prepare the decompressor for relocation. - Enable DAT also for CPU restart path. - Final set of register asm removal patches; leaving only three locations where needed and sane. - Add NNPA, Vector-Packed-Decimal-Enhancement Facility 2, PCI MIO support to hwcaps flags. - Cleanup hwcaps implementation. - Add new instructions to in-kernel disassembler. - Various QDIO cleanups. - Add SCLP debug feature. - Various other cleanups and improvements all over the place. * tag 's390-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (105 commits) s390: remove SCHED_CORE from defconfigs s390/smp: do not use nodat_stack for secondary CPU start s390/smp: enable DAT before CPU restart callback is called s390: update defconfigs s390/ap: fix state machine hang after failure to enable irq KVM: s390: generate kvm hypercall functions s390/sclp: add tracing of SCLP interactions s390/debug: add early tracing support s390/debug: fix debug area life cycle s390/debug: keep debug data on resize s390/diag: make restart_part2 a local label s390/mm,pageattr: fix walk_pte_level() early exit s390: fix typo in linker script s390: remove do_signal() prototype and do_notify_resume() function s390/crypto: fix all kernel-doc warnings in vfio_ap_ops.c s390/pci: improve DMA translation init and exit s390/pci: simplify CLP List PCI handling s390/pci: handle FH state mismatch only on disable s390/pci: fix misleading rc in clp_set_pci_fn() s390/boot: factor out offset_vmlinux_info() function ...
2021-08-30Merge branch 'rework/printk_safe-removal' into for-linusPetr Mladek1-2/+0
2021-08-26bpf: Fix bpf-next builds without CONFIG_BPF_EVENTSDaniel Xu1-2/+0
This commit fixes linker errors along the lines of: s390-linux-ld: task_iter.c:(.init.text+0xa4): undefined reference to `btf_task_struct_ids'` Fix by defining btf_task_struct_ids unconditionally in kernel/bpf/btf.c since there exists code that unconditionally uses btf_task_struct_ids. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/05d94748d9f4b3eecedc4fddd6875418a396e23c.1629942444.git.dxu@dxuuu.xyz
2021-08-25bpf: Add bpf_task_pt_regs() helperDaniel Xu1-0/+19
The motivation behind this helper is to access userspace pt_regs in a kprobe handler. uprobe's ctx is the userspace pt_regs. kprobe's ctx is the kernelspace pt_regs. bpf_task_pt_regs() allows accessing userspace pt_regs in a kprobe handler. The final case (kernelspace pt_regs in uprobe) is pretty rare (usermode helper) so I think that can be solved later if necessary. More concretely, this helper is useful in doing BPF-based DWARF stack unwinding. Currently the kernel can only do framepointer based stack unwinds for userspace code. This is because the DWARF state machines are too fragile to be computed in kernelspace [0]. The idea behind DWARF-based stack unwinds w/ BPF is to copy a chunk of the userspace stack (while in prog context) and send it up to userspace for unwinding (probably with libunwind) [1]. This would effectively enable profiling applications with -fomit-frame-pointer using kprobes and uprobes. [0]: https://lkml.org/lkml/2012/2/10/356 [1]: https://github.com/danobi/bpf-dwarf-walk Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/e2718ced2d51ef4268590ab8562962438ab82815.1629772842.git.dxu@dxuuu.xyz
2021-08-25bpf: Extend bpf_base_func_proto helpers with bpf_get_current_task_btf()Daniel Xu1-1/+1
bpf_get_current_task() is already supported so it's natural to also include the _btf() variant for btf-powered helpers. This is required for non-tracing progs to use bpf_task_pt_regs() in the next commit. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/f99870ed5f834c9803d73b3476f8272b1bb987c0.1629772842.git.dxu@dxuuu.xyz
2021-08-25bpf: Consolidate task_struct BTF_ID declarationsDaniel Xu1-2/+2
No need to have it defined 5 times. Once is enough. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/6dcefa5bed26fe1226f26683f36819bb53ec19a2.1629772842.git.dxu@dxuuu.xyz
2021-08-20tracing: Add a probe that attaches to trace eventsTzvetomir Stoyanov (VMware)8-13/+958
A new dynamic event is introduced: event probe. The event is attached to an existing tracepoint and uses its fields as arguments. The user can specify custom format string of the new event, select what tracepoint arguments will be printed and how to print them. An event probe is created by writing configuration string in 'dynamic_events' ftrace file: e[:[SNAME/]ENAME] SYSTEM/EVENT [FETCHARGS] - Set an event probe -:SNAME/ENAME - Delete an event probe Where: SNAME - System name, if omitted 'eprobes' is used. ENAME - Name of the new event in SNAME, if omitted the SYSTEM_EVENT is used. SYSTEM - Name of the system, where the tracepoint is defined, mandatory. EVENT - Name of the tracepoint event in SYSTEM, mandatory. FETCHARGS - Arguments: <name>=$<field>[:TYPE] - Fetch given filed of the tracepoint and print it as given TYPE with given name. Supported types are: (u8/u16/u32/u64/s8/s16/s32/s64), basic type (x8/x16/x32/x64), hexadecimal types "string", "ustring" and bitfield. Example, attach an event probe on openat system call and print name of the file that will be opened: echo "e:esys/eopen syscalls/sys_enter_openat file=\$filename:string" >> dynamic_events A new dynamic event is created in events/esys/eopen/ directory. It can be deleted with: echo "-:esys/eopen" >> dynamic_events Filters, triggers and histograms can be attached to the new event, it can be matched in synthetic events. There is one limitation - an event probe can not be attached to kprobe, uprobe or another event probe. Link: https://lkml.kernel.org/r/20210812145805.2292326-1-tz.stoyanov@gmail.com Link: https://lkml.kernel.org/r/20210819152825.142428383@goodmis.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski5-36/+77
drivers/ptp/Kconfig: 55c8fca1dae1 ("ptp_pch: Restore dependency on PCI") e5f31552674e ("ethernet: fix PTP_1588_CLOCK dependencies") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-19tracing/probes: Reject events which have the same name of existing oneMasami Hiramatsu4-2/+36
Since kprobe_events and uprobe_events only check whether the other same-type probe event has the same name or not, if the user gives the same name of the existing tracepoint event (or the other type of probe events), it silently fails to create the tracefs entry (but registered.) as below. /sys/kernel/tracing # ls events/task/task_rename enable filter format hist id trigger /sys/kernel/tracing # echo p:task/task_rename vfs_read >> kprobe_events [ 113.048508] Could not create tracefs 'task_rename' directory /sys/kernel/tracing # cat kprobe_events p:task/task_rename vfs_read To fix this issue, check whether the existing events have the same name or not in trace_probe_register_event_call(). If exists, it rejects to register the new event. Link: https://lkml.kernel.org/r/162936876189.187130.17558311387542061930.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-19tracing/probes: Have process_fetch_insn() take a void * instead of pt_regsSteven Rostedt (VMware)3-5/+7
In preparation to allow event probes to use the process_fetch_insn() callback in trace_probe_tmpl.h, change the data passed to it from a pointer to pt_regs, as the event probe will not be using regs, and make it a void pointer instead. Update the process_fetch_insn() callers for kprobe and uprobe events to have the regs defined in the function and just typecast the void pointer parameter. Link: https://lkml.kernel.org/r/20210819041842.291622924@goodmis.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-19tracing/probe: Change traceprobe_set_print_fmt() to take a typeSteven Rostedt (VMware)4-11/+31
Instead of a boolean "is_return" have traceprobe_set_print_fmt() take a type (currently just PROBE_PRINT_NORMAL and PROBE_PRINT_RETURN). This will simplify adding different types. For example, the development of the event_probe, will need its own type as it prints an event, and not an IP. Link: https://lkml.kernel.org/r/20210819041842.104626301@goodmis.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>