summaryrefslogtreecommitdiff
path: root/kernel/trace
AgeCommit message (Collapse)AuthorFilesLines
2021-04-27bpf: Lock bpf_trace_printk's tmp buf before it is written toFlorent Revest1-1/+1
bpf_trace_printk uses a shared static buffer to hold strings before they are printed. A recent refactoring moved the locking of that buffer after it gets filled by mistake. Fixes: d9c9e4db186a ("bpf: Factorize bpf_trace_printk and bpf_seq_printf") Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210427112958.773132-1-revest@chromium.org
2021-04-26Merge tag 'irq-core-2021-04-26' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "The usual updates from the irq departement: Core changes: - Provide IRQF_NO_AUTOEN as a flag for request*_irq() so drivers can be cleaned up which either use a seperate mechanism to prevent auto-enable at request time or have a racy mechanism which disables the interrupt right after request. - Get rid of the last usage of irq_create_identity_mapping() and remove the interface. - An overhaul of tasklet_disable(). Most usage sites of tasklet_disable() are in task context and usually in cleanup, teardown code pathes. tasklet_disable() spinwaits for a tasklet which is currently executed. That's not only a problem for PREEMPT_RT where this can lead to a live lock when the disabling task preempts the softirq thread. It's also problematic in context of virtualization when the vCPU which runs the tasklet is scheduled out and the disabling code has to spin wait until it's scheduled back in. There are a few code pathes which invoke tasklet_disable() from non-sleepable context. For these a new disable variant which still spinwaits is provided which allows to switch tasklet_disable() to a sleep wait mechanism. For the atomic use cases this does not solve the live lock issue on PREEMPT_RT. That is mitigated by blocking on the RT specific softirq lock. - The PREEMPT_RT specific implementation of softirq processing and local_bh_disable/enable(). On RT enabled kernels soft interrupt processing happens always in task context and all interrupt handlers, which are not explicitly marked to be invoked in hard interrupt context are forced into task context as well. This allows to protect against softirq processing with a per CPU lock, which in turn allows to make BH disabled regions preemptible. Most of the softirq handling code is still shared. The RT/non-RT specific differences are addressed with a set of inline functions which provide the context specific functionality. The local_bh_disable() / local_bh_enable() mechanism are obviously seperate. - The usual set of small improvements and cleanups Driver changes: - New drivers for Nuvoton WPCM450 and DT 79rc3243x interrupt controllers - Extended functionality for MStar, STM32 and SC7280 irq chips - Enhanced robustness for ARM GICv3/4.1 drivers - The usual set of cleanups and improvements all over the place" * tag 'irq-core-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) irqchip/xilinx: Expose Kconfig option for Zynq/ZynqMP irqchip/gic-v3: Do not enable irqs when handling spurious interrups dt-bindings: interrupt-controller: Add IDT 79RC3243x Interrupt Controller irqchip: Add support for IDT 79rc3243x interrupt controller irqdomain: Drop references to recusive irqdomain setup irqdomain: Get rid of irq_create_strict_mappings() irqchip/jcore-aic: Kill use of irq_create_strict_mappings() ARM: PXA: Kill use of irq_create_strict_mappings() irqchip/gic-v4.1: Disable vSGI upon (GIC CPUIF < v4.1) detection irqchip/tb10x: Use 'fallthrough' to eliminate a warning genirq: Reduce irqdebug cacheline bouncing kernel: Initialize cpumask before parsing irqchip/wpcm450: Drop COMPILE_TEST irqchip/irq-mst: Support polarity configuration irqchip: Add driver for WPCM450 interrupt controller dt-bindings: interrupt-controller: Add nuvoton, wpcm450-aic dt-bindings: qcom,pdc: Add compatible for sc7280 irqchip/stm32: Add usart instances exti direct event support irqchip/gic-v3: Fix OF_BAD_ADDR error handling irqchip/sifive-plic: Mark two global variables __ro_after_init ...
2021-04-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller1-334/+39
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-04-23 The following pull-request contains BPF updates for your *net-next* tree. We've added 69 non-merge commits during the last 22 day(s) which contain a total of 69 files changed, 3141 insertions(+), 866 deletions(-). The main changes are: 1) Add BPF static linker support for extern resolution of global, from Andrii. 2) Refine retval for bpf_get_task_stack helper, from Dave. 3) Add a bpf_snprintf helper, from Florent. 4) A bunch of miscellaneous improvements from many developers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-21Merge tag 'trace-v5.12-rc8' of ↵Linus Torvalds1-3/+7
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Fix tp_printk command line and trace events Masami added a wrapper to be able to unhash trace event pointers as they are only read by root anyway, and they can also be extracted by the raw trace data buffers. But this wrapper utilized the iterator to have a temporary buffer to manipulate the text with. tp_printk is a kernel command line option that will send the trace output of a trace event to the console on boot up (useful when the system crashes before finishing the boot). But the code used the same wrapper that Masami added, and its iterator did not have a buffer, and this caused the system to crash. Have the wrapper just print the trace event normally if the iterator has no temporary buffer" * tag 'trace-v5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix checking event hash pointer logic when tp_printk is enabled
2021-04-20tracing: Fix checking event hash pointer logic when tp_printk is enabledSteven Rostedt (VMware)1-3/+7
Pointers in events that are printed are unhashed if the flags allow it, and the logic to do so is called before processing the event output from the raw ring buffer. In most cases, this is done when a user reads one of the trace files. But if tp_printk is added on the kernel command line, this logic is done for trace events when they are triggered, and their output goes out via printk. The unhash logic (and even the validation of the output) did not support the tp_printk output, and would crash. Link: https://lore.kernel.org/linux-tegra/9835d9f1-8d3a-3440-c53f-516c2606ad07@nvidia.com/ Fixes: efbbdaa22bb7 ("tracing: Show real address for trace event arguments") Reported-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-04-20bpf: Add a bpf_snprintf helperFlorent Revest1-0/+2
The implementation takes inspiration from the existing bpf_trace_printk helper but there are a few differences: To allow for a large number of format-specifiers, parameters are provided in an array, like in bpf_seq_printf. Because the output string takes two arguments and the array of parameters also takes two arguments, the format string needs to fit in one argument. Thankfully, ARG_PTR_TO_CONST_STR is guaranteed to point to a zero-terminated read-only map so we don't need a format string length arg. Because the format-string is known at verification time, we also do a first pass of format string validation in the verifier logic. This makes debugging easier. Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210419155243.1632274-4-revest@chromium.org
2021-04-20bpf: Factorize bpf_trace_printk and bpf_seq_printfFlorent Revest1-334/+37
Two helpers (trace_printk and seq_printf) have very similar implementations of format string parsing and a third one is coming (snprintf). To avoid code duplication and make the code easier to maintain, this moves the operations associated with format string parsing (validation and argument sanitization) into one generic function. The implementation of the two existing helpers already drifted quite a bit so unifying them entailed a lot of changes: - bpf_trace_printk always expected fmt[fmt_size] to be the terminating NULL character, this is no longer true, the first 0 is terminating. - bpf_trace_printk now supports %% (which produces the percentage char). - bpf_trace_printk now skips width formating fields. - bpf_trace_printk now supports the X modifier (capital hexadecimal). - bpf_trace_printk now supports %pK, %px, %pB, %pi4, %pI4, %pi6 and %pI6 - argument casting on 32 bit has been simplified into one macro and using an enum instead of obscure int increments. - bpf_seq_printf now uses bpf_trace_copy_string instead of strncpy_from_kernel_nofault and handles the %pks %pus specifiers. - bpf_seq_printf now prints longs correctly on 32 bit architectures. - both were changed to use a global per-cpu tmp buffer instead of one stack buffer for trace_printk and 6 small buffers for seq_printf. - to avoid per-cpu buffer usage conflict, these helpers disable preemption while the per-cpu buffer is in use. - both helpers now support the %ps and %pS specifiers to print symbols. The implementation is also moved from bpf_trace.c to helpers.c because the upcoming bpf_snprintf helper will be made available to all BPF programs and will need it. Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210419155243.1632274-2-revest@chromium.org
2021-04-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-2/+4
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c - keep the ZC code, drop the code related to reinit net/bridge/netfilter/ebtables.c - fix build after move to net_generic Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-15ftrace: Reuse the output of the function tracer for func_repeatsSteven Rostedt (VMware)1-10/+13
The func_repeats event shows the output of the function tracer followed by a count of the number of repeats the previous function had made, as well as the timestamp of the last function that was repeated. The printing of the function should be the same as is for the function it is displaying. Reuse the code in trace_fn_trace() by making a helper function print_fn_trace() and use it for trace_func_repeats_print(). Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-04-15tracing: Add "func_no_repeats" option for function tracingYordan Karadzhov (VMware)1-3/+159
If the option is activated the function tracing record gets consolidated in the cases when a single function is called number of times consecutively. Instead of having an identical record for each call of the function we will record only the first call following by event showing the number of repeats. Link: https://lkml.kernel.org/r/20210415181854.147448-7-y.karadz@gmail.com Signed-off-by: Yordan Karadzhov (VMware) <y.karadz@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-04-15tracing: Unify the logic for function tracing optionsYordan Karadzhov (VMware)1-27/+38
Currently the logic for dealing with the options for function tracing has two different implementations. One is used when we set the flags (in "static int func_set_flag()") and another used when we initialize the tracer (in "static int function_trace_init()"). Those two implementations are meant to do essentially the same thing and they are both not very convenient for adding new options. In this patch we add a helper function that provides a single implementation of the logic for dealing with the options and we make it such that new options can be easily added. Link: https://lkml.kernel.org/r/20210415181854.147448-6-y.karadz@gmail.com Signed-off-by: Yordan Karadzhov (VMware) <y.karadz@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-04-15tracing: Add method for recording "func_repeats" eventsYordan Karadzhov (VMware)2-0/+38
This patch only provides the implementation of the method. Later we will used it in a combination with a new option for function tracing. Link: https://lkml.kernel.org/r/20210415181854.147448-5-y.karadz@gmail.com Signed-off-by: Yordan Karadzhov (VMware) <y.karadz@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-04-15tracing: Add "last_func_repeats" to struct trace_arrayYordan Karadzhov (VMware)2-0/+13
The field is used to keep track of the consecutive (on the same CPU) calls of a single function. This information is needed in order to consolidate the function tracing record in the cases when a single function is called number of times. Link: https://lkml.kernel.org/r/20210415181854.147448-4-y.karadz@gmail.com Signed-off-by: Yordan Karadzhov (VMware) <y.karadz@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-04-15tracing: Define new ftrace event "func_repeats"Yordan Karadzhov (VMware)3-0/+73
The event aims to consolidate the function tracing record in the cases when a single function is called number of times consecutively. while (cond) do_func(); This may happen in various scenarios (busy waiting for example). The new ftrace event can be used to show repeated function events with a single event and save space on the ring buffer Link: https://lkml.kernel.org/r/20210415181854.147448-3-y.karadz@gmail.com Signed-off-by: Yordan Karadzhov (VMware) <y.karadz@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-04-15tracing: Define static void trace_print_time()Yordan Karadzhov (VMware)1-9/+17
The part of the code that prints the time of the trace record in "int trace_print_context()" gets extracted in a static function. This is done as a preparation for a following patch, in which we will define a new ftrace event called "func_repeats". The new static method, defined here, will be used by this new event to print the time of the last repeat of a function that is consecutively called number of times. Link: https://lkml.kernel.org/r/20210415181854.147448-2-y.karadz@gmail.com Signed-off-by: Yordan Karadzhov (VMware) <y.karadz@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-04-14Merge tag 'trace-v5.12-rc7' of ↵Linus Torvalds1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Fix a memory link in dyn_event_release(). An error path exited the function before freeing the allocated 'argv' variable" * tag 'trace-v5.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/dynevent: Fix a memory leak in an error handling path
2021-04-13tracing/dynevent: Fix a memory leak in an error handling pathChristophe JAILLET1-2/+4
We must free 'argv' before returning, as already done in all the other paths of this function. Link: https://lkml.kernel.org/r/21e3594ccd7fc88c5c162c98450409190f304327.1618136448.git.christophe.jaillet@wanadoo.fr Fixes: d262271d0483 ("tracing/dynevent: Delegate parsing to create function") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-04-10kernel: Initialize cpumask before parsingTetsuo Handa1-1/+1
KMSAN complains that new_value at cpumask_parse_user() from write_irq_affinity() from irq_affinity_proc_write() is uninitialized. [ 148.133411][ T5509] ===================================================== [ 148.135383][ T5509] BUG: KMSAN: uninit-value in find_next_bit+0x325/0x340 [ 148.137819][ T5509] [ 148.138448][ T5509] Local variable ----new_value.i@irq_affinity_proc_write created at: [ 148.140768][ T5509] irq_affinity_proc_write+0xc3/0x3d0 [ 148.142298][ T5509] irq_affinity_proc_write+0xc3/0x3d0 [ 148.143823][ T5509] ===================================================== Since bitmap_parse() from cpumask_parse_user() calls find_next_bit(), any alloc_cpumask_var() + cpumask_parse_user() sequence has possibility that find_next_bit() accesses uninitialized cpu mask variable. Fix this problem by replacing alloc_cpumask_var() with zalloc_cpumask_var(). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20210401055823.3929-1-penguin-kernel@I-love.SAKURA.ne.jp
2021-04-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-4/+8
Conflicts: MAINTAINERS - keep Chandrasekar drivers/net/ethernet/mellanox/mlx5/core/en_main.c - simple fix + trust the code re-added to param.c in -next is fine include/linux/bpf.h - trivial include/linux/ethtool.h - trivial, fix kdoc while at it include/linux/skmsg.h - move to relevant place in tcp.c, comment re-wrapped net/core/skmsg.c - add the sk = sk // sk = NULL around calls net/tipc/crypto.c - trivial Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-02Merge tag 'trace-v5.12-rc5-2' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Fix stack trace entry size to stop showing garbage The macro that creates both the structure and the format displayed to user space for the stack trace event was changed a while ago to fix the parsing by user space tooling. But this change also modified the structure used to store the stack trace event. It changed the caller array field from [0] to [8]. Even though the size in the ring buffer is dynamic and can be something other than 8 (user space knows how to handle this), the 8 extra words was not accounted for when reserving the event on the ring buffer, and added 8 more entries, due to the calculation of "sizeof(*entry) + nr_entries * sizeof(long)", as the sizeof(*entry) now contains 8 entries. The size of the caller field needs to be subtracted from the size of the entry to create the correct allocation size" * tag 'trace-v5.12-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix stack trace event size
2021-04-01ftrace: Simplify the calculation of page number for ftrace_page->records ↵Steven Rostedt (VMware)1-8/+2
some more Commit b40c6eabfcd40 ("ftrace: Simplify the calculation of page number for ftrace_page->records") simplified the calculation of the number of pages needed for each page group without having any empty pages, but it can be simplified even further. Link: https://lore.kernel.org/lkml/CAHk-=wjt9b7kxQ2J=aDNKbR1QBMB3Hiqb_hYcZbKsxGRSEb+gQ@mail.gmail.com/ Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-04-01ftrace: Store the order of pages allocated in ftrace_pageLinus Torvalds1-18/+17
Instead of saving the size of the records field of the ftrace_page, store the order it uses to allocate the pages, as that is what is needed to know in order to free the pages. This simplifies the code. Link: https://lore.kernel.org/lkml/CAHk-=whyMxheOqXAORt9a7JK9gc9eHTgCJ55Pgs4p=X3RrQubQ@mail.gmail.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [ change log written by Steven Rostedt ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-04-01tracing: Remove unused argument from "ring_buffer_time_stamp()Yordan Karadzhov (VMware)2-5/+5
The "cpu" parameter is not being used by the function. Link: https://lkml.kernel.org/r/20210329130331.199402-1-y.karadz@gmail.com Signed-off-by: Yordan Karadzhov (VMware) <y.karadz@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-04-01Merge branch 'trace/ftrace/urgent' into HEADSteven Rostedt (VMware)2-4/+8
Needed to merge trace/ftrace/urgent to get: Commit 59300b36f85 ("ftrace: Check if pages were allocated before calling free_pages()") To clean up the code that is affected by it as well.
2021-04-01tracing: Fix stack trace event sizeSteven Rostedt (VMware)1-1/+2
Commit cbc3b92ce037 fixed an issue to modify the macros of the stack trace event so that user space could parse it properly. Originally the stack trace format to user space showed that the called stack was a dynamic array. But it is not actually a dynamic array, in the way that other dynamic event arrays worked, and this broke user space parsing for it. The update was to make the array look to have 8 entries in it. Helper functions were added to make it parse it correctly, as the stack was dynamic, but was determined by the size of the event stored. Although this fixed user space on how it read the event, it changed the internal structure used for the stack trace event. It changed the array size from [0] to [8] (added 8 entries). This increased the size of the stack trace event by 8 words. The size reserved on the ring buffer was the size of the stack trace event plus the number of stack entries found in the stack trace. That commit caused the amount to be 8 more than what was needed because it did not expect the caller field to have any size. This produced 8 entries of garbage (and reading random data) from the stack trace event: <idle>-0 [002] d... 1976396.837549: <stack trace> => trace_event_raw_event_sched_switch => __traceiter_sched_switch => __schedule => schedule_idle => do_idle => cpu_startup_entry => secondary_startup_64_no_verify => 0xc8c5e150ffff93de => 0xffff93de => 0 => 0 => 0xc8c5e17800000000 => 0x1f30affff93de => 0x00000004 => 0x200000000 Instead, subtract the size of the caller field from the size of the event to make sure that only the amount needed to store the stack trace is reserved. Link: https://lore.kernel.org/lkml/your-ad-here.call-01617191565-ext-9692@work.hours/ Cc: stable@vger.kernel.org Fixes: cbc3b92ce037 ("tracing: Set kernel_stack's caller size properly") Reported-by: Vasily Gorbik <gor@linux.ibm.com> Tested-by: Vasily Gorbik <gor@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-31Merge tag 'trace-v5.12-rc5' of ↵Linus Torvalds1-3/+6
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull ftrace fix from Steven Rostedt: "Add check of order < 0 before calling free_pages() The function addresses that are traced by ftrace are stored in pages, and the size is held in a variable. If there's some error in creating them, the allocate ones will be freed. In this case, it is possible that the order of pages to be freed may end up being negative due to a size of zero passed to get_count_order(), and then that negative number will cause free_pages() to free a very large section. Make sure that does not happen" * tag 'trace-v5.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Check if pages were allocated before calling free_pages()
2021-03-30ftrace: Check if pages were allocated before calling free_pages()Steven Rostedt (VMware)1-3/+6
It is possible that on error pg->size can be zero when getting its order, which would return a -1 value. It is dangerous to pass in an order of -1 to free_pages(). Check if order is greater than or equal to zero before calling free_pages(). Link: https://lore.kernel.org/lkml/20210330093916.432697c7@gandalf.local.home/ Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-5/+38
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25tracing: Update create_system_filter() kernel-doc commentQiujun Huang1-2/+3
commit f306cc82a93d ("tracing: Update event filters for multibuffer") added the parameter @tr for create_system_filter(). commit bb9ef1cb7d86 ("tracing: Change apply_subsystem_event_filter() paths to check file->system == dir") changed the parameter from @system to @dir. Link: https://lkml.kernel.org/r/20210325161911.123452-1-hqjagain@gmail.com Signed-off-by: Qiujun Huang <hqjagain@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-25tracing: A minor cleanup for create_system_filter()Qiujun Huang1-4/+3
The first two parameters should be reduced to one, as @tr is simply @dir->tr. Link: https://lkml.kernel.org/r/20210324205642.65e03248@oasis.local.home Link: https://lkml.kernel.org/r/20210325163752.128407-1-hqjagain@gmail.com Suggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Qiujun Huang <hqjagain@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-25kernel: trace: Mundane typo fixes in the file trace_events_filter.cBhaskar Chowdhury1-1/+1
s/callin/calling/ Link: https://lkml.kernel.org/r/20210317095401.1854544-1-unixbhaskar@gmail.com Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> [ Other fixes already done by Ingo Molnar ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-23tracing: Fix various typos in commentsIngo Molnar19-44/+45
Fix ~59 single-word typos in the tracing code comments, and fix the grammar in a handful of places. Link: https://lore.kernel.org/r/20210322224546.GA1981273@gmail.com Link: https://lkml.kernel.org/r/20210323174935.GA4176821@gmail.com Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-18tracing: Add a verifier to check string pointers for trace eventsSteven Rostedt (VMware)4-1/+215
It is a common mistake for someone writing a trace event to save a pointer to a string in the TP_fast_assign() and then display that string pointer in the TP_printk() with %s. The problem is that those two events may happen a long time apart, where the source of the string may no longer exist. The proper way to handle displaying any string that is not guaranteed to be in the kernel core rodata section, is to copy it into the ring buffer via the __string(), __assign_str() and __get_str() helper macros. Add a check at run time while displaying the TP_printk() of events to make sure that every %s referenced is safe to dereference, and if it is not, trigger a warning and only show the address of the pointer, and the dereferenced string if it can be safely retrieved with a strncpy_from_kernel_nofault() call. In order to not have to copy the parsing of vsnprintf() formats, or even exporting its code, the verifier relies on vsnprintf() being able to modify the va_list that is passed to it, and it remains modified after it is called. This is the case for some architectures like x86_64, but other architectures like x86_32 pass the va_list to vsnprintf() as a value not a reference, and the verifier can not use it to parse the non string arguments. Thus, at boot up, it is checked if vsnprintf() modifies the passed in va_list or not, and a static branch will disable the verifier if it's not compatible. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-18tracing: Add check of trace event print fmts for dereferencing pointersSteven Rostedt (VMware)1-0/+210
Trace events record data into the ring buffer at the time of the event. The trace event has a printf logic to display the recorded data at a much later time when the user reads the trace file. This makes using dereferencing pointers unsafe if the dereferenced pointer points to the original source. The safe way to handle this is to create an array within the trace event and copy the source into the array. Then the dereference pointer may point to that array. As this is a easy mistake to make, a check is added to examine all trace event print fmts to make sure that they are safe to use. This only checks the various %p* dereferenced pointers like %pB, %pR, etc. It does not handle dereferencing of strings, as there are some use cases that are OK to dereference the source. That will be dealt with differently. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-18tracing: Add tracing_event_time_stamp() APISteven Rostedt (VMware)2-0/+9
Add a tracing_event_time_stamp() API that checks if the event passed in is not on the ring buffer but a pointer to the per CPU trace_buffered_event which does not have its time stamp set yet. If it is a pointer to the trace_buffered_event, then just return the current time stamp that the ring buffer would produce. Otherwise, return the time stamp from the event. Link: https://lkml.kernel.org/r/20210316164114.131996180@goodmis.org Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-18ring-buffer: Add verifier for using ring_buffer_event_time_stamp()Steven Rostedt (VMware)1-4/+52
The ring_buffer_event_time_stamp() must be only called by an event that has not been committed yet, and is on the buffer that is passed in. This was used to help debug converting the histogram logic over to using the new time stamp code, and was proven to be very useful. Add a verifier that can check that this is the case, and extra WARN_ONs to catch unexpected use cases. Link: https://lkml.kernel.org/r/20210316164113.987294354@goodmis.org Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-18tracing: Use a no_filter_buffering_ref to stop using the filter bufferSteven Rostedt (VMware)3-21/+17
Currently, the trace histograms relies on it using absolute time stamps to trigger the tracing to not use the temp buffer if filters are set. That's because the histograms need the full timestamp that is saved in the ring buffer. That is no longer the case, as the ring_buffer_event_time_stamp() can now return the time stamp for all events without all triggering a full absolute time stamp. Now that the absolute time stamp is an unrelated dependency to not using the filters. There's nothing about having absolute timestamps to keep from using the filter buffer. Instead, change the interface to explicitly state to disable filter buffering that the histogram logic can use. Link: https://lkml.kernel.org/r/20210316164113.847886563@goodmis.org Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-18ring-buffer: Allow ring_buffer_event_time_stamp() to return time stamp of ↵Steven Rostedt (VMware)2-16/+46
all events Currently, ring_buffer_event_time_stamp() only returns an accurate time stamp of the event if it has an absolute extended time stamp attached to it. To make it more robust, use the event_stamp() in case the event does not have an absolute value attached to it. This will allow ring_buffer_event_time_stamp() to be used in more cases than just histograms, and it will also allow histograms to not require including absolute values all the time. Link: https://lkml.kernel.org/r/20210316164113.704830885@goodmis.org Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-18tracing: Pass buffer of event to trigger operationsSteven Rostedt (VMware)4-51/+92
The ring_buffer_event_time_stamp() is going to be updated to extract the time stamp for the event without needing it to be set to have absolute values for all events. But to do so, it needs the buffer that the event is on as the buffer saves information for the event before it is committed to the buffer. If the trace buffer is disabled, a temporary buffer is used, and there's no access to this buffer from the current histogram triggers, even though it is passed to the trace event code. Pass the buffer that the event is on all the way down to the histogram triggers. Link: https://lkml.kernel.org/r/20210316164113.542448131@goodmis.org Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-18ring-buffer: Add a event_stamp to cpu_buffer for each level of nestingSteven Rostedt (VMware)1-1/+10
Add a place to save the current event time stamp for each level of nesting. This will be used to retrieve the time stamp of the current event before it is committed. Link: https://lkml.kernel.org/r/20210316164113.399089673@goodmis.org Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-18ring-buffer: Separate out internal use of ring_buffer_event_time_stamp()Steven Rostedt (VMware)1-18/+23
The exported use of ring_buffer_event_time_stamp() is going to become different than how it is used internally. Move the internal logic out into a static function called rb_event_time_stamp(), and have the internal callers call that instead. Link: https://lkml.kernel.org/r/20210316164113.257790481@goodmis.org Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-17ftrace: Fix modify_ftrace_direct.Alexei Starovoitov1-5/+38
The following sequence of commands: register_ftrace_direct(ip, addr1); modify_ftrace_direct(ip, addr1, addr2); unregister_ftrace_direct(ip, addr2); will cause the kernel to warn: [ 30.179191] WARNING: CPU: 2 PID: 1961 at kernel/trace/ftrace.c:5223 unregister_ftrace_direct+0x130/0x150 [ 30.180556] CPU: 2 PID: 1961 Comm: test_progs W O 5.12.0-rc2-00378-g86bc10a0a711-dirty #3246 [ 30.182453] RIP: 0010:unregister_ftrace_direct+0x130/0x150 When modify_ftrace_direct() changes the addr from old to new it should update the addr stored in ftrace_direct_funcs. Otherwise the final unregister_ftrace_direct() won't find the address and will cause the splat. Fixes: 0567d6809182 ("ftrace: Add modify_ftrace_direct()") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/bpf/20210316195815.34714-1-alexei.starovoitov@gmail.com
2021-03-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller1-0/+6
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-03-09 The following pull-request contains BPF updates for your *net-next* tree. We've added 90 non-merge commits during the last 17 day(s) which contain a total of 114 files changed, 5158 insertions(+), 1288 deletions(-). The main changes are: 1) Faster bpf_redirect_map(), from Björn. 2) skmsg cleanup, from Cong. 3) Support for floating point types in BTF, from Ilya. 4) Documentation for sys_bpf commands, from Joe. 5) Support for sk_lookup in bpf_prog_test_run, form Lorenz. 6) Enable task local storage for tracing programs, from Song. 7) bpf_for_each_map_elem() helper, from Yonghong. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04tracing: Skip selftests if tracing is disabledSteven Rostedt (VMware)1-0/+6
If tracing is disabled for some reason (traceoff_on_warning, command line, etc), the ftrace selftests are guaranteed to fail, as their results are defined by trace data in the ring buffers. If the ring buffers are turned off, the tests will fail, due to lack of data. Because tracing being disabled is for a specific reason (warning, user decided to, etc), it does not make sense to enable tracing to run the self tests, as the test output may corrupt the reason for the tracing to be disabled. Instead, simply skip the self tests and report that they are being skipped due to tracing being disabled. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-04tracing: Fix memory leak in __create_synth_event()Vamshi K Sthambamkadi1-1/+3
kmemleak report: unreferenced object 0xc5a6f708 (size 8): comm "ftracetest", pid 1209, jiffies 4294911500 (age 6.816s) hex dump (first 8 bytes): 00 c1 3d 60 14 83 1f 8a ..=`.... backtrace: [<f0aa4ac4>] __kmalloc_track_caller+0x2a6/0x460 [<7d3d60a6>] kstrndup+0x37/0x70 [<45a0e739>] argv_split+0x1c/0x120 [<c17982f8>] __create_synth_event+0x192/0xb00 [<0708b8a3>] create_synth_event+0xbb/0x150 [<3d1941e1>] create_dyn_event+0x5c/0xb0 [<5cf8b9e3>] trace_parse_run_command+0xa7/0x140 [<04deb2ef>] dyn_event_write+0x10/0x20 [<8779ac95>] vfs_write+0xa9/0x3c0 [<ed93722a>] ksys_write+0x89/0xc0 [<b9ca0507>] __ia32_sys_write+0x15/0x20 [<7ce02d85>] __do_fast_syscall_32+0x45/0x80 [<cb0ecb35>] do_fast_syscall_32+0x29/0x60 [<2467454a>] do_SYSENTER_32+0x15/0x20 [<9beaa61d>] entry_SYSENTER_32+0xa9/0xfc unreferenced object 0xc5a6f078 (size 8): comm "ftracetest", pid 1209, jiffies 4294911500 (age 6.816s) hex dump (first 8 bytes): 08 f7 a6 c5 00 00 00 00 ........ backtrace: [<bbac096a>] __kmalloc+0x2b6/0x470 [<aa2624b4>] argv_split+0x82/0x120 [<c17982f8>] __create_synth_event+0x192/0xb00 [<0708b8a3>] create_synth_event+0xbb/0x150 [<3d1941e1>] create_dyn_event+0x5c/0xb0 [<5cf8b9e3>] trace_parse_run_command+0xa7/0x140 [<04deb2ef>] dyn_event_write+0x10/0x20 [<8779ac95>] vfs_write+0xa9/0x3c0 [<ed93722a>] ksys_write+0x89/0xc0 [<b9ca0507>] __ia32_sys_write+0x15/0x20 [<7ce02d85>] __do_fast_syscall_32+0x45/0x80 [<cb0ecb35>] do_fast_syscall_32+0x29/0x60 [<2467454a>] do_SYSENTER_32+0x15/0x20 [<9beaa61d>] entry_SYSENTER_32+0xa9/0xfc In __create_synth_event(), while iterating field/type arguments, the argv_split() will return array of atleast 2 elements even when zero arguments(argc=0) are passed. for e.g. when there is double delimiter or string ends with delimiter To fix call argv_free() even when argc=0. Link: https://lkml.kernel.org/r/20210304094521.GA1826@cosmos Signed-off-by: Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-04ring-buffer: Add a little more information and a WARN when time stamp going ↵Steven Rostedt (VMware)1-3/+7
backwards is detected When the CONFIG_RING_BUFFER_VALIDATE_TIME_DELTAS is enabled, and the time stamps are detected as not being valid, it reports information about the write stamp, but does not show the before_stamp which is still useful information. Also, it should give a warning once, such that tests detect this happening. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-04ring-buffer: Force before_stamp and write_stamp to be different on discardSteven Rostedt (VMware)1-0/+11
Part of the logic of the new time stamp code depends on the before_stamp and the write_stamp to be different if the write_stamp does not match the last event on the buffer, as it will be used to calculate the delta of the next event written on the buffer. The discard logic depends on this, as the next event to come in needs to inject a full timestamp as it can not rely on the last event timestamp in the buffer because it is unknown due to events after it being discarded. But by changing the write_stamp back to the time before it, it forces the next event to use a full time stamp, instead of relying on it. The issue came when a full time stamp was used for the event, and rb_time_delta() returns zero in that case. The update to the write_stamp (which subtracts delta) made it not change. Then when the event is removed from the buffer, because the before_stamp and write_stamp still match, the next event written would calculate its delta from the write_stamp, but that would be wrong as the write_stamp is of the time of the event that was discarded. In the case that the delta change being made to write_stamp is zero, set the before_stamp to zero as well, and this will force the next event to inject a full timestamp and not use the current write_stamp. Cc: stable@vger.kernel.org Fixes: a389d86f7fd09 ("ring-buffer: Have nested events still record running time stamp") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-04tracing: Fix help text of TRACEPOINT_BENCHMARK in KconfigRolf Eike Beer1-1/+1
It's "cond_resched()" not "cond_sched()". Link: https://lkml.kernel.org/r/1863065.aFVDpXsuPd@devpool47 Signed-off-by: Rolf Eike Beer <eb@emlix.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-04tracing: Remove duplicate declaration from trace.hYordan Karadzhov (VMware)1-1/+0
A declaration of function "int trace_empty(struct trace_iterator *iter)" shows up twice in the header file kernel/trace/trace.h Link: https://lkml.kernel.org/r/20210304092348.208033-1-y.karadz@gmail.com Signed-off-by: Yordan Karadzhov (VMware) <y.karadz@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-02-28Merge tag 'block-5.12-2021-02-27' of git://git.kernel.dk/linux-blockLinus Torvalds1-7/+13
Pull more block updates from Jens Axboe: "A few stragglers (and one due to me missing it originally), and fixes for changes in this merge window mostly. In particular: - blktrace cleanups (Chaitanya, Greg) - Kill dead blk_pm_* functions (Bart) - Fixes for the bio alloc changes (Christoph) - Fix for the partition changes (Christoph, Ming) - Fix for turning off iopoll with polled IO inflight (Jeffle) - nbd disconnect fix (Josef) - loop fsync error fix (Mauricio) - kyber update depth fix (Yang) - max_sectors alignment fix (Mikulas) - Add bio_max_segs helper (Matthew)" * tag 'block-5.12-2021-02-27' of git://git.kernel.dk/linux-block: (21 commits) block: Add bio_max_segs blktrace: fix documentation for blk_fill_rw() block: memory allocations in bounce_clone_bio must not fail block: remove the gfp_mask argument to bounce_clone_bio block: fix bounce_clone_bio for passthrough bios block-crypto-fallback: use a bio_set for splitting bios block: fix logging on capacity change blk-settings: align max_sectors on "logical_block_size" boundary block: reopen the device in blkdev_reread_part block: don't skip empty device in in disk_uevent blktrace: remove debugfs file dentries from struct blk_trace nbd: handle device refs for DESTROY_ON_DISCONNECT properly kyber: introduce kyber_depth_updated() loop: fix I/O error on fsync() in detached loop devices block: fix potential IO hang when turning off io_poll block: get rid of the trace rq insert wrapper blktrace: fix blk_rq_merge documentation blktrace: fix blk_rq_issue documentation blktrace: add blk_fill_rwbs documentation comment block: remove superfluous param in blk_fill_rwbs() ...