summaryrefslogtreecommitdiff
path: root/kernel/trace
AgeCommit message (Collapse)AuthorFilesLines
2014-01-03tracing: Fix rcu handling of event_trigger_data filter fieldSteven Rostedt (Red Hat)2-4/+6
The filter field of the event_trigger_data structure is protected under RCU sched locks. It was not annotated as such, and after doing so, sparse pointed out several locations that required fix ups. Reported-by: kbuild test robot <fengguang.wu@intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-01-03tracing: Add generic tracing_lseek() functionSteven Rostedt (Red Hat)5-26/+19
Trace event triggers added a lseek that uses the ftrace_filter_lseek() function. Unfortunately, when function tracing is not configured in that function is not defined and the kernel fails to build. This is the second time that function was added to a file ops and it broke the build due to requiring special config dependencies. Make a generic tracing_lseek() that all the tracing utilities may use. Also, modify the old ftrace_filter_lseek() to return 0 instead of 1 on WRONLY. Not sure why it was a 1 as that does not make sense. This also changes the old tracing_seek() to modify the file pos pointer on WRONLY as well. Reported-by: kbuild test robot <fengguang.wu@intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-12-31Merge tag 'v3.13-rc6' into for-3.14/coreJens Axboe4-41/+46
Needed to bring blk-mq uptodate, since changes have been going in since for-3.14/core was established. Fixup merge issues related to the immutable biovec changes. Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: block/blk-flush.c fs/btrfs/check-integrity.c fs/btrfs/extent_io.c fs/btrfs/scrub.c fs/logfs/dev_bdev.c
2013-12-22tracing: Add and use generic set_trigger_filter() implementationTom Zanussi4-14/+219
Add a generic event_command.set_trigger_filter() op implementation and have the current set of trigger commands use it - this essentially gives them all support for filters. Syntactically, filters are supported by adding 'if <filter>' just after the command, in which case only events matching the filter will invoke the trigger. For example, to add a filter to an enable/disable_event command: echo 'enable_event:system:event if common_pid == 999' > \ .../othersys/otherevent/trigger The above command will only enable the system:event event if the common_pid field in the othersys:otherevent event is 999. As another example, to add a filter to a stacktrace command: echo 'stacktrace if common_pid == 999' > \ .../somesys/someevent/trigger The above command will only trigger a stacktrace if the common_pid field in the event is 999. The filter syntax is the same as that described in the 'Event filtering' section of Documentation/trace/events.txt. Because triggers can now use filters, the trigger-invoking logic needs to be moved in those cases - e.g. for ftrace_raw_event_calls, if a trigger has a filter associated with it, the trigger invocation now needs to happen after the { assign; } part of the call, in order for the trigger condition to be tested. There's still a SOFT_DISABLED-only check at the top of e.g. the ftrace_raw_events function, so when an event is soft disabled but not because of the presence of a trigger, the original SOFT_DISABLED behavior remains unchanged. There's also a bit of trickiness in that some triggers need to avoid being invoked while an event is currently in the process of being logged, since the trigger may itself log data into the trace buffer. Thus we make sure the current event is committed before invoking those triggers. To do that, we split the trigger invocation in two - the first part (event_triggers_call()) checks the filter using the current trace record; if a command has the post_trigger flag set, it sets a bit for itself in the return value, otherwise it directly invoks the trigger. Once all commands have been either invoked or set their return flag, event_triggers_call() returns. The current record is then either committed or discarded; if any commands have deferred their triggers, those commands are finally invoked following the close of the current event by event_triggers_post_call(). To simplify the above and make it more efficient, the TRIGGER_COND bit is introduced, which is set only if a soft-disabled trigger needs to use the log record for filter testing or needs to wait until the current log record is closed. The syscall event invocation code is also changed in analogous ways. Because event triggers need to be able to create and free filters, this also adds a couple external wrappers for the existing create_filter and free_filter functions, which are too generic to be made extern functions themselves. Link: http://lkml.kernel.org/r/7164930759d8719ef460357f143d995406e4eead.1382622043.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-12-22tracing: Move ftrace_event_file() out of DYNAMIC_FTRACE ifdefSteven Rostedt (Red Hat)1-13/+13
Now that event triggers use ftrace_event_file(), it needs to be outside the #ifdef CONFIG_DYNAMIC_FTRACE, as it can now be used when that is not defined. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-12-22tracing: Add 'enable_event' and 'disable_event' event trigger commandsTom Zanussi3-1/+364
Add 'enable_event' and 'disable_event' event_command commands. enable_event and disable_event event triggers are added by the user via these commands in a similar way and using practically the same syntax as the analagous 'enable_event' and 'disable_event' ftrace function commands, but instead of writing to the set_ftrace_filter file, the enable_event and disable_event triggers are written to the per-event 'trigger' files: echo 'enable_event:system:event' > .../othersys/otherevent/trigger echo 'disable_event:system:event' > .../othersys/otherevent/trigger The above commands will enable or disable the 'system:event' trace events whenever the othersys:otherevent events are hit. This also adds a 'count' version that limits the number of times the command will be invoked: echo 'enable_event:system:event:N' > .../othersys/otherevent/trigger echo 'disable_event:system:event:N' > .../othersys/otherevent/trigger Where N is the number of times the command will be invoked. The above commands will will enable or disable the 'system:event' trace events whenever the othersys:otherevent events are hit, but only N times. This also makes the find_event_file() helper function extern, since it's useful to use from other places, such as the event triggers code, so make it accessible. Link: http://lkml.kernel.org/r/f825f3048c3f6b026ee37ae5825f9fc373451828.1382622043.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-12-22tracing: Add 'stacktrace' event trigger commandTom Zanussi1-0/+79
Add 'stacktrace' event_command. stacktrace event triggers are added by the user via this command in a similar way and using practically the same syntax as the analogous 'stacktrace' ftrace function command, but instead of writing to the set_ftrace_filter file, the stacktrace event trigger is written to the per-event 'trigger' files: echo 'stacktrace' > .../tracing/events/somesys/someevent/trigger The above command will turn on stacktraces for someevent i.e. whenever someevent is hit, a stacktrace will be logged. This also adds a 'count' version that limits the number of times the command will be invoked: echo 'stacktrace:N' > .../tracing/events/somesys/someevent/trigger Where N is the number of times the command will be invoked. The above command will log N stacktraces for someevent i.e. whenever someevent is hit N times, a stacktrace will be logged. Link: http://lkml.kernel.org/r/0c30c008a0828c660aa0e1bbd3255cf179ed5c30.1382622043.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-12-22tracing: Add 'snapshot' event trigger commandTom Zanussi3-3/+116
Add 'snapshot' event_command. snapshot event triggers are added by the user via this command in a similar way and using practically the same syntax as the analogous 'snapshot' ftrace function command, but instead of writing to the set_ftrace_filter file, the snapshot event trigger is written to the per-event 'trigger' files: echo 'snapshot' > .../somesys/someevent/trigger The above command will turn on snapshots for someevent i.e. whenever someevent is hit, a snapshot will be done. This also adds a 'count' version that limits the number of times the command will be invoked: echo 'snapshot:N' > .../somesys/someevent/trigger Where N is the number of times the command will be invoked. The above command will snapshot N times for someevent i.e. whenever someevent is hit N times, a snapshot will be done. Also adds a new tracing_alloc_snapshot() function - the existing tracing_snapshot_alloc() function is a special version of tracing_snapshot() that also does the snapshot allocation - the snapshot triggers would like to be able to do just the allocation but not take a snapshot; the existing tracing_snapshot_alloc() in turn now also calls tracing_alloc_snapshot() underneath to do that allocation. Link: http://lkml.kernel.org/r/c9524dd07ce01f9dcbd59011290e0a8d5b47d7ad.1382622043.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> [ fix up from kbuild test robot <fengguang.wu@intel.com report ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-12-21tracing: Add 'traceon' and 'traceoff' event trigger commandsTom Zanussi1-0/+446
Add 'traceon' and 'traceoff' event_command commands. traceon and traceoff event triggers are added by the user via these commands in a similar way and using practically the same syntax as the analagous 'traceon' and 'traceoff' ftrace function commands, but instead of writing to the set_ftrace_filter file, the traceon and traceoff triggers are written to the per-event 'trigger' files: echo 'traceon' > .../tracing/events/somesys/someevent/trigger echo 'traceoff' > .../tracing/events/somesys/someevent/trigger The above command will turn tracing on or off whenever someevent is hit. This also adds a 'count' version that limits the number of times the command will be invoked: echo 'traceon:N' > .../tracing/events/somesys/someevent/trigger echo 'traceoff:N' > .../tracing/events/somesys/someevent/trigger Where N is the number of times the command will be invoked. The above commands will will turn tracing on or off whenever someevent is hit, but only N times. Some common register/unregister_trigger() implementations of the event_command reg()/unreg() callbacks are also provided, which add and remove trigger instances to the per-event list of triggers, and arm/disarm them as appropriate. event_trigger_callback() is a general-purpose event_command func() implementation that orchestrates command parsing and registration for most normal commands. Most event commands will use these, but some will override and possibly reuse them. The event_trigger_init(), event_trigger_free(), and event_trigger_print() functions are meant to be common implementations of the event_trigger_ops init(), free(), and print() ops, respectively. Most trigger_ops implementations will use these, but some will override and possibly reuse them. Link: http://lkml.kernel.org/r/00a52816703b98d2072947478dd6e2d70cde5197.1382622043.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-12-21tracing: Add basic event trigger frameworkTom Zanussi5-5/+480
Add a 'trigger' file for each trace event, enabling 'trace event triggers' to be set for trace events. 'trace event triggers' are patterned after the existing 'ftrace function triggers' implementation except that triggers are written to per-event 'trigger' files instead of to a single file such as the 'set_ftrace_filter' used for ftrace function triggers. The implementation is meant to be entirely separate from ftrace function triggers, in order to keep the respective implementations relatively simple and to allow them to diverge. The event trigger functionality is built on top of SOFT_DISABLE functionality. It adds a TRIGGER_MODE bit to the ftrace_event_file flags which is checked when any trace event fires. Triggers set for a particular event need to be checked regardless of whether that event is actually enabled or not - getting an event to fire even if it's not enabled is what's already implemented by SOFT_DISABLE mode, so trigger mode directly reuses that. Event trigger essentially inherit the soft disable logic in __ftrace_event_enable_disable() while adding a bit of logic and trigger reference counting via tm_ref on top of that in a new trace_event_trigger_enable_disable() function. Because the base __ftrace_event_enable_disable() code now needs to be invoked from outside trace_events.c, a wrapper is also added for those usages. The triggers for an event are actually invoked via a new function, event_triggers_call(), and code is also added to invoke them for ftrace_raw_event calls as well as syscall events. The main part of the patch creates a new trace_events_trigger.c file to contain the trace event triggers implementation. The standard open, read, and release file operations are implemented here. The open() implementation sets up for the various open modes of the 'trigger' file. It creates and attaches the trigger iterator and sets up the command parser. If opened for reading set up the trigger seq_ops. The read() implementation parses the event trigger written to the 'trigger' file, looks up the trigger command, and passes it along to that event_command's func() implementation for command-specific processing. The release() implementation does whatever cleanup is needed to release the 'trigger' file, like releasing the parser and trigger iterator, etc. A couple of functions for event command registration and unregistration are added, along with a list to add them to and a mutex to protect them, as well as an (initially empty) registration function to add the set of commands that will be added by future commits, and call to it from the trace event initialization code. also added are a couple trigger-specific data structures needed for these implementations such as a trigger iterator and a struct for trigger-specific data. A couple structs consisting mostly of function meant to be implemented in command-specific ways, event_command and event_trigger_ops, are used by the generic event trigger command implementations. They're being put into trace.h alongside the other trace_event data structures and functions, in the expectation that they'll be needed in several trace_event-related files such as trace_events_trigger.c and trace_events.c. The event_command.func() function is meant to be called by the trigger parsing code in order to add a trigger instance to the corresponding event. It essentially coordinates adding a live trigger instance to the event, and arming the triggering the event. Every event_command func() implementation essentially does the same thing for any command: - choose ops - use the value of param to choose either a number or count version of event_trigger_ops specific to the command - do the register or unregister of those ops - associate a filter, if specified, with the triggering event The reg() and unreg() ops allow command-specific implementations for event_trigger_op registration and unregistration, and the get_trigger_ops() op allows command-specific event_trigger_ops selection to be parameterized. When a trigger instance is added, the reg() op essentially adds that trigger to the triggering event and arms it, while unreg() does the opposite. The set_filter() function is used to associate a filter with the trigger - if the command doesn't specify a set_filter() implementation, the command will ignore filters. Each command has an associated trigger_type, which serves double duty, both as a unique identifier for the command as well as a value that can be used for setting a trigger mode bit during trigger invocation. The signature of func() adds a pointer to the event_command struct, used to invoke those functions, along with a command_data param that can be passed to the reg/unreg functions. This allows func() implementations to use command-specific blobs and supports code re-use. The event_trigger_ops.func() command corrsponds to the trigger 'probe' function that gets called when the triggering event is actually invoked. The other functions are used to list the trigger when needed, along with a couple mundane book-keeping functions. This also moves event_file_data() into trace.h so it can be used outside of trace_events.c. Link: http://lkml.kernel.org/r/316d95061accdee070aac8e5750afba0192fa5b9.1382622043.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Idea-by: Steve Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-12-20Merge tag 'trace-fixes-v3.13-rc2' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull ftrace fix from Steven Rostedt: "This fixes a long standing bug in the ftrace profiler. The problem is that the profiler only initializes the online CPUs, and not possible CPUs. This causes issues if the user takes CPUs online or offline while the profiler is running. If we online a CPU after starting the profiler, we lose all the trace information on the CPU going online. If we offline a CPU after running a test and start a new test, it will not clear the old data from that CPU. This bug causes incorrect data to be reported to the user if they online or offline CPUs during the profiling" * tag 'trace-fixes-v3.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Initialize the ftrace profiler for each possible cpu
2013-12-16ftrace: Initialize the ftrace profiler for each possible cpuMiao Xie1-1/+1
Ftrace currently initializes only the online CPUs. This implementation has two problems: - If we online a CPU after we enable the function profile, and then run the test, we will lose the trace information on that CPU. Steps to reproduce: # echo 0 > /sys/devices/system/cpu/cpu1/online # cd <debugfs>/tracing/ # echo <some function name> >> set_ftrace_filter # echo 1 > function_profile_enabled # echo 1 > /sys/devices/system/cpu/cpu1/online # run test - If we offline a CPU before we enable the function profile, we will not clear the trace information when we enable the function profile. It will trouble the users. Steps to reproduce: # cd <debugfs>/tracing/ # echo <some function name> >> set_ftrace_filter # echo 1 > function_profile_enabled # run test # cat trace_stat/function* # echo 0 > /sys/devices/system/cpu/cpu1/online # echo 0 > function_profile_enabled # echo 1 > function_profile_enabled # cat trace_stat/function* # run test # cat trace_stat/function* So it is better that we initialize the ftrace profiler for each possible cpu every time we enable the function profile instead of just the online ones. Link: http://lkml.kernel.org/r/1387178401-10619-1-git-send-email-miaox@cn.fujitsu.com Cc: stable@vger.kernel.org # 2.6.31+ Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-12-06Merge tag 'trace-fixes-3.13-rc2' of ↵Linus Torvalds2-10/+3
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "A regression showed up that there's a large delay when enabling all events. This was prevalent when FTRACE_SELFTEST was enabled which enables all events several times, and caused the system bootup to pause for over a minute. This was tracked down to an addition of a synchronize_sched() performed when system call tracepoints are unregistered. The synchronize_sched() is needed between the unregistering of the system call tracepoint and a deletion of a tracing instance buffer. But placing the synchronize_sched() in the unreg of *every* system call tracepoint is a bit overboard. A single synchronize_sched() before the deletion of the instance is sufficient" * tag 'trace-fixes-3.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Only run synchronize_sched() at instance deletion time
2013-12-05tracing: Only run synchronize_sched() at instance deletion timeSteven Rostedt2-10/+3
It has been reported that boot up with FTRACE_SELFTEST enabled can take a very long time. There can be stalls of over a minute. This was tracked down to the synchronize_sched() called when a system call event is disabled. As the self tests enable and disable thousands of events, this makes the synchronize_sched() get called thousands of times. The synchornize_sched() was added with d562aff93bfb53 "tracing: Add support for SOFT_DISABLE to syscall events" which caused this regression (added in 3.13-rc1). The synchronize_sched() is to protect against the events being accessed when a tracer instance is being deleted. When an instance is being deleted all the events associated to it are unregistered. The synchronize_sched() makes sure that no more users are running when it finishes. Instead of calling synchronize_sched() for all syscall events, we only need to call it once, after the events are unregistered and before the instance is deleted. The event_mutex is held during this action to prevent new users from enabling events. Link: http://lkml.kernel.org/r/20131203124120.427b9661@gandalf.local.home Reported-by: Petr Mladek <pmladek@suse.cz> Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com> Acked-by: Petr Mladek <pmladek@suse.cz> Tested-by: Petr Mladek <pmladek@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-12-02Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-1/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc kernel and tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools lib traceevent: Fix conversion of pointer to integer of different size perf/trace: Properly use u64 to hold event_id perf: Remove fragile swevent hlist optimization ftrace, perf: Avoid infinite event generation loop tools lib traceevent: Fix use of multiple options in processing field perf header: Fix possible memory leaks in process_group_desc() perf header: Fix bogus group name perf tools: Tag thread comm as overriden
2013-11-26ftrace: Fix function graph with loading of modulesSteven Rostedt (Red Hat)1-29/+35
Commit 8c4f3c3fa9681 "ftrace: Check module functions being traced on reload" fixed module loading and unloading with respect to function tracing, but it missed the function graph tracer. If you perform the following # cd /sys/kernel/debug/tracing # echo function_graph > current_tracer # modprobe nfsd # echo nop > current_tracer You'll get the following oops message: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 2910 at /linux.git/kernel/trace/ftrace.c:1640 __ftrace_hash_rec_update.part.35+0x168/0x1b9() Modules linked in: nfsd exportfs nfs_acl lockd ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables uinput snd_hda_codec_idt CPU: 2 PID: 2910 Comm: bash Not tainted 3.13.0-rc1-test #7 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007 0000000000000668 ffff8800787efcf8 ffffffff814fe193 ffff88007d500000 0000000000000000 ffff8800787efd38 ffffffff8103b80a 0000000000000668 ffffffff810b2b9a ffffffff81a48370 0000000000000001 ffff880037aea000 Call Trace: [<ffffffff814fe193>] dump_stack+0x4f/0x7c [<ffffffff8103b80a>] warn_slowpath_common+0x81/0x9b [<ffffffff810b2b9a>] ? __ftrace_hash_rec_update.part.35+0x168/0x1b9 [<ffffffff8103b83e>] warn_slowpath_null+0x1a/0x1c [<ffffffff810b2b9a>] __ftrace_hash_rec_update.part.35+0x168/0x1b9 [<ffffffff81502f89>] ? __mutex_lock_slowpath+0x364/0x364 [<ffffffff810b2cc2>] ftrace_shutdown+0xd7/0x12b [<ffffffff810b47f0>] unregister_ftrace_graph+0x49/0x78 [<ffffffff810c4b30>] graph_trace_reset+0xe/0x10 [<ffffffff810bf393>] tracing_set_tracer+0xa7/0x26a [<ffffffff810bf5e1>] tracing_set_trace_write+0x8b/0xbd [<ffffffff810c501c>] ? ftrace_return_to_handler+0xb2/0xde [<ffffffff811240a8>] ? __sb_end_write+0x5e/0x5e [<ffffffff81122aed>] vfs_write+0xab/0xf6 [<ffffffff8150a185>] ftrace_graph_caller+0x85/0x85 [<ffffffff81122dbd>] SyS_write+0x59/0x82 [<ffffffff8150a185>] ftrace_graph_caller+0x85/0x85 [<ffffffff8150a2d2>] system_call_fastpath+0x16/0x1b ---[ end trace 940358030751eafb ]--- The above mentioned commit didn't go far enough. Well, it covered the function tracer by adding checks in __register_ftrace_function(). The problem is that the function graph tracer circumvents that (for a slight efficiency gain when function graph trace is running with a function tracer. The gain was not worth this). The problem came with ftrace_startup() which should always be called after __register_ftrace_function(), if you want this bug to be completely fixed. Anyway, this solution moves __register_ftrace_function() inside of ftrace_startup() and removes the need to call them both. Reported-by: Dave Wysochanski <dwysocha@redhat.com> Fixes: ed926f9b35cd ("ftrace: Use counters to enable functions to trace") Cc: stable@vger.kernel.org # 3.0+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-24block: Abstract out bvec iteratorKent Overstreet1-7/+8
Immutable biovecs are going to require an explicit iterator. To implement immutable bvecs, a later patch is going to add a bi_bvec_done member to this struct; for now, this patch effectively just renames things. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@inktank.com> Cc: ceph-devel@vger.kernel.org Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: Benny Halevy <bhalevy@tonian.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@fusionio.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Ben Myers <bpm@sgi.com> Cc: xfs@oss.sgi.com Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: "Roger Pau Monné" <roger.pau@citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchand@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Peng Tao <tao.peng@emc.com> Cc: Andy Adamson <andros@netapp.com> Cc: fanchaoting <fanchaoting@cn.fujitsu.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Pankaj Kumar <pankaj.km@samsung.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Mel Gorman <mgorman@suse.de>6
2013-11-19perf/trace: Properly use u64 to hold event_idVince Weaver1-1/+1
The 64-bit attr.config value for perf trace events was being copied into an "int" before doing a comparison, meaning the top 32 bits were being truncated. As far as I can tell this didn't cause any errors, but it did mean it was possible to create valid aliases for all the tracepoint ids which I don't think was intended. (For example, 0xffffffff00000018 and 0x18 both enable the same tracepoint). Signed-off-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1311151236100.11932@vincent-weaver-1.um.maine.edu Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-19ftrace, perf: Avoid infinite event generation loopPeter Zijlstra1-0/+6
Vince's perf-trinity fuzzer found yet another 'interesting' problem. When we sample the irq_work_exit tracepoint with period==1 (or PERF_SAMPLE_PERIOD) and we add an fasync SIGNAL handler we create an infinite event generation loop: ,-> <IPI> | irq_work_exit() -> | trace_irq_work_exit() -> | ... | __perf_event_overflow() -> (due to fasync) | irq_work_queue() -> (irq_work_list must be empty) '--------- arch_irq_work_raise() Similar things can happen due to regular poll() wakeups if we exceed the ring-buffer wakeup watermark, or have an event_limit. To avoid this, dis-allow sampling this particular tracepoint. In order to achieve this, create a special perf_perm function pointer for each event and call this (when set) on trying to create a tracepoint perf event. [ roasted: use expr... to allow for ',' in your expression ] Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Dave Jones <davej@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20131114152304.GC5364@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-17Merge tag 'trace-3.13' of ↵Linus Torvalds14-209/+518
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing update from Steven Rostedt: "This batch of changes is mostly clean ups and small bug fixes. The only real feature that was added this release is from Namhyung Kim, who introduced "set_graph_notrace" filter that lets you run the function graph tracer and not trace particular functions and their call chain. Tom Zanussi added some updates to the ftrace multibuffer tracing that made it more consistent with the top level tracing. One of the fixes for perf function tracing required an API change in RCU; the addition of "rcu_is_watching()". As Paul McKenney is pushing that change in this release too, he gave me a branch that included all the changes to get that working, and I pulled that into my tree in order to complete the perf function tracing fix" * tag 'trace-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Add rcu annotation for syscall trace descriptors tracing: Do not use signed enums with unsigned long long in fgragh output tracing: Remove unused function ftrace_off_permanent() tracing: Do not assign filp->private_data to freed memory tracing: Add helper function tracing_is_disabled() tracing: Open tracer when ftrace_dump_on_oops is used tracing: Add support for SOFT_DISABLE to syscall events tracing: Make register/unregister_ftrace_command __init tracing: Update event filters for multibuffer recordmcount.pl: Add support for __fentry__ ftrace: Have control op function callback only trace when RCU is watching rcu: Do not trace rcu_is_watching() functions ftrace/x86: skip over the breakpoint for ftrace caller trace/trace_stat: use rbtree postorder iteration helper instead of opencoding ftrace: Add set_graph_notrace filter ftrace: Narrow down the protected area of graph_lock ftrace: Introduce struct ftrace_graph_data ftrace: Get rid of ftrace_graph_filter_enabled tracing: Fix potential out-of-bounds in trace_get_user() tracing: Show more exact help information about snapshot
2013-11-14Merge branch 'for-3.13/core' of git://git.kernel.dk/linux-blockLinus Torvalds1-8/+28
Pull block IO core updates from Jens Axboe: "This is the pull request for the core changes in the block layer for 3.13. It contains: - The new blk-mq request interface. This is a new and more scalable queueing model that marries the best part of the request based interface we currently have (which is fully featured, but scales poorly) and the bio based "interface" which the new drivers for high IOPS devices end up using because it's much faster than the request based one. The bio interface has no block layer support, since it taps into the stack much earlier. This means that drivers end up having to implement a lot of functionality on their own, like tagging, timeout handling, requeue, etc. The blk-mq interface provides all these. Some drivers even provide a switch to select bio or rq and has code to handle both, since things like merging only works in the rq model and hence is faster for some workloads. This is a huge mess. Conversion of these drivers nets us a substantial code reduction. Initial results on converting SCSI to this model even shows an 8x improvement on single queue devices. So while the model was intended to work on the newer multiqueue devices, it has substantial improvements for "classic" hardware as well. This code has gone through extensive testing and development, it's now ready to go. A pull request is coming to convert virtio-blk to this model will be will be coming as well, with more drivers scheduled for 3.14 conversion. - Two blktrace fixes from Jan and Chen Gang. - A plug merge fix from Alireza Haghdoost. - Conversion of __get_cpu_var() from Christoph Lameter. - Fix for sector_div() with 64-bit divider from Geert Uytterhoeven. - A fix for a race between request completion and the timeout handling from Jeff Moyer. This is what caused the merge conflict with blk-mq/core, in case you are looking at that. - A dm stacking fix from Mike Snitzer. - A code consolidation fix and duplicated code removal from Kent Overstreet. - A handful of block bug fixes from Mikulas Patocka, fixing a loop crash and memory corruption on blk cg. - Elevator switch bug fix from Tomoki Sekiyama. A heads-up that I had to rebase this branch. Initially the immutable bio_vecs had been queued up for inclusion, but a week later, it became clear that it wasn't fully cooked yet. So the decision was made to pull this out and postpone it until 3.14. It was a straight forward rebase, just pruning out the immutable series and the later fixes of problems with it. The rest of the patches applied directly and no further changes were made" * 'for-3.13/core' of git://git.kernel.dk/linux-block: (31 commits) block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO block: Do not call sector_div() with a 64-bit divisor kernel: trace: blktrace: remove redundent memcpy() in compat_blk_trace_setup() block: Consolidate duplicated bio_trim() implementations block: Use rw_copy_check_uvector() block: Enable sysfs nomerge control for I/O requests in the plug list block: properly stack underlying max_segment_size to DM device elevator: acquire q->sysfs_lock in elevator_change() elevator: Fix a race in elevator switching and md device initialization block: Replace __get_cpu_var uses bdi: test bdi_init failure block: fix a probe argument to blk_register_region loop: fix crash if blk_alloc_queue fails blk-core: Fix memory corruption if blkcg_init_queue fails block: fix race between request completion and timeout handling blktrace: Send BLK_TN_PROCESS events to all running traces blk-mq: don't disallow request merges for req->special being set blk-mq: mq plug list breakage blk-mq: fix for flush deadlock ...
2013-11-12Merge branch 'sched-core-for-linus' of ↵Linus Torvalds3-3/+20
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "The main changes in this cycle are: - (much) improved CONFIG_NUMA_BALANCING support from Mel Gorman, Rik van Riel, Peter Zijlstra et al. Yay! - optimize preemption counter handling: merge the NEED_RESCHED flag into the preempt_count variable, by Peter Zijlstra. - wait.h fixes and code reorganization from Peter Zijlstra - cfs_bandwidth fixes from Ben Segall - SMP load-balancer cleanups from Peter Zijstra - idle balancer improvements from Jason Low - other fixes and cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits) ftrace, sched: Add TRACE_FLAG_PREEMPT_RESCHED stop_machine: Fix race between stop_two_cpus() and stop_cpus() sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus sched: Fix asymmetric scheduling for POWER7 sched: Move completion code from core.c to completion.c sched: Move wait code from core.c to wait.c sched: Move wait.c into kernel/sched/ sched/wait: Fix __wait_event_interruptible_lock_irq_timeout() sched: Avoid throttle_cfs_rq() racing with period_timer stopping sched: Guarantee new group-entities always have weight sched: Fix hrtimer_cancel()/rq->lock deadlock sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining sched: Fix race on toggling cfs_bandwidth_used sched: Remove extra put_online_cpus() inside sched_setaffinity() sched/rt: Fix task_tick_rt() comment sched/wait: Fix build breakage sched/wait: Introduce prepare_to_wait_event() sched/wait: Add ___wait_cond_timeout() to wait_event*_timeout() too sched: Remove get_online_cpus() usage sched: Fix race in migrate_swap_stop() ...
2013-11-11tracing: Add rcu annotation for syscall trace descriptorsSteven Rostedt (Red Hat)1-2/+2
sparse complains about the enter/exit_sysycall_files[] variables being dereferenced with rcu_dereference_sched(). The fields need to be annotated with __rcu. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-11ftrace, sched: Add TRACE_FLAG_PREEMPT_RESCHEDPeter Zijlstra3-3/+20
Since the introduction of PREEMPT_NEED_RESCHED in: f27dde8deef3 ("sched: Add NEED_RESCHED to the preempt_count") we need to be able to look at both TIF_NEED_RESCHED and PREEMPT_NEED_RESCHED to understand the full preemption behaviour. Add it to the trace output. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com> Link: http://lkml.kernel.org/r/20131004152826.GP3081@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-08kernel: trace: blktrace: remove redundent memcpy() in compat_blk_trace_setup()Chen Gang1-2/+1
do_blk_trace_setup() will fully initialize 'buts.name', so can remove the related memcpy(). And also use BLKTRACE_BDEV_SIZE and ARRAY_SIZE instead of hard code number '32'. Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-11-08blktrace: Send BLK_TN_PROCESS events to all running tracesJan Kara1-6/+27
Currently each task sends BLK_TN_PROCESS event to the first traced device it interacts with after a new trace is started. When there are several traced devices and the task accesses more devices, this logic can result in BLK_TN_PROCESS being sent several times to some devices while it is never sent to other devices. Thus blkparse doesn't display command name when parsing some blktrace files. Fix the problem by sending BLK_TN_PROCESS event to all traced devices when a task interacts with any of them. Signed-off-by: Jan Kara <jack@suse.cz> Review-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-11-07tracing: Do not use signed enums with unsigned long long in fgragh outputSteven Rostedt (Red Hat)2-11/+13
The duration field of print_graph_duration() can also be used to do the space filling by passing an enum in it: DURATION_FILL_FULL DURATION_FILL_START DURATION_FILL_END The problem is that these are enums and defined as negative, but the duration field is unsigned long long. Most archs are fine with this but blackfin fails to compile because of it: kernel/built-in.o: In function `print_graph_duration': kernel/trace/trace_functions_graph.c:782: undefined reference to `__ucmpdi2' Overloading a unsigned long long with an signed enum is just bad in principle. We can accomplish the same thing by using part of the flags field instead. Cc: Mike Frysinger <vapier@gentoo.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-07tracing: Remove unused function ftrace_off_permanent()Steven Rostedt (Red Hat)1-15/+0
In the past, ftrace_off_permanent() was called if something strange was detected. But the ftrace_bug() now handles all the anomolies that can happen with ftrace (function tracing), and there are no uses of ftrace_off_permanent(). Get rid of it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-07tracing: Do not assign filp->private_data to freed memoryGeyslan G. Bem1-1/+8
In system_tr_open(), the filp->private_data can be assigned the 'dir' variable even if it was freed. This is on the error path, and is harmless because the error return code will prevent filp->private_data from being used. But for correctness, we should not assign it to a recently freed variable, as that can cause static tools to give false warnings. Also have both subsystem_open() and system_tr_open() return -ENODEV if tracing has been disabled. Link: http://lkml.kernel.org/r/1383764571-7318-1-git-send-email-geyslan@gmail.com Signed-off-by: Geyslan G. Bem <geyslan@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-06perf/ftrace: Fix paranoid level for enabling function tracerSteven Rostedt1-1/+1
The current default perf paranoid level is "1" which has "perf_paranoid_kernel()" return false, and giving any operations that use it, access to normal users. Unfortunately, this includes function tracing and normal users should not be allowed to enable function tracing by default. The proper level is defined at "-1" (full perf access), which "perf_paranoid_tracepoint_raw()" will only give access to. Use that check instead for enabling function tracing. Reported-by: Dave Jones <davej@redhat.com> Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: stable@vger.kernel.org # 3.4+ CVE: CVE-2013-2930 Fixes: ced39002f5ea ("ftrace, perf: Add support to use function tracepoint in perf") Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-06tracing: Add helper function tracing_is_disabled()Geyslan G. Bem2-0/+6
This patch creates the function 'tracing_is_disabled', which can be used outside of trace.c. Link: http://lkml.kernel.org/r/1382141754-12155-1-git-send-email-geyslan@gmail.com Signed-off-by: Geyslan G. Bem <geyslan@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-06tracing: Open tracer when ftrace_dump_on_oops is usedCody P Schafer1-0/+11
With ftrace_dump_on_oops, we previously did not open the tracer in question, sometimes causing the trace output to be useless. For example, the function_graph tracer with tracing_thresh set dumped via ftrace_dump_on_oops would show a series of '}' indented at different levels, but no function names. call trace->open() (and do a few other fixups copied from the normal dump path) to make the output more intelligible. Link: http://lkml.kernel.org/r/1382554197-16961-1-git-send-email-cody@linux.vnet.ibm.com Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-06tracing: Add support for SOFT_DISABLE to syscall eventsTom Zanussi2-12/+34
The original SOFT_DISABLE patches didn't add support for soft disable of syscall events; this adds it. Add an array of ftrace_event_file pointers indexed by syscall number to the trace array and remove the existing enabled bitmaps, which as a result are now redundant. The ftrace_event_file structs in turn contain the soft disable flags we need for per-syscall soft disable accounting. Adding ftrace_event_files also means we can remove the USE_CALL_FILTER bit, thus enabling multibuffer filter support for syscall events. Link: http://lkml.kernel.org/r/6e72b566e85d8df8042f133efbc6c30e21fb017e.1382620672.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-06tracing: Make register/unregister_ftrace_command __initTom Zanussi2-4/+12
register/unregister_ftrace_command() are only ever called from __init functions, so can themselves be made __init. Also make register_snapshot_cmd() __init for the same reason. Link: http://lkml.kernel.org/r/d4042c8cadb7ae6f843ac9a89a24e1c6a3099727.1382620672.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-06tracing: Update event filters for multibufferTom Zanussi12-91/+239
The trace event filters are still tied to event calls rather than event files, which means you don't get what you'd expect when using filters in the multibuffer case: Before: # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 8192 # mkdir /sys/kernel/debug/tracing/instances/test1 # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 2048 # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter bytes_alloc > 2048 Setting the filter in tracing/instances/test1/events shouldn't affect the same event in tracing/events as it does above. After: # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 8192 # mkdir /sys/kernel/debug/tracing/instances/test1 # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 8192 # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter bytes_alloc > 2048 We'd like to just move the filter directly from ftrace_event_call to ftrace_event_file, but there are a couple cases that don't yet have multibuffer support and therefore have to continue using the current event_call-based filters. For those cases, a new USE_CALL_FILTER bit is added to the event_call flags, whose main purpose is to keep the old behavior for those cases until they can be updated with multibuffer support; at that point, the USE_CALL_FILTER flag (and the new associated call_filter_check_discard() function) can go away. The multibuffer support also made filter_current_check_discard() redundant, so this change removes that function as well and replaces it with filter_check_discard() (or call_filter_check_discard() as appropriate). Link: http://lkml.kernel.org/r/f16e9ce4270c62f46b2e966119225e1c3cca7e60.1382620672.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-06ftrace: Have control op function callback only trace when RCU is watchingSteven Rostedt (Red Hat)1-0/+9
Dave Jones reported that trinity would be able to trigger the following back trace: =============================== [ INFO: suspicious RCU usage. ] 3.10.0-rc2+ #38 Not tainted ------------------------------- include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle! other info that might help us debug this: RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0 RCU used illegally from extended quiescent state! 1 lock held by trinity-child1/18786: #0: (rcu_read_lock){.+.+..}, at: [<ffffffff8113dd48>] __perf_event_overflow+0x108/0x310 stack backtrace: CPU: 3 PID: 18786 Comm: trinity-child1 Not tainted 3.10.0-rc2+ #38 0000000000000000 ffff88020767bac8 ffffffff816e2f6b ffff88020767baf8 ffffffff810b5897 ffff88021de92520 0000000000000000 ffff88020767bbf8 0000000000000000 ffff88020767bb78 ffffffff8113ded4 ffffffff8113dd48 Call Trace: [<ffffffff816e2f6b>] dump_stack+0x19/0x1b [<ffffffff810b5897>] lockdep_rcu_suspicious+0xe7/0x120 [<ffffffff8113ded4>] __perf_event_overflow+0x294/0x310 [<ffffffff8113dd48>] ? __perf_event_overflow+0x108/0x310 [<ffffffff81309289>] ? __const_udelay+0x29/0x30 [<ffffffff81076054>] ? __rcu_read_unlock+0x54/0xa0 [<ffffffff816f4000>] ? ftrace_call+0x5/0x2f [<ffffffff8113dfa1>] perf_swevent_overflow+0x51/0xe0 [<ffffffff8113e08f>] perf_swevent_event+0x5f/0x90 [<ffffffff8113e1c9>] perf_tp_event+0x109/0x4f0 [<ffffffff8113e36f>] ? perf_tp_event+0x2af/0x4f0 [<ffffffff81074630>] ? __rcu_read_lock+0x20/0x20 [<ffffffff8112d79f>] perf_ftrace_function_call+0xbf/0xd0 [<ffffffff8110e1e1>] ? ftrace_ops_control_func+0x181/0x210 [<ffffffff81074630>] ? __rcu_read_lock+0x20/0x20 [<ffffffff81100cae>] ? rcu_eqs_enter_common+0x5e/0x470 [<ffffffff8110e1e1>] ftrace_ops_control_func+0x181/0x210 [<ffffffff816f4000>] ftrace_call+0x5/0x2f [<ffffffff8110e229>] ? ftrace_ops_control_func+0x1c9/0x210 [<ffffffff816f4000>] ? ftrace_call+0x5/0x2f [<ffffffff81074635>] ? debug_lockdep_rcu_enabled+0x5/0x40 [<ffffffff81074635>] ? debug_lockdep_rcu_enabled+0x5/0x40 [<ffffffff81100cae>] ? rcu_eqs_enter_common+0x5e/0x470 [<ffffffff8110112a>] rcu_eqs_enter+0x6a/0xb0 [<ffffffff81103673>] rcu_user_enter+0x13/0x20 [<ffffffff8114541a>] user_enter+0x6a/0xd0 [<ffffffff8100f6d8>] syscall_trace_leave+0x78/0x140 [<ffffffff816f46af>] int_check_syscall_exit_work+0x34/0x3d ------------[ cut here ]------------ Perf uses rcu_read_lock() but as the function tracer can trace functions even when RCU is not currently active, this makes the rcu_read_lock() used by perf ineffective. As perf is currently the only user of the ftrace_ops_control_func() and perf is also the only function callback that actively uses rcu_read_lock(), the quick fix is to prevent the ftrace_ops_control_func() from calling its callbacks if RCU is not active. With Paul's new "rcu_is_watching()" we can tell if RCU is active or not. Reported-by: Dave Jones <davej@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-06trace/trace_stat: use rbtree postorder iteration helper instead of opencodingCody P Schafer1-36/+5
Use rbtree_postorder_for_each_entry_safe() to destroy the rbtree instead of opencoding an alternate postorder iteration that modifies the tree Link: http://lkml.kernel.org/r/1383345566-25087-2-git-send-email-cody@linux.vnet.ibm.com Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-19ftrace: Add set_graph_notrace filterNamhyung Kim3-3/+108
The set_graph_notrace filter is analogous to set_ftrace_notrace and can be used for eliminating uninteresting part of function graph trace output. It also works with set_graph_function nicely. # cd /sys/kernel/debug/tracing/ # echo do_page_fault > set_graph_function # perf ftrace live true 2) | do_page_fault() { 2) | __do_page_fault() { 2) 0.381 us | down_read_trylock(); 2) 0.055 us | __might_sleep(); 2) 0.696 us | find_vma(); 2) | handle_mm_fault() { 2) | handle_pte_fault() { 2) | __do_fault() { 2) | filemap_fault() { 2) | find_get_page() { 2) 0.033 us | __rcu_read_lock(); 2) 0.035 us | __rcu_read_unlock(); 2) 1.696 us | } 2) 0.031 us | __might_sleep(); 2) 2.831 us | } 2) | _raw_spin_lock() { 2) 0.046 us | add_preempt_count(); 2) 0.841 us | } 2) 0.033 us | page_add_file_rmap(); 2) | _raw_spin_unlock() { 2) 0.057 us | sub_preempt_count(); 2) 0.568 us | } 2) | unlock_page() { 2) 0.084 us | page_waitqueue(); 2) 0.126 us | __wake_up_bit(); 2) 1.117 us | } 2) 7.729 us | } 2) 8.397 us | } 2) 8.956 us | } 2) 0.085 us | up_read(); 2) + 12.745 us | } 2) + 13.401 us | } ... # echo handle_mm_fault > set_graph_notrace # perf ftrace live true 1) | do_page_fault() { 1) | __do_page_fault() { 1) 0.205 us | down_read_trylock(); 1) 0.041 us | __might_sleep(); 1) 0.344 us | find_vma(); 1) 0.069 us | up_read(); 1) 4.692 us | } 1) 5.311 us | } ... Link: http://lkml.kernel.org/r/1381739066-7531-5-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-19ftrace: Narrow down the protected area of graph_lockNamhyung Kim1-13/+9
The parser set up is just a generic utility that uses local variables allocated by the function. There's no need to hold the graph_lock for this set up. This also makes the code simpler. Link: http://lkml.kernel.org/r/1381739066-7531-4-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-19ftrace: Introduce struct ftrace_graph_dataNamhyung Kim1-19/+62
The struct ftrace_graph_data is for generalizing the access to set_graph_function file. This is a preparation for adding support to set_graph_notrace. Link: http://lkml.kernel.org/r/1381739066-7531-3-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-19ftrace: Get rid of ftrace_graph_filter_enabledNamhyung Kim2-7/+2
The ftrace_graph_filter_enabled means that user sets function filter and it always has same meaning of ftrace_graph_count > 0. Link: http://lkml.kernel.org/r/1381739066-7531-2-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-19tracing: Fix potential out-of-bounds in trace_get_user()Steven Rostedt1-1/+4
Andrey reported the following report: ERROR: AddressSanitizer: heap-buffer-overflow on address ffff8800359c99f3 ffff8800359c99f3 is located 0 bytes to the right of 243-byte region [ffff8800359c9900, ffff8800359c99f3) Accessed by thread T13003: #0 ffffffff810dd2da (asan_report_error+0x32a/0x440) #1 ffffffff810dc6b0 (asan_check_region+0x30/0x40) #2 ffffffff810dd4d3 (__tsan_write1+0x13/0x20) #3 ffffffff811cd19e (ftrace_regex_release+0x1be/0x260) #4 ffffffff812a1065 (__fput+0x155/0x360) #5 ffffffff812a12de (____fput+0x1e/0x30) #6 ffffffff8111708d (task_work_run+0x10d/0x140) #7 ffffffff810ea043 (do_exit+0x433/0x11f0) #8 ffffffff810eaee4 (do_group_exit+0x84/0x130) #9 ffffffff810eafb1 (SyS_exit_group+0x21/0x30) #10 ffffffff81928782 (system_call_fastpath+0x16/0x1b) Allocated by thread T5167: #0 ffffffff810dc778 (asan_slab_alloc+0x48/0xc0) #1 ffffffff8128337c (__kmalloc+0xbc/0x500) #2 ffffffff811d9d54 (trace_parser_get_init+0x34/0x90) #3 ffffffff811cd7b3 (ftrace_regex_open+0x83/0x2e0) #4 ffffffff811cda7d (ftrace_filter_open+0x2d/0x40) #5 ffffffff8129b4ff (do_dentry_open+0x32f/0x430) #6 ffffffff8129b668 (finish_open+0x68/0xa0) #7 ffffffff812b66ac (do_last+0xb8c/0x1710) #8 ffffffff812b7350 (path_openat+0x120/0xb50) #9 ffffffff812b8884 (do_filp_open+0x54/0xb0) #10 ffffffff8129d36c (do_sys_open+0x1ac/0x2c0) #11 ffffffff8129d4b7 (SyS_open+0x37/0x50) #12 ffffffff81928782 (system_call_fastpath+0x16/0x1b) Shadow bytes around the buggy address: ffff8800359c9700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd ffff8800359c9780: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa ffff8800359c9800: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9880: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>ffff8800359c9980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00[03]fb ffff8800359c9a00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9a80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9b00: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00 ffff8800359c9b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8800359c9c00: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap redzone: fa Heap kmalloc redzone: fb Freed heap region: fd Shadow gap: fe The out-of-bounds access happens on 'parser->buffer[parser->idx] = 0;' Although the crash happened in ftrace_regex_open() the real bug occurred in trace_get_user() where there's an incrementation to parser->idx without a check against the size. The way it is triggered is if userspace sends in 128 characters (EVENT_BUF_SIZE + 1), the loop that reads the last character stores it and then breaks out because there is no more characters. Then the last character is read to determine what to do next, and the index is incremented without checking size. Then the caller of trace_get_user() usually nulls out the last character with a zero, but since the index is equal to the size, it writes a nul character after the allocated space, which can corrupt memory. Luckily, only root user has write access to this file. Link: http://lkml.kernel.org/r/20131009222323.04fd1a0d@gandalf.local.home Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-10-10tracing: Show more exact help information about snapshotWang YanQing1-1/+1
The current "help" that comes out of the snapshot file when it is not allocated looks like this: # * Snapshot is freed * # # Snapshot commands: # echo 0 > snapshot : Clears and frees snapshot buffer # echo 1 > snapshot : Allocates snapshot buffer, if not already allocated. # Takes a snapshot of the main buffer. # echo 2 > snapshot : Clears snapshot buffer (but does not allocate) # (Doesn't have to be '2' works with any number that # is not a '0' or '1') Echo 2 says that it does not allocate the buffer, which is correct, but to be more consistent with "echo 0" it should also state that it does not free. Link: http://lkml.kernel.org/r/20130914045916.GA4243@udknight Signed-off-by: Wang YanQing <udknight@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-09-10Merge tag 'trace-3.12' of ↵Linus Torvalds5-208/+64
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "Not much changes for the 3.12 merge window. The major tracing changes are still in flux, and will have to wait for 3.13. The changes for 3.12 are mostly clean ups and minor fixes. H Peter Anvin added a check to x86_32 static function tracing that helps a small segment of the kernel community. Oleg Nesterov had a few changes from 3.11, but were mostly clean ups and not worth pushing in the -rc time frame. Li Zefan had small clean up with annotating a raw_init with __init. I fixed a slight race in updating function callbacks, but the race is so small and the bug that happens when it occurs is so minor it's not even worth pushing to stable. The only real enhancement is from Alexander Z Lam that made the tracing_cpumask work for trace buffer instances, instead of them all sharing a global cpumask" * tag 'trace-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace/rcu: Do not trace debug_lockdep_rcu_enabled() x86-32, ftrace: Fix static ftrace when early microcode is enabled ftrace: Fix a slight race in modifying what function callback gets traced tracing: Make tracing_cpumask available for all instances tracing: Kill the !CONFIG_MODULES code in trace_events.c tracing: Don't pass file_operations array to event_create_dir() tracing: Kill trace_create_file_ops() and friends tracing/syscalls: Annotate raw_init function with __init
2013-09-04ftrace: Fix a slight race in modifying what function callback gets tracedSteven Rostedt (Red Hat)1-1/+16
There's a slight race when going from a list function to a non list function. That is, when only one callback is registered to the function tracer, it gets called directly by the mcount trampoline. But if this function has filters, it may be called by the wrong functions. As the list ops callback that handles multiple callbacks that are registered to ftrace, it also handles what functions they call. While the transaction is taking place, use the list function always, and after all the updates are finished (only the functions that should be traced are being traced), then we can update the trampoline to call the function directly. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-09-03Merge branch 'rcu/next' of ↵Ingo Molnar2-0/+22
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: " * Update RCU documentation. These were posted to LKML at https://lkml.org/lkml/2013/8/19/611. * Miscellaneous fixes. These were posted to LKML at https://lkml.org/lkml/2013/8/19/619. * Full-system idle detection. This is for use by Frederic Weisbecker's adaptive-ticks mechanism. Its purpose is to allow the timekeeping CPU to shut off its tick when all other CPUs are idle. These were posted to LKML at https://lkml.org/lkml/2013/8/19/648. * Improve rcutorture test coverage. These were posted to LKML at https://lkml.org/lkml/2013/8/19/675. " Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-22tracing: Make tracing_cpumask available for all instancesAlexander Z Lam2-17/+21
Allow tracer instances to disable tracing by cpu by moving the static global tracing_cpumask into trace_array. Link: http://lkml.kernel.org/r/921622317f239bfc2283cac2242647801ef584f2.1375980149.git.azl@google.com Cc: Vaibhav Nagarnaik <vnagarnaik@google.com> Cc: David Sharp <dhsharp@google.com> Cc: Alexander Z Lam <lambchop468@gmail.com> Signed-off-by: Alexander Z Lam <azl@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-08-22tracing: Kill the !CONFIG_MODULES code in trace_events.cOleg Nesterov1-12/+6
Move trace_module_nb under CONFIG_MODULES and kill the dummy trace_module_notify(). Imho it doesn't make sense to define "struct notifier_block" and its .notifier_call just to avoid "ifdef" in event_trace_init(), and all other !CONFIG_MODULES code has already gone away. Link: http://lkml.kernel.org/r/20130731173137.GA31043@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-08-22tracing: Don't pass file_operations array to event_create_dir()Oleg Nesterov1-34/+12
Now that event_create_dir() and __trace_add_new_event() always use the same file_operations we can kill these arguments and simplify the code. Link: http://lkml.kernel.org/r/20130731173135.GA31040@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-08-22tracing: Kill trace_create_file_ops() and friendsOleg Nesterov1-144/+9
trace_create_file_ops() allocates the copy of id/filter/format/enable file_operations to set "f_op->owner = mod" for fops_get(). However after the recent changes there is no reason to prevent rmmod even if one of these files is opened. A file operation can do nothing but fail after remove_event_file_dir() clears ->i_private for every file removed by trace_module_remove_events(). Kill "struct ftrace_module_file_ops" and fix the compilation errors. Link: http://lkml.kernel.org/r/20130731173132.GA31033@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>