diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2022-10-10 22:20:55 +0300 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2022-10-10 22:20:55 +0300 |
commit | cdf072acb5baa18e5b05bdf3f13d6481f62396fc (patch) | |
tree | 9e6e6dc4c5adb79b1babab8d4c08cc86f1fe222b /Documentation/trace | |
parent | dc55342867c9cf2295f4330420461361a9c8117d (diff) | |
parent | 4f881a696484a7d31a4d1b12547615b1a3ee5771 (diff) | |
download | linux-cdf072acb5baa18e5b05bdf3f13d6481f62396fc.tar.xz |
Merge tag 'trace-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
"Major changes:
- Changed location of tracing repo from personal git repo to:
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
- Added Masami Hiramatsu as co-maintainer
- Updated MAINTAINERS file to separate out FTRACE as it is more than
just TRACING.
Minor changes:
- Added Mark Rutland as FTRACE reviewer
- Updated user_events to make it on its way to remove the BROKEN tag.
The changes should now be acceptable but will run it through a
cycle and hopefully we can remove the BROKEN tag next release.
- Added filtering to eprobes
- Added a delta time to the benchmark trace event
- Have the histogram and filter callbacks called via a switch
statement instead of indirect functions. This speeds it up to avoid
retpolines.
- Add a way to wake up ring buffer waiters waiting for the ring
buffer to fill up to its watermark.
- New ioctl() on the trace_pipe_raw file to wake up ring buffer
waiters.
- Wake up waiters when the ring buffer is disabled. A reader may
block when the ring buffer is disabled, but if it was blocked when
the ring buffer is disabled it should then wake up.
Fixes:
- Allow splice to read partially read ring buffer pages. This fixes
splice never moving forward.
- Fix inverted compare that made the "shortest" ring buffer wait
queue actually the longest.
- Fix a race in the ring buffer between resetting a page when a
writer goes to another page, and the reader.
- Fix ftrace accounting bug when function hooks are added at boot up
before the weak functions are set to "disabled".
- Fix bug that freed a user allocated snapshot buffer when enabling a
tracer.
- Fix possible recursive locks in osnoise tracer
- Fix recursive locking direct functions
- Other minor clean ups and fixes"
* tag 'trace-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (44 commits)
ftrace: Create separate entry in MAINTAINERS for function hooks
tracing: Update MAINTAINERS to reflect new tracing git repo
tracing: Do not free snapshot if tracer is on cmdline
ftrace: Still disable enabled records marked as disabled
tracing/user_events: Move pages/locks into groups to prepare for namespaces
tracing: Add Masami Hiramatsu as co-maintainer
tracing: Remove unused variable 'dups'
MAINTAINERS: add myself as a tracing reviewer
ring-buffer: Fix race between reset page and reading page
tracing/user_events: Update ABI documentation to align to bits vs bytes
tracing/user_events: Use bits vs bytes for enabled status page data
tracing/user_events: Use refcount instead of atomic for ref tracking
tracing/user_events: Ensure user provided strings are safely formatted
tracing/user_events: Use WRITE instead of READ for io vector import
tracing/user_events: Use NULL for strstr checks
tracing: Fix spelling mistake "preapre" -> "prepare"
tracing: Wake up waiters when tracing is disabled
tracing: Add ioctl() to force ring buffer waiters to wake up
tracing: Wake up ring buffer waiters on closing of the file
ring-buffer: Add ring_buffer_wake_waiters()
...
Diffstat (limited to 'Documentation/trace')
-rw-r--r-- | Documentation/trace/user_events.rst | 86 |
1 files changed, 58 insertions, 28 deletions
diff --git a/Documentation/trace/user_events.rst b/Documentation/trace/user_events.rst index c180936f49fc..9f181f342a70 100644 --- a/Documentation/trace/user_events.rst +++ b/Documentation/trace/user_events.rst @@ -20,14 +20,14 @@ dynamic_events is the same as the ioctl with the u: prefix applied. Typically programs will register a set of events that they wish to expose to tools that can read trace_events (such as ftrace and perf). The registration -process gives back two ints to the program for each event. The first int is the -status index. This index describes which byte in the +process gives back two ints to the program for each event. The first int is +the status bit. This describes which bit in little-endian format in the /sys/kernel/debug/tracing/user_events_status file represents this event. The -second int is the write index. This index describes the data when a write() or +second int is the write index which describes the data when a write() or writev() is called on the /sys/kernel/debug/tracing/user_events_data file. -The structures referenced in this document are contained with the -/include/uap/linux/user_events.h file in the source tree. +The structures referenced in this document are contained within the +/include/uapi/linux/user_events.h file in the source tree. **NOTE:** *Both user_events_status and user_events_data are under the tracefs filesystem and may be mounted at different paths than above.* @@ -38,18 +38,18 @@ Registering within a user process is done via ioctl() out to the /sys/kernel/debug/tracing/user_events_data file. The command to issue is DIAG_IOCSREG. -This command takes a struct user_reg as an argument:: +This command takes a packed struct user_reg as an argument:: struct user_reg { u32 size; u64 name_args; - u32 status_index; + u32 status_bit; u32 write_index; }; The struct user_reg requires two inputs, the first is the size of the structure to ensure forward and backward compatibility. The second is the command string -to issue for registering. Upon success two outputs are set, the status index +to issue for registering. Upon success two outputs are set, the status bit and the write index. User based events show up under tracefs like any other event under the @@ -111,15 +111,56 @@ in realtime. This allows user programs to only incur the cost of the write() or writev() calls when something is actively attached to the event. User programs call mmap() on /sys/kernel/debug/tracing/user_events_status to -check the status for each event that is registered. The byte to check in the -file is given back after the register ioctl() via user_reg.status_index. +check the status for each event that is registered. The bit to check in the +file is given back after the register ioctl() via user_reg.status_bit. The bit +is always in little-endian format. Programs can check if the bit is set either +using a byte-wise index with a mask or a long-wise index with a little-endian +mask. + Currently the size of user_events_status is a single page, however, custom kernel configurations can change this size to allow more user based events. In all cases the size of the file is a multiple of a page size. -For example, if the register ioctl() gives back a status_index of 3 you would -check byte 3 of the returned mmap data to see if anything is attached to that -event. +For example, if the register ioctl() gives back a status_bit of 3 you would +check byte 0 (3 / 8) of the returned mmap data and then AND the result with 8 +(1 << (3 % 8)) to see if anything is attached to that event. + +A byte-wise index check is performed as follows:: + + int index, mask; + char *status_page; + + index = status_bit / 8; + mask = 1 << (status_bit % 8); + + ... + + if (status_page[index] & mask) { + /* Enabled */ + } + +A long-wise index check is performed as follows:: + + #include <asm/bitsperlong.h> + #include <endian.h> + + #if __BITS_PER_LONG == 64 + #define endian_swap(x) htole64(x) + #else + #define endian_swap(x) htole32(x) + #endif + + long index, mask, *status_page; + + index = status_bit / __BITS_PER_LONG; + mask = 1L << (status_bit % __BITS_PER_LONG); + mask = endian_swap(mask); + + ... + + if (status_page[index] & mask) { + /* Enabled */ + } Administrators can easily check the status of all registered events by reading the user_events_status file directly via a terminal. The output is as follows:: @@ -137,7 +178,7 @@ For example, on a system that has a single event the output looks like this:: Active: 1 Busy: 0 - Max: 4096 + Max: 32768 If a user enables the user event via ftrace, the output would change to this:: @@ -145,21 +186,10 @@ If a user enables the user event via ftrace, the output would change to this:: Active: 1 Busy: 1 - Max: 4096 - -**NOTE:** *A status index of 0 will never be returned. This allows user -programs to have an index that can be used on error cases.* - -Status Bits -^^^^^^^^^^^ -The byte being checked will be non-zero if anything is attached. Programs can -check specific bits in the byte to see what mechanism has been attached. - -The following values are defined to aid in checking what has been attached: - -**EVENT_STATUS_FTRACE** - Bit set if ftrace has been attached (Bit 0). + Max: 32768 -**EVENT_STATUS_PERF** - Bit set if perf has been attached (Bit 1). +**NOTE:** *A status bit of 0 will never be returned. This allows user programs +to have a bit that can be used on error cases.* Writing Data ------------ |