diff options
Diffstat (limited to 'Documentation/trace/rv')
-rw-r--r-- | Documentation/trace/rv/da_monitor_synthesis.rst | 147 | ||||
-rw-r--r-- | Documentation/trace/rv/index.rst | 5 | ||||
-rw-r--r-- | Documentation/trace/rv/linear_temporal_logic.rst | 134 | ||||
-rw-r--r-- | Documentation/trace/rv/monitor_rtapp.rst | 133 | ||||
-rw-r--r-- | Documentation/trace/rv/monitor_sched.rst | 402 | ||||
-rw-r--r-- | Documentation/trace/rv/monitor_synthesis.rst | 271 | ||||
-rw-r--r-- | Documentation/trace/rv/runtime-verification.rst | 4 |
7 files changed, 946 insertions, 150 deletions
diff --git a/Documentation/trace/rv/da_monitor_synthesis.rst b/Documentation/trace/rv/da_monitor_synthesis.rst deleted file mode 100644 index 0a92729c8a9b..000000000000 --- a/Documentation/trace/rv/da_monitor_synthesis.rst +++ /dev/null @@ -1,147 +0,0 @@ -Deterministic Automata Monitor Synthesis -======================================== - -The starting point for the application of runtime verification (RV) techniques -is the *specification* or *modeling* of the desired (or undesired) behavior -of the system under scrutiny. - -The formal representation needs to be then *synthesized* into a *monitor* -that can then be used in the analysis of the trace of the system. The -*monitor* connects to the system via an *instrumentation* that converts -the events from the *system* to the events of the *specification*. - - -In Linux terms, the runtime verification monitors are encapsulated inside -the *RV monitor* abstraction. The RV monitor includes a set of instances -of the monitor (per-cpu monitor, per-task monitor, and so on), the helper -functions that glue the monitor to the system reference model, and the -trace output as a reaction to event parsing and exceptions, as depicted -below:: - - Linux +----- RV Monitor ----------------------------------+ Formal - Realm | | Realm - +-------------------+ +----------------+ +-----------------+ - | Linux kernel | | Monitor | | Reference | - | Tracing | -> | Instance(s) | <- | Model | - | (instrumentation) | | (verification) | | (specification) | - +-------------------+ +----------------+ +-----------------+ - | | | - | V | - | +----------+ | - | | Reaction | | - | +--+--+--+-+ | - | | | | | - | | | +-> trace output ? | - +------------------------|--|----------------------+ - | +----> panic ? - +-------> <user-specified> - -DA monitor synthesis --------------------- - -The synthesis of automata-based models into the Linux *RV monitor* abstraction -is automated by the dot2k tool and the rv/da_monitor.h header file that -contains a set of macros that automatically generate the monitor's code. - -dot2k ------ - -The dot2k utility leverages dot2c by converting an automaton model in -the DOT format into the C representation [1] and creating the skeleton of -a kernel monitor in C. - -For example, it is possible to transform the wip.dot model present in -[1] into a per-cpu monitor with the following command:: - - $ dot2k -d wip.dot -t per_cpu - -This will create a directory named wip/ with the following files: - -- wip.h: the wip model in C -- wip.c: the RV monitor - -The wip.c file contains the monitor declaration and the starting point for -the system instrumentation. - -Monitor macros --------------- - -The rv/da_monitor.h enables automatic code generation for the *Monitor -Instance(s)* using C macros. - -The benefits of the usage of macro for monitor synthesis are 3-fold as it: - -- Reduces the code duplication; -- Facilitates the bug fix/improvement; -- Avoids the case of developers changing the core of the monitor code - to manipulate the model in a (let's say) non-standard way. - -This initial implementation presents three different types of monitor instances: - -- ``#define DECLARE_DA_MON_GLOBAL(name, type)`` -- ``#define DECLARE_DA_MON_PER_CPU(name, type)`` -- ``#define DECLARE_DA_MON_PER_TASK(name, type)`` - -The first declares the functions for a global deterministic automata monitor, -the second for monitors with per-cpu instances, and the third with per-task -instances. - -In all cases, the 'name' argument is a string that identifies the monitor, and -the 'type' argument is the data type used by dot2k on the representation of -the model in C. - -For example, the wip model with two states and three events can be -stored in an 'unsigned char' type. Considering that the preemption control -is a per-cpu behavior, the monitor declaration in the 'wip.c' file is:: - - DECLARE_DA_MON_PER_CPU(wip, unsigned char); - -The monitor is executed by sending events to be processed via the functions -presented below:: - - da_handle_event_$(MONITOR_NAME)($(event from event enum)); - da_handle_start_event_$(MONITOR_NAME)($(event from event enum)); - da_handle_start_run_event_$(MONITOR_NAME)($(event from event enum)); - -The function ``da_handle_event_$(MONITOR_NAME)()`` is the regular case where -the event will be processed if the monitor is processing events. - -When a monitor is enabled, it is placed in the initial state of the automata. -However, the monitor does not know if the system is in the *initial state*. - -The ``da_handle_start_event_$(MONITOR_NAME)()`` function is used to notify the -monitor that the system is returning to the initial state, so the monitor can -start monitoring the next event. - -The ``da_handle_start_run_event_$(MONITOR_NAME)()`` function is used to notify -the monitor that the system is known to be in the initial state, so the -monitor can start monitoring and monitor the current event. - -Using the wip model as example, the events "preempt_disable" and -"sched_waking" should be sent to monitor, respectively, via [2]:: - - da_handle_event_wip(preempt_disable_wip); - da_handle_event_wip(sched_waking_wip); - -While the event "preempt_enabled" will use:: - - da_handle_start_event_wip(preempt_enable_wip); - -To notify the monitor that the system will be returning to the initial state, -so the system and the monitor should be in sync. - -Final remarks -------------- - -With the monitor synthesis in place using the rv/da_monitor.h and -dot2k, the developer's work should be limited to the instrumentation -of the system, increasing the confidence in the overall approach. - -[1] For details about deterministic automata format and the translation -from one representation to another, see:: - - Documentation/trace/rv/deterministic_automata.rst - -[2] dot2k appends the monitor's name suffix to the events enums to -avoid conflicting variables when exporting the global vmlinux.h -use by BPF programs. diff --git a/Documentation/trace/rv/index.rst b/Documentation/trace/rv/index.rst index 15fa966102c0..a2812ac5cfeb 100644 --- a/Documentation/trace/rv/index.rst +++ b/Documentation/trace/rv/index.rst @@ -8,7 +8,10 @@ Runtime Verification runtime-verification.rst deterministic_automata.rst - da_monitor_synthesis.rst + linear_temporal_logic.rst + monitor_synthesis.rst da_monitor_instrumentation.rst monitor_wip.rst monitor_wwnr.rst + monitor_sched.rst + monitor_rtapp.rst diff --git a/Documentation/trace/rv/linear_temporal_logic.rst b/Documentation/trace/rv/linear_temporal_logic.rst new file mode 100644 index 000000000000..9eee09d9cacf --- /dev/null +++ b/Documentation/trace/rv/linear_temporal_logic.rst @@ -0,0 +1,134 @@ +Linear temporal logic +===================== + +Introduction +------------ + +Runtime verification monitor is a verification technique which checks that the +kernel follows a specification. It does so by using tracepoints to monitor the +kernel's execution trace, and verifying that the execution trace sastifies the +specification. + +Initially, the specification can only be written in the form of deterministic +automaton (DA). However, while attempting to implement DA monitors for some +complex specifications, deterministic automaton is found to be inappropriate as +the specification language. The automaton is complicated, hard to understand, +and error-prone. + +Thus, RV monitors based on linear temporal logic (LTL) are introduced. This type +of monitor uses LTL as specification instead of DA. For some cases, writing the +specification as LTL is more concise and intuitive. + +Many materials explain LTL in details. One book is:: + + Christel Baier and Joost-Pieter Katoen: Principles of Model Checking, The MIT + Press, 2008. + +Grammar +------- + +Unlike some existing syntax, kernel's implementation of LTL is more verbose. +This is motivated by considering that the people who read the LTL specifications +may not be well-versed in LTL. + +Grammar: + ltl ::= opd | ( ltl ) | ltl binop ltl | unop ltl + +Operands (opd): + true, false, user-defined names consisting of upper-case characters, digits, + and underscore. + +Unary Operators (unop): + always + eventually + next + not + +Binary Operators (binop): + until + and + or + imply + equivalent + +This grammar is ambiguous: operator precedence is not defined. Parentheses must +be used. + +Example linear temporal logic +----------------------------- +.. code-block:: + + RAIN imply (GO_OUTSIDE imply HAVE_UMBRELLA) + +means: if it is raining, going outside means having an umbrella. + +.. code-block:: + + RAIN imply (WET until not RAIN) + +means: if it is raining, it is going to be wet until the rain stops. + +.. code-block:: + + RAIN imply eventually not RAIN + +means: if it is raining, rain will eventually stop. + +The above examples are referring to the current time instance only. For kernel +verification, the `always` operator is usually desirable, to specify that +something is always true at the present and for all future. For example:: + + always (RAIN imply eventually not RAIN) + +means: *all* rain eventually stops. + +In the above examples, `RAIN`, `GO_OUTSIDE`, `HAVE_UMBRELLA` and `WET` are the +"atomic propositions". + +Monitor synthesis +----------------- + +To synthesize an LTL into a kernel monitor, the `rvgen` tool can be used: +`tools/verification/rvgen`. The specification needs to be provided as a file, +and it must have a "RULE = LTL" assignment. For example:: + + RULE = always (ACQUIRE imply ((not KILLED and not CRASHED) until RELEASE)) + +which says: if `ACQUIRE`, then `RELEASE` must happen before `KILLED` or +`CRASHED`. + +The LTL can be broken down using sub-expressions. The above is equivalent to: + + .. code-block:: + + RULE = always (ACQUIRE imply (ALIVE until RELEASE)) + ALIVE = not KILLED and not CRASHED + +From this specification, `rvgen` generates the C implementation of a Buchi +automaton - a non-deterministic state machine which checks the satisfiability of +the LTL. See Documentation/trace/rv/monitor_synthesis.rst for details on using +`rvgen`. + +References +---------- + +One book covering model checking and linear temporal logic is:: + + Christel Baier and Joost-Pieter Katoen: Principles of Model Checking, The MIT + Press, 2008. + +For an example of using linear temporal logic in software testing, see:: + + Ruijie Meng, Zhen Dong, Jialin Li, Ivan Beschastnikh, and Abhik Roychoudhury. + 2022. Linear-time temporal logic guided greybox fuzzing. In Proceedings of the + 44th International Conference on Software Engineering (ICSE '22). Association + for Computing Machinery, New York, NY, USA, 1343–1355. + https://doi.org/10.1145/3510003.3510082 + +The kernel's LTL monitor implementation is based on:: + + Gerth, R., Peled, D., Vardi, M.Y., Wolper, P. (1996). Simple On-the-fly + Automatic Verification of Linear Temporal Logic. In: Dembiński, P., Średniawa, + M. (eds) Protocol Specification, Testing and Verification XV. PSTV 1995. IFIP + Advances in Information and Communication Technology. Springer, Boston, MA. + https://doi.org/10.1007/978-0-387-34892-6_1 diff --git a/Documentation/trace/rv/monitor_rtapp.rst b/Documentation/trace/rv/monitor_rtapp.rst new file mode 100644 index 000000000000..c8104eda924a --- /dev/null +++ b/Documentation/trace/rv/monitor_rtapp.rst @@ -0,0 +1,133 @@ +Real-time application monitors +============================== + +- Name: rtapp +- Type: container for multiple monitors +- Author: Nam Cao <namcao@linutronix.de> + +Description +----------- + +Real-time applications may have design flaws such that they experience +unexpected latency and fail to meet their time requirements. Often, these flaws +follow a few patterns: + + - Page faults: A real-time thread may access memory that does not have a + mapped physical backing or must first be copied (such as for copy-on-write). + Thus a page fault is raised and the kernel must first perform the expensive + action. This causes significant delays to the real-time thread + - Priority inversion: A real-time thread blocks waiting for a lower-priority + thread. This causes the real-time thread to effectively take on the + scheduling priority of the lower-priority thread. For example, the real-time + thread needs to access a shared resource that is protected by a + non-pi-mutex, but the mutex is currently owned by a non-real-time thread. + +The `rtapp` monitor detects these patterns. It aids developers to identify +reasons for unexpected latency with real-time applications. It is a container of +multiple sub-monitors described in the following sections. + +Monitor pagefault ++++++++++++++++++ + +The `pagefault` monitor reports real-time tasks raising page faults. Its +specification is:: + + RULE = always (RT imply not PAGEFAULT) + +To fix warnings reported by this monitor, `mlockall()` or `mlock()` can be used +to ensure physical backing for memory. + +This monitor may have false negatives because the pages used by the real-time +threads may just happen to be directly available during testing. To minimize +this, the system can be put under memory pressure (e.g. invoking the OOM killer +using a program that does `ptr = malloc(SIZE_OF_RAM); memset(ptr, 0, +SIZE_OF_RAM);`) so that the kernel executes aggressive strategies to recycle as +much physical memory as possible. + +Monitor sleep ++++++++++++++ + +The `sleep` monitor reports real-time threads sleeping in a manner that may +cause undesirable latency. Real-time applications should only put a real-time +thread to sleep for one of the following reasons: + + - Cyclic work: real-time thread sleeps waiting for the next cycle. For this + case, only the `clock_nanosleep` syscall should be used with `TIMER_ABSTIME` + (to avoid time drift) and `CLOCK_MONOTONIC` (to avoid the clock being + changed). No other method is safe for real-time. For example, threads + waiting for timerfd can be woken by softirq which provides no real-time + guarantee. + - Real-time thread waiting for something to happen (e.g. another thread + releasing shared resources, or a completion signal from another thread). In + this case, only futexes (FUTEX_LOCK_PI, FUTEX_LOCK_PI2 or one of + FUTEX_WAIT_*) should be used. Applications usually do not use futexes + directly, but use PI mutexes and PI condition variables which are built on + top of futexes. Be aware that the C library might not implement conditional + variables as safe for real-time. As an alternative, the librtpi library + exists to provide a conditional variable implementation that is correct for + real-time applications in Linux. + +Beside the reason for sleeping, the eventual waker should also be +real-time-safe. Namely, one of: + + - An equal-or-higher-priority thread + - Hard interrupt handler + - Non-maskable interrupt handler + +This monitor's warning usually means one of the following: + + - Real-time thread is blocked by a non-real-time thread (e.g. due to + contention on a mutex without priority inheritance). This is priority + inversion. + - Time-critical work waits for something which is not safe for real-time (e.g. + timerfd). + - The work executed by the real-time thread does not need to run at real-time + priority at all. This is not a problem for the real-time thread itself, but + it is potentially taking the CPU away from other important real-time work. + +Application developers may purposely choose to have their real-time application +sleep in a way that is not safe for real-time. It is debatable whether that is a +problem. Application developers must analyze the warnings to make a proper +assessment. + +The monitor's specification is:: + + RULE = always ((RT and SLEEP) imply (RT_FRIENDLY_SLEEP or ALLOWLIST)) + + RT_FRIENDLY_SLEEP = (RT_VALID_SLEEP_REASON or KERNEL_THREAD) + and ((not WAKE) until RT_FRIENDLY_WAKE) + + RT_VALID_SLEEP_REASON = FUTEX_WAIT + or RT_FRIENDLY_NANOSLEEP + + RT_FRIENDLY_NANOSLEEP = CLOCK_NANOSLEEP + and NANOSLEEP_TIMER_ABSTIME + and NANOSLEEP_CLOCK_MONOTONIC + + RT_FRIENDLY_WAKE = WOKEN_BY_EQUAL_OR_HIGHER_PRIO + or WOKEN_BY_HARDIRQ + or WOKEN_BY_NMI + or KTHREAD_SHOULD_STOP + + ALLOWLIST = BLOCK_ON_RT_MUTEX + or FUTEX_LOCK_PI + or TASK_IS_RCU + or TASK_IS_MIGRATION + +Beside the scenarios described above, this specification also handle some +special cases: + + - `KERNEL_THREAD`: kernel tasks do not have any pattern that can be recognized + as valid real-time sleeping reasons. Therefore sleeping reason is not + checked for kernel tasks. + - `KTHREAD_SHOULD_STOP`: a non-real-time thread may stop a real-time kernel + thread by waking it and waiting for it to exit (`kthread_stop()`). This + wakeup is safe for real-time. + - `ALLOWLIST`: to handle known false positives with the kernel. + - `BLOCK_ON_RT_MUTEX` is included in the allowlist due to its implementation. + In the release path of rt_mutex, a boosted task is de-boosted before waking + the rt_mutex's waiter. Consequently, the monitor may see a real-time-unsafe + wakeup (e.g. non-real-time task waking real-time task). This is actually + real-time-safe because preemption is disabled for the duration. + - `FUTEX_LOCK_PI` is included in the allowlist for the same reason as + `BLOCK_ON_RT_MUTEX`. diff --git a/Documentation/trace/rv/monitor_sched.rst b/Documentation/trace/rv/monitor_sched.rst new file mode 100644 index 000000000000..3f8381ad9ec7 --- /dev/null +++ b/Documentation/trace/rv/monitor_sched.rst @@ -0,0 +1,402 @@ +Scheduler monitors +================== + +- Name: sched +- Type: container for multiple monitors +- Author: Gabriele Monaco <gmonaco@redhat.com>, Daniel Bristot de Oliveira <bristot@kernel.org> + +Description +----------- + +Monitors describing complex systems, such as the scheduler, can easily grow to +the point where they are just hard to understand because of the many possible +state transitions. +Often it is possible to break such descriptions into smaller monitors, +sharing some or all events. Enabling those smaller monitors concurrently is, +in fact, testing the system as if we had one single larger monitor. +Splitting models into multiple specification is not only easier to +understand, but gives some more clues when we see errors. + +The sched monitor is a set of specifications to describe the scheduler behaviour. +It includes several per-cpu and per-task monitors that work independently to verify +different specifications the scheduler should follow. + +To make this system as straightforward as possible, sched specifications are *nested* +monitors, whereas sched itself is a *container*. +From the interface perspective, sched includes other monitors as sub-directories, +enabling/disabling or setting reactors to sched, propagates the change to all monitors, +however single monitors can be used independently as well. + +It is important that future modules are built after their container (sched, in +this case), otherwise the linker would not respect the order and the nesting +wouldn't work as expected. +To do so, simply add them after sched in the Makefile. + +Specifications +-------------- + +The specifications included in sched are currently a work in progress, adapting the ones +defined in by Daniel Bristot in [1]. + +Currently we included the following: + +Monitor sco +~~~~~~~~~~~ + +The scheduling context operations (sco) monitor ensures changes in a task state +happen only in thread context:: + + + | + | + v + sched_set_state +------------------+ + +------------------ | | + | | thread_context | + +-----------------> | | <+ + +------------------+ | + | | + | schedule_entry | schedule_exit + v | + | + scheduling_context -+ + +Monitor snroc +~~~~~~~~~~~~~ + +The set non runnable on its own context (snroc) monitor ensures changes in a +task state happens only in the respective task's context. This is a per-task +monitor:: + + | + | + v + +------------------+ + | other_context | <+ + +------------------+ | + | | + | sched_switch_in | sched_switch_out + v | + sched_set_state | + +------------------ | + | own_context | + +-----------------> -+ + +Monitor scpd +~~~~~~~~~~~~ + +The schedule called with preemption disabled (scpd) monitor ensures schedule is +called with preemption disabled:: + + | + | + v + +------------------+ + | cant_sched | <+ + +------------------+ | + | | + | preempt_disable | preempt_enable + v | + schedule_entry | + schedule_exit | + +----------------- can_sched | + | | + +----------------> -+ + +Monitor snep +~~~~~~~~~~~~ + +The schedule does not enable preempt (snep) monitor ensures a schedule call +does not enable preemption:: + + | + | + v + preempt_disable +------------------------+ + preempt_enable | | + +------------------ | non_scheduling_context | + | | | + +-----------------> | | <+ + +------------------------+ | + | | + | schedule_entry | schedule_exit + v | + | + scheduling_contex -+ + +Monitor sts +~~~~~~~~~~~ + +The schedule implies task switch (sts) monitor ensures a task switch happens +only in scheduling context and up to once, as well as scheduling occurs with +interrupts enabled but no task switch can happen before interrupts are +disabled. When the next task picked for execution is the same as the previously +running one, no real task switch occurs but interrupts are disabled nonetheless:: + + irq_entry | + +----+ | + v | v + +------------+ irq_enable #===================# irq_disable + | | ------------> H H irq_entry + | cant_sched | <------------ H H irq_enable + | | irq_disable H can_sched H --------------+ + +------------+ H H | + H H | + +---------------> H H <-------------+ + | #===================# + | | + schedule_exit | schedule_entry + | v + | +-------------------+ irq_enable + | | scheduling | <---------------+ + | +-------------------+ | + | | | + | | irq_disable +--------+ irq_entry + | v | | --------+ + | +-------------------+ irq_entry | in_irq | | + | | | -----------> | | <-------+ + | | disable_to_switch | +--------+ + | | | --+ + | +-------------------+ | + | | | + | | sched_switch | + | v | + | +-------------------+ | + | | switching | | irq_enable + | +-------------------+ | + | | | + | | irq_enable | + | v | + | +-------------------+ | + +-- | enable_to_exit | <-+ + +-------------------+ + ^ | irq_disable + | | irq_entry + +---------------+ irq_enable + +Monitor nrp +----------- + +The need resched preempts (nrp) monitor ensures preemption requires +``need_resched``. Only kernel preemption is considered, since preemption +while returning to userspace, for this monitor, is indistinguishable from +``sched_switch_yield`` (described in the sssw monitor). +A kernel preemption is whenever ``__schedule`` is called with the preemption +flag set to true (e.g. from preempt_enable or exiting from interrupts). This +type of preemption occurs after the need for ``rescheduling`` has been set. +This is not valid for the *lazy* variant of the flag, which causes only +userspace preemption. +A ``schedule_entry_preempt`` may involve a task switch or not, in the latter +case, a task goes through the scheduler from a preemption context but it is +picked as the next task to run. Since the scheduler runs, this clears the need +to reschedule. The ``any_thread_running`` state does not imply the monitored +task is not running as this monitor does not track the outcome of scheduling. + +In theory, a preemption can only occur after the ``need_resched`` flag is set. In +practice, however, it is possible to see a preemption where the flag is not +set. This can happen in one specific condition:: + + need_resched + preempt_schedule() + preempt_schedule_irq() + __schedule() + !need_resched + __schedule() + +In the situation above, standard preemption starts (e.g. from preempt_enable +when the flag is set), an interrupt occurs before scheduling and, on its exit +path, it schedules, which clears the ``need_resched`` flag. +When the preempted task runs again, the standard preemption started earlier +resumes, although the flag is no longer set. The monitor considers this a +``nested_preemption``, this allows another preemption without re-setting the +flag. This condition relaxes the monitor constraints and may catch false +negatives (i.e. no real ``nested_preemptions``) but makes the monitor more +robust and able to validate other scenarios. +For simplicity, the monitor starts in ``preempt_irq``, although no interrupt +occurred, as the situation above is hard to pinpoint:: + + schedule_entry + irq_entry #===========================================# + +-------------------------- H H + | H H + +-------------------------> H any_thread_running H + H H + +-------------------------> H H + | #===========================================# + | schedule_entry | ^ + | schedule_entry_preempt | sched_need_resched | schedule_entry + | | schedule_entry_preempt + | v | + | +----------------------+ | + | +--- | | | + | sched_need_resched | | rescheduling | -+ + | +--> | | + | +----------------------+ + | | irq_entry + | v + | +----------------------+ + | | | ---+ + | ---> | | | sched_need_resched + | | preempt_irq | | irq_entry + | | | <--+ + | | | <--+ + | +----------------------+ | + | | schedule_entry | sched_need_resched + | | schedule_entry_preempt | + | v | + | +-----------------------+ | + +-------------------------- | nested_preempt | --+ + +-----------------------+ + ^ irq_entry | + +-------------------+ + +Due to how the ``need_resched`` flag on the preemption count works on arm64, +this monitor is unstable on that architecture, as it often records preemption +when the flag is not set, even in presence of the workaround above. +For the time being, the monitor is disabled by default on arm64. + +Monitor sssw +------------ + +The set state sleep and wakeup (sssw) monitor ensures ``set_state`` to +sleepable leads to sleeping and sleeping tasks require wakeup. It includes the +following types of switch: + +* ``switch_suspend``: + a task puts itself to sleep, this can happen only after explicitly setting + the task to ``sleepable``. After a task is suspended, it needs to be woken up + (``waking`` state) before being switched in again. + Setting the task's state to ``sleepable`` can be reverted before switching if it + is woken up or set to ``runnable``. +* ``switch_blocking``: + a special case of a ``switch_suspend`` where the task is waiting on a + sleeping RT lock (``PREEMPT_RT`` only), it is common to see wakeup and set + state events racing with each other and this leads the model to perceive this + type of switch when the task is not set to sleepable. This is a limitation of + the model in SMP system and workarounds may slow down the system. +* ``switch_preempt``: + a task switch as a result of kernel preemption (``schedule_entry_preempt`` in + the nrp model). +* ``switch_yield``: + a task explicitly calls the scheduler or is preempted while returning to + userspace. It can happen after a ``yield`` system call, from the idle task or + if the ``need_resched`` flag is set. By definition, a task cannot yield while + ``sleepable`` as that would be a suspension. A special case of a yield occurs + when a task in ``TASK_INTERRUPTIBLE`` calls the scheduler while a signal is + pending. The task doesn't go through the usual blocking/waking and is set + back to runnable, the resulting switch (if there) looks like a yield to the + ``signal_wakeup`` state and is followed by the signal delivery. From this + state, the monitor expects a signal even if it sees a wakeup event, although + not necessary, to rule out false negatives. + +This monitor doesn't include a running state, ``sleepable`` and ``runnable`` +are only referring to the task's desired state, which could be scheduled out +(e.g. due to preemption). However, it does include the event +``sched_switch_in`` to represent when a task is allowed to become running. This +can be triggered also by preemption, but cannot occur after the task got to +``sleeping`` before a ``wakeup`` occurs:: + + +--------------------------------------------------------------------------+ + | | + | | + | switch_suspend | | + | switch_blocking | | + v v | + +----------+ #==========================# set_state_runnable | + | | H H wakeup | + | | H H switch_in | + | | H H switch_yield | + | sleeping | H H switch_preempt | + | | H H signal_deliver | + | | switch_ H H ------+ | + | | _blocking H runnable H | | + | | <----------- H H <-----+ | + +----------+ H H | + | wakeup H H | + +---------------------> H H | + H H | + +---------> H H | + | #==========================# | + | | ^ | + | | | set_state_runnable | + | | | wakeup | + | set_state_sleepable | +------------------------+ + | v | | + | +--------------------------+ set_state_sleepable + | | | switch_in + | | | switch_preempt + signal_deliver | sleepable | signal_deliver + | | | ------+ + | | | | + | | | <-----+ + | +--------------------------+ + | | ^ + | switch_yield | set_state_sleepable + | v | + | +---------------+ | + +---------- | signal_wakeup | -+ + +---------------+ + ^ | switch_in + | | switch_preempt + | | switch_yield + +-----------+ wakeup + +Monitor opid +------------ + +The operations with preemption and irq disabled (opid) monitor ensures +operations like ``wakeup`` and ``need_resched`` occur with interrupts and +preemption disabled or during interrupt context, in such case preemption may +not be disabled explicitly. +``need_resched`` can be set by some RCU internals functions, in which case it +doesn't match a task wakeup and might occur with only interrupts disabled:: + + | sched_need_resched + | sched_waking + | irq_entry + | +--------------------+ + v v | + +------------------------------------------------------+ + +----------- | disabled | <+ + | +------------------------------------------------------+ | + | | ^ | + | | preempt_disable sched_need_resched | + | preempt_enable | +--------------------+ | + | v | v | | + | +------------------------------------------------------+ | + | | irq_disabled | | + | +------------------------------------------------------+ | + | | | ^ | + | irq_entry irq_entry | | | + | sched_need_resched v | irq_disable | + | sched_waking +--------------+ | | | + | +----- | | irq_enable | | + | | | in_irq | | | | + | +----> | | | | | + | +--------------+ | | irq_disable + | | | | | + | irq_enable | irq_enable | | | + | v v | | + | #======================================================# | + | H enabled H | + | #======================================================# | + | | ^ ^ preempt_enable | | + | preempt_disable preempt_enable +--------------------+ | + | v | | + | +------------------+ | | + +----------> | preempt_disabled | -+ | + +------------------+ | + | | + +-------------------------------------------------------+ + +This monitor is designed to work on ``PREEMPT_RT`` kernels, the special case of +events occurring in interrupt context is a shortcut to identify valid scenarios +where the preemption tracepoints might not be visible, during interrupts +preemption is always disabled. On non- ``PREEMPT_RT`` kernels, the interrupts +might invoke a softirq to set ``need_resched`` and wake up a task. This is +another special case that is currently not supported by the monitor. + +References +---------- + +[1] - https://bristot.me/linux-task-model diff --git a/Documentation/trace/rv/monitor_synthesis.rst b/Documentation/trace/rv/monitor_synthesis.rst new file mode 100644 index 000000000000..ac808a7554f5 --- /dev/null +++ b/Documentation/trace/rv/monitor_synthesis.rst @@ -0,0 +1,271 @@ +Runtime Verification Monitor Synthesis +====================================== + +The starting point for the application of runtime verification (RV) techniques +is the *specification* or *modeling* of the desired (or undesired) behavior +of the system under scrutiny. + +The formal representation needs to be then *synthesized* into a *monitor* +that can then be used in the analysis of the trace of the system. The +*monitor* connects to the system via an *instrumentation* that converts +the events from the *system* to the events of the *specification*. + + +In Linux terms, the runtime verification monitors are encapsulated inside +the *RV monitor* abstraction. The RV monitor includes a set of instances +of the monitor (per-cpu monitor, per-task monitor, and so on), the helper +functions that glue the monitor to the system reference model, and the +trace output as a reaction to event parsing and exceptions, as depicted +below:: + + Linux +----- RV Monitor ----------------------------------+ Formal + Realm | | Realm + +-------------------+ +----------------+ +-----------------+ + | Linux kernel | | Monitor | | Reference | + | Tracing | -> | Instance(s) | <- | Model | + | (instrumentation) | | (verification) | | (specification) | + +-------------------+ +----------------+ +-----------------+ + | | | + | V | + | +----------+ | + | | Reaction | | + | +--+--+--+-+ | + | | | | | + | | | +-> trace output ? | + +------------------------|--|----------------------+ + | +----> panic ? + +-------> <user-specified> + +RV monitor synthesis +-------------------- + +The synthesis of a specification into the Linux *RV monitor* abstraction is +automated by the rvgen tool and the header file containing common code for +creating monitors. The header files are: + + * rv/da_monitor.h for deterministic automaton monitor. + * rv/ltl_monitor.h for linear temporal logic monitor. + +rvgen +----- + +The rvgen utility converts a specification into the C presentation and creating +the skeleton of a kernel monitor in C. + +For example, it is possible to transform the wip.dot model present in +[1] into a per-cpu monitor with the following command:: + + $ rvgen monitor -c da -s wip.dot -t per_cpu + +This will create a directory named wip/ with the following files: + +- wip.h: the wip model in C +- wip.c: the RV monitor + +The wip.c file contains the monitor declaration and the starting point for +the system instrumentation. + +Similarly, a linear temporal logic monitor can be generated with the following +command:: + + $ rvgen monitor -c ltl -s pagefault.ltl -t per_task + +This generates pagefault/ directory with: + +- pagefault.h: The Buchi automaton (the non-deterministic state machine to + verify the specification) +- pagefault.c: The skeleton for the RV monitor + +Monitor header files +-------------------- + +The header files: + +- `rv/da_monitor.h` for deterministic automaton monitor +- `rv/ltl_monitor` for linear temporal logic monitor + +include common macros and static functions for implementing *Monitor +Instance(s)*. + +The benefits of having all common functionalities in a single header file are +3-fold: + + - Reduce the code duplication; + - Facilitate the bug fix/improvement; + - Avoid the case of developers changing the core of the monitor code to + manipulate the model in a (let's say) non-standard way. + +rv/da_monitor.h ++++++++++++++++ + +This initial implementation presents three different types of monitor instances: + +- ``#define DECLARE_DA_MON_GLOBAL(name, type)`` +- ``#define DECLARE_DA_MON_PER_CPU(name, type)`` +- ``#define DECLARE_DA_MON_PER_TASK(name, type)`` + +The first declares the functions for a global deterministic automata monitor, +the second for monitors with per-cpu instances, and the third with per-task +instances. + +In all cases, the 'name' argument is a string that identifies the monitor, and +the 'type' argument is the data type used by rvgen on the representation of +the model in C. + +For example, the wip model with two states and three events can be +stored in an 'unsigned char' type. Considering that the preemption control +is a per-cpu behavior, the monitor declaration in the 'wip.c' file is:: + + DECLARE_DA_MON_PER_CPU(wip, unsigned char); + +The monitor is executed by sending events to be processed via the functions +presented below:: + + da_handle_event_$(MONITOR_NAME)($(event from event enum)); + da_handle_start_event_$(MONITOR_NAME)($(event from event enum)); + da_handle_start_run_event_$(MONITOR_NAME)($(event from event enum)); + +The function ``da_handle_event_$(MONITOR_NAME)()`` is the regular case where +the event will be processed if the monitor is processing events. + +When a monitor is enabled, it is placed in the initial state of the automata. +However, the monitor does not know if the system is in the *initial state*. + +The ``da_handle_start_event_$(MONITOR_NAME)()`` function is used to notify the +monitor that the system is returning to the initial state, so the monitor can +start monitoring the next event. + +The ``da_handle_start_run_event_$(MONITOR_NAME)()`` function is used to notify +the monitor that the system is known to be in the initial state, so the +monitor can start monitoring and monitor the current event. + +Using the wip model as example, the events "preempt_disable" and +"sched_waking" should be sent to monitor, respectively, via [2]:: + + da_handle_event_wip(preempt_disable_wip); + da_handle_event_wip(sched_waking_wip); + +While the event "preempt_enabled" will use:: + + da_handle_start_event_wip(preempt_enable_wip); + +To notify the monitor that the system will be returning to the initial state, +so the system and the monitor should be in sync. + +rv/ltl_monitor.h +++++++++++++++++ +This file must be combined with the $(MODEL_NAME).h file (generated by `rvgen`) +to be complete. For example, for the `pagefault` monitor, the `pagefault.c` +source file must include:: + + #include "pagefault.h" + #include <rv/ltl_monitor.h> + +(the skeleton monitor file generated by `rvgen` already does this). + +`$(MODEL_NAME).h` (`pagefault.h` in the above example) includes the +implementation of the Buchi automaton - a non-deterministic state machine that +verifies the LTL specification. While `rv/ltl_monitor.h` includes the common +helper functions to interact with the Buchi automaton and to implement an RV +monitor. An important definition in `$(MODEL_NAME).h` is:: + + enum ltl_atom { + LTL_$(FIRST_ATOMIC_PROPOSITION), + LTL_$(SECOND_ATOMIC_PROPOSITION), + ... + LTL_NUM_ATOM + }; + +which is the list of atomic propositions present in the LTL specification +(prefixed with "LTL\_" to avoid name collision). This `enum` is passed to the +functions interacting with the Buchi automaton. + +While generating code, `rvgen` cannot understand the meaning of the atomic +propositions. Thus, that task is left for manual work. The recommended pratice +is adding tracepoints to places where the atomic propositions change; and in the +tracepoints' handlers: the Buchi automaton is executed using:: + + void ltl_atom_update(struct task_struct *task, enum ltl_atom atom, bool value) + +which tells the Buchi automaton that the atomic proposition `atom` is now +`value`. The Buchi automaton checks whether the LTL specification is still +satisfied, and invokes the monitor's error tracepoint and the reactor if +violation is detected. + +Tracepoints and `ltl_atom_update()` should be used whenever possible. However, +it is sometimes not the most convenient. For some atomic propositions which are +changed in multiple places in the kernel, it is cumbersome to trace all those +places. Furthermore, it may not be important that the atomic propositions are +updated at precise times. For example, considering the following linear temporal +logic:: + + RULE = always (RT imply not PAGEFAULT) + +This LTL states that a real-time task does not raise page faults. For this +specification, it is not important when `RT` changes, as long as it has the +correct value when `PAGEFAULT` is true. Motivated by this case, another +function is introduced:: + + void ltl_atom_fetch(struct task_struct *task, struct ltl_monitor *mon) + +This function is called whenever the Buchi automaton is triggered. Therefore, it +can be manually implemented to "fetch" `RT`:: + + void ltl_atom_fetch(struct task_struct *task, struct ltl_monitor *mon) + { + ltl_atom_set(mon, LTL_RT, rt_task(task)); + } + +Effectively, whenever `PAGEFAULT` is updated with a call to `ltl_atom_update()`, +`RT` is also fetched. Thus, the LTL specification can be verified without +tracing `RT` everywhere. + +For atomic propositions which act like events, they usually need to be set (or +cleared) and then immediately cleared (or set). A convenient function is +provided:: + + void ltl_atom_pulse(struct task_struct *task, enum ltl_atom atom, bool value) + +which is equivalent to:: + + ltl_atom_update(task, atom, value); + ltl_atom_update(task, atom, !value); + +To initialize the atomic propositions, the following function must be +implemented:: + + ltl_atoms_init(struct task_struct *task, struct ltl_monitor *mon, bool task_creation) + +This function is called for all running tasks when the monitor is enabled. It is +also called for new tasks created after the enabling the monitor. It should +initialize as many atomic propositions as possible, for example:: + + void ltl_atom_init(struct task_struct *task, struct ltl_monitor *mon, bool task_creation) + { + ltl_atom_set(mon, LTL_RT, rt_task(task)); + if (task_creation) + ltl_atom_set(mon, LTL_PAGEFAULT, false); + } + +Atomic propositions not initialized by `ltl_atom_init()` will stay in the +unknown state until relevant tracepoints are hit, which can take some time. As +monitoring for a task cannot be done until all atomic propositions is known for +the task, the monitor may need some time to start validating tasks which have +been running before the monitor is enabled. Therefore, it is recommended to +start the tasks of interest after enabling the monitor. + +Final remarks +------------- + +With the monitor synthesis in place using the header files and +rvgen, the developer's work should be limited to the instrumentation +of the system, increasing the confidence in the overall approach. + +[1] For details about deterministic automata format and the translation +from one representation to another, see:: + + Documentation/trace/rv/deterministic_automata.rst + +[2] rvgen appends the monitor's name suffix to the events enums to +avoid conflicting variables when exporting the global vmlinux.h +use by BPF programs. diff --git a/Documentation/trace/rv/runtime-verification.rst b/Documentation/trace/rv/runtime-verification.rst index dae78dfa7cdc..c700dde9259c 100644 --- a/Documentation/trace/rv/runtime-verification.rst +++ b/Documentation/trace/rv/runtime-verification.rst @@ -8,14 +8,14 @@ checking* and *theorem proving*) with a more practical approach for complex systems. Instead of relying on a fine-grained model of a system (e.g., a -re-implementation a instruction level), RV works by analyzing the trace of the +re-implementation at instruction level), RV works by analyzing the trace of the system's actual execution, comparing it against a formal specification of the system behavior. The main advantage is that RV can give precise information on the runtime behavior of the monitored system, without the pitfalls of developing models that require a re-implementation of the entire system in a modeling language. -Moreover, given an efficient monitoring method, it is possible execute an +Moreover, given an efficient monitoring method, it is possible to execute an *online* verification of a system, enabling the *reaction* for unexpected events, avoiding, for example, the propagation of a failure on safety-critical systems. |