summaryrefslogtreecommitdiff
path: root/Documentation/core-api/real-time/theory.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/core-api/real-time/theory.rst')
-rw-r--r--Documentation/core-api/real-time/theory.rst116
1 files changed, 116 insertions, 0 deletions
diff --git a/Documentation/core-api/real-time/theory.rst b/Documentation/core-api/real-time/theory.rst
new file mode 100644
index 000000000000..43d0120737f8
--- /dev/null
+++ b/Documentation/core-api/real-time/theory.rst
@@ -0,0 +1,116 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================
+Theory of operation
+=====================
+
+:Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+
+Preface
+=======
+
+PREEMPT_RT transforms the Linux kernel into a real-time kernel. It achieves
+this by replacing locking primitives, such as spinlock_t, with a preemptible
+and priority-inheritance aware implementation known as rtmutex, and by enforcing
+the use of threaded interrupts. As a result, the kernel becomes fully
+preemptible, with the exception of a few critical code paths, including entry
+code, the scheduler, and low-level interrupt handling routines.
+
+This transformation places the majority of kernel execution contexts under the
+control of the scheduler and significantly increasing the number of preemption
+points. Consequently, it reduces the latency between a high-priority task
+becoming runnable and its actual execution on the CPU.
+
+Scheduling
+==========
+
+The core principles of Linux scheduling and the associated user-space API are
+documented in the man page sched(7)
+`sched(7) <https://man7.org/linux/man-pages/man7/sched.7.html>`_.
+By default, the Linux kernel uses the SCHED_OTHER scheduling policy. Under
+this policy, a task is preempted when the scheduler determines that it has
+consumed a fair share of CPU time relative to other runnable tasks. However,
+the policy does not guarantee immediate preemption when a new SCHED_OTHER task
+becomes runnable. The currently running task may continue executing.
+
+This behavior differs from that of real-time scheduling policies such as
+SCHED_FIFO. When a task with a real-time policy becomes runnable, the
+scheduler immediately selects it for execution if it has a higher priority than
+the currently running task. The task continues to run until it voluntarily
+yields the CPU, typically by blocking on an event.
+
+Sleeping spin locks
+===================
+
+The various lock types and their behavior under real-time configurations are
+described in detail in Documentation/locking/locktypes.rst.
+In a non-PREEMPT_RT configuration, a spinlock_t is acquired by first disabling
+preemption and then actively spinning until the lock becomes available. Once
+the lock is released, preemption is enabled. From a real-time perspective,
+this approach is undesirable because disabling preemption prevents the
+scheduler from switching to a higher-priority task, potentially increasing
+latency.
+
+To address this, PREEMPT_RT replaces spinning locks with sleeping spin locks
+that do not disable preemption. On PREEMPT_RT, spinlock_t is implemented using
+rtmutex. Instead of spinning, a task attempting to acquire a contended lock
+disables CPU migration, donates its priority to the lock owner (priority
+inheritance), and voluntarily schedules out while waiting for the lock to
+become available.
+
+Disabling CPU migration provides the same effect as disabling preemption, while
+still allowing preemption and ensuring that the task continues to run on the
+same CPU while holding a sleeping lock.
+
+Priority inheritance
+====================
+
+Lock types such as spinlock_t and mutex_t in a PREEMPT_RT enabled kernel are
+implemented on top of rtmutex, which provides support for priority inheritance
+(PI). When a task blocks on such a lock, the PI mechanism temporarily
+propagates the blocked task’s scheduling parameters to the lock owner.
+
+For example, if a SCHED_FIFO task A blocks on a lock currently held by a
+SCHED_OTHER task B, task A’s scheduling policy and priority are temporarily
+inherited by task B. After this inheritance, task A is put to sleep while
+waiting for the lock, and task B effectively becomes the highest-priority task
+in the system. This allows B to continue executing, make progress, and
+eventually release the lock.
+
+Once B releases the lock, it reverts to its original scheduling parameters, and
+task A can resume execution.
+
+Threaded interrupts
+===================
+
+Interrupt handlers are another source of code that executes with preemption
+disabled and outside the control of the scheduler. To bring interrupt handling
+under scheduler control, PREEMPT_RT enforces threaded interrupt handlers.
+
+With forced threading, interrupt handling is split into two stages. The first
+stage, the primary handler, is executed in IRQ context with interrupts disabled.
+Its sole responsibility is to wake the associated threaded handler. The second
+stage, the threaded handler, is the function passed to request_irq() as the
+interrupt handler. It runs in process context, scheduled by the kernel.
+
+From waking the interrupt thread until threaded handling is completed, the
+interrupt source is masked in the interrupt controller. This ensures that the
+device interrupt remains pending but does not retrigger the CPU, allowing the
+system to exit IRQ context and handle the interrupt in a scheduled thread.
+
+By default, the threaded handler executes with the SCHED_FIFO scheduling policy
+and a priority of 50 (MAX_RT_PRIO / 2), which is midway between the minimum and
+maximum real-time priorities.
+
+If the threaded interrupt handler raises any soft interrupts during its
+execution, those soft interrupt routines are invoked after the threaded handler
+completes, within the same thread. Preemption remains enabled during the
+execution of the soft interrupt handler.
+
+Summary
+=======
+
+By using sleeping locks and forced-threaded interrupts, PREEMPT_RT
+significantly reduces sections of code where interrupts or preemption is
+disabled, allowing the scheduler to preempt the current execution context and
+switch to a higher-priority task.