summaryrefslogtreecommitdiff
path: root/Documentation/scheduler/sched-ext.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/scheduler/sched-ext.rst')
-rw-r--r--Documentation/scheduler/sched-ext.rst132
1 files changed, 84 insertions, 48 deletions
diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst
index 7b59bbd2e564..a1869c38046e 100644
--- a/Documentation/scheduler/sched-ext.rst
+++ b/Documentation/scheduler/sched-ext.rst
@@ -1,3 +1,5 @@
+.. _sched-ext:
+
==========================
Extensible Scheduler Class
==========================
@@ -16,12 +18,12 @@ programs - the BPF scheduler.
* The system integrity is maintained no matter what the BPF scheduler does.
The default scheduling behavior is restored anytime an error is detected,
a runnable task stalls, or on invoking the SysRq key sequence
- :kbd:`SysRq-S`.
+ `SysRq-S`.
* When the BPF scheduler triggers an error, debug information is dumped to
aid debugging. The debug dump is passed to and printed out by the
scheduler binary. The debug dump can also be accessed through the
- `sched_ext_dump` tracepoint. The SysRq key sequence :kbd:`SysRq-D`
+ `sched_ext_dump` tracepoint. The SysRq key sequence `SysRq-D`
triggers a debug dump. This doesn't terminate the BPF scheduler and can
only be read through the tracepoint.
@@ -47,8 +49,8 @@ options should be enabled to use sched_ext:
sched_ext is used only when the BPF scheduler is loaded and running.
If a task explicitly sets its scheduling policy to ``SCHED_EXT``, it will be
-treated as ``SCHED_NORMAL`` and scheduled by CFS until the BPF scheduler is
-loaded.
+treated as ``SCHED_NORMAL`` and scheduled by the fair-class scheduler until the
+BPF scheduler is loaded.
When the BPF scheduler is loaded and ``SCX_OPS_SWITCH_PARTIAL`` is not set
in ``ops->flags``, all ``SCHED_NORMAL``, ``SCHED_BATCH``, ``SCHED_IDLE``, and
@@ -57,11 +59,11 @@ in ``ops->flags``, all ``SCHED_NORMAL``, ``SCHED_BATCH``, ``SCHED_IDLE``, and
However, when the BPF scheduler is loaded and ``SCX_OPS_SWITCH_PARTIAL`` is
set in ``ops->flags``, only tasks with the ``SCHED_EXT`` policy are scheduled
by sched_ext, while tasks with ``SCHED_NORMAL``, ``SCHED_BATCH`` and
-``SCHED_IDLE`` policies are scheduled by CFS.
+``SCHED_IDLE`` policies are scheduled by the fair-class scheduler.
-Terminating the sched_ext scheduler program, triggering :kbd:`SysRq-S`, or
+Terminating the sched_ext scheduler program, triggering `SysRq-S`, or
detection of any internal error including stalled runnable tasks aborts the
-BPF scheduler and reverts all tasks back to CFS.
+BPF scheduler and reverts all tasks back to the fair-class scheduler.
.. code-block:: none
@@ -107,8 +109,7 @@ detailed information:
nr_rejected : 0
enable_seq : 1
-If ``CONFIG_SCHED_DEBUG`` is set, whether a given task is on sched_ext can
-be determined as follows:
+Whether a given task is on sched_ext can be determined as follows:
.. code-block:: none
@@ -130,7 +131,7 @@ optional. The following modified excerpt is from
* Decide which CPU a task should be migrated to before being
* enqueued (either at wakeup, fork time, or exec time). If an
* idle core is found by the default ops.select_cpu() implementation,
- * then dispatch the task directly to SCX_DSQ_LOCAL and skip the
+ * then insert the task directly into SCX_DSQ_LOCAL and skip the
* ops.enqueue() callback.
*
* Note that this implementation has exactly the same behavior as the
@@ -148,15 +149,15 @@ optional. The following modified excerpt is from
cpu = scx_bpf_select_cpu_dfl(p, prev_cpu, wake_flags, &direct);
if (direct)
- scx_bpf_dispatch(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL, 0);
+ scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL, 0);
return cpu;
}
/*
- * Do a direct dispatch of a task to the global DSQ. This ops.enqueue()
- * callback will only be invoked if we failed to find a core to dispatch
- * to in ops.select_cpu() above.
+ * Do a direct insertion of a task to the global DSQ. This ops.enqueue()
+ * callback will only be invoked if we failed to find a core to insert
+ * into in ops.select_cpu() above.
*
* Note that this implementation has exactly the same behavior as the
* default ops.enqueue implementation, which just dispatches the task
@@ -166,7 +167,7 @@ optional. The following modified excerpt is from
*/
void BPF_STRUCT_OPS(simple_enqueue, struct task_struct *p, u64 enq_flags)
{
- scx_bpf_dispatch(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, enq_flags);
+ scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, enq_flags);
}
s32 BPF_STRUCT_OPS_SLEEPABLE(simple_init)
@@ -198,18 +199,17 @@ Dispatch Queues
To match the impedance between the scheduler core and the BPF scheduler,
sched_ext uses DSQs (dispatch queues) which can operate as both a FIFO and a
priority queue. By default, there is one global FIFO (``SCX_DSQ_GLOBAL``),
-and one local dsq per CPU (``SCX_DSQ_LOCAL``). The BPF scheduler can manage
-an arbitrary number of dsq's using ``scx_bpf_create_dsq()`` and
+and one local DSQ per CPU (``SCX_DSQ_LOCAL``). The BPF scheduler can manage
+an arbitrary number of DSQs using ``scx_bpf_create_dsq()`` and
``scx_bpf_destroy_dsq()``.
-A CPU always executes a task from its local DSQ. A task is "dispatched" to a
-DSQ. A non-local DSQ is "consumed" to transfer a task to the consuming CPU's
-local DSQ.
+A CPU always executes a task from its local DSQ. A task is "inserted" into a
+DSQ. A task in a non-local DSQ is "move"d into the target CPU's local DSQ.
When a CPU is looking for the next task to run, if the local DSQ is not
-empty, the first task is picked. Otherwise, the CPU tries to consume the
-global DSQ. If that doesn't yield a runnable task either, ``ops.dispatch()``
-is invoked.
+empty, the first task is picked. Otherwise, the CPU tries to move a task
+from the global DSQ. If that doesn't yield a runnable task either,
+``ops.dispatch()`` is invoked.
Scheduling Cycle
----------------
@@ -229,26 +229,26 @@ The following briefly shows how a waking task is scheduled and executed.
scheduler can wake up any cpu using the ``scx_bpf_kick_cpu()`` helper,
using ``ops.select_cpu()`` judiciously can be simpler and more efficient.
- A task can be immediately dispatched to a DSQ from ``ops.select_cpu()`` by
- calling ``scx_bpf_dispatch()``. If the task is dispatched to
- ``SCX_DSQ_LOCAL`` from ``ops.select_cpu()``, it will be dispatched to the
+ A task can be immediately inserted into a DSQ from ``ops.select_cpu()``
+ by calling ``scx_bpf_dsq_insert()``. If the task is inserted into
+ ``SCX_DSQ_LOCAL`` from ``ops.select_cpu()``, it will be inserted into the
local DSQ of whichever CPU is returned from ``ops.select_cpu()``.
- Additionally, dispatching directly from ``ops.select_cpu()`` will cause the
+ Additionally, inserting directly from ``ops.select_cpu()`` will cause the
``ops.enqueue()`` callback to be skipped.
Note that the scheduler core will ignore an invalid CPU selection, for
example, if it's outside the allowed cpumask of the task.
2. Once the target CPU is selected, ``ops.enqueue()`` is invoked (unless the
- task was dispatched directly from ``ops.select_cpu()``). ``ops.enqueue()``
+ task was inserted directly from ``ops.select_cpu()``). ``ops.enqueue()``
can make one of the following decisions:
- * Immediately dispatch the task to either the global or local DSQ by
- calling ``scx_bpf_dispatch()`` with ``SCX_DSQ_GLOBAL`` or
- ``SCX_DSQ_LOCAL``, respectively.
+ * Immediately insert the task into either the global or a local DSQ by
+ calling ``scx_bpf_dsq_insert()`` with one of the following options:
+ ``SCX_DSQ_GLOBAL``, ``SCX_DSQ_LOCAL``, or ``SCX_DSQ_LOCAL_ON | cpu``.
- * Immediately dispatch the task to a custom DSQ by calling
- ``scx_bpf_dispatch()`` with a DSQ ID which is smaller than 2^63.
+ * Immediately insert the task into a custom DSQ by calling
+ ``scx_bpf_dsq_insert()`` with a DSQ ID which is smaller than 2^63.
* Queue the task on the BPF side.
@@ -257,23 +257,23 @@ The following briefly shows how a waking task is scheduled and executed.
run, ``ops.dispatch()`` is invoked which can use the following two
functions to populate the local DSQ.
- * ``scx_bpf_dispatch()`` dispatches a task to a DSQ. Any target DSQ can
- be used - ``SCX_DSQ_LOCAL``, ``SCX_DSQ_LOCAL_ON | cpu``,
- ``SCX_DSQ_GLOBAL`` or a custom DSQ. While ``scx_bpf_dispatch()``
+ * ``scx_bpf_dsq_insert()`` inserts a task to a DSQ. Any target DSQ can be
+ used - ``SCX_DSQ_LOCAL``, ``SCX_DSQ_LOCAL_ON | cpu``,
+ ``SCX_DSQ_GLOBAL`` or a custom DSQ. While ``scx_bpf_dsq_insert()``
currently can't be called with BPF locks held, this is being worked on
- and will be supported. ``scx_bpf_dispatch()`` schedules dispatching
+ and will be supported. ``scx_bpf_dsq_insert()`` schedules insertion
rather than performing them immediately. There can be up to
``ops.dispatch_max_batch`` pending tasks.
- * ``scx_bpf_consume()`` tranfers a task from the specified non-local DSQ
- to the dispatching DSQ. This function cannot be called with any BPF
- locks held. ``scx_bpf_consume()`` flushes the pending dispatched tasks
- before trying to consume the specified DSQ.
+ * ``scx_bpf_move_to_local()`` moves a task from the specified non-local
+ DSQ to the dispatching DSQ. This function cannot be called with any BPF
+ locks held. ``scx_bpf_move_to_local()`` flushes the pending insertions
+ tasks before trying to move from the specified DSQ.
4. After ``ops.dispatch()`` returns, if there are tasks in the local DSQ,
the CPU runs the first one. If empty, the following steps are taken:
- * Try to consume the global DSQ. If successful, run the task.
+ * Try to move from the global DSQ. If successful, run the task.
* If ``ops.dispatch()`` has dispatched any tasks, retry #3.
@@ -286,14 +286,50 @@ Note that the BPF scheduler can always choose to dispatch tasks immediately
in ``ops.enqueue()`` as illustrated in the above simple example. If only the
built-in DSQs are used, there is no need to implement ``ops.dispatch()`` as
a task is never queued on the BPF scheduler and both the local and global
-DSQs are consumed automatically.
+DSQs are executed automatically.
-``scx_bpf_dispatch()`` queues the task on the FIFO of the target DSQ. Use
-``scx_bpf_dispatch_vtime()`` for the priority queue. Internal DSQs such as
+``scx_bpf_dsq_insert()`` inserts the task on the FIFO of the target DSQ. Use
+``scx_bpf_dsq_insert_vtime()`` for the priority queue. Internal DSQs such as
``SCX_DSQ_LOCAL`` and ``SCX_DSQ_GLOBAL`` do not support priority-queue
-dispatching, and must be dispatched to with ``scx_bpf_dispatch()``. See the
-function documentation and usage in ``tools/sched_ext/scx_simple.bpf.c`` for
-more information.
+dispatching, and must be dispatched to with ``scx_bpf_dsq_insert()``. See
+the function documentation and usage in ``tools/sched_ext/scx_simple.bpf.c``
+for more information.
+
+Task Lifecycle
+--------------
+
+The following pseudo-code summarizes the entire lifecycle of a task managed
+by a sched_ext scheduler:
+
+.. code-block:: c
+
+ ops.init_task(); /* A new task is created */
+ ops.enable(); /* Enable BPF scheduling for the task */
+
+ while (task in SCHED_EXT) {
+ if (task can migrate)
+ ops.select_cpu(); /* Called on wakeup (optimization) */
+
+ ops.runnable(); /* Task becomes ready to run */
+
+ while (task is runnable) {
+ if (task is not in a DSQ) {
+ ops.enqueue(); /* Task can be added to a DSQ */
+
+ /* A CPU becomes available */
+
+ ops.dispatch(); /* Task is moved to a local DSQ */
+ }
+ ops.running(); /* Task starts running on its assigned CPU */
+ ops.tick(); /* Called every 1/HZ seconds */
+ ops.stopping(); /* Task stops running (time slice expires or wait) */
+ }
+
+ ops.quiescent(); /* Task releases its assigned CPU (wait) */
+ }
+
+ ops.disable(); /* Disable BPF scheduling for the task */
+ ops.exit_task(); /* Task is destroyed */
Where to Look
=============