<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/linux/sched/ext.h, branch v7.2-rc1</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v7.2-rc1</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v7.2-rc1'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-05-10T22:51:09+00:00</updated>
<entry>
<title>Merge branch 'for-7.1-fixes' into for-7.2</title>
<updated>2026-05-10T22:51:09+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2026-05-10T22:51:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b5646c627652241f248f0a4ff31b1d32316b0068'/>
<id>urn:sha1:b5646c627652241f248f0a4ff31b1d32316b0068</id>
<content type='text'>
Conflict between:

 [1] 41e3312861ea ("sched_ext: add p-&gt;scx.tid and SCX_OPS_TID_TO_TASK lookup")
 [2] c941d7391f25 ("sched_ext: Close root-enable vs sched_ext_dead() race with SCX_TASK_INIT_BEGIN")

in scx_root_enable_workfn()'s post-init block. [1] added a tid hash
insertion under a scoped_guard() for scx_tasks_lock; [2] wraps the same
region in task_rq_lock() for a DEAD recheck. A naive merge would invert the
iter's outer/inner order.

 [3] f25ad1e3cbaa ("sched_ext: Add scx_task_iter_relock() and use it in scx_root_enable_workfn()")

was added to for-7.2 for a clean resolution: scx_task_iter_relock(iter, p)
takes both scx_tasks_lock and @p's rq lock in iter order.

Resolved by routing both sides through [3]'s dual-lock helper: the post-init
region runs under a single scx_task_iter_relock() acquisition, with [2]'s
state machine and [1]'s hash insert in sequence inside it.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched_ext: Close root-enable vs sched_ext_dead() race with SCX_TASK_INIT_BEGIN</title>
<updated>2026-05-10T20:08:16+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2026-05-10T20:08:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c941d7391f258d5d06e0f7e962a52f99a547a83e'/>
<id>urn:sha1:c941d7391f258d5d06e0f7e962a52f99a547a83e</id>
<content type='text'>
scx_root_enable_workfn() drops the iter rq lock for ops.init_task() and a
TASK_DEAD @p can fall through sched_ext_dead() in that window. The race hits
when sched_ext_dead() observes SCX_TASK_INIT (the intermediate state before
@p-&gt;scx.sched is published) and dereferences NULL via SCX_HAS_OP(NULL,
exit_task), or observes SCX_TASK_NONE during the unlocked init window and
skips cleanup so exit_task() never runs.

Add SCX_TASK_INIT_BEGIN. The enable path writes NONE -&gt; INIT_BEGIN under the
iter rq lock, then takes the rq lock again after init to walk INIT_BEGIN -&gt;
INIT -&gt; READY. sched_ext_dead() that wins the rq-lock race observes
INIT_BEGIN and sets DEAD without calling into ops; the post-init recheck
unwinds via scx_sub_init_cancel_task().

scx_fork() runs single-threaded against sched_ext_dead() (the task is not on
scx_tasks until scx_post_fork() adds it) so its INIT_BEGIN -&gt; INIT walk
needs no rq-lock pairing; it rolls back to NONE on ops.init_task() failure.

The validation matrix grows the INIT_BEGIN row and the INIT_BEGIN -&gt; DEAD
edge; INIT now requires INIT_BEGIN as the predecessor. scx_sub_disable()'s
migration writes INIT_BEGIN as a synthetic predecessor to satisfy the
tightened verification.

The sub-sched paths still race with sched_ext_dead() during the unlocked
init window. This will be fixed by the next patch.

Reported-by: zhidao su &lt;suzhidao@xiaomi.com&gt;
Link: https://lore.kernel.org/all/20260429133155.3825247-1-suzhidao@xiaomi.com/
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Andrea Righi &lt;arighi@nvidia.com&gt;
</content>
</entry>
<entry>
<title>sched_ext: Replace SCX_TASK_OFF_TASKS flag with SCX_TASK_DEAD state</title>
<updated>2026-05-10T20:08:16+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2026-05-10T20:08:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cceb8fa9cb2cf98e31d81ecf6353b6ba5ac57744'/>
<id>urn:sha1:cceb8fa9cb2cf98e31d81ecf6353b6ba5ac57744</id>
<content type='text'>
SCX_TASK_OFF_TASKS marked tasks already through sched_ext_dead() so cgroup
task iteration would skip them. This can be expressed better with a task
state. Replace the flag with SCX_TASK_DEAD.

scx_disable_and_exit_task() resets state to NONE on its way out, so
sched_ext_dead() now sets DEAD after the wrapper returns. The validation
matrix grows NONE -&gt; DEAD, warns on DEAD -&gt; NONE, and tightens READY's
predecessor to INIT or ENABLED so the new DEAD value cannot silently
transition to READY.

Prepares for the following enable vs dead race fix.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Andrea Righi &lt;arighi@nvidia.com&gt;
</content>
</entry>
<entry>
<title>sched_ext: Skip past-sched_ext_dead() tasks in scx_task_iter_next_locked()</title>
<updated>2026-05-04T19:06:03+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2026-04-28T00:16:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ff9eda4ea906b1f02fc260ddc42d2d9bd736a49c'/>
<id>urn:sha1:ff9eda4ea906b1f02fc260ddc42d2d9bd736a49c</id>
<content type='text'>
scx_task_iter's cgroup-scoped mode can return tasks whose
sched_ext_dead() has already completed: cgroup_task_dead() removes
from cset-&gt;tasks after sched_ext_dead() in finish_task_switch() and is
irq-work deferred on PREEMPT_RT. The global mode is fine -
sched_ext_dead() removes from scx_tasks via list_del_init() first.

Callers (sub-sched enable prep/abort/apply, scx_sub_disable(),
scx_fail_parent()) assume returned tasks are still on @sch and trip
WARN_ON_ONCE() or operate on torn-down state otherwise.

Set %SCX_TASK_OFF_TASKS in sched_ext_dead() under @p's rq lock and
have scx_task_iter_next_locked() skip flagged tasks under the same
lock. Setter and reader serialize on the per-task rq lock - no race.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched_ext: add p-&gt;scx.tid and SCX_OPS_TID_TO_TASK lookup</title>
<updated>2026-04-20T16:55:33+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2026-04-19T18:36:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=41e3312861eafba171d9620150aaf2e99165d044'/>
<id>urn:sha1:41e3312861eafba171d9620150aaf2e99165d044</id>
<content type='text'>
BPF schedulers that can't hold task_struct pointers (arena-backed ones in
particular) key tasks by pid. During exit, pid is released before the
task finishes passing through scheduler callbacks, so a dying task
becomes invisible to the BPF side mid-schedule. scx_qmap hits this: an
exiting task's dispatch callback can't recover its queue entry, stalling
dispatch until SCX_EXIT_ERROR_STALL.

Add a unique non-zero u64 p-&gt;scx.tid assigned at fork that survives the
full task lifetime including exit. scx_bpf_tid_to_task() looks up the
task; unlike bpf_task_from_pid(), it handles exiting tasks.

The lookup costs an rhashtable insert/remove under scx_tasks_lock, so
root schedulers opt in via SCX_OPS_TID_TO_TASK. Sub-schedulers that set
the flag to declare a dependency are rejected at attach if root didn't
opt in.

scx_qmap converted: keys tasks by tid and enables SCX_OPS_ENQ_EXITING.
Pre-patch it stalls within seconds under a non-leader-exec workload;
with the patch it runs cleanly.

v3: Warn on rhashtable_lookup_insert_fast() failure via new
    scx_tid_hash_insert() helper (Cheng-Yang Chou).

v2: Guard scx_root deref in scx_bpf_tid_to_task() error path. The kfunc
    is registered via scx_kfunc_set_any and reachable from tracing and
    syscall programs when no scheduler is attached (Cheng-Yang Chou).

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Cheng-Yang Chou &lt;yphbchou0911@gmail.com&gt;
Reviewed-by: Andrea Righi &lt;arighi@nvidia.com&gt;
</content>
</entry>
<entry>
<title>sched_ext: Remove runtime kfunc mask enforcement</title>
<updated>2026-04-10T17:54:06+00:00</updated>
<author>
<name>Cheng-Yang Chou</name>
<email>yphbchou0911@gmail.com</email>
</author>
<published>2026-04-10T17:54:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7cd9a5d7d4b75802b97aa89f6f53375a6d84d1d5'/>
<id>urn:sha1:7cd9a5d7d4b75802b97aa89f6f53375a6d84d1d5</id>
<content type='text'>
Now that scx_kfunc_context_filter enforces context-sensitive kfunc
restrictions at BPF load time, the per-task runtime enforcement via
scx_kf_mask is redundant. Remove it entirely:

 - Delete enum scx_kf_mask, the kf_mask field on sched_ext_entity, and
   the scx_kf_allow()/scx_kf_disallow()/scx_kf_allowed() helpers along
   with the higher_bits()/highest_bit() helpers they used.
 - Strip the @mask parameter (and the BUILD_BUG_ON checks) from the
   SCX_CALL_OP[_RET]/SCX_CALL_OP_TASK[_RET]/SCX_CALL_OP_2TASKS_RET
   macros and update every call site. Reflow call sites that were
   wrapped only to fit the old 5-arg form and now collapse onto a single
   line under ~100 cols.
 - Remove the in-kfunc scx_kf_allowed() runtime checks from
   scx_dsq_insert_preamble(), scx_dsq_move(), scx_bpf_dispatch_nr_slots(),
   scx_bpf_dispatch_cancel(), scx_bpf_dsq_move_to_local___v2(),
   scx_bpf_sub_dispatch(), scx_bpf_reenqueue_local(), and the per-call
   guard inside select_cpu_from_kfunc().

scx_bpf_task_cgroup() and scx_kf_allowed_on_arg_tasks() were already
cleaned up in the "drop redundant rq-locked check" patch.
scx_kf_allowed_if_unlocked() was rewritten in the preceding "decouple"
patch. No further changes to those helpers here.

Co-developed-by: Juntong Deng &lt;juntong.deng@outlook.com&gt;
Signed-off-by: Juntong Deng &lt;juntong.deng@outlook.com&gt;
Signed-off-by: Cheng-Yang Chou &lt;yphbchou0911@gmail.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Andrea Righi &lt;arighi@nvidia.com&gt;
</content>
</entry>
<entry>
<title>sched_ext: Implement SCX_ENQ_IMMED</title>
<updated>2026-03-13T19:43:22+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2026-03-13T19:43:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=98d709cba3193f0bec54da4cd76ef499ea2f1ef7'/>
<id>urn:sha1:98d709cba3193f0bec54da4cd76ef499ea2f1ef7</id>
<content type='text'>
Add SCX_ENQ_IMMED enqueue flag for local DSQ insertions. Once a task is
dispatched with IMMED, it either gets on the CPU immediately and stays on it,
or gets reenqueued back to the BPF scheduler. It will never linger on a local
DSQ behind other tasks or on a CPU taken by a higher-priority class.

rq_is_open() uses rq-&gt;next_class to determine whether the rq is available,
and wakeup_preempt_scx() triggers reenqueue when a higher-priority class task
arrives. These capture all higher class preemptions. Combined with reenqueue
points in the dispatch path, all cases where an IMMED task would not execute
immediately are covered.

SCX_TASK_IMMED persists in p-&gt;scx.flags until the next fresh enqueue, so the
guarantee survives SAVE/RESTORE cycles. If preempted while running,
put_prev_task_scx() reenqueues through ops.enqueue() with
SCX_TASK_REENQ_PREEMPTED instead of silently placing the task back on the
local DSQ.

This enables tighter scheduling latency control by preventing tasks from
piling up on local DSQs. It also enables opportunistic CPU sharing across
sub-schedulers - without this, a sub-scheduler can stuff the local DSQ of a
shared CPU, making it difficult for others to use.

v2: - Rewrite is_curr_done() as rq_is_open() using rq-&gt;next_class and
      implement wakeup_preempt_scx() to achieve complete coverage of all
      cases where IMMED tasks could get stranded.
    - Track IMMED persistently in p-&gt;scx.flags and reenqueue
      preempted-while-running tasks through ops.enqueue().
    - Bound deferred reenq cycles (SCX_REENQ_LOCAL_MAX_REPEAT).
    - Misc renames, documentation.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Andrea Righi &lt;arighi@nvidia.com&gt;
</content>
</entry>
<entry>
<title>sched_ext: Add SCX_TASK_REENQ_REASON flags</title>
<updated>2026-03-07T15:29:50+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2026-03-07T15:29:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ce897abc21b2d5e74981ff2b848f3a08a580d50a'/>
<id>urn:sha1:ce897abc21b2d5e74981ff2b848f3a08a580d50a</id>
<content type='text'>
SCX_ENQ_REENQ indicates that a task is being re-enqueued but doesn't tell the
BPF scheduler why. Add SCX_TASK_REENQ_REASON flags using bits 12-13 of
p-&gt;scx.flags to communicate the reason during ops.enqueue():

- NONE: Not being reenqueued
- KFUNC: Reenqueued by scx_bpf_dsq_reenq() and friends

More reasons will be added.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Andrea Righi &lt;arighi@nvidia.com&gt;
</content>
</entry>
<entry>
<title>sched_ext: Simplify task state handling</title>
<updated>2026-03-07T15:29:50+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2026-03-07T15:29:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7203d77d6e04f83f7b78838eed099d9cac31700b'/>
<id>urn:sha1:7203d77d6e04f83f7b78838eed099d9cac31700b</id>
<content type='text'>
Task states (NONE, INIT, READY, ENABLED) were defined in a separate enum with
unshifted values and then shifted when stored in scx_entity.flags. Simplify by
defining them as pre-shifted values directly in scx_ent_flags and removing the
separate scx_task_state enum. This removes the need for shifting when
reading/writing state values.

scx_get_task_state() now returns the masked flags value directly.
scx_set_task_state() accepts the pre-shifted state value. scx_dump_task()
shifts down for display to maintain readable output.

No functional changes.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Andrea Righi &lt;arighi@nvidia.com&gt;
</content>
</entry>
<entry>
<title>sched_ext: Implement scx_bpf_dsq_reenq() for user DSQs</title>
<updated>2026-03-07T15:29:50+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2026-03-07T15:29:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=84b1a0ea0b7c23dec240783a592e480780efe459'/>
<id>urn:sha1:84b1a0ea0b7c23dec240783a592e480780efe459</id>
<content type='text'>
scx_bpf_dsq_reenq() currently only supports local DSQs. Extend it to support
user-defined DSQs by adding a deferred re-enqueue mechanism similar to the
local DSQ handling.

Add per-cpu deferred_reenq_user_node/flags to scx_dsq_pcpu and
deferred_reenq_users list to scx_rq. When scx_bpf_dsq_reenq() is called on a
user DSQ, the DSQ's per-cpu node is added to the current rq's deferred list.
process_deferred_reenq_users() then iterates the DSQ using the cursor helpers
and re-enqueues each task.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Andrea Righi &lt;arighi@nvidia.com&gt;
</content>
</entry>
</feed>
