From 41e3312861eafba171d9620150aaf2e99165d044 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Sun, 19 Apr 2026 08:36:45 -1000 Subject: sched_ext: add p->scx.tid and SCX_OPS_TID_TO_TASK lookup BPF schedulers that can't hold task_struct pointers (arena-backed ones in particular) key tasks by pid. During exit, pid is released before the task finishes passing through scheduler callbacks, so a dying task becomes invisible to the BPF side mid-schedule. scx_qmap hits this: an exiting task's dispatch callback can't recover its queue entry, stalling dispatch until SCX_EXIT_ERROR_STALL. Add a unique non-zero u64 p->scx.tid assigned at fork that survives the full task lifetime including exit. scx_bpf_tid_to_task() looks up the task; unlike bpf_task_from_pid(), it handles exiting tasks. The lookup costs an rhashtable insert/remove under scx_tasks_lock, so root schedulers opt in via SCX_OPS_TID_TO_TASK. Sub-schedulers that set the flag to declare a dependency are rejected at attach if root didn't opt in. scx_qmap converted: keys tasks by tid and enables SCX_OPS_ENQ_EXITING. Pre-patch it stalls within seconds under a non-leader-exec workload; with the patch it runs cleanly. v3: Warn on rhashtable_lookup_insert_fast() failure via new scx_tid_hash_insert() helper (Cheng-Yang Chou). v2: Guard scx_root deref in scx_bpf_tid_to_task() error path. The kfunc is registered via scx_kfunc_set_any and reachable from tracing and syscall programs when no scheduler is attached (Cheng-Yang Chou). Signed-off-by: Tejun Heo Reviewed-by: Cheng-Yang Chou Reviewed-by: Andrea Righi --- include/linux/sched/ext.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sched/ext.h b/include/linux/sched/ext.h index 1a3af2ea2a79..d05efcac794d 100644 --- a/include/linux/sched/ext.h +++ b/include/linux/sched/ext.h @@ -203,6 +203,15 @@ struct sched_ext_entity { u64 core_sched_at; /* see scx_prio_less() */ #endif + /* + * Unique non-zero task ID assigned at fork. Persists across exec and + * is never reused. Lets BPF schedulers identify tasks without storing + * kernel pointers - arena-backed schedulers being one example. See + * scx_bpf_tid_to_task(). + */ + u64 tid; + struct rhash_head tid_hash_node; /* see SCX_OPS_TID_TO_TASK */ + /* BPF scheduler modifiable fields */ /* -- cgit v1.2.3