sched_ext: Fix ops->priv clobber on concurrent attach/detach - kernel/linux.git

diff options

author	Andrea Righi <arighi@nvidia.com>	2026-05-11 09:18:12 +0300
committer	Tejun Heo <tj@kernel.org>	2026-05-11 10:40:03 +0300
commit	bbf30b383cf6e87f2fe57c292fbd640b1d88b4c3 (patch)
tree	a72cae4440a56e13d74dde49ca56023bd17a1eee /tools/lib/python/__init__.py
parent	3788e32516530dee66cf9186f846480a16799b05 (diff)
download	linux-bbf30b383cf6e87f2fe57c292fbd640b1d88b4c3.tar.xz

sched_ext: Fix ops->priv clobber on concurrent attach/detach

Under heavy concurrent attach/detach operations, scx_claim_exit() can trigger a NULL pointer dereference. This can be reproduced running the reload_loop kselftests inside a virtme-ng session: $ vng -v -- ./tools/testing/selftests/sched_ext/runner -t reload_loop ... BUG: kernel NULL pointer dereference, address: 0000000000000400 RIP: 0010:scx_claim_exit+0x3b/0x120 Call Trace: <TASK> bpf_scx_unreg+0x45/0xb0 bpf_struct_ops_map_link_dealloc+0x39/0x50 bpf_link_release+0x18/0x20 __fput+0x10b/0x2e0 __x64_sys_close+0x47/0xa0 The underlying race (diagnosed by Tejun Heo) is a stomp of @ops->priv, not a missing NULL check: T2 unreg(K) T1 reg(K) ----------- --------- sch = ops->priv = sch_b800 scx_disable; flush_disable_work [scx_root_disable: scx_root=NULL, mutex_unlock, state=DISABLED] mutex_lock; state ok scx_alloc_and_add_sched: ops->priv = sch_a800 scx_root = sch_a800; init=0 state=ENABLED; mutex_unlock [flush returns] RCU_INIT_POINTER(ops->priv, NULL) <-- clobbers sch_a800 kobject_put(sch_b800) T1 acquires scx_enable_mutex inside scx_root_disable()'s mutex_unlock window and starts a fresh attach on the same kdata, assigning sch_a800 to @ops->priv. T2 then continues out of scx_disable()/flush_disable_work and clobbers @ops->priv to NULL, leaking sch_a800; the bpf_link is gone but state stays SCX_ENABLED, so all future attaches fail with -EBUSY permanently. The next bpf_scx_unreg() on that kdata then reads NULL @ops->priv and dereferences it in scx_claim_exit(). Make @ops->priv the lifecycle binding: in scx_root_enable_workfn() and scx_sub_enable_workfn(), after the existing state check and still under scx_enable_mutex, refuse with -EBUSY if @ops->priv is non-NULL. This rejects an attempt to reuse a kdata that is still bound to a previous scheduler instance, closing the race without changing the unreg side. Fixes: 105dcd005be2 ("sched_ext: Introduce scx_prog_sched()") Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>

Diffstat (limited to 'tools/lib/python/__init__.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: