<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/linux/sched/ext.h, branch v6.18.21</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-09-23T19:03:26+00:00</updated>
<entry>
<title>sched_ext: Improve SCX_KF_DISPATCH comment</title>
<updated>2025-09-23T19:03:26+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2025-09-23T19:03:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=edf005fa274a0c224e550a52726aa7a426384e36'/>
<id>urn:sha1:edf005fa274a0c224e550a52726aa7a426384e36</id>
<content type='text'>
The comment for SCX_KF_DISPATCH was incomplete and didn't explain that
ops.dispatch() may temporarily release the rq lock, allowing ENQUEUE and
SELECT_CPU operations to be nested inside DISPATCH contexts.

Update the comment to clarify this nesting behavior and provide better
context for when these operations can occur within dispatch.

Acked-by: Andrea Righi &lt;arighi@nvidia.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched_ext: Drop kfuncs marked for removal in 6.15</title>
<updated>2025-06-25T23:13:20+00:00</updated>
<author>
<name>Jake Hillion</name>
<email>jake@hillion.co.uk</email>
</author>
<published>2025-06-25T17:05:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4ecf83741401c70d4420588ee1f3b1ca04ef58d5'/>
<id>urn:sha1:4ecf83741401c70d4420588ee1f3b1ca04ef58d5</id>
<content type='text'>
sched_ext performed a kfunc renaming pass in 6.13 and kept the old names
around for compatibility with old binaries. These were scheduled for
cleanup in 6.15 but were missed. Submitting for cleanup in for-next.

Removed the kfuncs, their flags, and any references I could find to them
in doc comments. Left the entries in include/scx/compat.bpf.h as they're
still useful to make new binaries compatible with old kernels.

Tested by applying to my kernel. It builds and a modern version of
scx_lavd loads fine.

Signed-off-by: Jake Hillion &lt;jake@hillion.co.uk&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched_ext, rcu: Eject BPF scheduler on RCU CPU stall panic</title>
<updated>2025-06-24T23:05:26+00:00</updated>
<author>
<name>David Dai</name>
<email>david.dai@linux.dev</email>
</author>
<published>2025-06-24T22:49:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cb444006a625c60e6d4dd3753863c3c74f96aac3'/>
<id>urn:sha1:cb444006a625c60e6d4dd3753863c3c74f96aac3</id>
<content type='text'>
For systems using a sched_ext scheduler and has panic_on_rcu_stall
enabled, try kicking out the current scheduler before issuing a panic.

While there are numerous reasons for RCU CPU stalls that are not
directly attributed to the scheduler, deferring the panic gives
sched_ext an opportunity to provide additional debug info when ejecting
the current scheduler. Also, handling the event more gracefully allows
us to potentially recover the system instead of incurring additional
down time.

Suggested-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Signed-off-by: David Dai &lt;david.dai@linux.dev&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched_ext: Add support for cgroup bandwidth control interface</title>
<updated>2025-06-21T03:03:51+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2025-06-14T01:34:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ddceadce63d9cb752c2472e220ded05cabaf7971'/>
<id>urn:sha1:ddceadce63d9cb752c2472e220ded05cabaf7971</id>
<content type='text'>
From 077814f57f8acce13f91dc34bbd2b7e4911fbf25 Mon Sep 17 00:00:00 2001
From: Tejun Heo &lt;tj@kernel.org&gt;
Date: Fri, 13 Jun 2025 15:06:47 -1000

- Add CONFIG_GROUP_SCHED_BANDWIDTH which is selected by both
  CONFIG_CFS_BANDWIDTH and EXT_GROUP_SCHED.

- Put bandwidth control interface files for both cgroup v1 and v2 under
  CONFIG_GROUP_SCHED_BANDWIDTH.

- Update tg_bandwidth() to fetch configuration parameters from fair if
  CONFIG_CFS_BANDWIDTH, SCX otherwise.

- Update tg_set_bandwidth() to update the parameters for both fair and SCX.

- Add bandwidth control parameters to struct scx_cgroup_init_args.

- Add sched_ext_ops.cgroup_set_bandwidth() which is invoked on bandwidth
  control parameter updates.

- Update scx_qmap and maximal selftest to test the new feature.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched_ext, sched/core: Factor out struct scx_task_group</title>
<updated>2025-06-21T03:03:27+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2025-06-14T01:33:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6e6558a6bc418f1478c5dc8609d03805364e0cb9'/>
<id>urn:sha1:6e6558a6bc418f1478c5dc8609d03805364e0cb9</id>
<content type='text'>
More sched_ext fields will be added to struct task_group. In preparation,
factor out sched_ext fields into struct scx_task_group to reduce clutter in
the common header. No functional changes.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched_ext: Add an event, SCX_EV_SELECT_CPU_FALLBACK</title>
<updated>2025-02-02T17:23:18+00:00</updated>
<author>
<name>Changwoo Min</name>
<email>changwoo@igalia.com</email>
</author>
<published>2025-01-31T07:09:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f7f6142107f0e0dd4a2b041116461a049ca18cb0'/>
<id>urn:sha1:f7f6142107f0e0dd4a2b041116461a049ca18cb0</id>
<content type='text'>
Add a core event, SCX_EV_SELECT_CPU_FALLBACK, which represents how many times
ops.select_cpu() returns a CPU that the task can't use.

__scx_add_event() is used since the caller holds an rq lock,
so the preemption has already been disabled.

Signed-off-by: Changwoo Min &lt;changwoo@igalia.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'sched_ext-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext</title>
<updated>2024-11-20T18:08:00+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-11-20T18:08:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8f7c8b88bda4988f44e595a760438febf51c92c8'/>
<id>urn:sha1:8f7c8b88bda4988f44e595a760438febf51c92c8</id>
<content type='text'>
Pull sched_ext updates from Tejun Heo:

 - Improve the default select_cpu() implementation making it topology
   aware and handle WAKE_SYNC better.

 - set_arg_maybe_null() was used to inform the verifier which ops args
   could be NULL in a rather hackish way. Use the new __nullable CFI
   stub tags instead.

 - On Sapphire Rapids multi-socket systems, a BPF scheduler, by
   hammering on the same queue across sockets, could live-lock the
   system to the point where the system couldn't make reasonable forward
   progress.

   This could lead to soft-lockup triggered resets or stalling out
   bypass mode switch and thus BPF scheduler ejection for tens of
   minutes if not hours. After trying a number of mitigations, the
   following set worked reliably:

     - Injecting artificial cpu_relax() loops in two places while
       sched_ext is trying to turn on the bypass mode.

     - Triggering scheduler ejection when soft-lockup detection is
       imminent (a quarter of threshold left).

   While not the prettiest, the impact both in terms of code complexity
   and overhead is minimal.

 - A common complaint on the API is the overuse of the word "dispatch"
   and the confusion around "consume". This is due to how the dispatch
   queues became more generic over time. Rename the affected kfuncs for
   clarity. Thanks to BPF's compatibility features, this change can be
   made in a way that's both forward and backward compatible. The
   compatibility code will be dropped in a few releases.

 - Other misc changes

* tag 'sched_ext-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (21 commits)
  sched_ext: Replace scx_next_task_picked() with switch_class() in comment
  sched_ext: Rename scx_bpf_dispatch[_vtime]_from_dsq*() -&gt; scx_bpf_dsq_move[_vtime]*()
  sched_ext: Rename scx_bpf_consume() to scx_bpf_dsq_move_to_local()
  sched_ext: Rename scx_bpf_dispatch[_vtime]() to scx_bpf_dsq_insert[_vtime]()
  sched_ext: scx_bpf_dispatch_from_dsq_set_*() are allowed from unlocked context
  sched_ext: add a missing rcu_read_lock/unlock pair at scx_select_cpu_dfl()
  sched_ext: Clarify sched_ext_ops table for userland scheduler
  sched_ext: Enable the ops breather and eject BPF scheduler on softlockup
  sched_ext: Avoid live-locking bypass mode switching
  sched_ext: Fix incorrect use of bitwise AND
  sched_ext: Do not enable LLC/NUMA optimizations when domains overlap
  sched_ext: Introduce NUMA awareness to the default idle selection policy
  sched_ext: Replace set_arg_maybe_null() with __nullable CFI stub tags
  sched_ext: Rename CFI stubs to names that are recognized by BPF
  sched_ext: Introduce LLC awareness to the default idle selection policy
  sched_ext: Clarify ops.select_cpu() for single-CPU tasks
  sched_ext: improve WAKE_SYNC behavior for default idle CPU selection
  sched_ext: Use btf_ids to resolve task_struct
  sched/ext: Use tg_cgroup() to elieminate duplicate code
  sched/ext: Fix unmatch trailing comment of CONFIG_EXT_GROUP_SCHED
  ...
</content>
</entry>
<entry>
<title>sched_ext: Enable the ops breather and eject BPF scheduler on softlockup</title>
<updated>2024-11-08T20:42:22+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2024-11-05T21:49:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e32c260195e6ff72940ab7826e38e0a0066fc58f'/>
<id>urn:sha1:e32c260195e6ff72940ab7826e38e0a0066fc58f</id>
<content type='text'>
On 2 x Intel Sapphire Rapids machines with 224 logical CPUs, a poorly
behaving BPF scheduler can live-lock the system by making multiple CPUs bang
on the same DSQ to the point where soft-lockup detection triggers before
SCX's own watchdog can take action. It also seems possible that the machine
can be live-locked enough to prevent scx_ops_helper, which is an RT task,
from running in a timely manner.

Implement scx_softlockup() which is called when three quarters of
soft-lockup threshold has passed. The function immediately enables the ops
breather and triggers an ops error to initiate ejection of the BPF
scheduler.

The previous and this patch combined enable the kernel to reliably recover
the system from live-lock conditions that can be triggered by a poorly
behaving BPF scheduler on Intel dual socket systems.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Douglas Anderson &lt;dianders@chromium.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sched/ext: Remove sched_fork() hack</title>
<updated>2024-11-05T11:55:37+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2024-10-28T13:20:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0f0d1b8e5010bfe1feeb4d78d137e41946a5370d'/>
<id>urn:sha1:0f0d1b8e5010bfe1feeb4d78d137e41946a5370d</id>
<content type='text'>
Instead of solving the underlying problem of the double invocation of
__sched_fork() for idle tasks, sched-ext decided to hack around the issue
by partially clearing out the entity struct to preserve the already
enqueued node. A provided analysis and solution has been ignored for four
months.

Now that someone else has taken care of cleaning it up, remove the
disgusting hack and clear out the full structure. Remove the comment in the
structure declaration as well, as there is no requirement for @node being
the last element anymore.

Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Link: https://lore.kernel.org/r/87ldy82wkc.ffs@tglx
</content>
</entry>
<entry>
<title>sched_ext: Compact struct bpf_iter_scx_dsq_kern</title>
<updated>2024-09-09T23:42:47+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2024-09-09T23:42:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6462dd53a26088a433f90a5c15822196f201037c'/>
<id>urn:sha1:6462dd53a26088a433f90a5c15822196f201037c</id>
<content type='text'>
struct scx_iter_scx_dsq is defined as 6 u64's and scx_dsq_iter_kern was
using 5 of them. We want to add two more u64 fields but it's better if we do
so while staying within scx_iter_scx_dsq to maintain binary compatibility.

The way scx_iter_scx_dsq_kern is laid out is rather inefficient - the node
field takes up three u64's but only one bit of the last u64 is used. Turn
the bool into u32 flags and only use the lower 16 bits freeing up 48 bits -
16 bits for flags, 32 bits for a u32 - for use by struct
bpf_iter_scx_dsq_kern.

This allows moving the dsq_seq and flags fields of bpf_iter_scx_dsq_kern
into the cursor field reducing the struct size by a full u64.

No behavior changes intended.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
</feed>
