<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/tools/testing/selftests/cgroup, branch v7.0-rc7</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v7.0-rc7</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v7.0-rc7'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-03-24T20:21:57+00:00</updated>
<entry>
<title>selftests/cgroup: Don't require synchronous populated update on task exit</title>
<updated>2026-03-24T20:21:57+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2026-03-24T20:21:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6680c162b4850976ee52b57372eddc4450c1d074'/>
<id>urn:sha1:6680c162b4850976ee52b57372eddc4450c1d074</id>
<content type='text'>
test_cgcore_populated (test_core) and test_cgkill_{simple,tree,forkbomb}
(test_kill) check cgroup.events "populated 0" immediately after reaping
child tasks with waitpid(). This used to work because cgroup_task_exit() in
do_exit() unlinked tasks from css_sets before exit_notify() woke up
waitpid().

d245698d727a ("cgroup: Defer task cgroup unlink until after the task is done
switching out") moved the unlink to cgroup_task_dead() in
finish_task_switch(), which runs after exit_notify(). The populated counter
is now decremented after the parent's waitpid() can return, so there is no
longer a synchronous ordering guarantee. On PREEMPT_RT, where
cgroup_task_dead() is further deferred through lazy irq_work, the race
window is even larger.

The synchronous populated transition was never part of the cgroup interface
contract - it was an implementation artifact. Use cg_read_strcmp_wait() which
retries for up to 1 second, matching what these tests actually need to
verify: that the cgroup eventually becomes unpopulated after all tasks exit.

Fixes: d245698d727a ("cgroup: Defer task cgroup unlink until after the task is done switching out")
Reported-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Tested-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Cc: Christian Brauner &lt;brauner@kernel.org&gt;
Cc: cgroups@vger.kernel.org
</content>
</entry>
<entry>
<title>cgroup/cpuset: Defer housekeeping_update() calls from CPU hotplug to workqueue</title>
<updated>2026-02-23T20:42:11+00:00</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2026-02-21T18:54:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6df415aa46ec10d607da5063d88492a7c7762074'/>
<id>urn:sha1:6df415aa46ec10d607da5063d88492a7c7762074</id>
<content type='text'>
The cpuset_handle_hotplug() may need to invoke housekeeping_update(),
for instance, when an isolated partition is invalidated because its
last active CPU has been put offline.

As we are going to enable dynamic update to the nozh_full housekeeping
cpumask (HK_TYPE_KERNEL_NOISE) soon with the help of CPU hotplug,
allowing the CPU hotplug path to call into housekeeping_update() directly
from update_isolation_cpumasks() will likely cause deadlock. So we
have to defer any call to housekeeping_update() after the CPU hotplug
operation has finished. This is now done via the workqueue where
the update_hk_sched_domains() function will be invoked via the
hk_sd_workfn().

An concurrent cpuset control file write may have executed the required
update_hk_sched_domains() function before the work function is called. So
the work function call may become a no-op when it is invoked.

Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>kselftest/cgroup: Simplify test_cpuset_prs.sh by removing "S+" command</title>
<updated>2026-02-23T20:42:05+00:00</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2026-02-21T18:54:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5e6aac573cc4cc8e62aebf880b824e219a5ff557'/>
<id>urn:sha1:5e6aac573cc4cc8e62aebf880b824e219a5ff557</id>
<content type='text'>
The "S+" command is used in the test matrix to enable the cpuset
controller. However this can be done automatically and we never use the
"S-" command to disable cpuset controller. Simplify the test matrix and
reduce clutter by removing the command and doing that automatically.
There is no functional change to the test cases.

Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup/cpuset: Move the v1 empty cpus/mems check to cpuset1_validate_change()</title>
<updated>2026-01-12T19:03:22+00:00</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2026-01-12T16:00:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=272bd8183376a9e20fe08bacbaa44003d7c8acaa'/>
<id>urn:sha1:272bd8183376a9e20fe08bacbaa44003d7c8acaa</id>
<content type='text'>
As stated in commit 1c09b195d37f ("cpuset: fix a regression in validating
config change"), it is not allowed to clear masks of a cpuset if
there're tasks in it. This is specific to v1 since empty "cpuset.cpus"
or "cpuset.mems" will cause the v2 cpuset to inherit the effective CPUs
or memory nodes from its parent. So it is OK to have empty cpus or mems
even if there are tasks in the cpuset.

Move this empty cpus/mems check in validate_change() to
cpuset1_validate_change() to allow more flexibility in setting
cpus or mems in v2. cpuset_is_populated() needs to be moved into
cpuset-internal.h as it is needed by the empty cpus/mems checking code.

Also add a test case to test_cpuset_prs.sh to verify that.

Reported-by: Chen Ridong &lt;chenridong@huaweicloud.com&gt;
Closes: https://lore.kernel.org/lkml/7a3ec392-2e86-4693-aa9f-1e668a668b9c@huaweicloud.com/
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Reviewed-by: Chen Ridong &lt;chenridong@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict</title>
<updated>2026-01-12T19:02:46+00:00</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2026-01-12T16:00:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2a3602030d800b6600ef55c31e21bc54611f7770'/>
<id>urn:sha1:2a3602030d800b6600ef55c31e21bc54611f7770</id>
<content type='text'>
Currently, when setting a cpuset's cpuset.cpus to a value that conflicts
with the cpuset.cpus/cpuset.cpus.exclusive of a sibling partition,
the sibling's partition state becomes invalid. This is overly harsh and
is probably not necessary.

The cpuset.cpus.exclusive control file, if set, will override the
cpuset.cpus of the same cpuset when creating a cpuset partition.
So cpuset.cpus has less priority than cpuset.cpus.exclusive in setting up
a partition.  However, it cannot override a conflicting cpuset.cpus file
in a sibling cpuset and the partition creation process will fail. This
is inconsistent.  That will also make using cpuset.cpus.exclusive less
valuable as a tool to set up cpuset partitions as the users have to
check if such a cpuset.cpus conflict exists or not.

Fix these problems by making sure that once a cpuset.cpus.exclusive
is set without failure, it will always be allowed to form a valid
partition as long as at least one CPU can be granted from its parent
irrespective of the state of the siblings' cpuset.cpus values. Of
course, setting cpuset.cpus.exclusive will fail if it conflicts with
the cpuset.cpus.exclusive or the cpuset.cpus.exclusive.effective value
of a sibling.

Partition can still be created by setting only cpuset.cpus without
setting cpuset.cpus.exclusive. However, any conflicting CPUs in sibling's
cpuset.cpus.exclusive.effective and cpuset.cpus.exclusive values will
be removed from its cpuset.cpus.exclusive.effective as long as there
is still one or more CPUs left and can be granted from its parent. This
CPU stripping is currently done in rm_siblings_excl_cpus().

The new code will now try its best to enable the creation of new
partitions with only cpuset.cpus set without invalidating existing ones.
However it is not guaranteed that all the CPUs requested in cpuset.cpus
will be used in the new partition even when all these CPUs can be
granted from the parent.

This is similar to the fact that cpuset.cpus.effective may not be
able to include all the CPUs requested in cpuset.cpus. In this case,
the parent may not able to grant all the exclusive CPUs requested in
cpuset.cpus to cpuset.cpus.exclusive.effective if some of them have
already been granted to other partitions earlier.

With the creation of multiple sibling partitions by setting
only cpuset.cpus, this does have the side effect that their exact
cpuset.cpus.exclusive.effective settings will depend on the order of
partition creation if there are conflicts. Due to the exclusive nature
of the CPUs in a partition, it is not easy to make it fair other than
the old behavior of invalidating all the conflicting partitions.

For example,
  # echo "0-2" &gt; A1/cpuset.cpus
  # echo "root" &gt; A1/cpuset.cpus.partition
  # cat A1/cpuset.cpus.partition
  root
  # cat A1/cpuset.cpus.exclusive.effective
  0-2
  # echo "2-4" &gt; B1/cpuset.cpus
  # echo "root" &gt; B1/cpuset.cpus.partition
  # cat B1/cpuset.cpus.partition
  root
  # cat B1/cpuset.cpus.exclusive.effective
  3-4
  # cat B1/cpuset.cpus.effective
  3-4

For users who want to be sure that they can get most of the CPUs they
want, cpuset.cpus.exclusive should be used instead if they can set
it successfully without failure. Setting cpuset.cpus.exclusive will
guarantee that sibling conflicts from then onward is no longer possible.

To make this change, we have to separate out the is_cpu_exclusive()
check in cpus_excl_conflict() into a cgroup v1 only
cpuset1_cpus_excl_conflict() helper. The cpus_allowed_validate_change()
helper is now no longer needed and can be removed.

Some existing tests in test_cpuset_prs.sh are updated and new ones are
added to reflect the new behavior. The cgroup-v2.rst doc file is also
updated the clarify what exclusive CPUs will be used when a partition
is created.

Reported-by: Sun Shaojie &lt;sunshaojie@kylinos.cn&gt;
Closes: https://lore.kernel.org/lkml/20251117015708.977585-1-sunshaojie@kylinos.cn/
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Reviewed-by: Chen Ridong &lt;chenridong@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>selftests: cgroup: Replace sleep with cg_read_key_long_poll() for waiting on nr_dying_descendants</title>
<updated>2025-12-08T18:44:54+00:00</updated>
<author>
<name>Guopeng Zhang</name>
<email>zhangguopeng@kylinos.cn</email>
</author>
<published>2025-12-03T11:56:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=50133c09d189a26f4cc6e78e382864fd599a1dc4'/>
<id>urn:sha1:50133c09d189a26f4cc6e78e382864fd599a1dc4</id>
<content type='text'>
Replace the manual sleep-and-retry logic in test_kmem_dead_cgroups()
with the new helper `cg_read_key_long_poll()`. This change improves
the robustness of the test by polling the "nr_dying_descendants"
counter in `cgroup.stat` until it reaches 0 or the timeout is exceeded.

Additionally, increase the retry timeout to 8 seconds (from 5 seconds)
based on testing results:
  - With 5-second timeout: 4/20 runs passed.
  - With 8-second timeout: 20/20 runs passed.

The 8 second timeout is based on stress testing of test_kmem_dead_cgroups()
under load: 5 seconds was occasionally not enough for reclaim of dying
descendants to complete, whereas 8 seconds consistently covered the observed
latencies. This value is intended as a generous upper bound for the
asynchronous reclaim and is not tied to any specific kernel constant, so it
can be adjusted in the future if reclaim behavior changes.

Signed-off-by: Guopeng Zhang &lt;zhangguopeng@kylinos.cn&gt;
Reviewed-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Acked-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>selftests: cgroup: make test_memcg_sock robust against delayed sock stats</title>
<updated>2025-12-08T18:44:54+00:00</updated>
<author>
<name>Guopeng Zhang</name>
<email>zhangguopeng@kylinos.cn</email>
</author>
<published>2025-12-03T11:56:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6360d444ae32871c6a048ac880ef3b871a439bad'/>
<id>urn:sha1:6360d444ae32871c6a048ac880ef3b871a439bad</id>
<content type='text'>
test_memcg_sock() currently requires that memory.stat's "sock " counter
is exactly zero immediately after the TCP server exits. On a busy system
this assumption is too strict:

  - Socket memory may be freed with a small delay (e.g. RCU callbacks).
  - memcg statistics are updated asynchronously via the rstat flushing
    worker, so the "sock " value in memory.stat can stay non-zero for a
    short period of time even after all socket memory has been uncharged.

As a result, test_memcg_sock() can intermittently fail even though socket
memory accounting is working correctly.

Make the test more robust by polling memory.stat for the "sock "
counter and allowing it some time to drop to zero instead of checking
it only once. The timeout is set to 3 seconds to cover the periodic
rstat flush interval (FLUSH_TIME = 2*HZ by default) plus some
scheduling slack. If the counter does not become zero within the
timeout, the test still fails as before.

On my test system, running test_memcontrol 50 times produced:

  - Before this patch:  6/50 runs passed.
  - After this patch:  50/50 runs passed.

Signed-off-by: Guopeng Zhang &lt;zhangguopeng@kylinos.cn&gt;
Suggested-by: Lance Yang &lt;lance.yang@linux.dev&gt;
Reviewed-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Acked-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>selftests: cgroup: Add cg_read_key_long_poll() to poll a cgroup key with retries</title>
<updated>2025-12-08T18:44:54+00:00</updated>
<author>
<name>Guopeng Zhang</name>
<email>zhangguopeng@kylinos.cn</email>
</author>
<published>2025-12-03T11:56:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=311ead1be05d2348e89f873337c8375e856e1abb'/>
<id>urn:sha1:311ead1be05d2348e89f873337c8375e856e1abb</id>
<content type='text'>
Introduce a new helper function `cg_read_key_long_poll()` in
cgroup_util.h. This function polls the specified key in a cgroup file
until it matches the expected value or the retry limit is reached,
with configurable wait intervals between retries.

This helper is particularly useful for handling asynchronously updated
cgroup statistics (e.g., memory.stat), where immediate reads may
observe stale values, especially on busy systems. It allows tests and
other utilities to handle such cases more flexibly.

Signed-off-by: Guopeng Zhang &lt;zhangguopeng@kylinos.cn&gt;
Suggested-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Reviewed-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Acked-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm</title>
<updated>2025-12-06T22:01:20+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-12-06T22:01:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=509d3f45847627f4c5cdce004c3ec79262b5239c'/>
<id>urn:sha1:509d3f45847627f4c5cdce004c3ec79262b5239c</id>
<content type='text'>
Pull non-MM updates from Andrew Morton:

 - "panic: sys_info: Refactor and fix a potential issue" (Andy Shevchenko)
   fixes a build issue and does some cleanup in ib/sys_info.c

 - "Implement mul_u64_u64_div_u64_roundup()" (David Laight)
   enhances the 64-bit math code on behalf of a PWM driver and beefs up
   the test module for these library functions

 - "scripts/gdb/symbols: make BPF debug info available to GDB" (Ilya Leoshkevich)
   makes BPF symbol names, sizes, and line numbers available to the GDB
   debugger

 - "Enable hung_task and lockup cases to dump system info on demand" (Feng Tang)
   adds a sysctl which can be used to cause additional info dumping when
   the hung-task and lockup detectors fire

 - "lib/base64: add generic encoder/decoder, migrate users" (Kuan-Wei Chiu)
   adds a general base64 encoder/decoder to lib/ and migrates several
   users away from their private implementations

 - "rbree: inline rb_first() and rb_last()" (Eric Dumazet)
   makes TCP a little faster

 - "liveupdate: Rework KHO for in-kernel users" (Pasha Tatashin)
   reworks the KEXEC Handover interfaces in preparation for Live Update
   Orchestrator (LUO), and possibly for other future clients

 - "kho: simplify state machine and enable dynamic updates" (Pasha Tatashin)
   increases the flexibility of KEXEC Handover. Also preparation for LUO

 - "Live Update Orchestrator" (Pasha Tatashin)
   is a major new feature targeted at cloud environments. Quoting the
   cover letter:

      This series introduces the Live Update Orchestrator, a kernel
      subsystem designed to facilitate live kernel updates using a
      kexec-based reboot. This capability is critical for cloud
      environments, allowing hypervisors to be updated with minimal
      downtime for running virtual machines. LUO achieves this by
      preserving the state of selected resources, such as memory,
      devices and their dependencies, across the kernel transition.

      As a key feature, this series includes support for preserving
      memfd file descriptors, which allows critical in-memory data, such
      as guest RAM or any other large memory region, to be maintained in
      RAM across the kexec reboot.

   Mike Rappaport merits a mention here, for his extensive review and
   testing work.

 - "kexec: reorganize kexec and kdump sysfs" (Sourabh Jain)
   moves the kexec and kdump sysfs entries from /sys/kernel/ to
   /sys/kernel/kexec/ and adds back-compatibility symlinks which can
   hopefully be removed one day

 - "kho: fixes for vmalloc restoration" (Mike Rapoport)
   fixes a BUG which was being hit during KHO restoration of vmalloc()
   regions

* tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (139 commits)
  calibrate: update header inclusion
  Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()"
  vmcoreinfo: track and log recoverable hardware errors
  kho: fix restoring of contiguous ranges of order-0 pages
  kho: kho_restore_vmalloc: fix initialization of pages array
  MAINTAINERS: TPM DEVICE DRIVER: update the W-tag
  init: replace simple_strtoul with kstrtoul to improve lpj_setup
  KHO: fix boot failure due to kmemleak access to non-PRESENT pages
  Documentation/ABI: new kexec and kdump sysfs interface
  Documentation/ABI: mark old kexec sysfs deprecated
  kexec: move sysfs entries to /sys/kernel/kexec
  test_kho: always print restore status
  kho: free chunks using free_page() instead of kfree()
  selftests/liveupdate: add kexec test for multiple and empty sessions
  selftests/liveupdate: add simple kexec-based selftest for LUO
  selftests/liveupdate: add userspace API selftests
  docs: add documentation for memfd preservation via LUO
  mm: memfd_luo: allow preserving memfd
  liveupdate: luo_file: add private argument to store runtime state
  mm: shmem: export some functions to internal.h
  ...
</content>
</entry>
<entry>
<title>Merge tag 'cgroup-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup</title>
<updated>2025-12-03T21:04:07+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-12-03T21:04:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8449d3252c2603a51ffc7c36cb5bd94874378b7d'/>
<id>urn:sha1:8449d3252c2603a51ffc7c36cb5bd94874378b7d</id>
<content type='text'>
Pull cgroup updates from Tejun Heo:

 - Defer task cgroup unlink until after the dying task's final context
   switch so that controllers see the cgroup properly populated until
   the task is truly gone

 - cpuset cleanups and simplifications.

   Enforce that domain isolated CPUs stay in root or isolated partitions
   and fail if isolated+nohz_full would leave no housekeeping CPU. Fix
   sched/deadline root domain handling during CPU hot-unplug and race
   for tasks in attaching cpusets

 - Misc fixes including memory reclaim protection documentation and
   selftest KTAP conformance

* tag 'cgroup-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
  cpuset: Treat cpusets in attaching as populated
  sched/deadline: Walk up cpuset hierarchy to decide root domain when hot-unplug
  cgroup/cpuset: Introduce cpuset_cpus_allowed_locked()
  docs: cgroup: No special handling of unpopulated memcgs
  docs: cgroup: Note about sibling relative reclaim protection
  docs: cgroup: Explain reclaim protection target
  selftests/cgroup: conform test to KTAP format output
  cpuset: remove need_rebuild_sched_domains
  cpuset: remove global remote_children list
  cpuset: simplify node setting on error
  cgroup: include missing header for struct irq_work
  cgroup: Fix sleeping from invalid context warning on PREEMPT_RT
  cgroup/cpuset: Globally track isolated_cpus update
  cgroup/cpuset: Ensure domain isolated CPUs stay in root or isolated partition
  cgroup/cpuset: Move up prstate_housekeeping_conflict() helper
  cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping
  cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks()
  cgroup: Defer task cgroup unlink until after the task is done switching out
  cgroup: Move dying_tasks cleanup from cgroup_task_release() to cgroup_task_free()
  cgroup: Rename cgroup lifecycle hooks to cgroup_task_*()
  ...
</content>
</entry>
</feed>
