<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/Documentation/accounting, branch v6.1.168</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.168</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.168'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-10-17T13:22:06+00:00</updated>
<entry>
<title>delayacct: improve the average delay precision of getdelay tool to microsecond</title>
<updated>2024-10-17T13:22:06+00:00</updated>
<author>
<name>Wang Yong</name>
<email>wang.yong12@zte.com.cn</email>
</author>
<published>2023-02-13T06:08:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6fddf325fa41cd4b2cbab6174ef293e867d54a57'/>
<id>urn:sha1:6fddf325fa41cd4b2cbab6174ef293e867d54a57</id>
<content type='text'>
[ Upstream commit eca7de7cdc382eb6e0d344c07b1449ed75f5b435 ]

Improve the average delay precision of getdelay tool to microsecond.  When
using the getdelay tool, it is sometimes found that the average delay
except CPU is not 0, but display is 0, because the precison is too low.
For example, see delay average of SWAP below when using ZRAM.

print delayacct stats ON
PID	32915
CPU             count     real total  virtual total    delay total  delay average
               339202     2793871936     9233585504        7951112          0.000ms
IO              count    delay total  delay average
                   41      419296904             10ms
SWAP            count    delay total  delay average
               242589     1045792384              0ms

This wrong display is misleading, so improve the millisecond precision of
the average delay to microsecond just like CPU.  Then user would get more
accurate information of delay time.

Link: https://lkml.kernel.org/r/202302131408087983857@zte.com.cn
Signed-off-by: Wang Yong &lt;wang.yong12@zte.com.cn&gt;
Reviewed-by: Yang Yang &lt;yang.yang29@zte.com.cn&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Stable-dep-of: 3840cbe24cf0 ("sched: psi: fix bogus pressure spikes from aggregation race")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/psi: Allow unprivileged polling of N*2s period</title>
<updated>2023-07-27T06:50:38+00:00</updated>
<author>
<name>Domenico Cerasuolo</name>
<email>cerasuolodomenico@gmail.com</email>
</author>
<published>2023-03-30T10:54:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d5dca1977685c3ec7ee7490e8f6736e35ca2ee70'/>
<id>urn:sha1:d5dca1977685c3ec7ee7490e8f6736e35ca2ee70</id>
<content type='text'>
[ Upstream commit d82caa273565b45fcf103148950549af76c314b0 ]

PSI offers 2 mechanisms to get information about a specific resource
pressure. One is reading from /proc/pressure/&lt;resource&gt;, which gives
average pressures aggregated every 2s. The other is creating a pollable
fd for a specific resource and cgroup.

The trigger creation requires CAP_SYS_RESOURCE, and gives the
possibility to pick specific time window and threshold, spawing an RT
thread to aggregate the data.

Systemd would like to provide containers the option to monitor pressure
on their own cgroup and sub-cgroups. For example, if systemd launches a
container that itself then launches services, the container should have
the ability to poll() for pressure in individual services. But neither
the container nor the services are privileged.

This patch implements a mechanism to allow unprivileged users to create
pressure triggers. The difference with privileged triggers creation is
that unprivileged ones must have a time window that's a multiple of 2s.
This is so that we can avoid unrestricted spawning of rt threads, and
use instead the same aggregation mechanism done for the averages, which
runs independently of any triggers.

Suggested-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Domenico Cerasuolo &lt;cerasuolodomenico@gmail.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Link: https://lore.kernel.org/r/20230330105418.77061-5-cerasuolodomenico@gmail.com
Stable-dep-of: aff037078eca ("sched/psi: use kernfs polling functions for PSI trigger polling")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>filemap: make the accounting of thrashing more consistent</title>
<updated>2022-09-27T02:46:06+00:00</updated>
<author>
<name>Yang Yang</name>
<email>yang.yang29@zte.com.cn</email>
</author>
<published>2022-08-05T03:38:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f347c9d2697fcbbb64e077f7113a3887a181b8c0'/>
<id>urn:sha1:f347c9d2697fcbbb64e077f7113a3887a181b8c0</id>
<content type='text'>
Once upon a time, we only support accounting thrashing of page cache. 
Then Joonsoo introduced workingset detection for anonymous pages and we
gained the ability to account thrashing of them[1].

So let delayacct account both the thrashing of page cache and anonymous
pages, this could make the codes more consistent and simpler.

[1] commit aae466b0052e ("mm/swap: implement workingset detection for anonymous LRU")

Link: https://lkml.kernel.org/r/20220805033838.1714674-1-yang.yang29@zte.com.cn
Signed-off-by: Yang Yang &lt;yang.yang29@zte.com.cn&gt;
Signed-off-by: CGEL ZTE &lt;cgel.zte@gmail.com&gt;
Acked-by: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Yang Yang &lt;yang.yang29@zte.com.cn&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>delayacct: track delays from write-protect copy</title>
<updated>2022-06-01T22:55:25+00:00</updated>
<author>
<name>Yang Yang</name>
<email>yang.yang29@zte.com.cn</email>
</author>
<published>2022-06-01T22:55:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=662ce1dc9caf493c309200edbe38d186f1ea20d0'/>
<id>urn:sha1:662ce1dc9caf493c309200edbe38d186f1ea20d0</id>
<content type='text'>
Delay accounting does not track the delay of write-protect copy.  When
tasks trigger many write-protect copys(include COW and unsharing of
anonymous pages[1]), it may spend a amount of time waiting for them.  To
get the delay of tasks in write-protect copy, could help users to evaluate
the impact of using KSM or fork() or GUP.

Also update tools/accounting/getdelays.c:

    / # ./getdelays -dl -p 231
    print delayacct stats ON
    listen forever
    PID     231

    CPU             count     real total  virtual total    delay total  delay average
                     6247     1859000000     2154070021     1674255063          0.268ms
    IO              count    delay total  delay average
                        0              0              0ms
    SWAP            count    delay total  delay average
                        0              0              0ms
    RECLAIM         count    delay total  delay average
                        0              0              0ms
    THRASHING       count    delay total  delay average
                        0              0              0ms
    COMPACT         count    delay total  delay average
                        3          72758              0ms
    WPCOPY          count    delay total  delay average
                     3635      271567604              0ms

[1] commit 31cc5bc4af70("mm: support GUP-triggered unsharing of anonymous pages")

Link: https://lkml.kernel.org/r/20220409014342.2505532-1-yang.yang29@zte.com.cn
Signed-off-by: Yang Yang &lt;yang.yang29@zte.com.cn&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Jiang Xuexin &lt;jiang.xuexin@zte.com.cn&gt;
Reviewed-by: Ran Xiaokai &lt;ran.xiaokai@zte.com.cn&gt;
Reviewed-by: wangyong &lt;wang.yong12@zte.com.cn&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sched/psi: report zeroes for CPU full at the system level</title>
<updated>2022-04-22T10:14:08+00:00</updated>
<author>
<name>Chengming Zhou</name>
<email>zhouchengming@bytedance.com</email>
</author>
<published>2022-04-08T12:19:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=890d550d7dbac7a31ecaa78732aa22be282bb6b8'/>
<id>urn:sha1:890d550d7dbac7a31ecaa78732aa22be282bb6b8</id>
<content type='text'>
Martin find it confusing when look at the /proc/pressure/cpu output,
and found no hint about that CPU "full" line in psi Documentation.

% cat /proc/pressure/cpu
some avg10=0.92 avg60=0.91 avg300=0.73 total=933490489
full avg10=0.22 avg60=0.23 avg300=0.16 total=358783277

The PSI_CPU_FULL state is introduced by commit e7fcd7622823
("psi: Add PSI_CPU_FULL state"), which mainly for cgroup level,
but also counted at the system level as a side effect.

Naturally, the FULL state doesn't exist for the CPU resource at
the system level. These "full" numbers can come from CPU idle
schedule latency. For example, t1 is the time when task wakeup
on an idle CPU, t2 is the time when CPU pick and switch to it.
The delta of (t2 - t1) will be in CPU_FULL state.

Another case all processes can be stalled is when all cgroups
have been throttled at the same time, which unlikely to happen.

Anyway, CPU_FULL metric is meaningless and confusing at the
system level. So this patch will report zeroes for CPU full
at the system level, and update psi Documentation accordingly.

Fixes: e7fcd7622823 ("psi: Add PSI_CPU_FULL state")
Reported-by: Martin Steigerwald &lt;Martin.Steigerwald@proact.de&gt;
Suggested-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Chengming Zhou &lt;zhouchengming@bytedance.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Link: https://lore.kernel.org/r/20220408121914.82855-1-zhouchengming@bytedance.com
</content>
</entry>
<entry>
<title>Merge tag 'sched_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2022-01-23T15:35:27+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-01-23T15:35:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=10c64a0f280636652ec63bb1ddd34b6c8e2f5584'/>
<id>urn:sha1:10c64a0f280636652ec63bb1ddd34b6c8e2f5584</id>
<content type='text'>
Pull scheduler fixes from Borislav Petkov:
 "A bunch of fixes: forced idle time accounting, utilization values
  propagation in the sched hierarchies and other minor cleanups and
  improvements"

* tag 'sched_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kernel/sched: Remove dl_boosted flag comment
  sched: Avoid double preemption in __cond_resched_*lock*()
  sched/fair: Fix all kernel-doc warnings
  sched/core: Accounting forceidle time for all tasks except idle task
  sched/pelt: Relax the sync of load_sum with load_avg
  sched/pelt: Relax the sync of runnable_sum with runnable_avg
  sched/pelt: Continue to relax the sync of util_sum with util_avg
  sched/pelt: Relax the sync of util_sum with util_avg
  psi: Fix uaf issue when psi trigger is destroyed while being polled
</content>
</entry>
<entry>
<title>Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact</title>
<updated>2022-01-20T06:52:55+00:00</updated>
<author>
<name>wangyong</name>
<email>wang.yong12@zte.com.cn</email>
</author>
<published>2022-01-20T02:10:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=ec710aa8b2385e6a2239f79120fbf9b78400865b'/>
<id>urn:sha1:ec710aa8b2385e6a2239f79120fbf9b78400865b</id>
<content type='text'>
Add thrashing page cache and direct compact related descriptions and
update the usage of getdelays userspace utility.

The following patches modifications have been updated:
https://lore.kernel.org/all/20190312102002.31737-4-jinpuwang@gmail.com/
https://lore.kernel.org/all/1638619795-71451-1-git-send-email-
wang.yong12@zte.com.cn/

Link: https://lkml.kernel.org/r/1639583021-92977-1-git-send-email-wang.yong12@zte.com.cn
Signed-off-by: wangyong &lt;wang.yong12@zte.com.cn&gt;
Reviewed-by: Yang Yang &lt;yang.yang29@zte.com.cn&gt;
Reported-by: Zeal Robot &lt;zealci@zte.com.cn&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>psi: Fix uaf issue when psi trigger is destroyed while being polled</title>
<updated>2022-01-18T11:09:57+00:00</updated>
<author>
<name>Suren Baghdasaryan</name>
<email>surenb@google.com</email>
</author>
<published>2022-01-11T23:23:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a06247c6804f1a7c86a2e5398a4c1f1db1471848'/>
<id>urn:sha1:a06247c6804f1a7c86a2e5398a4c1f1db1471848</id>
<content type='text'>
With write operation on psi files replacing old trigger with a new one,
the lifetime of its waitqueue is totally arbitrary. Overwriting an
existing trigger causes its waitqueue to be freed and pending poll()
will stumble on trigger-&gt;event_wait which was destroyed.
Fix this by disallowing to redefine an existing psi trigger. If a write
operation is used on a file descriptor with an already existing psi
trigger, the operation will fail with EBUSY error.
Also bypass a check for psi_disabled in the psi_trigger_destroy as the
flag can be flipped after the trigger is created, leading to a memory
leak.

Fixes: 0e94682b73bf ("psi: introduce psi monitor")
Reported-by: syzbot+cdb5dd11c97cc532efad@syzkaller.appspotmail.com
Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Analyzed-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
Signed-off-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Eric Biggers &lt;ebiggers@google.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220111232309.1786347-1-surenb@google.com
</content>
</entry>
<entry>
<title>delayacct: Add sysctl to enable at runtime</title>
<updated>2021-05-12T09:43:25+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2021-05-10T12:01:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0cd7c741f01de13dc1eecf22557593b3514639bb'/>
<id>urn:sha1:0cd7c741f01de13dc1eecf22557593b3514639bb</id>
<content type='text'>
Just like sched_schedstats, allow runtime enabling (and disabling) of
delayacct. This is useful if one forgot to add the delayacct boot time
option.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/YJkhebGJAywaZowX@hirez.programming.kicks-ass.net
</content>
</entry>
<entry>
<title>delayacct: Default disabled</title>
<updated>2021-05-12T09:43:25+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2021-05-04T20:43:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e4042ad492357fa995921376462b04a025dd53b6'/>
<id>urn:sha1:e4042ad492357fa995921376462b04a025dd53b6</id>
<content type='text'>
Assuming this stuff isn't actually used much; disable it by default
and avoid allocating and tracking the task_delay_info structure.

taskstats is changed to still report the regular sched and sched_info
and only skip the missing task_delay_info fields instead of not
reporting anything.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lkml.kernel.org/r/20210505111525.308018373@infradead.org
</content>
</entry>
</feed>
