<feed xmlns='http://www.w3.org/2005/Atom'>
<title>starfive-tech/linux.git/kernel/cgroup, branch visionfive_v1_5.13</title>
<subtitle>StarFive Tech Linux Kernel for VisionFive (JH7110) boards (mirror)</subtitle>
<id>https://git.radix-linux.su/starfive-tech/linux.git/atom?h=visionfive_v1_5.13</id>
<link rel='self' href='https://git.radix-linux.su/starfive-tech/linux.git/atom?h=visionfive_v1_5.13'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/starfive-tech/linux.git/'/>
<updated>2021-08-18T07:06:49+00:00</updated>
<entry>
<title>cgroup: rstat: fix A-A deadlock on 32bit around u64_stats_sync</title>
<updated>2021-08-18T07:06:49+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2021-07-27T23:12:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/starfive-tech/linux.git/commit/?id=bf77f479cd4cfd7f0b3d80a13b05e56223becc97'/>
<id>urn:sha1:bf77f479cd4cfd7f0b3d80a13b05e56223becc97</id>
<content type='text'>
commit c3df5fb57fe8756d67fd56ed29da65cdfde839f9 upstream.

0fa294fb1985 ("cgroup: Replace cgroup_rstat_mutex with a spinlock") added
cgroup_rstat_flush_irqsafe() allowing flushing to happen from the irq
context. However, rstat paths use u64_stats_sync to synchronize access to
64bit stat counters on 32bit machines. u64_stats_sync is implemented using
seq_lock and trying to read from an irq context can lead to A-A deadlock if
the irq happens to interrupt the stat update.

Fix it by using the irqsafe variants - u64_stats_update_begin_irqsave() and
u64_stats_update_end_irqrestore() - in the update paths. Note that none of
this matters on 64bit machines. All these are just for 32bit SMP setups.

Note that the interface was introduced way back, its first and currently
only use was recently added by 2d146aa3aa84 ("mm: memcontrol: switch to
rstat"). Stable tagging targets this commit.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Rik van Riel &lt;riel@surriel.com&gt;
Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat")
Cc: stable@vger.kernel.org # v5.13+
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>cgroup1: fix leaked context root causing sporadic NULL deref in LTP</title>
<updated>2021-07-31T06:13:45+00:00</updated>
<author>
<name>Paul Gortmaker</name>
<email>paul.gortmaker@windriver.com</email>
</author>
<published>2021-06-16T12:51:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/starfive-tech/linux.git/commit/?id=141cf6c82b4fd37169dd52ae4dbf6c0c162c0a6b'/>
<id>urn:sha1:141cf6c82b4fd37169dd52ae4dbf6c0c162c0a6b</id>
<content type='text'>
commit 1e7107c5ef44431bc1ebbd4c353f1d7c22e5f2ec upstream.

Richard reported sporadic (roughly one in 10 or so) null dereferences and
other strange behaviour for a set of automated LTP tests.  Things like:

   BUG: kernel NULL pointer dereference, address: 0000000000000008
   #PF: supervisor read access in kernel mode
   #PF: error_code(0x0000) - not-present page
   PGD 0 P4D 0
   Oops: 0000 [#1] PREEMPT SMP PTI
   CPU: 0 PID: 1516 Comm: umount Not tainted 5.10.0-yocto-standard #1
   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014
   RIP: 0010:kernfs_sop_show_path+0x1b/0x60

...or these others:

   RIP: 0010:do_mkdirat+0x6a/0xf0
   RIP: 0010:d_alloc_parallel+0x98/0x510
   RIP: 0010:do_readlinkat+0x86/0x120

There were other less common instances of some kind of a general scribble
but the common theme was mount and cgroup and a dubious dentry triggering
the NULL dereference.  I was only able to reproduce it under qemu by
replicating Richard's setup as closely as possible - I never did get it
to happen on bare metal, even while keeping everything else the same.

In commit 71d883c37e8d ("cgroup_do_mount(): massage calling conventions")
we see this as a part of the overall change:

   --------------
           struct cgroup_subsys *ss;
   -       struct dentry *dentry;

   [...]

   -       dentry = cgroup_do_mount(&amp;cgroup_fs_type, fc-&gt;sb_flags, root,
   -                                CGROUP_SUPER_MAGIC, ns);

   [...]

   -       if (percpu_ref_is_dying(&amp;root-&gt;cgrp.self.refcnt)) {
   -               struct super_block *sb = dentry-&gt;d_sb;
   -               dput(dentry);
   +       ret = cgroup_do_mount(fc, CGROUP_SUPER_MAGIC, ns);
   +       if (!ret &amp;&amp; percpu_ref_is_dying(&amp;root-&gt;cgrp.self.refcnt)) {
   +               struct super_block *sb = fc-&gt;root-&gt;d_sb;
   +               dput(fc-&gt;root);
                   deactivate_locked_super(sb);
                   msleep(10);
                   return restart_syscall();
           }
   --------------

In changing from the local "*dentry" variable to using fc-&gt;root, we now
export/leave that dentry pointer in the file context after doing the dput()
in the unlikely "is_dying" case.   With LTP doing a crazy amount of back to
back mount/unmount [testcases/bin/cgroup_regression_5_1.sh] the unlikely
becomes slightly likely and then bad things happen.

A fix would be to not leave the stale reference in fc-&gt;root as follows:

   --------------
                  dput(fc-&gt;root);
  +               fc-&gt;root = NULL;
                  deactivate_locked_super(sb);
   --------------

...but then we are just open-coding a duplicate of fc_drop_locked() so we
simply use that instead.

Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Zefan Li &lt;lizefan.x@bytedance.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: stable@vger.kernel.org      # v5.1+
Reported-by: Richard Purdie &lt;richard.purdie@linuxfoundation.org&gt;
Fixes: 71d883c37e8d ("cgroup_do_mount(): massage calling conventions")
Signed-off-by: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>cgroup: verify that source is a string</title>
<updated>2021-07-20T14:00:09+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>christian.brauner@ubuntu.com</email>
</author>
<published>2021-07-14T13:47:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/starfive-tech/linux.git/commit/?id=a41573667b39152176f6b08d10b4deb171e541c4'/>
<id>urn:sha1:a41573667b39152176f6b08d10b4deb171e541c4</id>
<content type='text'>
commit 3b0462726e7ef281c35a7a4ae33e93ee2bc9975b upstream.

The following sequence can be used to trigger a UAF:

    int fscontext_fd = fsopen("cgroup");
    int fd_null = open("/dev/null, O_RDONLY);
    int fsconfig(fscontext_fd, FSCONFIG_SET_FD, "source", fd_null);
    close_range(3, ~0U, 0);

The cgroup v1 specific fs parser expects a string for the "source"
parameter.  However, it is perfectly legitimate to e.g.  specify a file
descriptor for the "source" parameter.  The fs parser doesn't know what
a filesystem allows there.  So it's a bug to assume that "source" is
always of type fs_value_is_string when it can reasonably also be
fs_value_is_file.

This assumption in the cgroup code causes a UAF because struct
fs_parameter uses a union for the actual value.  Access to that union is
guarded by the param-&gt;type member.  Since the cgroup paramter parser
didn't check param-&gt;type but unconditionally moved param-&gt;string into
fc-&gt;source a close on the fscontext_fd would trigger a UAF during
put_fs_context() which frees fc-&gt;source thereby freeing the file stashed
in param-&gt;file causing a UAF during a close of the fd_null.

Fix this by verifying that param-&gt;type is actually a string and report
an error if not.

In follow up patches I'll add a new generic helper that can be used here
and by other filesystems instead of this error-prone copy-pasta fix.
But fixing it in here first makes backporting a it to stable a lot
easier.

Fixes: 8d2451f4994f ("cgroup1: switch to option-by-option parsing")
Reported-by: syzbot+283ce5a46486d6acdbaf@syzkaller.appspotmail.com
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: &lt;stable@kernel.org&gt;
Cc: syzkaller-bugs &lt;syzkaller-bugs@googlegroups.com&gt;
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>cgroup1: don't allow '\n' in renaming</title>
<updated>2021-06-10T13:58:50+00:00</updated>
<author>
<name>Alexander Kuznetsov</name>
<email>wwfq@yandex-team.ru</email>
</author>
<published>2021-06-09T07:17:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/starfive-tech/linux.git/commit/?id=b7e24eb1caa5f8da20d405d262dba67943aedc42'/>
<id>urn:sha1:b7e24eb1caa5f8da20d405d262dba67943aedc42</id>
<content type='text'>
cgroup_mkdir() have restriction on newline usage in names:
$ mkdir $'/sys/fs/cgroup/cpu/test\ntest2'
mkdir: cannot create directory
'/sys/fs/cgroup/cpu/test\ntest2': Invalid argument

But in cgroup1_rename() such check is missed.
This allows us to make /proc/&lt;pid&gt;/cgroup unparsable:
$ mkdir /sys/fs/cgroup/cpu/test
$ mv /sys/fs/cgroup/cpu/test $'/sys/fs/cgroup/cpu/test\ntest2'
$ echo $$ &gt; $'/sys/fs/cgroup/cpu/test\ntest2'
$ cat /proc/self/cgroup
11:pids:/
10:freezer:/
9:hugetlb:/
8:cpuset:/
7:blkio:/user.slice
6:memory:/user.slice
5:net_cls,net_prio:/
4:perf_event:/
3:devices:/user.slice
2:cpu,cpuacct:/test
test2
1:name=systemd:/
0::/

Signed-off-by: Alexander Kuznetsov &lt;wwfq@yandex-team.ru&gt;
Reported-by: Andrey Krasichkov &lt;buglloc@yandex-team.ru&gt;
Acked-by: Dmitry Yakunin &lt;zeil@yandex-team.ru&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: fix spelling mistakes</title>
<updated>2021-05-24T16:45:26+00:00</updated>
<author>
<name>Zhen Lei</name>
<email>thunder.leizhen@huawei.com</email>
</author>
<published>2021-05-24T08:29:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/starfive-tech/linux.git/commit/?id=08b2b6fdf6b26032f025084ce2893924a0cdb4a2'/>
<id>urn:sha1:08b2b6fdf6b26032f025084ce2893924a0cdb4a2</id>
<content type='text'>
Fix some spelling mistakes in comments:
hierarhcy ==&gt; hierarchy
automtically ==&gt; automatically
overriden ==&gt; overridden
In absense of .. or ==&gt; In absence of .. and
assocaited ==&gt; associated
taget ==&gt; target
initate ==&gt; initiate
succeded ==&gt; succeeded
curremt ==&gt; current
udpated ==&gt; updated

Signed-off-by: Zhen Lei &lt;thunder.leizhen@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup: disable controllers at parse time</title>
<updated>2021-05-20T16:27:53+00:00</updated>
<author>
<name>Shakeel Butt</name>
<email>shakeelb@google.com</email>
</author>
<published>2021-05-12T20:19:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/starfive-tech/linux.git/commit/?id=45e1ba40837ac2f6f4d4716bddb8d44bd7e4a251'/>
<id>urn:sha1:45e1ba40837ac2f6f4d4716bddb8d44bd7e4a251</id>
<content type='text'>
This patch effectively reverts the commit a3e72739b7a7 ("cgroup: fix
too early usage of static_branch_disable()"). The commit 6041186a3258
("init: initialize jump labels before command line option parsing") has
moved the jump_label_init() before parse_args() which has made the
commit a3e72739b7a7 unnecessary. On the other hand there are
consequences of disabling the controllers later as there are subsystems
doing the controller checks for different decisions. One such incident
is reported [1] regarding the memory controller and its impact on memory
reclaim code.

[1] https://lore.kernel.org/linux-mm/921e53f3-4b13-aab8-4a9e-e83ff15371e4@nec.com

Signed-off-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Reported-by: NOMURA JUNICHI(野村　淳一) &lt;junichi.nomura@nec.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Tested-by: Jun'ichi Nomura &lt;junichi.nomura@nec.com&gt;
</content>
</entry>
<entry>
<title>cgroup: rstat: punt root-level optimization to individual controllers</title>
<updated>2021-04-30T18:20:37+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2021-04-30T05:56:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/starfive-tech/linux.git/commit/?id=dc26532aed0ab25c0801a34640d1f3b9b9098a48'/>
<id>urn:sha1:dc26532aed0ab25c0801a34640d1f3b9b9098a48</id>
<content type='text'>
Current users of the rstat code can source root-level statistics from
the native counters of their respective subsystem, allowing them to
forego aggregation at the root level.  This optimization is currently
implemented inside the generic rstat code, which doesn't track the root
cgroup and doesn't invoke the subsystem flush callbacks on it.

However, the memory controller cannot do this optimization, because
cgroup1 breaks out memory specifically for the local level, including at
the root level.  In preparation for the memory controller switching to
rstat, move the optimization from rstat core to the controllers.

Afterwards, rstat will always track the root cgroup for changes and
invoke the subsystem callbacks on it; and it's up to the subsystem to
special-case and skip aggregation of the root cgroup if it can source
this information through other, cheaper means.

This is the case for the io controller and the cgroup base stats.  In
their respective flush callbacks, check whether the parent is the root
cgroup, and if so, skip the unnecessary upward propagation.

The extra cost of tracking the root cgroup is negligible: on stat
changes, we actually remove a branch that checks for the root.  The
queueing for a flush touches only per-cpu data, and only the first stat
change since a flush requires a (per-cpu) lock.

Link: https://lkml.kernel.org/r/20210209163304.77088-6-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Michal Koutný &lt;mkoutny@suse.com&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cgroup: rstat: support cgroup1</title>
<updated>2021-04-30T18:20:37+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2021-04-30T05:56:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/starfive-tech/linux.git/commit/?id=a7df69b81aac5bdeb5c5aef9addd680ce22feebf'/>
<id>urn:sha1:a7df69b81aac5bdeb5c5aef9addd680ce22feebf</id>
<content type='text'>
Rstat currently only supports the default hierarchy in cgroup2.  In
order to replace memcg's private stats infrastructure - used in both
cgroup1 and cgroup2 - with rstat, the latter needs to support cgroup1.

The initialization and destruction callbacks for regular cgroups are
already in place.  Remove the cgroup_on_dfl() guards to handle cgroup1.

The initialization of the root cgroup is currently hardcoded to only
handle cgrp_dfl_root.cgrp.  Move those callbacks to cgroup_setup_root()
and cgroup_destroy_root() to handle the default root as well as the
various cgroup1 roots we may set up during mounting.

The linking of css to cgroups happens in code shared between cgroup1 and
cgroup2 as well.  Simply remove the cgroup_on_dfl() guard.

Linkage of the root css to the root cgroup is a bit trickier: per
default, the root css of a subsystem controller belongs to the default
hierarchy (i.e.  the cgroup2 root).  When a controller is mounted in its
cgroup1 version, the root css is stolen and moved to the cgroup1 root;
on unmount, the css moves back to the default hierarchy.  Annotate
rebind_subsystems() to move the root css linkage along between roots.

Link: https://lkml.kernel.org/r/20210209163304.77088-5-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Roman Gushchin &lt;guro@fb.com&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>cgroup: use tsk-&gt;in_iowait instead of delayacct_is_task_waiting_on_io()</title>
<updated>2021-04-16T20:49:37+00:00</updated>
<author>
<name>Chunguang Xu</name>
<email>brookxu@tencent.com</email>
</author>
<published>2021-04-13T01:39:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/starfive-tech/linux.git/commit/?id=ffeee417d97f9171bce9f43c22c9f477e4c84f54'/>
<id>urn:sha1:ffeee417d97f9171bce9f43c22c9f477e4c84f54</id>
<content type='text'>
If delayacct is disabled, then delayacct_is_task_waiting_on_io()
always returns false, which causes the statistical value to be
wrong. Perhaps tsk-&gt;in_iowait is better.

Signed-off-by: Chunguang Xu &lt;brookxu@tencent.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>cgroup/cpuset: fix typos in comments</title>
<updated>2021-04-12T21:20:53+00:00</updated>
<author>
<name>Lu Jialin</name>
<email>lujialin4@huawei.com</email>
</author>
<published>2021-04-08T08:03:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/starfive-tech/linux.git/commit/?id=d95af61df072a7d70b311a11c0c24cf7d8ccebd9'/>
<id>urn:sha1:d95af61df072a7d70b311a11c0c24cf7d8ccebd9</id>
<content type='text'>
Change hierachy to hierarchy and unrechable to unreachable,
no functionality changed.

Signed-off-by: Lu Jialin &lt;lujialin4@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
</feed>
