<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/linux/kernfs.h, branch v6.12.80</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-01-30T23:54:25+00:00</updated>
<entry>
<title>kernfs: RCU protect kernfs_nodes and avoid kernfs_idr_lock in kernfs_find_and_get_node_by_id()</title>
<updated>2024-01-30T23:54:25+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2024-01-09T21:48:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4207b556e62f0a8915afc5da4c5d5ad915a253a5'/>
<id>urn:sha1:4207b556e62f0a8915afc5da4c5d5ad915a253a5</id>
<content type='text'>
The BPF helper bpf_cgroup_from_id() calls kernfs_find_and_get_node_by_id()
which acquires kernfs_idr_lock, which is an non-raw non-IRQ-safe lock. This
can lead to deadlocks as bpf_cgroup_from_id() can be called from any BPF
programs including e.g. the ones that attach to functions which are holding
the scheduler rq lock.

Consider the following BPF program:

  SEC("fentry/__set_cpus_allowed_ptr_locked")
  int BPF_PROG(__set_cpus_allowed_ptr_locked, struct task_struct *p,
	       struct affinity_context *affn_ctx, struct rq *rq, struct rq_flags *rf)
  {
	  struct cgroup *cgrp = bpf_cgroup_from_id(p-&gt;cgroups-&gt;dfl_cgrp-&gt;kn-&gt;id);

	  if (cgrp) {
		  bpf_printk("%d[%s] in %s", p-&gt;pid, p-&gt;comm, cgrp-&gt;kn-&gt;name);
		  bpf_cgroup_release(cgrp);
	  }
	  return 0;
  }

__set_cpus_allowed_ptr_locked() is called with rq lock held and the above
BPF program calls bpf_cgroup_from_id() within leading to the following
lockdep warning:

  =====================================================
  WARNING: HARDIRQ-safe -&gt; HARDIRQ-unsafe lock order detected
  6.7.0-rc3-work-00053-g07124366a1d7-dirty #147 Not tainted
  -----------------------------------------------------
  repro/1620 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
  ffffffff833b3688 (kernfs_idr_lock){+.+.}-{2:2}, at: kernfs_find_and_get_node_by_id+0x1e/0x70

		and this task is already holding:
  ffff888237ced698 (&amp;rq-&gt;__lock){-.-.}-{2:2}, at: task_rq_lock+0x4e/0xf0
  which would create a new lock dependency:
   (&amp;rq-&gt;__lock){-.-.}-{2:2} -&gt; (kernfs_idr_lock){+.+.}-{2:2}
  ...
   Possible interrupt unsafe locking scenario:

	 CPU0                    CPU1
	 ----                    ----
    lock(kernfs_idr_lock);
				 local_irq_disable();
				 lock(&amp;rq-&gt;__lock);
				 lock(kernfs_idr_lock);
    &lt;Interrupt&gt;
      lock(&amp;rq-&gt;__lock);

		 *** DEADLOCK ***
  ...
  Call Trace:
   dump_stack_lvl+0x55/0x70
   dump_stack+0x10/0x20
   __lock_acquire+0x781/0x2a40
   lock_acquire+0xbf/0x1f0
   _raw_spin_lock+0x2f/0x40
   kernfs_find_and_get_node_by_id+0x1e/0x70
   cgroup_get_from_id+0x21/0x240
   bpf_cgroup_from_id+0xe/0x20
   bpf_prog_98652316e9337a5a___set_cpus_allowed_ptr_locked+0x96/0x11a
   bpf_trampoline_6442545632+0x4f/0x1000
   __set_cpus_allowed_ptr_locked+0x5/0x5a0
   sched_setaffinity+0x1b3/0x290
   __x64_sys_sched_setaffinity+0x4f/0x60
   do_syscall_64+0x40/0xe0
   entry_SYSCALL_64_after_hwframe+0x46/0x4e

Let's fix it by protecting kernfs_node and kernfs_root with RCU and making
kernfs_find_and_get_node_by_id() acquire rcu_read_lock() instead of
kernfs_idr_lock.

This adds an rcu_head to kernfs_node making it larger by 16 bytes on 64bit.
Combined with the preceding rearrange patch, the net increase is 8 bytes.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Andrea Righi &lt;andrea.righi@canonical.com&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Link: https://lore.kernel.org/r/20240109214828.252092-4-tj@kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>kernfs: Rearrange kernfs_node fields to reduce its size on 64bit</title>
<updated>2024-01-30T23:53:59+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2024-01-10T18:28:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1c9f2c7606afe149800986182638f636646dd824'/>
<id>urn:sha1:1c9f2c7606afe149800986182638f636646dd824</id>
<content type='text'>
Moving .flags and .mode right below .hash makes kernfs_node smaller by 8
bytes on 64bit. To avoid creating a hole from 8 bytes alignment on 32bit
archs, .priv is moved below so that there are two 32bit pointers after the
64bit .id field.

v2: Updated to avoid size increase on 32bit noticed by Geert.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Link: https://lore.kernel.org/r/ZZ7hwA18nfmFjYpj@slm.duckdns.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>kernfs: sysfs: support custom llseek method for sysfs entries</title>
<updated>2023-10-05T11:42:11+00:00</updated>
<author>
<name>Valentine Sinitsyn</name>
<email>valesini@yandex-team.ru</email>
</author>
<published>2023-09-25T08:40:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0fedefd4c4e33dd24f726b13b5d7c143e2b483be'/>
<id>urn:sha1:0fedefd4c4e33dd24f726b13b5d7c143e2b483be</id>
<content type='text'>
As of now, seeking in sysfs files is handled by generic_file_llseek().
There are situations where one may want to customize seeking logic:

- Many sysfs entries are fixed files while generic_file_llseek() accepts
  past-the-end positions. Not only being useless by itself, this
  also means a bug in userspace code will trigger not at lseek(), but at
  some later point making debugging harder.
- generic_file_llseek() relies on f_mapping-&gt;host to get the file size
  which might not be correct for all sysfs entries.
  See commit 636b21b50152 ("PCI: Revoke mappings like devmem") as an example.

Implement llseek method to override this behavior at sysfs attribute
level. The method is optional, and if it is absent,
generic_file_llseek() is called to preserve backwards compatibility.

Signed-off-by: Valentine Sinitsyn &lt;valesini@yandex-team.ru&gt;
Link: https://lore.kernel.org/r/20230925084013.309399-1-valesini@yandex-team.ru
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>kernfs: add stub helper for kernfs_generic_poll()</title>
<updated>2023-08-05T06:31:42+00:00</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2023-07-24T12:18:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=79038a99445f69c5d28494dd4f8c6f0509f65b2e'/>
<id>urn:sha1:79038a99445f69c5d28494dd4f8c6f0509f65b2e</id>
<content type='text'>
In some randconfig builds, kernfs ends up being disabled, so there is no prototype
for kernfs_generic_poll()

In file included from kernel/sched/build_utility.c:97:
kernel/sched/psi.c:1479:3: error: implicit declaration of function 'kernfs_generic_poll' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
                kernfs_generic_poll(t-&gt;of, wait);
                ^

Add a stub helper for it, as we have it for other kernfs functions.

Fixes: aff037078ecae ("sched/psi: use kernfs polling functions for PSI trigger polling")
Fixes: 147e1a97c4a0b ("fs: kernfs: add poll file operation")
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Reviewed-by: Chengming Zhou &lt;zhouchengming@bytedance.com&gt;
Link: https://lore.kernel.org/r/20230724121823.1357562-1-arnd@kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>kernfs: Implement kernfs_show()</title>
<updated>2022-09-01T16:08:44+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2022-08-28T05:04:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=783bd07d095b722108725af11113795ee046ed0e'/>
<id>urn:sha1:783bd07d095b722108725af11113795ee046ed0e</id>
<content type='text'>
Currently, kernfs nodes can be created hidden and activated later by calling
kernfs_activate() to allow creation of multiple nodes to succeed or fail as
a unit. This is an one-way one-time-only transition. This patch introduces
kernfs_show() which can toggle visibility dynamically.

As the currently proposed use - toggling the cgroup pressure files - only
requires operating on leaf nodes, for the sake of simplicity, restrict it as
such for now.

Hiding uses the same mechanism as deactivation and likewise guarantees that
there are no in-flight operations on completion. KERNFS_ACTIVATED and
KERNFS_HIDDEN are used to manage the interactions between activations and
show/hide operations. A node is visible iff both activated &amp; !hidden.

Cc: Chengming Zhou &lt;zhouchengming@bytedance.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Tested-by: Chengming Zhou &lt;zhouchengming@bytedance.com&gt;
Reviewed-by: Chengming Zhou &lt;zhouchengming@bytedance.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Link: https://lore.kernel.org/r/20220828050440.734579-9-tj@kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>kernfs: Add KERNFS_REMOVING flags</title>
<updated>2022-09-01T16:08:44+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2022-08-28T05:04:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c25491747b21536bd56dccb82a109754bbc8d52c'/>
<id>urn:sha1:c25491747b21536bd56dccb82a109754bbc8d52c</id>
<content type='text'>
KERNFS_ACTIVATED tracks whether a given node has ever been activated. As a
node was only deactivated on removal, this was used for

 1. Drain optimization (removed by the previous patch).
 2. To hide !activated nodes
 3. To avoid double activations
 4. Reject adding children to a node being removed
 5. Skip activaing a node which is being removed.

We want to decouple deactivation from removal so that nodes can be
deactivated and hidden dynamically, which makes KERNFS_ACTIVATED useless for
all of the above purposes.

#1 is already gone. #2 and #3 can instead test whether the node is currently
active. A new flag KERNFS_REMOVING is added to explicitly mark nodes which
are being removed for #4 and #5.

While this leaves KERNFS_ACTIVATED with no users, leave it be as it will be
used in a following patch.

Cc: Chengming Zhou &lt;zhouchengming@bytedance.com&gt;
Tested-by: Chengming Zhou &lt;zhouchengming@bytedance.com&gt;
Reviewed-by: Chengming Zhou &lt;zhouchengming@bytedance.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Link: https://lore.kernel.org/r/20220828050440.734579-7-tj@kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "kernfs: Change kernfs_notify_list to llist."</title>
<updated>2022-07-06T12:20:22+00:00</updated>
<author>
<name>Imran Khan</name>
<email>imran.f.khan@oracle.com</email>
</author>
<published>2022-07-05T20:10:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2fd26970cf66bd52dc42843c46968040caa8c9a1'/>
<id>urn:sha1:2fd26970cf66bd52dc42843c46968040caa8c9a1</id>
<content type='text'>
This reverts commit b8f35fa1188b84035c59d4842826c4e93a1b1c9f.

This is causing regression due to same kernfs_node getting
added multiple times in kernfs_notify_list so revert it until
safe way of using llist in this context is found.

Reported-by: Nathan Chancellor &lt;nathan@kernel.org&gt;
Reported-by: Michael Walle &lt;michael@walle.cc&gt;
Reported-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Signed-off-by: Imran Khan &lt;imran.f.khan@oracle.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Link: https://lore.kernel.org/r/20220705201026.2487665-1-imran.f.khan@oracle.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>kernfs: Replace global kernfs_open_file_mutex with hashed mutexes.</title>
<updated>2022-06-27T14:46:15+00:00</updated>
<author>
<name>Imran Khan</name>
<email>imran.f.khan@oracle.com</email>
</author>
<published>2022-06-15T02:10:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1d25b84e444ad66313c473407979ea9cd33deb3f'/>
<id>urn:sha1:1d25b84e444ad66313c473407979ea9cd33deb3f</id>
<content type='text'>
In current kernfs design a single mutex, kernfs_open_file_mutex, protects
the list of kernfs_open_file instances corresponding to a sysfs attribute.
So even if different tasks are opening or closing different sysfs files
they can contend on osq_lock of this mutex. The contention is more apparent
in large scale systems with few hundred CPUs where most of the CPUs have
running tasks that are opening, accessing or closing sysfs files at any
point of time.

Using hashed mutexes in place of a single global mutex, can significantly
reduce contention around global mutex and hence can provide better
scalability. Moreover as these hashed mutexes are not part of kernfs_node
objects we will not see any singnificant change in memory utilization of
kernfs based file systems like sysfs, cgroupfs etc.

Modify interface introduced in previous patch to make use of hashed
mutexes. Use kernfs_node address as hashing key.

Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Imran Khan &lt;imran.f.khan@oracle.com&gt;
Link: https://lore.kernel.org/r/20220615021059.862643-5-imran.f.khan@oracle.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>kernfs: Change kernfs_notify_list to llist.</title>
<updated>2022-06-27T14:46:15+00:00</updated>
<author>
<name>Imran Khan</name>
<email>imran.f.khan@oracle.com</email>
</author>
<published>2022-06-15T02:10:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b8f35fa1188b84035c59d4842826c4e93a1b1c9f'/>
<id>urn:sha1:b8f35fa1188b84035c59d4842826c4e93a1b1c9f</id>
<content type='text'>
At present kernfs_notify_list is implemented as a singly linked
list of kernfs_node(s), where last element points to itself and
value of -&gt;attr.next tells if node is present on the list or not.
Both addition and deletion to list happen under kernfs_notify_lock.

Change kernfs_notify_list to llist so that addition to list can heppen
locklessly.

Suggested by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;

Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Imran Khan &lt;imran.f.khan@oracle.com&gt;
Link: https://lore.kernel.org/r/20220615021059.862643-3-imran.f.khan@oracle.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>kernfs: make -&gt;attr.open RCU protected.</title>
<updated>2022-06-27T14:46:14+00:00</updated>
<author>
<name>Imran Khan</name>
<email>imran.f.khan@oracle.com</email>
</author>
<published>2022-06-15T02:10:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=086c00c71fc8d47db6983f419a45f9ee167de03f'/>
<id>urn:sha1:086c00c71fc8d47db6983f419a45f9ee167de03f</id>
<content type='text'>
After removal of kernfs_open_node-&gt;refcnt in the previous patch,
kernfs_open_node_lock can be removed as well by making -&gt;attr.open
RCU protected. kernfs_put_open_node can delegate freeing to -&gt;attr.open
to RCU and other readers of -&gt;attr.open can do so under rcu_read_(un)lock.

Suggested by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;

Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Imran Khan &lt;imran.f.khan@oracle.com&gt;
Link: https://lore.kernel.org/r/20220615021059.862643-2-imran.f.khan@oracle.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
