<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/proc/internal.h, branch linux-6.0.y</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=linux-6.0.y</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=linux-6.0.y'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2022-01-22T06:33:37+00:00</updated>
<entry>
<title>fs: proc: store PDE()-&gt;data into inode-&gt;i_private</title>
<updated>2022-01-22T06:33:37+00:00</updated>
<author>
<name>Muchun Song</name>
<email>songmuchun@bytedance.com</email>
</author>
<published>2022-01-22T06:14:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6dfbbae14a7b961f41d80a106e1ab60e86d061c5'/>
<id>urn:sha1:6dfbbae14a7b961f41d80a106e1ab60e86d061c5</id>
<content type='text'>
PDE_DATA(inode) is introduced to get user private data and hide the
layout of struct proc_dir_entry.  The inode-&gt;i_private is used to do the
same thing as well.  Save a copy of user private data to inode-&gt;
i_private when proc inode is allocated.  This means the user also can
get their private data by inode-&gt;i_private.

Introduce pde_data() to wrap inode-&gt;i_private so that we can remove
PDE_DATA() from fs/proc/generic.c and make PTE_DATE() as a wrapper of
pde_data().  It will be easier if we decide to remove PDE_DATE() in the
future.

Link: https://lkml.kernel.org/r/20211124081956.87711-1-songmuchun@bytedance.com
Signed-off-by: Muchun Song &lt;songmuchun@bytedance.com&gt;
Acked-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Alexey Gladkov &lt;gladkov.alexey@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fs: make helpers idmap mount aware</title>
<updated>2021-01-24T13:27:20+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>christian.brauner@ubuntu.com</email>
</author>
<published>2021-01-21T13:19:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=549c7297717c32ee53f156cd949e055e601f67bb'/>
<id>urn:sha1:549c7297717c32ee53f156cd949e055e601f67bb</id>
<content type='text'>
Extend some inode methods with an additional user namespace argument. A
filesystem that is aware of idmapped mounts will receive the user
namespace the mount has been marked with. This can be used for
additional permission checking and also to enable filesystems to
translate between uids and gids if they need to. We have implemented all
relevant helpers in earlier patches.

As requested we simply extend the exisiting inode method instead of
introducing new ones. This is a little more code churn but it's mostly
mechanical and doesnt't leave us with additional inode methods.

Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
</content>
</entry>
<entry>
<title>fs/proc: make pde_get() return nothing</title>
<updated>2020-12-16T06:46:15+00:00</updated>
<author>
<name>Hui Su</name>
<email>sh_def@163.com</email>
</author>
<published>2020-12-16T04:42:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a9389683fafcd4b6f7dcef62f9f05d436a12cfb5'/>
<id>urn:sha1:a9389683fafcd4b6f7dcef62f9f05d436a12cfb5</id>
<content type='text'>
We don't need pde_get()'s return value, so make pde_get() return nothing

Link: https://lkml.kernel.org/r/20201211061944.GA2387571@rlk
Signed-off-by: Hui Su &lt;sh_def@163.com&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>proc: fix lookup in /proc/net subdirectories after setns(2)</title>
<updated>2020-12-16T06:46:15+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2020-12-16T04:42:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c6c75deda81344c3a95d1d1f606d5cee109e5d54'/>
<id>urn:sha1:c6c75deda81344c3a95d1d1f606d5cee109e5d54</id>
<content type='text'>
Commit 1fde6f21d90f ("proc: fix /proc/net/* after setns(2)") only forced
revalidation of regular files under /proc/net/

However, /proc/net/ is unusual in the sense of /proc/net/foo handlers
take netns pointer from parent directory which is old netns.

Steps to reproduce:

	(void)open("/proc/net/sctp/snmp", O_RDONLY);
	unshare(CLONE_NEWNET);

	int fd = open("/proc/net/sctp/snmp", O_RDONLY);
	read(fd, &amp;c, 1);

Read will read wrong data from original netns.

Patch forces lookup on every directory under /proc/net .

Link: https://lkml.kernel.org/r/20201205160916.GA109739@localhost.localdomain
Fixes: 1da4d377f943 ("proc: revalidate misc dentries")
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Reported-by: "Rantala, Tommi T. (Nokia - FI/Espoo)" &lt;tommi.t.rantala@nokia.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>proc: faster open/read/close with "permanent" files</title>
<updated>2020-04-07T17:43:42+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2020-04-07T03:09:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d919b33dafb3e222d23671b2bb06d119aede625f'/>
<id>urn:sha1:d919b33dafb3e222d23671b2bb06d119aede625f</id>
<content type='text'>
Now that "struct proc_ops" exist we can start putting there stuff which
could not fly with VFS "struct file_operations"...

Most of fs/proc/inode.c file is dedicated to make open/read/.../close
reliable in the event of disappearing /proc entries which usually happens
if module is getting removed.  Files like /proc/cpuinfo which never
disappear simply do not need such protection.

Save 2 atomic ops, 1 allocation, 1 free per open/read/close sequence for such
"permanent" files.

Enable "permanent" flag for

	/proc/cpuinfo
	/proc/kmsg
	/proc/modules
	/proc/slabinfo
	/proc/stat
	/proc/sysvipc/*
	/proc/swaps

More will come once I figure out foolproof way to prevent out module
authors from marking their stuff "permanent" for performance reasons
when it is not.

This should help with scalability: benchmark is "read /proc/cpuinfo R times
by N threads scattered over the system".

	N	R	t, s (before)	t, s (after)
	-----------------------------------------------------
	64	4096	1.582458	1.530502	-3.2%
	256	4096	6.371926	6.125168	-3.9%
	1024	4096	25.64888	24.47528	-4.6%

Benchmark source:

#include &lt;chrono&gt;
#include &lt;iostream&gt;
#include &lt;thread&gt;
#include &lt;vector&gt;

#include &lt;sys/types.h&gt;
#include &lt;sys/stat.h&gt;
#include &lt;fcntl.h&gt;
#include &lt;unistd.h&gt;

const int NR_CPUS = sysconf(_SC_NPROCESSORS_ONLN);
int N;
const char *filename;
int R;

int xxx = 0;

int glue(int n)
{
	cpu_set_t m;
	CPU_ZERO(&amp;m);
	CPU_SET(n, &amp;m);
	return sched_setaffinity(0, sizeof(cpu_set_t), &amp;m);
}

void f(int n)
{
	glue(n % NR_CPUS);

	while (*(volatile int *)&amp;xxx == 0) {
	}

	for (int i = 0; i &lt; R; i++) {
		int fd = open(filename, O_RDONLY);
		char buf[4096];
		ssize_t rv = read(fd, buf, sizeof(buf));
		asm volatile ("" :: "g" (rv));
		close(fd);
	}
}

int main(int argc, char *argv[])
{
	if (argc &lt; 4) {
		std::cerr &lt;&lt; "usage: " &lt;&lt; argv[0] &lt;&lt; ' ' &lt;&lt; "N /proc/filename R
";
		return 1;
	}

	N = atoi(argv[1]);
	filename = argv[2];
	R = atoi(argv[3]);

	for (int i = 0; i &lt; NR_CPUS; i++) {
		if (glue(i) == 0)
			break;
	}

	std::vector&lt;std::thread&gt; T;
	T.reserve(N);
	for (int i = 0; i &lt; N; i++) {
		T.emplace_back(f, i);
	}

	auto t0 = std::chrono::system_clock::now();
	{
		*(volatile int *)&amp;xxx = 1;
		for (auto&amp; t: T) {
			t.join();
		}
	}
	auto t1 = std::chrono::system_clock::now();
	std::chrono::duration&lt;double&gt; dt = t1 - t0;
	std::cout &lt;&lt; dt.count() &lt;&lt; '
';

	return 0;
}

P.S.:
Explicit randomization marker is added because adding non-function pointer
will silently disable structure layout randomization.

[akpm@linux-foundation.org: coding style fixes]
Reported-by: kbuild test robot &lt;lkp@intel.com&gt;
Reported-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Link: http://lkml.kernel.org/r/20200222201539.GA22576@avx2
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>proc: Use a list of inodes to flush from proc</title>
<updated>2020-02-24T16:14:44+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2020-02-20T00:22:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7bc3e6e55acf065500a24621f3b313e7e5998acf'/>
<id>urn:sha1:7bc3e6e55acf065500a24621f3b313e7e5998acf</id>
<content type='text'>
Rework the flushing of proc to use a list of directory inodes that
need to be flushed.

The list is kept on struct pid not on struct task_struct, as there is
a fixed connection between proc inodes and pids but at least for the
case of de_thread the pid of a task_struct changes.

This removes the dependency on proc_mnt which allows for different
mounts of proc having different mount options even in the same pid
namespace and this allows for the removal of proc_mnt which will
trivially the first mount of proc to honor it's mount options.

This flushing remains an optimization.  The functions
pid_delete_dentry and pid_revalidate ensure that ordinary dcache
management will not attempt to use dentries past the point their
respective task has died.  When unused the shrinker will
eventually be able to remove these dentries.

There is a case in de_thread where proc_flush_pid can be
called early for a given pid.  Which winds up being
safe (if suboptimal) as this is just an optiimization.

Only pid directories are put on the list as the other
per pid files are children of those directories and
d_invalidate on the directory will get them as well.

So that the pid can be used during flushing it's reference count is
taken in release_task and dropped in proc_flush_pid.  Further the call
of proc_flush_pid is moved after the tasklist_lock is released in
release_task so that it is certain that the pid has already been
unhashed when flushing it taking place.  This removes a small race
where a dentry could recreated.

As struct pid is supposed to be small and I need a per pid lock
I reuse the only lock that currently exists in struct pid the
the wait_pidfd.lock.

The net result is that this adds all of this functionality
with just a little extra list management overhead and
a single extra pointer in struct pid.

v2: Initialize pid-&gt;inodes.  I somehow failed to get that
    initialization into the initial version of the patch.  A boot
    failure was reported by "kernel test robot &lt;lkp@intel.com&gt;", and
    failure to initialize that pid-&gt;inodes matches all of the reported
    symptoms.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>proc: Use d_invalidate in proc_prune_siblings_dcache</title>
<updated>2020-02-24T15:50:04+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2020-02-21T14:43:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f90f3cafe8d56d593fc509a4185da1d5800efea4'/>
<id>urn:sha1:f90f3cafe8d56d593fc509a4185da1d5800efea4</id>
<content type='text'>
The function d_prune_aliases has the problem that it will only prune
aliases thare are completely unused.  It will not remove aliases for
the dcache or even think of removing mounts from the dcache.  For that
behavior d_invalidate is needed.

To use d_invalidate replace d_prune_aliases with d_find_alias followed
by d_invalidate and dput.

For completeness the directory and the non-directory cases are
separated because in theory (although not in currently in practice for
proc) directories can only ever have a single dentry while
non-directories can have hardlinks and thus multiple dentries.
As part of this separation use d_find_any_alias for directories
to spare d_find_alias the extra work of doing that.

Plus the differences between d_find_any_alias and d_find_alias makes
it clear why the directory and non-directory code and not share code.

To make it clear these routines now invalidate dentries rename
proc_prune_siblings_dache to proc_invalidate_siblings_dcache, and rename
proc_sys_prune_dcache proc_sys_invalidate_dcache.

V2: Split the directory and non-directory cases.  To make this
    code robust to future changes in proc.

Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>proc: Generalize proc_sys_prune_dcache into proc_prune_siblings_dcache</title>
<updated>2020-02-20T16:06:02+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2020-02-20T14:34:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=26dbc60f385ff9cff475ea2a3bad02e80fd6fa43'/>
<id>urn:sha1:26dbc60f385ff9cff475ea2a3bad02e80fd6fa43</id>
<content type='text'>
This prepares the way for allowing the pid part of proc to use this
dcache pruning code as well.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>proc: Rename in proc_inode rename sysctl_inodes sibling_inodes</title>
<updated>2020-02-20T14:14:32+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2020-02-19T23:17:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0afa5ca82212247456f9de1468b595a111fee633'/>
<id>urn:sha1:0afa5ca82212247456f9de1468b595a111fee633</id>
<content type='text'>
I about to need and use the same functionality for pid based
inodes and there is no point in adding a second field when
this field is already here and serving the same purporse.

Just give the field a generic name so it is clear that
it is no longer sysctl specific.

Also for good measure initialize sibling_inodes when
proc_inode is initialized.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>proc: decouple proc from VFS with "struct proc_ops"</title>
<updated>2020-02-04T03:05:26+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2020-02-04T01:37:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d56c0d45f0e27f814e87a1676b6bdccccbc252e9'/>
<id>urn:sha1:d56c0d45f0e27f814e87a1676b6bdccccbc252e9</id>
<content type='text'>
Currently core /proc code uses "struct file_operations" for custom hooks,
however, VFS doesn't directly call them.  Every time VFS expands
file_operations hook set, /proc code bloats for no reason.

Introduce "struct proc_ops" which contains only those hooks which /proc
allows to call into (open, release, read, write, ioctl, mmap, poll).  It
doesn't contain module pointer as well.

Save ~184 bytes per usage:

	add/remove: 26/26 grow/shrink: 1/4 up/down: 1922/-6674 (-4752)
	Function                                     old     new   delta
	sysvipc_proc_ops                               -      72     +72
				...
	config_gz_proc_ops                             -      72     +72
	proc_get_inode                               289     339     +50
	proc_reg_get_unmapped_area                   110     107      -3
	close_pdeo                                   227     224      -3
	proc_reg_open                                289     284      -5
	proc_create_data                              60      53      -7
	rt_cpu_seq_fops                              256       -    -256
				...
	default_affinity_proc_fops                   256       -    -256
	Total: Before=5430095, After=5425343, chg -0.09%

Link: http://lkml.kernel.org/r/20191225172228.GA13378@avx2
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
