<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/linux/fdtable.h, branch v6.12.80</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-09-30T01:52:29+00:00</updated>
<entry>
<title>close_range(): fix the logics in descriptor table trimming</title>
<updated>2024-09-30T01:52:29+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2024-08-16T19:17:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=678379e1d4f7443b170939525d3312cfc37bf86b'/>
<id>urn:sha1:678379e1d4f7443b170939525d3312cfc37bf86b</id>
<content type='text'>
Cloning a descriptor table picks the size that would cover all currently
opened files.  That's fine for clone() and unshare(), but for close_range()
there's an additional twist - we clone before we close, and it would be
a shame to have
	close_range(3, ~0U, CLOSE_RANGE_UNSHARE)
leave us with a huge descriptor table when we are not going to keep
anything past stderr, just because some large file descriptor used to
be open before our call has taken it out.

Unfortunately, it had been dealt with in an inherently racy way -
sane_fdtable_size() gets a "don't copy anything past that" argument
(passed via unshare_fd() and dup_fd()), close_range() decides how much
should be trimmed and passes that to unshare_fd().

The problem is, a range that used to extend to the end of descriptor
table back when close_range() had looked at it might very well have stuff
grown after it by the time dup_fd() has allocated a new files_struct
and started to figure out the capacity of fdtable to be attached to that.

That leads to interesting pathological cases; at the very least it's a
QoI issue, since unshare(CLONE_FILES) is atomic in a sense that it takes
a snapshot of descriptor table one might have observed at some point.
Since CLOSE_RANGE_UNSHARE close_range() is supposed to be a combination
of unshare(CLONE_FILES) with plain close_range(), ending up with a
weird state that would never occur with unshare(2) is confusing, to put
it mildly.

It's not hard to get rid of - all it takes is passing both ends of the
range down to sane_fdtable_size().  There we are under -&gt;files_lock,
so the race is trivially avoided.

So we do the following:
	* switch close_files() from calling unshare_fd() to calling
dup_fd().
	* undo the calling convention change done to unshare_fd() in
60997c3d45d9 "close_range: add CLOSE_RANGE_UNSHARE"
	* introduce struct fd_range, pass a pointer to that to dup_fd()
and sane_fdtable_size() instead of "trim everything past that point"
they are currently getting.  NULL means "we are not going to be punching
any holes"; NR_OPEN_MAX is gone.
	* make sane_fdtable_size() use find_last_bit() instead of
open-coding it; it's easier to follow that way.
	* while we are at it, have dup_fd() report errors by returning
ERR_PTR(), no need to use a separate int *errorp argument.

Fixes: 60997c3d45d9 "close_range: add CLOSE_RANGE_UNSHARE"
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>fd_is_open(): move to fs/file.c</title>
<updated>2024-04-15T20:03:24+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2024-01-05T02:45:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c4aab26253cd1f302279b8d6b5b66ccf1b120520'/>
<id>urn:sha1:c4aab26253cd1f302279b8d6b5b66ccf1b120520</id>
<content type='text'>
no users outside that...

Reviewed-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>close_on_exec(): pass files_struct instead of fdtable</title>
<updated>2024-04-15T20:03:24+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2024-01-05T02:35:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f60d374d2cc88034385265d193a38e3f4a4b430c'/>
<id>urn:sha1:f60d374d2cc88034385265d193a38e3f4a4b430c</id>
<content type='text'>
both callers are happier that way...

Reviewed-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>file: s/close_fd_get_file()/file_close_fd()/g</title>
<updated>2023-12-12T13:24:13+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2023-11-30T12:49:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a88c955fcfb49727d0ed86b47410f6555a8e69e4'/>
<id>urn:sha1:a88c955fcfb49727d0ed86b47410f6555a8e69e4</id>
<content type='text'>
That really shouldn't have "get" in there as that implies we're bumping
the reference count which we don't do at all. We used to but not anmore.
Now we're just closing the fd and pick that file from the fdtable
without bumping the reference count. Update the wrong documentation
while at it.

Link: https://lore.kernel.org/r/20231130-vfs-files-fixes-v1-1-e73ca6f4ea83@kernel.org
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>Improve __fget_files_rcu() code generation (and thus __fget_light())</title>
<updated>2023-12-12T13:24:13+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-11-26T20:24:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=253ca8678d30bcf94410b54476fc1e0f1627a137'/>
<id>urn:sha1:253ca8678d30bcf94410b54476fc1e0f1627a137</id>
<content type='text'>
Commit 0ede61d8589c ("file: convert to SLAB_TYPESAFE_BY_RCU") caused a
performance regression as reported by the kernel test robot.

The __fget_light() function is one of those critical ones for some
loads, and the code generation was unnecessarily impacted.  Let's just
write that function to better.

Reported-by: kernel test robot &lt;oliver.sang@intel.com&gt;
Cc: Christian Brauner &lt;brauner@kernel.org&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Mateusz Guzik &lt;mjguzik@gmail.com&gt;
Closes: https://lore.kernel.org/oe-lkp/202311201406.2022ca3f-oliver.sang@intel.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: https://lore.kernel.org/r/CAHk-=wiCJtLbFWNURB34b9a_R_unaH3CiMRXfkR0-iihB_z68A@mail.gmail.com
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>file: convert to SLAB_TYPESAFE_BY_RCU</title>
<updated>2023-10-19T09:02:48+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2023-09-29T06:45:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0ede61d8589cc2d93aa78230d74ac58b5b8d0244'/>
<id>urn:sha1:0ede61d8589cc2d93aa78230d74ac58b5b8d0244</id>
<content type='text'>
In recent discussions around some performance improvements in the file
handling area we discussed switching the file cache to rely on
SLAB_TYPESAFE_BY_RCU which allows us to get rid of call_rcu() based
freeing for files completely. This is a pretty sensitive change overall
but it might actually be worth doing.

The main downside is the subtlety. The other one is that we should
really wait for Jann's patch to land that enables KASAN to handle
SLAB_TYPESAFE_BY_RCU UAFs. Currently it doesn't but a patch for this
exists.

With SLAB_TYPESAFE_BY_RCU objects may be freed and reused multiple times
which requires a few changes. So it isn't sufficient anymore to just
acquire a reference to the file in question under rcu using
atomic_long_inc_not_zero() since the file might have already been
recycled and someone else might have bumped the reference.

In other words, callers might see reference count bumps from newer
users. For this reason it is necessary to verify that the pointer is the
same before and after the reference count increment. This pattern can be
seen in get_file_rcu() and __files_get_rcu().

In addition, it isn't possible to access or check fields in struct file
without first aqcuiring a reference on it. Not doing that was always
very dodgy and it was only usable for non-pointer data in struct file.
With SLAB_TYPESAFE_BY_RCU it is necessary that callers first acquire a
reference under rcu or they must hold the files_lock of the fdtable.
Failing to do either one of this is a bug.

Thanks to Jann for pointing out that we need to ensure memory ordering
between reallocations and pointer check by ensuring that all subsequent
loads have a dependency on the second load in get_file_rcu() and
providing a fixup that was folded into this patch.

Cc: Jann Horn &lt;jannh@google.com&gt;
Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>Unify the primitives for file descriptor closing</title>
<updated>2022-05-14T22:49:01+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2022-05-12T21:08:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6319194ec57b0452dcda4589d24c4e7db299c5bf'/>
<id>urn:sha1:6319194ec57b0452dcda4589d24c4e7db299c5bf</id>
<content type='text'>
Currently we have 3 primitives for removing an opened file from descriptor
table - pick_file(), __close_fd_get_file() and close_fd_get_file().  Their
calling conventions are rather odd and there's a code duplication for no
good reason.  They can be unified -

1) have __range_close() cap max_fd in the very beginning; that way
we don't need separate way for pick_file() to report being past the end
of descriptor table.

2) make {__,}close_fd_get_file() return file (or NULL) directly, rather
than returning it via struct file ** argument.  Don't bother with
(bogus) return value - nobody wants that -ENOENT.

3) make pick_file() return NULL on unopened descriptor - the only caller
that used to care about the distinction between descriptor past the end
of descriptor table and finding NULL in descriptor table doesn't give
a damn after (1).

4) lift -&gt;files_lock out of pick_file()

That actually simplifies the callers, as well as the primitives themselves.
Code duplication is also gone...

Reviewed-by: Christian Brauner (Microsoft) &lt;brauner@kernel.org&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>file: Remove get_files_struct</title>
<updated>2020-12-10T18:42:59+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2020-11-20T23:14:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fa67bf885e5211c7dce9514ef2877212c0a5e09e'/>
<id>urn:sha1:fa67bf885e5211c7dce9514ef2877212c0a5e09e</id>
<content type='text'>
When discussing[1] exec and posix file locks it was realized that none
of the callers of get_files_struct fundamentally needed to call
get_files_struct, and that by switching them to helper functions
instead it will both simplify their code and remove unnecessary
increments of files_struct.count.  Those unnecessary increments can
result in exec unnecessarily unsharing files_struct which breaking
posix locks, and it can result in fget_light having to fallback to
fget reducing system performance.

Now that get_files_struct has no more users and can not cause the
problems for posix file locking and fget_light remove get_files_struct
so that it does not gain any new users.

[1] https://lkml.kernel.org/r/20180915160423.GA31461@redhat.com
Suggested-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
v1: https://lkml.kernel.org/r/20200817220425.9389-13-ebiederm@xmission.com
Link: https://lkml.kernel.org/r/20201120231441.29911-24-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>file: Rename __close_fd_get_file close_fd_get_file</title>
<updated>2020-12-10T18:42:59+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2020-11-20T23:14:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9fe83c43e71cdb8e5b9520bcb98706a2b3c680c8'/>
<id>urn:sha1:9fe83c43e71cdb8e5b9520bcb98706a2b3c680c8</id>
<content type='text'>
The function close_fd_get_file is explicitly a variant of
__close_fd[1].  Now that __close_fd has been renamed close_fd, rename
close_fd_get_file to be consistent with close_fd.

When __alloc_fd, __close_fd and __fd_install were introduced the
double underscore indicated that the function took a struct
files_struct parameter.  The function __close_fd_get_file never has so
the naming has always been inconsistent.  This just cleans things up
so there are not any lingering mentions or references __close_fd left
in the code.

[1] 80cd795630d6 ("binder: fix use-after-free due to ksys_close() during fdget()")
Link: https://lkml.kernel.org/r/20201120231441.29911-23-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>file: Rename __close_fd to close_fd and remove the files parameter</title>
<updated>2020-12-10T18:42:59+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2020-11-20T23:14:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8760c909f54a82aaa6e76da19afe798a0c77c3c3'/>
<id>urn:sha1:8760c909f54a82aaa6e76da19afe798a0c77c3c3</id>
<content type='text'>
The function __close_fd was added to support binder[1].  Now that
binder has been fixed to no longer need __close_fd[2] all calls
to __close_fd pass current-&gt;files.

Therefore transform the files parameter into a local variable
initialized to current-&gt;files, and rename __close_fd to close_fd to
reflect this change, and keep it in sync with the similar changes to
__alloc_fd, and __fd_install.

This removes the need for callers to care about the extra care that
needs to be take if anything except current-&gt;files is passed, by
limiting the callers to only operation on current-&gt;files.

[1] 483ce1d4b8c3 ("take descriptor-related part of close() to file.c")
[2] 44d8047f1d87 ("binder: use standard functions to allocate fds")
Acked-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
v1: https://lkml.kernel.org/r/20200817220425.9389-17-ebiederm@xmission.com
Link: https://lkml.kernel.org/r/20201120231441.29911-21-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
</feed>
