<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/binfmt_misc.c, branch v6.18.21</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.18.21'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-11-05T13:00:16+00:00</updated>
<entry>
<title>binfmt_misc: restore write access before closing files opened by open_exec()</title>
<updated>2025-11-05T13:00:16+00:00</updated>
<author>
<name>Zilin Guan</name>
<email>zilin@seu.edu.cn</email>
</author>
<published>2025-11-05T02:29:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=90f601b497d76f40fa66795c3ecf625b6aced9fd'/>
<id>urn:sha1:90f601b497d76f40fa66795c3ecf625b6aced9fd</id>
<content type='text'>
bm_register_write() opens an executable file using open_exec(), which
internally calls do_open_execat() and denies write access on the file to
avoid modification while it is being executed.

However, when an error occurs, bm_register_write() closes the file using
filp_close() directly. This does not restore the write permission, which
may cause subsequent write operations on the same file to fail.

Fix this by calling exe_file_allow_write_access() before filp_close() to
restore the write permission properly.

Fixes: e7850f4d844e ("binfmt_misc: fix possible deadlock in bm_register_write")
Signed-off-by: Zilin Guan &lt;zilin@seu.edu.cn&gt;
Link: https://patch.msgid.link/20251105022923.1813587-1-zilin@seu.edu.cn
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>binfmt_misc: switch to locked_recursive_removal()</title>
<updated>2025-07-03T02:36:51+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2025-03-09T00:54:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=eacb58fdcaefac63d32c3c095152e97dc9425ea8'/>
<id>urn:sha1:eacb58fdcaefac63d32c3c095152e97dc9425ea8</id>
<content type='text'>
... fixing a mount leak, strictly speaking.

Reviewed-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>VFS: rename lookup_one_len family to lookup_noperm and remove permission check</title>
<updated>2025-04-08T09:24:36+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neil@brown.name</email>
</author>
<published>2025-03-19T03:01:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fa6fe07d1536361a227d655e69ca270faf28fdbe'/>
<id>urn:sha1:fa6fe07d1536361a227d655e69ca270faf28fdbe</id>
<content type='text'>
The lookup_one_len family of functions is (now) only used internally by
a filesystem on itself either
- in a context where permission checking is irrelevant such as by a
  virtual filesystem populating itself, or xfs accessing its ORPHANAGE
  or dquota accessing the quota file; or
- in a context where a permission check (MAY_EXEC on the parent) has just
  been performed such as a network filesystem finding in "silly-rename"
  file in the same directory.  This is also the context after the
  _parentat() functions where currently lookup_one_qstr_excl() is used.

So the permission check is pointless.

The name "one_len" is unhelpful in understanding the purpose of these
functions and should be changed.  Most of the callers pass the len as
"strlen()" so using a qstr and QSTR() can simplify the code.

This patch renames these functions (include lookup_positive_unlocked()
which is part of the family despite the name) to have a name based on
"lookup_noperm".  They are changed to receive a 'struct qstr' instead
of separate name and len.  In a few cases the use of QSTR() results in a
new call to strlen().

try_lookup_noperm() takes a pointer to a qstr instead of the whole
qstr.  This is consistent with d_hash_and_lookup() (which is nearly
identical) and useful for lookup_noperm_unlocked().

The new lookup_noperm_common() doesn't take a qstr yet.  That will be
tidied up in a subsequent patch.

Signed-off-by: NeilBrown &lt;neil@brown.name&gt;
Link: https://lore.kernel.org/r/20250319031545.2999807-5-neil@brown.name
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'execve-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux</title>
<updated>2025-01-20T21:27:58+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-01-20T21:27:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fadc3ed9ce1cd9ecc5c8be8875f7ec11ab3a7ebe'/>
<id>urn:sha1:fadc3ed9ce1cd9ecc5c8be8875f7ec11ab3a7ebe</id>
<content type='text'>
Pull execve updates from Kees Cook:

 - fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case (Tycho
   Andersen, Kees Cook)

 - binfmt_misc: Fix comment typos (Christophe JAILLET)

 - move empty argv[0] warning closer to actual logic (Nir Lichtman)

 - remove legacy custom binfmt modules autoloading (Nir Lichtman)

 - Make sure set_task_comm() always NUL-terminates

 - binfmt_flat: Fix integer overflow bug on 32 bit systems (Dan
   Carpenter)

 - coredump: Do not lock when copying "comm"

 - MAINTAINERS: add auxvec.h and set myself as maintainer

* tag 'execve-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  binfmt_flat: Fix integer overflow bug on 32 bit systems
  selftests/exec: add a test for execveat()'s comm
  exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case
  exec: Make sure task-&gt;comm is always NUL-terminated
  exec: remove legacy custom binfmt modules autoloading
  exec: move warning of null argv to be next to the relevant code
  fs: binfmt: Fix a typo
  MAINTAINERS: exec: Mark Kees as maintainer
  MAINTAINERS: exec: Add auxvec.h UAPI
  coredump: Do not lock during 'comm' reporting
</content>
</entry>
<entry>
<title>fs: binfmt: Fix a typo</title>
<updated>2024-11-30T03:35:58+00:00</updated>
<author>
<name>Christophe JAILLET</name>
<email>christophe.jaillet@wanadoo.fr</email>
</author>
<published>2024-11-02T09:25:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b6709dcd87ac85d56e7cd574a7b21f3a8727d942'/>
<id>urn:sha1:b6709dcd87ac85d56e7cd574a7b21f3a8727d942</id>
<content type='text'>
A 't' is missing in "binfm_misc".
Add it.

Signed-off-by: Christophe JAILLET &lt;christophe.jaillet@wanadoo.fr&gt;
Acked-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Link: https://lore.kernel.org/r/34b8c52b67934b293a67558a9a486aea7ba08951.1730539498.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Kees Cook &lt;kees@kernel.org&gt;
</content>
</entry>
<entry>
<title>Revert "fs: don't block i_writecount during exec"</title>
<updated>2024-11-27T11:51:30+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2024-11-27T11:45:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3b832035387ff508fdcf0fba66701afc78f79e3d'/>
<id>urn:sha1:3b832035387ff508fdcf0fba66701afc78f79e3d</id>
<content type='text'>
This reverts commit 2a010c41285345da60cece35575b4e0af7e7bf44.

Rui Ueyama &lt;rui314@gmail.com&gt; writes:

&gt; I'm the creator and the maintainer of the mold linker
&gt; (https://github.com/rui314/mold). Recently, we discovered that mold
&gt; started causing process crashes in certain situations due to a change
&gt; in the Linux kernel. Here are the details:
&gt;
&gt; - In general, overwriting an existing file is much faster than
&gt; creating an empty file and writing to it on Linux, so mold attempts to
&gt; reuse an existing executable file if it exists.
&gt;
&gt; - If a program is running, opening the executable file for writing
&gt; previously failed with ETXTBSY. If that happens, mold falls back to
&gt; creating a new file.
&gt;
&gt; - However, the Linux kernel recently changed the behavior so that
&gt; writing to an executable file is now always permitted
&gt; (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2a010c412853).
&gt;
&gt; That caused mold to write to an executable file even if there's a
&gt; process running that file. Since changes to mmap'ed files are
&gt; immediately visible to other processes, any processes running that
&gt; file would almost certainly crash in a very mysterious way.
&gt; Identifying the cause of these random crashes took us a few days.
&gt;
&gt; Rejecting writes to an executable file that is currently running is a
&gt; well-known behavior, and Linux had operated that way for a very long
&gt; time. So, I don’t believe relying on this behavior was our mistake;
&gt; rather, I see this as a regression in the Linux kernel.

Quoting myself from commit 2a010c412853 ("fs: don't block i_writecount during exec")

&gt; Yes, someone in userspace could potentially be relying on this. It's not
&gt; completely out of the realm of possibility but let's find out if that's
&gt; actually the case and not guess.

It seems we found out that someone is relying on this obscure behavior.
So revert the change.

Link: https://github.com/rui314/mold/issues/1361
Link: https://lore.kernel.org/r/4a2bc207-76be-4715-8e12-7fc45a76a125@leemhuis.info
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'vfs-6.11.module.description' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs</title>
<updated>2024-07-15T18:14:59+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-07-15T18:14:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7d156879ffd6c48428c2f46d5c2b4b80d9c9ee79'/>
<id>urn:sha1:7d156879ffd6c48428c2f46d5c2b4b80d9c9ee79</id>
<content type='text'>
Pull vfs module description updates from Christian Brauner:
 "This contains patches to add module descriptions to all modules under
  fs/ currently lacking them"

* tag 'vfs-6.11.module.description' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  openpromfs: add missing MODULE_DESCRIPTION() macro
  fs: nls: add missing MODULE_DESCRIPTION() macros
  fs: autofs: add MODULE_DESCRIPTION()
  fs: fat: add missing MODULE_DESCRIPTION() macros
  fs: binfmt: add missing MODULE_DESCRIPTION() macros
  fs: cramfs: add MODULE_DESCRIPTION()
  fs: hfs: add MODULE_DESCRIPTION()
  fs: hpfs: add MODULE_DESCRIPTION()
  qnx4: add MODULE_DESCRIPTION()
  qnx6: add MODULE_DESCRIPTION()
  fs: sysv: add MODULE_DESCRIPTION()
  fs: efs: add MODULE_DESCRIPTION()
  fs: minix: add MODULE_DESCRIPTION()
</content>
</entry>
<entry>
<title>fs: don't block i_writecount during exec</title>
<updated>2024-06-03T13:52:10+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2024-05-31T13:01:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2a010c41285345da60cece35575b4e0af7e7bf44'/>
<id>urn:sha1:2a010c41285345da60cece35575b4e0af7e7bf44</id>
<content type='text'>
Back in 2021 we already discussed removing deny_write_access() for
executables. Back then I was hesistant because I thought that this might
cause issues in userspace. But even back then I had started taking some
notes on what could potentially depend on this and I didn't come up with
a lot so I've changed my mind and I would like to try this.

Here are some of the notes that I took:

(1) The deny_write_access() mechanism is causing really pointless issues
    such as [1]. If a thread in a thread-group opens a file writable,
    then writes some stuff, then closing the file descriptor and then
    calling execve() they can fail the execve() with ETXTBUSY because
    another thread in the thread-group could have concurrently called
    fork(). Multi-threaded libraries such as go suffer from this.

(2) There are userspace attacks that rely on overwriting the binary of a
    running process. These attacks are _mitigated_ but _not at all
    prevented_ from ocurring by the deny_write_access() mechanism.

    I'll go over some details. The clearest example of such attacks was
    the attack against runC in CVE-2019-5736 (cf. [3]).

    An attack could compromise the runC host binary from inside a
    _privileged_ runC container. The malicious binary could then be used
    to take over the host.

    (It is crucial to note that this attack is _not_ possible with
     unprivileged containers. IOW, the setup here is already insecure.)

    The attack can be made when attaching to a running container or when
    starting a container running a specially crafted image. For example,
    when runC attaches to a container the attacker can trick it into
    executing itself.

    This could be done by replacing the target binary inside the
    container with a custom binary pointing back at the runC binary
    itself. As an example, if the target binary was /bin/bash, this
    could be replaced with an executable script specifying the
    interpreter path #!/proc/self/exe.

    As such when /bin/bash is executed inside the container, instead the
    target of /proc/self/exe will be executed. That magic link will
    point to the runc binary on the host. The attacker can then proceed
    to write to the target of /proc/self/exe to try and overwrite the
    runC binary on the host.

    However, this will not succeed because of deny_write_access(). Now,
    one might think that this would prevent the attack but it doesn't.

    To overcome this, the attacker has multiple ways:
    * Open a file descriptor to /proc/self/exe using the O_PATH flag and
      then proceed to reopen the binary as O_WRONLY through
      /proc/self/fd/&lt;nr&gt; and try to write to it in a busy loop from a
      separate process. Ultimately it will succeed when the runC binary
      exits. After this the runC binary is compromised and can be used
      to attack other containers or the host itself.
    * Use a malicious shared library annotating a function in there with
      the constructor attribute making the malicious function run as an
      initializor. The malicious library will then open /proc/self/exe
      for creating a new entry under /proc/self/fd/&lt;nr&gt;. It'll then call
      exec to a) force runC to exit and b) hand the file descriptor off
      to a program that then reopens /proc/self/fd/&lt;nr&gt; for writing
      (which is now possible because runC has exited) and overwriting
      that binary.

    To sum up: the deny_write_access() mechanism doesn't prevent such
    attacks in insecure setups. It just makes them minimally harder.
    That's all.

    The only way back then to prevent this is to create a temporary copy
    of the calling binary itself when it starts or attaches to
    containers. So what I did back then for LXC (and Aleksa for runC)
    was to create an anonymous, in-memory file using the memfd_create()
    system call and to copy itself into the temporary in-memory file,
    which is then sealed to prevent further modifications. This sealed,
    in-memory file copy is then executed instead of the original on-disk
    binary.

    Any compromising write operations from a privileged container to the
    host binary will then write to the temporary in-memory binary and
    not to the host binary on-disk, preserving the integrity of the host
    binary. Also as the temporary, in-memory binary is sealed, writes to
    this will also fail.

    The point is that deny_write_access() is uselss to prevent these
    attacks.

(3) Denying write access to an inode because it's currently used in an
    exec path could easily be done on an LSM level. It might need an
    additional hook but that should be about it.

(4) The MAP_DENYWRITE flag for mmap() has been deprecated a long time
    ago so while we do protect the main executable the bigger portion of
    the things you'd think need protecting such as the shared libraries
    aren't. IOW, we let anyone happily overwrite shared libraries.

(5) We removed all remaining uses of VM_DENYWRITE in [2]. That means:
    (5.1) We removed the legacy uselib() protection for preventing
          overwriting of shared libraries. Nobody cared in 3 years.
    (5.2) We allow write access to the elf interpreter after exec
          completed treating it on a par with shared libraries.

Yes, someone in userspace could potentially be relying on this. It's not
completely out of the realm of possibility but let's find out if that's
actually the case and not guess.

Link: https://github.com/golang/go/issues/22315 [1]
Link: 49624efa65ac ("Merge tag 'denywrite-for-5.15' of git://github.com/davidhildenbrand/linux") [2]
Link: https://unit42.paloaltonetworks.com/breaking-docker-via-runc-explaining-cve-2019-5736 [3]
Link: https://lwn.net/Articles/866493
Link: https://github.com/golang/go/issues/22220
Link: https://github.com/golang/go/blob/5bf8c0cf09ee5c7e5a37ab90afcce154ab716a97/src/cmd/go/internal/work/buildid.go#L724
Link: https://github.com/golang/go/blob/5bf8c0cf09ee5c7e5a37ab90afcce154ab716a97/src/cmd/go/internal/work/exec.go#L1493
Link: https://github.com/golang/go/blob/5bf8c0cf09ee5c7e5a37ab90afcce154ab716a97/src/cmd/go/internal/script/cmds.go#L457
Link: https://github.com/golang/go/blob/5bf8c0cf09ee5c7e5a37ab90afcce154ab716a97/src/cmd/go/internal/test/test.go#L1557
Link: https://github.com/golang/go/blob/5bf8c0cf09ee5c7e5a37ab90afcce154ab716a97/src/os/exec/lp_linux_test.go#L61
Link: https://github.com/buildkite/agent/pull/2736
Link: https://github.com/rust-lang/rust/issues/114554
Link: https://bugs.openjdk.org/browse/JDK-8068370
Link: https://github.com/dotnet/runtime/issues/58964
Link: https://lore.kernel.org/r/20240531-vfs-i_writecount-v1-1-a17bea7ee36b@kernel.org
Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>fs: binfmt: add missing MODULE_DESCRIPTION() macros</title>
<updated>2024-05-28T10:06:51+00:00</updated>
<author>
<name>Jeff Johnson</name>
<email>quic_jjohnson@quicinc.com</email>
</author>
<published>2024-05-27T18:57:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2c2a3f622e400a58cb53eac633b8102da0600102'/>
<id>urn:sha1:2c2a3f622e400a58cb53eac633b8102da0600102</id>
<content type='text'>
Fix the 'make W=1' warnings:
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/binfmt_misc.o
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/binfmt_script.o

Signed-off-by: Jeff Johnson &lt;quic_jjohnson@quicinc.com&gt;
Link: https://lore.kernel.org/r/20240527-md-fs-binfmt-v1-1-f9dc1745cb67@quicinc.com
Acked-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'execve-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux</title>
<updated>2023-10-31T05:28:19+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-10-31T05:28:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d82c0a37d431ada0d1dae9a2665fcfe17b0f9e14'/>
<id>urn:sha1:d82c0a37d431ada0d1dae9a2665fcfe17b0f9e14</id>
<content type='text'>
Pull execve updates from Kees Cook:

 - Support non-BSS ELF segments with zero filesz

   Eric Biederman and I refactored ELF segment loading to handle the
   case where a segment has a smaller filesz than memsz. Traditionally
   linkers only did this for .bss and it was always the last segment. As
   a result, the kernel only handled this case when it was the last
   segment. We've had two recent cases where linkers were trying to use
   these kinds of segments for other reasons, and the were in the middle
   of the segment list. There was no good reason for the kernel not to
   support this, and the refactor actually ends up making things more
   readable too.

 - Enable namespaced binfmt_misc

   Christian Brauner has made it possible to use binfmt_misc with mount
   namespaces. This means some traditionally root-only interfaces (for
   adding/removing formats) are now more exposed (but believed to be
   safe).

 - Remove struct tag 'dynamic' from ELF UAPI

   Alejandro Colomar noticed that the ELF UAPI has been polluting the
   struct namespace with an unused and overly generic tag named
   "dynamic" for no discernible reason for many many years. After
   double-checking various distro source repositories, it has been
   removed.

 - Clean up binfmt_elf_fdpic debug output (Greg Ungerer)

* tag 'execve-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  binfmt_misc: enable sandboxed mounts
  binfmt_misc: cleanup on filesystem umount
  binfmt_elf_fdpic: clean up debug warnings
  mm: Remove unused vm_brk()
  binfmt_elf: Only report padzero() errors when PROT_WRITE
  binfmt_elf: Use elf_load() for library
  binfmt_elf: Use elf_load() for interpreter
  binfmt_elf: elf_bss no longer used by load_elf_binary()
  binfmt_elf: Support segments with 0 filesz and misaligned starts
  elf, uapi: Remove struct tag 'dynamic'
</content>
</entry>
</feed>
