<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/Kconfig, branch v6.12.80</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.12.80'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2025-04-25T08:47:50+00:00</updated>
<entry>
<title>nfs: add missing selections of CONFIG_CRC32</title>
<updated>2025-04-25T08:47:50+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@google.com</email>
</author>
<published>2025-04-01T22:02:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=33c0f51f1a8d48d0858b1b82d6c80627f4a3551f'/>
<id>urn:sha1:33c0f51f1a8d48d0858b1b82d6c80627f4a3551f</id>
<content type='text'>
[ Upstream commit cd35b6cb46649750b7dbd0df0e2d767415d8917b ]

nfs.ko, nfsd.ko, and lockd.ko all use crc32_le(), which is available
only when CONFIG_CRC32 is enabled.  But the only NFS kconfig option that
selected CONFIG_CRC32 was CONFIG_NFS_DEBUG, which is client-specific and
did not actually guard the use of crc32_le() even on the client.

The code worked around this bug by only actually calling crc32_le() when
CONFIG_CRC32 is built-in, instead hard-coding '0' in other cases.  This
avoided randconfig build errors, and in real kernels the fallback code
was unlikely to be reached since CONFIG_CRC32 is 'default y'.  But, this
really needs to just be done properly, especially now that I'm planning
to update CONFIG_CRC32 to not be 'default y'.

Therefore, make CONFIG_NFS_FS, CONFIG_NFSD, and CONFIG_LOCKD select
CONFIG_CRC32.  Then remove the fallback code that becomes unnecessary,
as well as the selection of CONFIG_CRC32 from CONFIG_NFS_DEBUG.

Fixes: 1264a2f053a3 ("NFS: refactor code for calculating the crc32 hash of a filehandle")
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Acked-by: Anna Schumaker &lt;anna.schumaker@oracle.com&gt;
Signed-off-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>nfs_common: fix Kconfig for NFS_COMMON_LOCALIO_SUPPORT</title>
<updated>2024-10-03T20:19:51+00:00</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@kernel.org</email>
</author>
<published>2024-10-03T19:34:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=009b15b5748549fec4f40f1142ce644a27ddc265'/>
<id>urn:sha1:009b15b5748549fec4f40f1142ce644a27ddc265</id>
<content type='text'>
The 'default n' that was in NFS_COMMON_LOCALIO_SUPPORT caused these
extra defaults to be missed:
        default y if NFSD=y || NFS_FS=y
	default m if NFSD=m &amp;&amp; NFS_FS=m

Remove the 'default n' for NFS_COMMON_LOCALIO_SUPPORT so that the
correct tristate is selected based on how NFSD and NFS_FS are
configured.  This fixes the reported case where NFS_FS=y but
NFS_COMMON_LOCALIO_SUPPORT=m, it is now correctly set to =y.

In addition, add extra 'depends on NFS_LOCALIO' to
NFS_COMMON_LOCALIO_SUPPORT so that if NFS_LOCALIO isn't set then
NFS_COMMON_LOCALIO_SUPPORT will not be either.

Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Closes: https://lore.kernel.org/oe-kbuild-all/202410031944.hMCFY9BO-lkp@intel.com/
Signed-off-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Signed-off-by: Anna Schumaker &lt;anna.schumaker@oracle.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'nfs-for-6.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfs</title>
<updated>2024-09-24T22:44:18+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-09-24T22:44:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=684a64bf32b6e488004e0ad7f0d7e922798f65b6'/>
<id>urn:sha1:684a64bf32b6e488004e0ad7f0d7e922798f65b6</id>
<content type='text'>
Pull NFS client updates from Anna Schumaker:
 "New Features:
   - Add a 'noalignwrite' mount option for lock-less 'lost writes' prevention
   - Add support for the LOCALIO protocol extention

  Bugfixes:
   - Fix memory leak in error path of nfs4_do_reclaim()
   - Simplify and guarantee lock owner uniqueness
   - Fix -Wformat-truncation warning
   - Fix folio refcounts by using folio_attach_private()
   - Fix failing the mount system call when the server is down
   - Fix detection of "Proxying of Times" server support

  Cleanups:
   - Annotate struct nfs_cache_array with __counted_by()
   - Remove unnecessary NULL checks before kfree()
   - Convert RPC_TASK_* constants to an enum
   - Remove obsolete or misleading comments and declerations"

* tag 'nfs-for-6.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (41 commits)
  nfs: Fix `make htmldocs` warnings in the localio documentation
  nfs: add "NFS Client and Server Interlock" section to localio.rst
  nfs: add FAQ section to Documentation/filesystems/nfs/localio.rst
  nfs: add Documentation/filesystems/nfs/localio.rst
  nfs: implement client support for NFS_LOCALIO_PROGRAM
  nfs/localio: use dedicated workqueues for filesystem read and write
  pnfs/flexfiles: enable localio support
  nfs: enable localio for non-pNFS IO
  nfs: add LOCALIO support
  nfs: pass struct nfsd_file to nfs_init_pgio and nfs_init_commit
  nfsd: implement server support for NFS_LOCALIO_PROGRAM
  nfsd: add LOCALIO support
  nfs_common: prepare for the NFS client to use nfsd_file for LOCALIO
  nfs_common: add NFS LOCALIO auxiliary protocol enablement
  SUNRPC: replace program list with program array
  SUNRPC: add svcauth_map_clnt_to_svc_cred_local
  SUNRPC: remove call_allocate() BUG_ONs
  nfsd: add nfsd_serv_try_get and nfsd_serv_put
  nfsd: add nfsd_file_acquire_local()
  nfsd: factor out __fh_verify to allow NULL rqstp to be passed
  ...
</content>
</entry>
<entry>
<title>nfs_common: add NFS LOCALIO auxiliary protocol enablement</title>
<updated>2024-09-23T19:03:30+00:00</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@kernel.org</email>
</author>
<published>2024-09-05T19:09:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2a33a85be45178198245e1f656e6224c899895e4'/>
<id>urn:sha1:2a33a85be45178198245e1f656e6224c899895e4</id>
<content type='text'>
fs/nfs_common/nfslocalio.c provides interfaces that enable an NFS
client to generate a nonce (single-use UUID) and associated nfs_uuid_t
struct, register it with nfs_common for subsequent lookup and
verification by the NFS server and if matched the NFS server populates
members in the nfs_uuid_t struct.

nfs_common's nfs_uuids list is the basis for localio enablement, as
such it has members that point to nfsd memory for direct use by the
client (e.g. 'net' is the server's network namespace, through it the
client can access nn-&gt;nfsd_serv).

This commit also provides the base nfs_uuid_t interfaces to allow
proper net namespace refcounting for the LOCALIO use case.

CONFIG_NFS_LOCALIO controls the nfs_common, NFS server and NFS client
enablement for LOCALIO. If both NFS_FS=m and NFSD=m then
NFS_COMMON_LOCALIO_SUPPORT=m and nfs_localio.ko is built (and provides
nfs_common's LOCALIO support).

  # lsmod | grep nfs_localio
  nfs_localio            12288  2 nfsd,nfs
  sunrpc                745472  35 nfs_localio,nfsd,auth_rpcgss,lockd,nfsv3,nfs

Signed-off-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Co-developed-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Anna Schumaker &lt;anna.schumaker@oracle.com&gt;
</content>
</entry>
<entry>
<title>mm/hugetlb: enforce that PMD PT sharing has split PMD PT locks</title>
<updated>2024-09-02T03:25:51+00:00</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2024-07-26T15:07:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=188cac58a8bcdf82c7f63275b68f7a46871e45d6'/>
<id>urn:sha1:188cac58a8bcdf82c7f63275b68f7a46871e45d6</id>
<content type='text'>
Sharing page tables between processes but falling back to per-MM page
table locks cannot possibly work.

So, let's make sure that we do have split PMD locks by adding a new
Kconfig option and letting that depend on CONFIG_SPLIT_PMD_PTLOCKS.

Link: https://lkml.kernel.org/r/20240726150728.3159964-3-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Mike Rapoport (Microsoft) &lt;rppt@kernel.org&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Boris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
Cc: Christian Brauner &lt;brauner@kernel.org&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Juergen Gross &lt;jgross@suse.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: "Naveen N. Rao" &lt;naveen.n.rao@linux.ibm.com&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Russell King &lt;linux@armlinux.org.uk&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm</title>
<updated>2024-03-15T00:43:30+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-03-15T00:43:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=902861e34c401696ed9ad17a54c8790e7e8e3069'/>
<id>urn:sha1:902861e34c401696ed9ad17a54c8790e7e8e3069</id>
<content type='text'>
Pull MM updates from Andrew Morton:

 - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
   from hotplugged memory rather than only from main memory. Series
   "implement "memmap on memory" feature on s390".

 - More folio conversions from Matthew Wilcox in the series

	"Convert memcontrol charge moving to use folios"
	"mm: convert mm counter to take a folio"

 - Chengming Zhou has optimized zswap's rbtree locking, providing
   significant reductions in system time and modest but measurable
   reductions in overall runtimes. The series is "mm/zswap: optimize the
   scalability of zswap rb-tree".

 - Chengming Zhou has also provided the series "mm/zswap: optimize zswap
   lru list" which provides measurable runtime benefits in some
   swap-intensive situations.

 - And Chengming Zhou further optimizes zswap in the series "mm/zswap:
   optimize for dynamic zswap_pools". Measured improvements are modest.

 - zswap cleanups and simplifications from Yosry Ahmed in the series
   "mm: zswap: simplify zswap_swapoff()".

 - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
   contributed several DAX cleanups as well as adding a sysfs tunable to
   control the memmap_on_memory setting when the dax device is
   hotplugged as system memory.

 - Johannes Weiner has added the large series "mm: zswap: cleanups",
   which does that.

 - More DAMON work from SeongJae Park in the series

	"mm/damon: make DAMON debugfs interface deprecation unignorable"
	"selftests/damon: add more tests for core functionalities and corner cases"
	"Docs/mm/damon: misc readability improvements"
	"mm/damon: let DAMOS feeds and tame/auto-tune itself"

 - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
   extension" Rakie Kim has developed a new mempolicy interleaving
   policy wherein we allocate memory across nodes in a weighted fashion
   rather than uniformly. This is beneficial in heterogeneous memory
   environments appearing with CXL.

 - Christophe Leroy has contributed some cleanup and consolidation work
   against the ARM pagetable dumping code in the series "mm: ptdump:
   Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".

 - Luis Chamberlain has added some additional xarray selftesting in the
   series "test_xarray: advanced API multi-index tests".

 - Muhammad Usama Anjum has reworked the selftest code to make its
   human-readable output conform to the TAP ("Test Anything Protocol")
   format. Amongst other things, this opens up the use of third-party
   tools to parse and process out selftesting results.

 - Ryan Roberts has added fork()-time PTE batching of THP ptes in the
   series "mm/memory: optimize fork() with PTE-mapped THP". Mainly
   targeted at arm64, this significantly speeds up fork() when the
   process has a large number of pte-mapped folios.

 - David Hildenbrand also gets in on the THP pte batching game in his
   series "mm/memory: optimize unmap/zap with PTE-mapped THP". It
   implements batching during munmap() and other pte teardown
   situations. The microbenchmark improvements are nice.

 - And in the series "Transparent Contiguous PTEs for User Mappings"
   Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte
   mappings"). Kernel build times on arm64 improved nicely. Ryan's
   series "Address some contpte nits" provides some followup work.

 - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
   fixed an obscure hugetlb race which was causing unnecessary page
   faults. He has also added a reproducer under the selftest code.

 - In the series "selftests/mm: Output cleanups for the compaction
   test", Mark Brown did what the title claims.

 - Kinsey Ho has added the series "mm/mglru: code cleanup and
   refactoring".

 - Even more zswap material from Nhat Pham. The series "fix and extend
   zswap kselftests" does as claimed.

 - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
   regression" Mathieu Desnoyers has cleaned up and fixed rather a mess
   in our handling of DAX on archiecctures which have virtually aliasing
   data caches. The arm architecture is the main beneficiary.

 - Lokesh Gidra's series "per-vma locks in userfaultfd" provides
   dramatic improvements in worst-case mmap_lock hold times during
   certain userfaultfd operations.

 - Some page_owner enhancements and maintenance work from Oscar Salvador
   in his series

	"page_owner: print stacks and their outstanding allocations"
	"page_owner: Fixup and cleanup"

 - Uladzislau Rezki has contributed some vmalloc scalability
   improvements in his series "Mitigate a vmap lock contention". It
   realizes a 12x improvement for a certain microbenchmark.

 - Some kexec/crash cleanup work from Baoquan He in the series "Split
   crash out from kexec and clean up related config items".

 - Some zsmalloc maintenance work from Chengming Zhou in the series

	"mm/zsmalloc: fix and optimize objects/page migration"
	"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"

 - Zi Yan has taught the MM to perform compaction on folios larger than
   order=0. This a step along the path to implementaton of the merging
   of large anonymous folios. The series is named "Enable &gt;0 order folio
   memory compaction".

 - Christoph Hellwig has done quite a lot of cleanup work in the
   pagecache writeback code in his series "convert write_cache_pages()
   to an iterator".

 - Some modest hugetlb cleanups and speedups in Vishal Moola's series
   "Handle hugetlb faults under the VMA lock".

 - Zi Yan has changed the page splitting code so we can split huge pages
   into sizes other than order-0 to better utilize large folios. The
   series is named "Split a folio to any lower order folios".

 - David Hildenbrand has contributed the series "mm: remove
   total_mapcount()", a cleanup.

 - Matthew Wilcox has sought to improve the performance of bulk memory
   freeing in his series "Rearrange batched folio freeing".

 - Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
   provides large improvements in bootup times on large machines which
   are configured to use large numbers of hugetlb pages.

 - Matthew Wilcox's series "PageFlags cleanups" does that.

 - Qi Zheng's series "minor fixes and supplement for ptdesc" does that
   also. S390 is affected.

 - Cleanups to our pagemap utility functions from Peter Xu in his series
   "mm/treewide: Replace pXd_large() with pXd_leaf()".

 - Nico Pache has fixed a few things with our hugepage selftests in his
   series "selftests/mm: Improve Hugepage Test Handling in MM
   Selftests".

 - Also, of course, many singleton patches to many things. Please see
   the individual changelogs for details.

* tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits)
  mm/zswap: remove the memcpy if acomp is not sleepable
  crypto: introduce: acomp_is_async to expose if comp drivers might sleep
  memtest: use {READ,WRITE}_ONCE in memory scanning
  mm: prohibit the last subpage from reusing the entire large folio
  mm: recover pud_leaf() definitions in nopmd case
  selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements
  selftests/mm: skip uffd hugetlb tests with insufficient hugepages
  selftests/mm: dont fail testsuite due to a lack of hugepages
  mm/huge_memory: skip invalid debugfs new_order input for folio split
  mm/huge_memory: check new folio order when split a folio
  mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure
  mm: add an explicit smp_wmb() to UFFDIO_CONTINUE
  mm: fix list corruption in put_pages_list
  mm: remove folio from deferred split list before uncharging it
  filemap: avoid unnecessary major faults in filemap_fault()
  mm,page_owner: drop unnecessary check
  mm,page_owner: check for null stack_record before bumping its refcount
  mm: swap: fix race between free_swap_and_cache() and swapoff()
  mm/treewide: align up pXd_leaf() retval across archs
  mm/treewide: drop pXd_large()
  ...
</content>
</entry>
<entry>
<title>pidfs: remove config option</title>
<updated>2024-03-13T19:53:53+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2024-03-12T09:39:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9d9539db8638cfe053fcd1f441746f0e2c8c2d32'/>
<id>urn:sha1:9d9539db8638cfe053fcd1f441746f0e2c8c2d32</id>
<content type='text'>
As Linus suggested this enables pidfs unconditionally. A key property to
retain is the ability to compare pidfds by inode number (cf. [1]).
That's extremely helpful just as comparing namespace file descriptors by
inode number is. They are used in a variety of scenarios where they need
to be compared, e.g., when receiving a pidfd via SO_PEERPIDFD from a
socket to trivially authenticate a the sender and various other
use-cases.

For 64bit systems this is pretty trivial to do. For 32bit it's slightly
more annoying as we discussed but we simply add a dumb ida based
allocator that gets used on 32bit. This gives the same guarantees about
inode numbers on 64bit without any overflow risk. Practically, we'll
never run into overflow issues because we're constrained by the number
of processes that can exist on 32bit and by the number of open files
that can exist on a 32bit system. On 64bit none of this matters and
things are very simple.

If 32bit also needs the uniqueness guarantee they can simply parse the
contents of /proc/&lt;pid&gt;/fd/&lt;nr&gt;. The uniqueness guarantees have a
variety of use-cases. One of the most obvious ones is that they will
make pidfiles (or "pidfdfiles", I guess) reliable as the unique
identifier can be placed into there that won't be reycled. Also a
frequent request.

Note, I took the chance and simplified path_from_stashed() even further.
Instead of passing the inode number explicitly to path_from_stashed() we
let the filesystem handle that internally. So path_from_stashed() ends
up even simpler than it is now. This is also a good solution allowing
the cleanup code to be clean and consistent between 32bit and 64bit. The
cleanup path in prepare_anon_dentry() is also switched around so we put
the inode before the dentry allocation. This means we only have to call
the cleanup handler for the filesystem's inode data once and can rely
-&gt;evict_inode() otherwise.

Aside from having to have a bit of extra code for 32bit it actually ends
up a nice cleanup for path_from_stashed() imho.

Tested on both 32 and 64bit including error injection.

Link: https://github.com/systemd/systemd/pull/31713 [1]
Link: https://lore.kernel.org/r/20240312-dingo-sehnlich-b3ecc35c6de7@brauner
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'vfs-6.9.pidfd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs</title>
<updated>2024-03-11T17:21:06+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-03-11T17:21:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b5683a37c881e2e08065f1670086e281430ee19f'/>
<id>urn:sha1:b5683a37c881e2e08065f1670086e281430ee19f</id>
<content type='text'>
Pull pdfd updates from Christian Brauner:

 - Until now pidfds could only be created for thread-group leaders but
   not for threads. There was no technical reason for this. We simply
   had no users that needed support for this. Now we do have users that
   need support for this.

   This introduces a new PIDFD_THREAD flag for pidfd_open(). If that
   flag is set pidfd_open() creates a pidfd that refers to a specific
   thread.

   In addition, we now allow clone() and clone3() to be called with
   CLONE_PIDFD | CLONE_THREAD which wasn't possible before.

   A pidfd that refers to an individual thread differs from a pidfd that
   refers to a thread-group leader:

    (1) Pidfds are pollable. A task may poll a pidfd and get notified
        when the task has exited.

        For thread-group leader pidfds the polling task is woken if the
        thread-group is empty. In other words, if the thread-group
        leader task exits when there are still threads alive in its
        thread-group the polling task will not be woken when the
        thread-group leader exits but rather when the last thread in the
        thread-group exits.

        For thread-specific pidfds the polling task is woken if the
        thread exits.

    (2) Passing a thread-group leader pidfd to pidfd_send_signal() will
        generate thread-group directed signals like kill(2) does.

        Passing a thread-specific pidfd to pidfd_send_signal() will
        generate thread-specific signals like tgkill(2) does.

        The default scope of the signal is thus determined by the type
        of the pidfd.

        Since use-cases exist where the default scope of the provided
        pidfd needs to be overriden the following flags are added to
        pidfd_send_signal():

         - PIDFD_SIGNAL_THREAD
           Send a thread-specific signal.

         - PIDFD_SIGNAL_THREAD_GROUP
           Send a thread-group directed signal.

         - PIDFD_SIGNAL_PROCESS_GROUP
           Send a process-group directed signal.

        The scope change will only work if the struct pid is actually
        used for this scope.

        For example, in order to send a thread-group directed signal the
        provided pidfd must be used as a thread-group leader and
        similarly for PIDFD_SIGNAL_PROCESS_GROUP the struct pid must be
        used as a process group leader.

 - Move pidfds from the anonymous inode infrastructure to a tiny pseudo
   filesystem. This will unblock further work that we weren't able to do
   simply because of the very justified limitations of anonymous inodes.
   Moving pidfds to a tiny pseudo filesystem allows for statx on pidfds
   to become useful for the first time. They can now be compared by
   inode number which are unique for the system lifetime.

   Instead of stashing struct pid in file-&gt;private_data we can now stash
   it in inode-&gt;i_private. This makes it possible to introduce concepts
   that operate on a process once all file descriptors have been closed.
   A concrete example is kill-on-last-close. Another side-effect is that
   file-&gt;private_data is now freed up for per-file options for pidfds.

   Now, each struct pid will refer to a different inode but the same
   struct pid will refer to the same inode if it's opened multiple
   times. In contrast to now where each struct pid refers to the same
   inode.

   The tiny pseudo filesystem is not visible anywhere in userspace
   exactly like e.g., pipefs and sockfs. There's no lookup, there's no
   complex inode operations, nothing. Dentries and inodes are always
   deleted when the last pidfd is closed.

   We allocate a new inode and dentry for each struct pid and we reuse
   that inode and dentry for all pidfds that refer to the same struct
   pid. The code is entirely optional and fairly small. If it's not
   selected we fallback to anonymous inodes. Heavily inspired by nsfs.

   The dentry and inode allocation mechanism is moved into generic
   infrastructure that is now shared between nsfs and pidfs. The
   path_from_stashed() helper must be provided with a stashing location,
   an inode number, a mount, and the private data that is supposed to be
   used and it will provide a path that can be passed to dentry_open().

   The helper will try retrieve an existing dentry from the provided
   stashing location. If a valid dentry is found it is reused. If not a
   new one is allocated and we try to stash it in the provided location.
   If this fails we retry until we either find an existing dentry or the
   newly allocated dentry could be stashed. Subsequent openers of the
   same namespace or task are then able to reuse it.

 - Currently it is only possible to get notified when a task has exited,
   i.e., become a zombie and userspace gets notified with EPOLLIN. We
   now also support waiting until the task has been reaped, notifying
   userspace with EPOLLHUP.

 - Ensure that ESRCH is reported for getfd if a task is exiting instead
   of the confusing EBADF.

 - Various smaller cleanups to pidfd functions.

* tag 'vfs-6.9.pidfd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (23 commits)
  libfs: improve path_from_stashed()
  libfs: add stashed_dentry_prune()
  libfs: improve path_from_stashed() helper
  pidfs: convert to path_from_stashed() helper
  nsfs: convert to path_from_stashed() helper
  libfs: add path_from_stashed()
  pidfd: add pidfs
  pidfd: move struct pidfd_fops
  pidfd: allow to override signal scope in pidfd_send_signal()
  pidfd: change pidfd_send_signal() to respect PIDFD_THREAD
  signal: fill in si_code in prepare_kill_siginfo()
  selftests: add ESRCH tests for pidfd_getfd()
  pidfd: getfd should always report ESRCH if a task is exiting
  pidfd: clone: allow CLONE_THREAD | CLONE_PIDFD together
  pidfd: exit: kill the no longer used thread_group_exited()
  pidfd: change do_notify_pidfd() to use __wake_up(poll_to_key(EPOLLIN))
  pid: kill the obsolete PIDTYPE_PID code in transfer_pid()
  pidfd: kill the no longer needed do_notify_pidfd() in de_thread()
  pidfd_poll: report POLLHUP when pid_task() == NULL
  pidfd: implement PIDFD_THREAD flag for pidfd_open()
  ...
</content>
</entry>
<entry>
<title>hugetlb: have CONFIG_HUGETLBFS select CONFIG_PADATA</title>
<updated>2024-03-06T21:04:17+00:00</updated>
<author>
<name>Gang Li</name>
<email>gang.li@linux.dev</email>
</author>
<published>2024-02-22T14:04:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=26d1dc6bb2305cd64dc0cd18e5773cbbcec41bb1'/>
<id>urn:sha1:26d1dc6bb2305cd64dc0cd18e5773cbbcec41bb1</id>
<content type='text'>
Allow hugetlb use padata_do_multithreaded for parallel initialization. 
Select CONFIG_PADATA in this case.

Link: https://lkml.kernel.org/r/20240222140422.393911-7-gang.li@linux.dev
Signed-off-by: Gang Li &lt;ligang.bdlg@bytedance.com&gt;
Tested-by: David Rientjes &lt;rientjes@google.com&gt;
Reviewed-by: Muchun Song &lt;muchun.song@linux.dev&gt;
Tested-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Acked-by: Daniel Jordan &lt;daniel.m.jordan@oracle.com&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Jane Chu &lt;jane.chu@oracle.com&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Cc: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>pidfd: add pidfs</title>
<updated>2024-03-01T11:23:37+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2024-02-12T15:32:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cb12fd8e0dabb9a1c8aef55a6a41e2c255fcdf4b'/>
<id>urn:sha1:cb12fd8e0dabb9a1c8aef55a6a41e2c255fcdf4b</id>
<content type='text'>
This moves pidfds from the anonymous inode infrastructure to a tiny
pseudo filesystem. This has been on my todo for quite a while as it will
unblock further work that we weren't able to do simply because of the
very justified limitations of anonymous inodes. Moving pidfds to a tiny
pseudo filesystem allows:

* statx() on pidfds becomes useful for the first time.
* pidfds can be compared simply via statx() and then comparing inode
  numbers.
* pidfds have unique inode numbers for the system lifetime.
* struct pid is now stashed in inode-&gt;i_private instead of
  file-&gt;private_data. This means it is now possible to introduce
  concepts that operate on a process once all file descriptors have been
  closed. A concrete example is kill-on-last-close.
* file-&gt;private_data is freed up for per-file options for pidfds.
* Each struct pid will refer to a different inode but the same struct
  pid will refer to the same inode if it's opened multiple times. In
  contrast to now where each struct pid refers to the same inode. Even
  if we were to move to anon_inode_create_getfile() which creates new
  inodes we'd still be associating the same struct pid with multiple
  different inodes.

The tiny pseudo filesystem is not visible anywhere in userspace exactly
like e.g., pipefs and sockfs. There's no lookup, there's no complex
inode operations, nothing. Dentries and inodes are always deleted when
the last pidfd is closed.

We allocate a new inode for each struct pid and we reuse that inode for
all pidfds. We use iget_locked() to find that inode again based on the
inode number which isn't recycled. We allocate a new dentry for each
pidfd that uses the same inode. That is similar to anonymous inodes
which reuse the same inode for thousands of dentries. For pidfds we're
talking way less than that. There usually won't be a lot of concurrent
openers of the same struct pid. They can probably often be counted on
two hands. I know that systemd does use separate pidfd for the same
struct pid for various complex process tracking issues. So I think with
that things actually become way simpler. Especially because we don't
have to care about lookup. Dentries and inodes continue to be always
deleted.

The code is entirely optional and fairly small. If it's not selected we
fallback to anonymous inodes. Heavily inspired by nsfs which uses a
similar stashing mechanism just for namespaces.

Link: https://lore.kernel.org/r/20240213-vfs-pidfd_fs-v1-2-f863f58cfce1@kernel.org
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
</feed>
