<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/linux/fs.h, branch v6.6.36</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.36</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.36'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-03-26T22:19:46+00:00</updated>
<entry>
<title>quota: Properly annotate i_dquot arrays with __rcu</title>
<updated>2024-03-26T22:19:46+00:00</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2024-02-06T14:08:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=42954c374534f37dd25a4096b52d28e46dc1f8ba'/>
<id>urn:sha1:42954c374534f37dd25a4096b52d28e46dc1f8ba</id>
<content type='text'>
[ Upstream commit ccb49011bb2ebfd66164dbf68c5bff48917bb5ef ]

Dquots pointed to from i_dquot arrays in inodes are protected by
dquot_srcu. Annotate them as such and change .get_dquots callback to
return properly annotated pointer to make sparse happy.

Fixes: b9ba6f94b238 ("quota: remove dqptr_sem")
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>ovl: Always reject mounting over case-insensitive directories</title>
<updated>2024-03-26T22:19:18+00:00</updated>
<author>
<name>Gabriel Krisman Bertazi</name>
<email>krisman@suse.de</email>
</author>
<published>2024-02-21T17:14:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c6f95031cf21701db5d2b449bf77b73e98edabde'/>
<id>urn:sha1:c6f95031cf21701db5d2b449bf77b73e98edabde</id>
<content type='text'>
[ Upstream commit 2824083db76cb9d4b7910607b367e93b02912865 ]

overlayfs relies on the filesystem setting DCACHE_OP_HASH or
DCACHE_OP_COMPARE to reject mounting over case-insensitive directories.

Since commit bb9cd9106b22 ("fscrypt: Have filesystems handle their
d_ops"), we set -&gt;d_op through a hook in -&gt;d_lookup, which
means the root dentry won't have them, causing the mount to accidentally
succeed.

In v6.7-rc7, the following sequence will succeed to mount, but any
dentry other than the root dentry will be a "weird" dentry to ovl and
fail with EREMOTE.

  mkfs.ext4 -O casefold lower.img
  mount -O loop lower.img lower
  mount -t overlay -o lowerdir=lower,upperdir=upper,workdir=work ovl /mnt

Mounting on a subdirectory fails, as expected, because DCACHE_OP_HASH
and DCACHE_OP_COMPARE are properly set by -&gt;lookup.

Fix by explicitly rejecting superblocks that allow case-insensitive
dentries. Yes, this will be solved when we move d_op configuration back
to -&gt;s_d_op. Yet, we better have an explicit fix to avoid messing up
again.

While there, re-sort the entries to have more descriptive error messages
first.

Fixes: bb9cd9106b22 ("fscrypt: Have filesystems handle their d_ops")
Acked-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Reviewed-by: Eric Biggers &lt;ebiggers@google.com&gt;
Link: https://lore.kernel.org/r/20240221171412.10710-2-krisman@suse.de
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaio</title>
<updated>2024-03-01T12:34:59+00:00</updated>
<author>
<name>Bart Van Assche</name>
<email>bvanassche@acm.org</email>
</author>
<published>2024-02-15T20:47:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e7e23fc5d5fe422827c9a43ecb579448f73876c7'/>
<id>urn:sha1:e7e23fc5d5fe422827c9a43ecb579448f73876c7</id>
<content type='text'>
commit b820de741ae48ccf50dd95e297889c286ff4f760 upstream.

If kiocb_set_cancel_fn() is called for I/O submitted via io_uring, the
following kernel warning appears:

WARNING: CPU: 3 PID: 368 at fs/aio.c:598 kiocb_set_cancel_fn+0x9c/0xa8
Call trace:
 kiocb_set_cancel_fn+0x9c/0xa8
 ffs_epfile_read_iter+0x144/0x1d0
 io_read+0x19c/0x498
 io_issue_sqe+0x118/0x27c
 io_submit_sqes+0x25c/0x5fc
 __arm64_sys_io_uring_enter+0x104/0xab0
 invoke_syscall+0x58/0x11c
 el0_svc_common+0xb4/0xf4
 do_el0_svc+0x2c/0xb0
 el0_svc+0x2c/0xa4
 el0t_64_sync_handler+0x68/0xb4
 el0t_64_sync+0x1a4/0x1a8

Fix this by setting the IOCB_AIO_RW flag for read and write I/O that is
submitted by libaio.

Suggested-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Avi Kivity &lt;avi@scylladb.com&gt;
Cc: Sandeep Dhavale &lt;dhavale@google.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Link: https://lore.kernel.org/r/20240215204739.2677806-2-bvanassche@acm.org
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>fs: new accessor methods for atime and mtime</title>
<updated>2024-01-05T14:19:40+00:00</updated>
<author>
<name>Jeff Layton</name>
<email>jlayton@kernel.org</email>
</author>
<published>2023-10-04T18:52:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5b5599a7eee5e6101c6ae738682331bc6299a54d'/>
<id>urn:sha1:5b5599a7eee5e6101c6ae738682331bc6299a54d</id>
<content type='text'>
[ Upstream commit 077c212f0344ae4198b2b51af128a94b614ccdf4 ]

Recently, we converted the ctime accesses in the kernel to use new
accessor functions. Linus recently pointed out though that if we add
accessors for the atime and mtime, then that would allow us to
seamlessly change how these timestamps are stored in the inode.

Add new accessor functions for the atime and mtime that mirror the
accessors for the ctime.

Signed-off-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Link: https://lore.kernel.org/r/20231004185239.80830-1-jlayton@kernel.org
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
Stable-dep-of: 01fe654f78fd ("fs: cifs: Fix atime update check")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>audit,io_uring: io_uring openat triggers audit reference count underflow</title>
<updated>2023-10-13T16:34:46+00:00</updated>
<author>
<name>Dan Clash</name>
<email>daclash@linux.microsoft.com</email>
</author>
<published>2023-10-12T21:55:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=03adc61edad49e1bbecfb53f7ea5d78f398fe368'/>
<id>urn:sha1:03adc61edad49e1bbecfb53f7ea5d78f398fe368</id>
<content type='text'>
An io_uring openat operation can update an audit reference count
from multiple threads resulting in the call trace below.

A call to io_uring_submit() with a single openat op with a flag of
IOSQE_ASYNC results in the following reference count updates.

These first part of the system call performs two increments that do not race.

do_syscall_64()
  __do_sys_io_uring_enter()
    io_submit_sqes()
      io_openat_prep()
        __io_openat_prep()
          getname()
            getname_flags()       /* update 1 (increment) */
              __audit_getname()   /* update 2 (increment) */

The openat op is queued to an io_uring worker thread which starts the
opportunity for a race.  The system call exit performs one decrement.

do_syscall_64()
  syscall_exit_to_user_mode()
    syscall_exit_to_user_mode_prepare()
      __audit_syscall_exit()
        audit_reset_context()
           putname()              /* update 3 (decrement) */

The io_uring worker thread performs one increment and two decrements.
These updates can race with the system call decrement.

io_wqe_worker()
  io_worker_handle_work()
    io_wq_submit_work()
      io_issue_sqe()
        io_openat()
          io_openat2()
            do_filp_open()
              path_openat()
                __audit_inode()   /* update 4 (increment) */
            putname()             /* update 5 (decrement) */
        __audit_uring_exit()
          audit_reset_context()
            putname()             /* update 6 (decrement) */

The fix is to change the refcnt member of struct audit_names
from int to atomic_t.

kernel BUG at fs/namei.c:262!
Call Trace:
...
 ? putname+0x68/0x70
 audit_reset_context.part.0.constprop.0+0xe1/0x300
 __audit_uring_exit+0xda/0x1c0
 io_issue_sqe+0x1f3/0x450
 ? lock_timer_base+0x3b/0xd0
 io_wq_submit_work+0x8d/0x2b0
 ? __try_to_del_timer_sync+0x67/0xa0
 io_worker_handle_work+0x17c/0x2b0
 io_wqe_worker+0x10a/0x350

Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/lkml/MW2PR2101MB1033FFF044A258F84AEAA584F1C9A@MW2PR2101MB1033.namprd21.prod.outlook.com/
Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring")
Signed-off-by: Dan Clash &lt;daclash@linux.microsoft.com&gt;
Link: https://lore.kernel.org/r/20231012215518.GA4048@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Reviewed-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>Revert "fs: add infrastructure for multigrain timestamps"</title>
<updated>2023-09-20T16:05:31+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2023-09-20T14:40:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=647aa768281f38cb1002edb3a1f673c3d66a8d81'/>
<id>urn:sha1:647aa768281f38cb1002edb3a1f673c3d66a8d81</id>
<content type='text'>
This reverts commit ffb6cf19e06334062744b7e3493f71e500964f8e.

Users reported regressions due to enabling multi-grained timestamps
unconditionally. As no clear consensus on a solution has come up and the
discussion has gone back to the drawing board revert the infrastructure
changes for. If it isn't code that's here to stay, make it go away.

Message-ID: &lt;20230920-keine-eile-c9755b5825db@brauner&gt;
Acked-by: Jan Kara &lt;jack@suse.cz&gt;
Acked-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>fs: export sget_dev()</title>
<updated>2023-08-31T10:47:15+00:00</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2023-08-29T15:23:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=69881be3d9a00cca770886af40913cfc5274b2d0'/>
<id>urn:sha1:69881be3d9a00cca770886af40913cfc5274b2d0</id>
<content type='text'>
They will be used for mtd devices as well.

Acked-by: Richard Weinberger &lt;richard@nod.at&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Message-Id: &lt;20230829-vfs-super-mtd-v1-1-fecb572e5df3@kernel.org&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm</title>
<updated>2023-08-29T21:25:26+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-08-29T21:25:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b96a3e9142fdf346b05b20e867b4f0dfca119e96'/>
<id>urn:sha1:b96a3e9142fdf346b05b20e867b4f0dfca119e96</id>
<content type='text'>
Pull MM updates from Andrew Morton:

 - Some swap cleanups from Ma Wupeng ("fix WARN_ON in
   add_to_avail_list")

 - Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
   reduces the special-case code for handling hugetlb pages in GUP. It
   also speeds up GUP handling of transparent hugepages.

 - Peng Zhang provides some maple tree speedups ("Optimize the fast path
   of mas_store()").

 - Sergey Senozhatsky has improved te performance of zsmalloc during
   compaction (zsmalloc: small compaction improvements").

 - Domenico Cerasuolo has developed additional selftest code for zswap
   ("selftests: cgroup: add zswap test program").

 - xu xin has doe some work on KSM's handling of zero pages. These
   changes are mainly to enable the user to better understand the
   effectiveness of KSM's treatment of zero pages ("ksm: support
   tracking KSM-placed zero-pages").

 - Jeff Xu has fixes the behaviour of memfd's
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").

 - David Howells has fixed an fscache optimization ("mm, netfs, fscache:
   Stop read optimisation when folio removed from pagecache").

 - Axel Rasmussen has given userfaultfd the ability to simulate memory
   poisoning ("add UFFDIO_POISON to simulate memory poisoning with
   UFFD").

 - Miaohe Lin has contributed some routine maintenance work on the
   memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
   check").

 - Peng Zhang has contributed some maintenance work on the maple tree
   code ("Improve the validation for maple tree and some cleanup").

 - Hugh Dickins has optimized the collapsing of shmem or file pages into
   THPs ("mm: free retracted page table by RCU").

 - Jiaqi Yan has a patch series which permits us to use the healthy
   subpages within a hardware poisoned huge page for general purposes
   ("Improve hugetlbfs read on HWPOISON hugepages").

 - Kemeng Shi has done some maintenance work on the pagetable-check code
   ("Remove unused parameters in page_table_check").

 - More folioification work from Matthew Wilcox ("More filesystem folio
   conversions for 6.6"), ("Followup folio conversions for zswap"). And
   from ZhangPeng ("Convert several functions in page_io.c to use a
   folio").

 - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").

 - Baoquan He has converted some architectures to use the
   GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert
   architectures to take GENERIC_IOREMAP way").

 - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
   batched/deferred tlb shootdown during page reclamation/migration").

 - Better maple tree lockdep checking from Liam Howlett ("More strict
   maple tree lockdep"). Liam also developed some efficiency
   improvements ("Reduce preallocations for maple tree").

 - Cleanup and optimization to the secondary IOMMU TLB invalidation,
   from Alistair Popple ("Invalidate secondary IOMMU TLB on permission
   upgrade").

 - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
   for arm64").

 - Kemeng Shi provides some maintenance work on the compaction code
   ("Two minor cleanups for compaction").

 - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle
   most file-backed faults under the VMA lock").

 - Aneesh Kumar contributes code to use the vmemmap optimization for DAX
   on ppc64, under some circumstances ("Add support for DAX vmemmap
   optimization for ppc64").

 - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
   data in page_ext"), ("minor cleanups to page_ext header").

 - Some zswap cleanups from Johannes Weiner ("mm: zswap: three
   cleanups").

 - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").

 - VMA handling cleanups from Kefeng Wang ("mm: convert to
   vma_is_initial_heap/stack()").

 - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
   implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
   address ranges and DAMON monitoring targets").

 - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").

 - Liam Howlett has improved the maple tree node replacement code
   ("maple_tree: Change replacement strategy").

 - ZhangPeng has a general code cleanup - use the K() macro more widely
   ("cleanup with helper macro K()").

 - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for
   memmap on memory feature on ppc64").

 - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
   in page_alloc"), ("Two minor cleanups for get pageblock
   migratetype").

 - Vishal Moola introduces a memory descriptor for page table tracking,
   "struct ptdesc" ("Split ptdesc from struct page").

 - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
   for vm.memfd_noexec").

 - MM include file rationalization from Hugh Dickins ("arch: include
   asm/cacheflush.h in asm/hugetlb.h").

 - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
   output").

 - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
   object_cache instead of kmemleak_initialized").

 - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
   and _folio_order").

 - A VMA locking scalability improvement from Suren Baghdasaryan
   ("Per-VMA lock support for swap and userfaults").

 - pagetable handling cleanups from Matthew Wilcox ("New page table
   range API").

 - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
   using page-&gt;private on tail pages for THP_SWAP + cleanups").

 - Cleanups and speedups to the hugetlb fault handling from Matthew
   Wilcox ("Change calling convention for -&gt;huge_fault").

 - Matthew Wilcox has also done some maintenance work on the MM
   subsystem documentation ("Improve mm documentation").

* tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits)
  maple_tree: shrink struct maple_tree
  maple_tree: clean up mas_wr_append()
  secretmem: convert page_is_secretmem() to folio_is_secretmem()
  nios2: fix flush_dcache_page() for usage from irq context
  hugetlb: add documentation for vma_kernel_pagesize()
  mm: add orphaned kernel-doc to the rst files.
  mm: fix clean_record_shared_mapping_range kernel-doc
  mm: fix get_mctgt_type() kernel-doc
  mm: fix kernel-doc warning from tlb_flush_rmaps()
  mm: remove enum page_entry_size
  mm: allow -&gt;huge_fault() to be called without the mmap_lock held
  mm: move PMD_ORDER to pgtable.h
  mm: remove checks for pte_index
  memcg: remove duplication detection for mem_cgroup_uncharge_swap
  mm/huge_memory: work on folio-&gt;swap instead of page-&gt;private when splitting folio
  mm/swap: inline folio_set_swap_entry() and folio_swap_entry()
  mm/swap: use dedicated entry for swap in folio
  mm/swap: stop using page-&gt;private on tail pages for THP_SWAP
  selftests/mm: fix WARNING comparing pointer to 0
  selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check
  ...
</content>
</entry>
<entry>
<title>Merge tag 'iomap-6.6-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux</title>
<updated>2023-08-28T18:59:52+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-08-28T18:59:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6016fc9162245c5b109305841f76cca59c20a273'/>
<id>urn:sha1:6016fc9162245c5b109305841f76cca59c20a273</id>
<content type='text'>
Pull iomap updates from Darrick Wong:
 "We've got some big changes for this release -- I'm very happy to be
  landing willy's work to enable large folios for the page cache for
  general read and write IOs when the fs can make contiguous space
  allocations, and Ritesh's work to track sub-folio dirty state to
  eliminate the write amplification problems inherent in using large
  folios.

  As a bonus, io_uring can now process write completions in the caller's
  context instead of bouncing through a workqueue, which should reduce
  io latency dramatically. IOWs, XFS should see a nice performance bump
  for both IO paths.

  Summary:

   - Make large writes to the page cache fill sparse parts of the cache
     with large folios, then use large memcpy calls for the large folio.

   - Track the per-block dirty state of each large folio so that a
     buffered write to a single byte on a large folio does not result in
     a (potentially) multi-megabyte writeback IO.

   - Allow some directio completions to be performed in the initiating
     task's context instead of punting through a workqueue. This will
     reduce latency for some io_uring requests"

* tag 'iomap-6.6-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (26 commits)
  iomap: support IOCB_DIO_CALLER_COMP
  io_uring/rw: add write support for IOCB_DIO_CALLER_COMP
  fs: add IOCB flags related to passing back dio completions
  iomap: add IOMAP_DIO_INLINE_COMP
  iomap: only set iocb-&gt;private for polled bio
  iomap: treat a write through cache the same as FUA
  iomap: use an unsigned type for IOMAP_DIO_* defines
  iomap: cleanup up iomap_dio_bio_end_io()
  iomap: Add per-block dirty state tracking to improve performance
  iomap: Allocate ifs in -&gt;write_begin() early
  iomap: Refactor iomap_write_delalloc_punch() function out
  iomap: Use iomap_punch_t typedef
  iomap: Fix possible overflow condition in iomap_write_delalloc_scan
  iomap: Add some uptodate state handling helpers for ifs state bitmap
  iomap: Drop ifs argument from iomap_set_range_uptodate()
  iomap: Rename iomap_page to iomap_folio_state and others
  iomap: Copy larger chunks from userspace
  iomap: Create large folios in the buffered write path
  filemap: Allow __filemap_get_folio to allocate large folios
  filemap: Add fgf_t typedef
  ...
</content>
</entry>
<entry>
<title>Merge tag 'v6.6-vfs.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs</title>
<updated>2023-08-28T18:04:18+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-08-28T18:04:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=511fb5bafed197ff76d9adf5448de67f1d0558ae'/>
<id>urn:sha1:511fb5bafed197ff76d9adf5448de67f1d0558ae</id>
<content type='text'>
Pull superblock updates from Christian Brauner:
 "This contains the super rework that was ready for this cycle. The
  first part changes the order of how we open block devices and allocate
  superblocks, contains various cleanups, simplifications, and a new
  mechanism to wait on superblock state changes.

  This unblocks work to ultimately limit the number of writers to a
  block device. Jan has already scheduled follow-up work that will be
  ready for v6.7 and allows us to restrict the number of writers to a
  given block device. That series builds on this work right here.

  The second part contains filesystem freezing updates.

  Overview:

  The generic superblock changes are rougly organized as follows
  (ignoring additional minor cleanups):

   (1) Removal of the bd_super member from struct block_device.

       This was a very odd back pointer to struct super_block with
       unclear rules. For all relevant places we have other means to get
       the same information so just get rid of this.

   (2) Simplify rules for superblock cleanup.

       Roughly, everything that is allocated during fs_context
       initialization and that's stored in fs_context-&gt;s_fs_info needs
       to be cleaned up by the fs_context-&gt;free() implementation before
       the superblock allocation function has been called successfully.

       After sget_fc() returned fs_context-&gt;s_fs_info has been
       transferred to sb-&gt;s_fs_info at which point sb-&gt;kill_sb() if
       fully responsible for cleanup. Adhering to these rules means that
       cleanup of sb-&gt;s_fs_info in fill_super() is to be avoided as it's
       brittle and inconsistent.

       Cleanup shouldn't be duplicated between sb-&gt;put_super() as
       sb-&gt;put_super() is only called if sb-&gt;s_root has been set aka
       when the filesystem has been successfully born (SB_BORN). That
       complexity should be avoided.

       This also means that block devices are to be closed in
       sb-&gt;kill_sb() instead of sb-&gt;put_super(). More details in the
       lower section.

   (3) Make it possible to lookup or create a superblock before opening
       block devices

       There's a subtle dependency on (2) as some filesystems did rely
       on fill_super() to be called in order to correctly clean up
       sb-&gt;s_fs_info. All these filesystems have been fixed.

   (4) Switch most filesystem to follow the same logic as the generic
       mount code now does as outlined in (3).

   (5) Use the superblock as the holder of the block device. We can now
       easily go back from block device to owning superblock.

   (6) Export and extend the generic fs_holder_ops and use them as
       holder ops everywhere and remove the filesystem specific holder
       ops.

   (7) Call from the block layer up into the filesystem layer when the
       block device is removed, allowing to shut down the filesystem
       without risk of deadlocks.

   (8) Get rid of get_super().

       We can now easily go back from the block device to owning
       superblock and can call up from the block layer into the
       filesystem layer when the device is removed. So no need to wade
       through all registered superblock to find the owning superblock
       anymore"

Link: https://lore.kernel.org/lkml/20230824-prall-intakt-95dbffdee4a0@brauner/

* tag 'v6.6-vfs.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (47 commits)
  super: use higher-level helper for {freeze,thaw}
  super: wait until we passed kill super
  super: wait for nascent superblocks
  super: make locking naming consistent
  super: use locking helpers
  fs: simplify invalidate_inodes
  fs: remove get_super
  block: call into the file system for ioctl BLKFLSBUF
  block: call into the file system for bdev_mark_dead
  block: consolidate __invalidate_device and fsync_bdev
  block: drop the "busy inodes on changed media" log message
  dasd: also call __invalidate_device when setting the device offline
  amiflop: don't call fsync_bdev in FDFMTBEG
  floppy: call disk_force_media_change when changing the format
  block: simplify the disk_force_media_change interface
  nbd: call blk_mark_disk_dead in nbd_clear_sock_ioctl
  xfs use fs_holder_ops for the log and RT devices
  xfs: drop s_umount over opening the log and RT devices
  ext4: use fs_holder_ops for the log device
  ext4: drop s_umount over opening the log device
  ...
</content>
</entry>
</feed>
