summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-09-03xfs: enable block size larger than page size supportPankaj Raghav6-11/+45
Page cache now has the ability to have a minimum order when allocating a folio which is a prerequisite to add support for block size > page size. Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20240827-xfs-fix-wformat-bs-gt-ps-v1-1-aec6717609e0@kernel.org # fix folded Link: https://lore.kernel.org/r/20240822135018.1931258-11-kernel@pankajraghav.com Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-02xfs: make the calculation generic in xfs_sb_validate_fsb_count()Pankaj Raghav1-1/+6
Instead of assuming that PAGE_SHIFT is always higher than the blocklog, make the calculation generic so that page cache count can be calculated correctly for LBS. Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20240822135018.1931258-10-kernel@pankajraghav.com Acked-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-02xfs: expose block size in statPankaj Raghav1-1/+1
For block size larger than page size, the unit of efficient IO is the block size, not the page size. Leaving stat() to report PAGE_SIZE as the block size causes test programs like fsx to issue illegal ranges for operations that require block size alignment (e.g. fallocate() insert range). Hence update the preferred IO size to reflect the block size in this case. This change is based on a patch originally from Dave Chinner.[1] [1] https://lwn.net/ml/linux-fsdevel/20181107063127.3902-16-david@fromorbit.com/ Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20240822135018.1931258-9-kernel@pankajraghav.com Acked-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-02xfs: use kvmalloc for xattr buffersDave Chinner1-9/+6
Pankaj Raghav reported that when filesystem block size is larger than page size, the xattr code can use kmalloc() for high order allocations. This triggers a useless warning in the allocator as it is a __GFP_NOFAIL allocation here: static inline struct page *rmqueue(struct zone *preferred_zone, struct zone *zone, unsigned int order, gfp_t gfp_flags, unsigned int alloc_flags, int migratetype) { struct page *page; /* * We most definitely don't want callers attempting to * allocate greater than order-1 page units with __GFP_NOFAIL. */ >>>> WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); ... Fix this by changing all these call sites to use kvmalloc(), which will strip the NOFAIL from the kmalloc attempt and if that fails will do a __GFP_NOFAIL vmalloc(). This is not an issue that productions systems will see as filesystems with block size > page size cannot be mounted by the kernel; Pankaj is developing this functionality right now. Reported-by: Pankaj Raghav <kernel@pankajraghav.com> Fixes: f078d4ea8276 ("xfs: convert kmem_alloc() to kmalloc()") Signed-off-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20240822135018.1931258-8-kernel@pankajraghav.com Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-02iomap: fix iomap_dio_zero() for fs bs > system page sizePankaj Raghav2-8/+41
iomap_dio_zero() will pad a fs block with zeroes if the direct IO size < fs block size. iomap_dio_zero() has an implicit assumption that fs block size < page_size. This is true for most filesystems at the moment. If the block size > page size, this will send the contents of the page next to zero page(as len > PAGE_SIZE) to the underlying block device, causing FS corruption. iomap is a generic infrastructure and it should not make any assumptions about the fs block size and the page size of the system. Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20240822135018.1931258-7-kernel@pankajraghav.com Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-02filemap: cap PTE range to be created to allowed zero fill in folio_map_range()Pankaj Raghav1-1/+5
Usually the page cache does not extend beyond the size of the inode, therefore, no PTEs are created for folios that extend beyond the size. But with LBS support, we might extend page cache beyond the size of the inode as we need to guarantee folios of minimum order. While doing a read, do_fault_around() can create PTEs for pages that lie beyond the EOF leading to incorrect error return when accessing a page beyond the mapped file. Cap the PTE range to be created for the page cache up to the end of file(EOF) in filemap_map_pages() so that return error codes are consistent with POSIX[1] for LBS configurations. generic/749 has been created to trigger this edge case. This also fixes generic/749 for tmpfs with huge=always on systems with 4k base page size. [1](from mmap(2)) SIGBUS Attempted access to a page of the buffer that lies beyond the end of the mapped file. For an explanation of the treatment of the bytes in the page that corresponds to the end of a mapped file that is not a multiple of the page size, see NOTES. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20240822135018.1931258-6-kernel@pankajraghav.com Tested-by: David Howells <dhowells@redhat.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-02mm: split a folio in minimum folio order chunksLuis Chamberlain2-8/+85
split_folio() and split_folio_to_list() assume order 0, to support minorder for non-anonymous folios, we must expand these to check the folio mapping order and use that. Set new_order to be at least minimum folio order if it is set in split_huge_page_to_list() so that we can maintain minimum folio order requirement in the page cache. Update the debugfs write files used for testing to ensure the order is respected as well. We simply enforce the min order when a file mapping is used. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20240902124931.506061-2-kernel@pankajraghav.com # folded fix Link: https://lore.kernel.org/r/20240822135018.1931258-5-kernel@pankajraghav.com Tested-by: David Howells <dhowells@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Zi Yan <ziy@nvidia.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-23readahead: allocate folios with mapping_min_order in readaheadPankaj Raghav1-18/+61
page_cache_ra_unbounded() was allocating single pages (0 order folios) if there was no folio found in an index. Allocate mapping_min_order folios as we need to guarantee the minimum order if it is set. page_cache_ra_order() tries to allocate folio to the page cache with a higher order if the index aligns with that order. Modify it so that the order does not go below the mapping_min_order requirement of the page cache. This function will do the right thing even if the new_order passed is less than the mapping_min_order. When adding new folios to the page cache we must also ensure the index used is aligned to the mapping_min_order as the page cache requires the index to be aligned to the order of the folio. readahead_expand() is called from readahead aops to extend the range of the readahead so this function can assume ractl->_index to be aligned with min_order. Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Co-developed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240822135018.1931258-4-kernel@pankajraghav.com Tested-by: David Howells <dhowells@redhat.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-23filemap: allocate mapping_min_order folios in the page cachePankaj Raghav2-8/+36
filemap_create_folio() and do_read_cache_folio() were always allocating folio of order 0. __filemap_get_folio was trying to allocate higher order folios when fgp_flags had higher order hint set but it will default to order 0 folio if higher order memory allocation fails. Supporting mapping_min_order implies that we guarantee each folio in the page cache has at least an order of mapping_min_order. When adding new folios to the page cache we must also ensure the index used is aligned to the mapping_min_order as the page cache requires the index to be aligned to the order of the folio. Co-developed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20240822135018.1931258-3-kernel@pankajraghav.com Tested-by: David Howells <dhowells@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-23fs: Allow fine-grained control of folio sizesMatthew Wilcox (Oracle)3-20/+80
We need filesystems to be able to communicate acceptable folio sizes to the pagecache for a variety of uses (e.g. large block sizes). Support a range of folio sizes between order-0 and order-31. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Co-developed-by: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20240822135018.1931258-2-kernel@pankajraghav.com Tested-by: David Howells <dhowells@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-29Linux 6.11-rc1Linus Torvalds1-2/+2
2024-07-29Merge tag 'kbuild-fixes-v6.11' of ↵Linus Torvalds4-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Fix RPM package build error caused by an incorrect locale setup - Mark modules.weakdep as ghost in RPM package - Fix the odd combination of -S and -c in stack protector scripts, which is an error with the latest Clang * tag 'kbuild-fixes-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: Fix '-S -c' in x86 stack protector scripts kbuild: rpm-pkg: ghost modules.weakdep file kbuild: rpm-pkg: Fix C locale setup
2024-07-28minmax: simplify and clarify min_t()/max_t() implementationLinus Torvalds1-8/+11
This simplifies the min_t() and max_t() macros by no longer making them work in the context of a C constant expression. That means that you can no longer use them for static initializers or for array sizes in type definitions, but there were only a couple of such uses, and all of them were converted (famous last words) to use MIN_T/MAX_T instead. Cc: David Laight <David.Laight@aculab.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-07-28minmax: add a few more MIN_T/MAX_T usersLinus Torvalds7-10/+10
Commit 3a7e02c040b1 ("minmax: avoid overly complicated constant expressions in VM code") added the simpler MIN_T/MAX_T macros in order to avoid some excessive expansion from the rather complicated regular min/max macros. The complexity of those macros stems from two issues: (a) trying to use them in situations that require a C constant expression (in static initializers and for array sizes) (b) the type sanity checking and MIN_T/MAX_T avoids both of these issues. Now, in the whole (long) discussion about all this, it was pointed out that the whole type sanity checking is entirely unnecessary for min_t/max_t which get a fixed type that the comparison is done in. But that still leaves min_t/max_t unnecessarily complicated due to worries about the C constant expression case. However, it turns out that there really aren't very many cases that use min_t/max_t for this, and we can just force-convert those. This does exactly that. Which in turn will then allow for much simpler implementations of min_t()/max_t(). All the usual "macros in all upper case will evaluate the arguments multiple times" rules apply. We should do all the same things for the regular min/max() vs MIN/MAX() cases, but that has the added complexity of various drivers defining their own local versions of MIN/MAX, so that needs another level of fixes first. Link: https://lore.kernel.org/all/b47fad1d0cf8449886ad148f8c013dae@AcuMS.aculab.com/ Cc: David Laight <David.Laight@aculab.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-07-28Merge tag 'ubifs-for-linus-6.11-rc1-take2' of ↵Linus Torvalds21-214/+135
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull UBI and UBIFS updates from Richard Weinberger: - Many fixes for power-cut issues by Zhihao Cheng - Another ubiblock error path fix - ubiblock section mismatch fix - Misc fixes all over the place * tag 'ubifs-for-linus-6.11-rc1-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubi: Fix ubi_init() ubiblock_exit() section mismatch ubifs: add check for crypto_shash_tfm_digest ubifs: Fix inconsistent inode size when powercut happens during appendant writing ubi: block: fix null-pointer-dereference in ubiblock_create() ubifs: fix kernel-doc warnings ubifs: correct UBIFS_DFS_DIR_LEN macro definition and improve code clarity mtd: ubi: Restore missing cleanup on ubi_init() failure path ubifs: dbg_orphan_check: Fix missed key type checking ubifs: Fix unattached inode when powercut happens in creating ubifs: Fix space leak when powercut happens in linking tmpfile ubifs: Move ui->data initialization after initializing security ubifs: Fix adding orphan entry twice for the same inode ubifs: Remove insert_dead_orphan from replaying orphan process Revert "ubifs: ubifs_symlink: Fix memleak of inode->i_link in error path" ubifs: Don't add xattr inode into orphan area ubifs: Fix unattached xattr inode if powercut happens after deleting mtd: ubi: avoid expensive do_div() on 32-bit machines mtd: ubi: make ubi_class constant ubi: eba: properly rollback inside self_check_eba
2024-07-28kbuild: Fix '-S -c' in x86 stack protector scriptsNathan Chancellor2-2/+2
After a recent change in clang to stop consuming all instances of '-S' and '-c' [1], the stack protector scripts break due to the kernel's use of -Werror=unused-command-line-argument to catch cases where flags are not being properly consumed by the compiler driver: $ echo | clang -o - -x c - -S -c -Werror=unused-command-line-argument clang: error: argument unused during compilation: '-c' [-Werror,-Wunused-command-line-argument] This results in CONFIG_STACKPROTECTOR getting disabled because CONFIG_CC_HAS_SANE_STACKPROTECTOR is no longer set. '-c' and '-S' both instruct the compiler to stop at different stages of the pipeline ('-S' after compiling, '-c' after assembling), so having them present together in the same command makes little sense. In this case, the test wants to stop before assembling because it is looking at the textual assembly output of the compiler for either '%fs' or '%gs', so remove '-c' from the list of arguments to resolve the error. All versions of GCC continue to work after this change, along with versions of clang that do or do not contain the change mentioned above. Cc: stable@vger.kernel.org Fixes: 4f7fd4d7a791 ("[PATCH] Add the -fstack-protector option to the CFLAGS") Fixes: 60a5317ff0f4 ("x86: implement x86_32 stack protector") Link: https://github.com/llvm/llvm-project/commit/6461e537815f7fa68cef06842505353cf5600e9c [1] Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-07-28ubi: Fix ubi_init() ubiblock_exit() section mismatchRichard Weinberger1-1/+1
Since ubiblock_exit() is now called from an init function, the __exit section no longer makes sense. Cc: Ben Hutchings <bwh@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202407131403.wZJpd8n2-lkp@intel.com/ Signed-off-by: Richard Weinberger <richard@nod.at> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com>
2024-07-28Merge tag 'v6.11-merge' of ↵Linus Torvalds5-495/+2274
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull turbostat updates from Len Brown: - Enable turbostat extensions to add both perf and PMT (Intel Platform Monitoring Technology) counters via the cmdline - Demonstrate PMT access with built-in support for Meteor Lake's Die C6 counter * tag 'v6.11-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: version 2024.07.26 tools/power turbostat: Include umask=%x in perf counter's config tools/power turbostat: Document PMT in turbostat.8 tools/power turbostat: Add MTL's PMT DC6 builtin counter tools/power turbostat: Add early support for PMT counters tools/power turbostat: Add selftests for added perf counters tools/power turbostat: Add selftests for SMI, APERF and MPERF counters tools/power turbostat: Move verbose counter messages to level 2 tools/power turbostat: Move debug prints from stdout to stderr tools/power turbostat: Fix typo in turbostat.8 tools/power turbostat: Add perf added counter example to turbostat.8 tools/power turbostat: Fix formatting in turbostat.8 tools/power turbostat: Extend --add option with perf counters tools/power turbostat: Group SMI counter with APERF and MPERF tools/power turbostat: Add ZERO_ARRAY for zero initializing builtin array tools/power turbostat: Replace enum rapl_source and cstate_source with counter_source tools/power turbostat: Remove anonymous union from rapl_counter_info_t tools/power/turbostat: Switch to new Intel CPU model defines
2024-07-28Merge tag 'cxl-for-6.11' of ↵Linus Torvalds19-208/+438
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull CXL updates from Dave Jiang: "Core: - A CXL maturity map has been added to the documentation to detail the current state of CXL enabling. It provides the status of the current state of various CXL features to inform current and future contributors of where things are and which areas need contribution. - A notifier handler has been added in order for a newly created CXL memory region to trigger the abstract distance metrics calculation. This should bring parity for CXL memory to the same level vs hotplugged DRAM for NUMA abstract distance calculation. The abstract distance reflects relative performance used for memory tiering handling. - An addition for XOR math has been added to address the CXL DPA to SPA translation. CXL address translation did not support address interleave math with XOR prior to this change. Fixes: - Fix to address race condition in the CXL memory hotplug notifier - Add missing MODULE_DESCRIPTION() for CXL modules - Fix incorrect vendor debug UUID define Misc: - A warning has been added to inform users of an unsupported configuration when mixing CXL VH and RCH/RCD hierarchies - The ENXIO error code has been replaced with EBUSY for inject poison limit reached via debugfs and cxl-test support - Moving the PCI config read in cxl_dvsec_rr_decode() to avoid unnecessary PCI config reads - A refactor to a common struct for DRAM and general media CXL events" * tag 'cxl-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/core/pci: Move reading of control register to immediately before usage cxl: Remove defunct code calculating host bridge target positions cxl/region: Verify target positions using the ordered target list cxl: Restore XOR'd position bits during address translation cxl/core: Fold cxl_trace_hpa() into cxl_dpa_to_hpa() cxl/test: Replace ENXIO with EBUSY for inject poison limit reached cxl/memdev: Replace ENXIO with EBUSY for inject poison limit reached cxl/acpi: Warn on mixed CXL VH and RCH/RCD Hierarchy cxl/core: Fix incorrect vendor debug UUID define Documentation: CXL Maturity Map cxl/region: Simplify cxl_region_nid() cxl/region: Support to calculate memory tier abstract distance cxl/region: Fix a race condition in memory hotplug notifier cxl: add missing MODULE_DESCRIPTION() macros cxl/events: Use a common struct for DRAM and General Media events
2024-07-28Merge tag 'unicode-next-6.11' of ↵Linus Torvalds3-2/+5
git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode Pull unicode update from Gabriel Krisman Bertazi: "Two small fixes to silence the compiler and static analyzers tools from Ben Dooks and Jeff Johnson" * tag 'unicode-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode: unicode: add MODULE_DESCRIPTION() macros unicode: make utf8 test count static
2024-07-28kbuild: rpm-pkg: ghost modules.weakdep fileJose Ignacio Tornos Martinez1-1/+1
In the same way as for other similar files, mark as ghost the new file generated by depmod for configured weak dependencies for modules, modules.weakdep, so that although it is not included in the package, claim the ownership on it. Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-07-28Merge tag '6.11-rc-smb-client-fixes-part2' of ↵Linus Torvalds5-7/+203
git://git.samba.org/sfrench/cifs-2.6 Pull more smb client updates from Steve French: - fix for potential null pointer use in init cifs - additional dynamic trace points to improve debugging of some common scenarios - two SMB1 fixes (one addressing reconnect with POSIX extensions, one a mount parsing error) * tag '6.11-rc-smb-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: smb3: add dynamic trace point for session setup key expired failures smb3: add four dynamic tracepoints for copy_file_range and reflink smb3: add dynamic tracepoint for reflink errors cifs: mount with "unix" mount option for SMB1 incorrectly handled cifs: fix reconnect with SMB1 UNIX Extensions cifs: fix potential null pointer use in destroy_workqueue in init_cifs error path
2024-07-28Merge tag 'block-6.11-20240726' of git://git.kernel.dk/linuxLinus Torvalds8-10/+39
Pull block fixes from Jens Axboe: - NVMe pull request via Keith: - Fix request without payloads cleanup (Leon) - Use new protection information format (Francis) - Improved debug message for lost pci link (Bart) - Another apst quirk (Wang) - Use appropriate sysfs api for printing chars (Markus) - ublk async device deletion fix (Ming) - drbd kerneldoc fixups (Simon) - Fix deadlock between sd removal and release (Yang) * tag 'block-6.11-20240726' of git://git.kernel.dk/linux: nvme-pci: add missing condition check for existence of mapped data ublk: fix UBLK_CMD_DEL_DEV_ASYNC handling block: fix deadlock between sd_remove & sd_release drbd: Add peer_device to Kernel doc nvme-core: choose PIF from QPIF if QPIFS supports and PIF is QTYPE nvme-pci: Fix the instructions for disabling power management nvme: remove redundant bdev local variable nvme-fabrics: Use seq_putc() in __nvmf_concat_opt_tokens() nvme/pci: Add APST quirk for Lenovo N60z laptop
2024-07-28Merge tag 'io_uring-6.11-20240726' of git://git.kernel.dk/linuxLinus Torvalds8-49/+51
Pull io_uring fixes from Jens Axboe: - Fix a syzbot issue for the msg ring cache added in this release. No ill effects from this one, but it did make KMSAN unhappy (me) - Sanitize the NAPI timeout handling, by unifying the value handling into all ktime_t rather than converting back and forth (Pavel) - Fail NAPI registration for IOPOLL rings, it's not supported (Pavel) - Fix a theoretical issue with ring polling and cancelations (Pavel) - Various little cleanups and fixes (Pavel) * tag 'io_uring-6.11-20240726' of git://git.kernel.dk/linux: io_uring/napi: pass ktime to io_napi_adjust_timeout io_uring/napi: use ktime in busy polling io_uring/msg_ring: fix uninitialized use of target_req->flags io_uring: align iowq and task request error handling io_uring: kill REQ_F_CANCEL_SEQ io_uring: simplify io_uring_cmd return io_uring: fix io_match_task must_hold io_uring: don't allow netpolling with SETUP_IOPOLL io_uring: tighten task exit cancellations
2024-07-28Merge tag 'vfs-6.11-rc1.fixes.3' of ↵Linus Torvalds2-10/+66
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "This contains two fixes for this merge window: VFS: - I noticed that it is possible for a privileged user to mount most filesystems with a non-initial user namespace in sb->s_user_ns. When fsopen() is called in a non-init namespace the caller's namespace is recorded in fs_context->user_ns. If the returned file descriptor is then passed to a process privileged in init_user_ns, that process can call fsconfig(fd_fs, FSCONFIG_CMD_CREATE*), creating a new superblock with sb->s_user_ns set to the namespace of the process which called fsopen(). This is problematic as only filesystems that raise FS_USERNS_MOUNT are known to be able to support a non-initial s_user_ns. Others may suffer security issues, on-disk corruption or outright crash the kernel. Prevent that by restricting such delegation to filesystems that allow FS_USERNS_MOUNT. Note, that this delegation requires a privileged process to actually create the superblock so either the privileged process is cooperaing or someone must have tricked a privileged process into operating on a fscontext file descriptor whose origin it doesn't know (a stupid idea). The bug dates back to about 5 years afaict. Misc: - Fix hostfs parsing when the mount request comes in via the legacy mount api. In the legacy mount api hostfs allows to specify the host directory mount without any key. Restore that behavior" * tag 'vfs-6.11-rc1.fixes.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: hostfs: fix the host directory parse when mounting. fs: don't allow non-init s_user_ns for filesystems without FS_USERNS_MOUNT
2024-07-27Merge tag 'rust-6.11' of https://github.com/Rust-for-Linux/linuxLinus Torvalds25-236/+1058
Pull Rust updates from Miguel Ojeda: "The highlight is the establishment of a minimum version for the Rust toolchain, including 'rustc' (and bundled tools) and 'bindgen'. The initial minimum will be the pinned version we currently have, i.e. we are just widening the allowed versions. That covers three stable Rust releases: 1.78.0, 1.79.0, 1.80.0 (getting released tomorrow), plus beta, plus nightly. This should already be enough for kernel developers in distributions that provide recent Rust compiler versions routinely, such as Arch Linux, Debian Unstable (outside the freeze period), Fedora Linux, Gentoo Linux (especially the testing channel), Nix (unstable) and openSUSE Slowroll and Tumbleweed. In addition, the kernel is now being built-tested by Rust's pre-merge CI. That is, every change that is attempting to land into the Rust compiler is tested against the kernel, and it is merged only if it passes. Similarly, the bindgen tool has agreed to build the kernel in their CI too. Thus, with the pre-merge CI in place, both projects hope to avoid unintentional changes to Rust that break the kernel. This means that, in general, apart from intentional changes on their side (that we will need to workaround conditionally on our side), the upcoming Rust compiler versions should generally work. In addition, the Rust project has proposed getting the kernel into stable Rust (at least solving the main blockers) as one of its three flagship goals for 2024H2 [1]. I would like to thank Niko, Sid, Emilio et al. for their help promoting the collaboration between Rust and the kernel. Toolchain and infrastructure: - Support several Rust toolchain versions. - Support several bindgen versions. - Remove 'cargo' requirement and simplify 'rusttest', thanks to 'alloc' having been dropped last cycle. - Provide proper error reporting for the 'rust-analyzer' target. 'kernel' crate: - Add 'uaccess' module with a safe userspace pointers abstraction. - Add 'page' module with a 'struct page' abstraction. - Support more complex generics in workqueue's 'impl_has_work!' macro. 'macros' crate: - Add 'firmware' field support to the 'module!' macro. - Improve 'module!' macro documentation. Documentation: - Provide instructions on what packages should be installed to build the kernel in some popular Linux distributions. - Introduce the new kernel.org LLVM+Rust toolchains. - Explain '#[no_std]'. And a few other small bits" Link: https://rust-lang.github.io/rust-project-goals/2024h2/index.html#flagship-goals [1] * tag 'rust-6.11' of https://github.com/Rust-for-Linux/linux: (26 commits) docs: rust: quick-start: add section on Linux distributions rust: warn about `bindgen` versions 0.66.0 and 0.66.1 rust: start supporting several `bindgen` versions rust: work around `bindgen` 0.69.0 issue rust: avoid assuming a particular `bindgen` build rust: start supporting several compiler versions rust: simplify Clippy warning flags set rust: relax most deny-level lints to warnings rust: allow `dead_code` for never constructed bindings rust: init: simplify from `map_err` to `inspect_err` rust: macros: indent list item in `paste!`'s docs rust: add abstraction for `struct page` rust: uaccess: add typed accessors for userspace pointers uaccess: always export _copy_[from|to]_user with CONFIG_RUST rust: uaccess: add userspace pointers kbuild: rust-analyzer: improve comment documentation kbuild: rust-analyzer: better error handling docs: rust: no_std is used rust: alloc: add __GFP_HIGHMEM flag rust: alloc: fix typo in docs for GFP_NOWAIT ...
2024-07-27Merge tag 'apparmor-pr-2024-07-25' of ↵Linus Torvalds8-34/+65
git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor Pull apparmor updates from John Johansen: "Cleanups - optimization: try to avoid refing the label in apparmor_file_open - remove useless static inline function is_deleted - use kvfree_sensitive to free data->data - fix typo in kernel doc Bug fixes: - unpack transition table if dfa is not present - test: add MODULE_DESCRIPTION() - take nosymfollow flag into account - fix possible NULL pointer dereference - fix null pointer deref when receiving skb during sock creation" * tag 'apparmor-pr-2024-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor: apparmor: unpack transition table if dfa is not present apparmor: try to avoid refing the label in apparmor_file_open apparmor: test: add MODULE_DESCRIPTION() apparmor: take nosymfollow flag into account apparmor: fix possible NULL pointer dereference apparmor: fix typo in kernel doc apparmor: remove useless static inline function is_deleted apparmor: use kvfree_sensitive to free data->data apparmor: Fix null pointer deref when receiving skb during sock creation
2024-07-27Merge tag 'landlock-6.11-rc1-houdini-fix' of ↵Linus Torvalds3-2/+84
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux Pull landlock fix from Mickaël Salaün: "Jann Horn reported a sandbox bypass for Landlock. This includes the fix and new tests. This should be backported" * tag 'landlock-6.11-rc1-houdini-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: selftests/landlock: Add cred_transfer test landlock: Don't lose track of restrictions on cred_transfer
2024-07-27Merge tag 'gpio-fixes-for-v6.11-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fix from Bartosz Golaszewski: - don't use sprintf() with non-constant format string * tag 'gpio-fixes-for-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: virtuser: avoid non-constant format string
2024-07-27Merge tag 'devicetree-fixes-for-6.11-1' of ↵Linus Torvalds24-96/+64
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull more devicetree updates from Rob Herring: "Most of this is a treewide change to of_property_for_each_u32() which was small enough to do in one go before rc1 and avoids the need to create of_property_for_each_u32_some_new_name(). - Treewide conversion of of_property_for_each_u32() to drop internal arguments making struct property opaque - Add binding for Amlogic A4 SoC watchdog - Fix constraints for AD7192 'single-channel' property" * tag 'devicetree-fixes-for-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: dt-bindings: iio: adc: ad7192: Fix 'single-channel' constraints of: remove internal arguments from of_property_for_each_u32() dt-bindings: watchdog: add support for Amlogic A4 SoCs
2024-07-27Merge tag 'iommu-fixes-v6.11-rc1' of ↵Linus Torvalds3-5/+5
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu fixes from Will Deacon: "We're still resolving a regression with the handling of unexpected page faults on SMMUv3, but we're not quite there with a fix yet. - Fix NULL dereference when freeing domain in Unisoc SPRD driver - Separate assignment statements with semicolons in AMD page-table code - Fix Tegra erratum workaround when the CPU is using 16KiB pages" * tag 'iommu-fixes-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: iommu: arm-smmu: Fix Tegra workaround for PAGE_SIZE mappings iommu/amd: Convert comma to semicolon iommu: sprd: Avoid NULL deref in sprd_iommu_hw_en
2024-07-27Merge tag 'firewire-fixes-6.11-rc1' of ↵Linus Torvalds2-5/+3
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394 Pull firewire fixes from Takashi Sakamoto: "The recent integration of compiler collections introduced the technology to check flexible array length at runtime by providing proper annotations. In v6.10 kernel, a patch was merged into firewire subsystem to utilize it, however the annotation was inadequate. There is also the related change for the flexible array in sound subsystem, but it causes a regression where the data in the payload of isochronous packet is incorrect for some devices. These bugs are now fixed" * tag 'firewire-fixes-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394: ALSA: firewire-lib: fix wrong value as length of header for CIP_NO_HEADER case Revert "firewire: Annotate struct fw_iso_packet with __counted_by()"
2024-07-27Merge tag 'spi-fix-v6.11-merge-window' of ↵Linus Torvalds3-81/+114
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "The bulk of this is a series of fixes for the microchip-core driver mostly originating from one of their customers, I also applied an additional patch adding support for controlling the word size which came along with it since it's still the merge window and clearly had a bunch of fairly thorough testing. We also have a fix for the compatible used to bind spidev to the BH2228FV" * tag 'spi-fix-v6.11-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: spidev: add correct compatible for Rohm BH2228FV dt-bindings: trivial-devices: fix Rohm BH2228FV compatible string spi: microchip-core: add support for word sizes of 1 to 32 bits spi: microchip-core: ensure TX and RX FIFOs are empty at start of a transfer spi: microchip-core: fix init function not setting the master and motorola modes spi: microchip-core: only disable SPI controller when register value change requires it spi: microchip-core: defer asserting chip select until just before write to TX FIFO spi: microchip-core: fix the issues in the isr
2024-07-27Merge tag 'regulator-fix-v6.11-merge-window' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "These two commits clean up the excessively loose dependencies for the RZG2L USB VBCTRL regulator driver, ensuring it shouldn't prompt for people who can't use it" * tag 'regulator-fix-v6.11-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: Further restrict RZG2L USB VBCTRL regulator dependencies regulator: renesas-usb-vbus-regulator: Update the default
2024-07-27Merge tag 'regmap-fix-v6.11-merge-window' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap fix from Mark Brown: "Arnd sent a workaround for a false positive warning which was showing up with GCC 14.1" * tag 'regmap-fix-v6.11-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: maple: work around gcc-14.1 false-positive warning
2024-07-27Merge tag 'clk-for-linus' of ↵Linus Torvalds4-9/+11
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Stephen Boyd: "A few clk driver fixes for the merge window to fix the build and boot on some SoCs. - Initialize struct clk_init_data in the TI da8xx-cfgchip driver so that stack contents aren't used for things like clk flags leading to unexpected behavior - Don't leak stack contents in a debug print in the new Sophgo clk driver - Disable the new T-Head clk driver on 32-bit targets to fix the build due to a division - Fix Samsung Exynos4 fin_pll wreckage from the clkdev rework done last cycle by using a struct clk_hw directly instead of a struct clk consumer" * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: samsung: fix getting Exynos4 fin_pll rate from external clocks clk: T-Head: Disable on 32-bit Targets clk: sophgo: clk-sg2042-pll: Fix uninitialized variable in debug output clk: davinci: da8xx-cfgchip: Initialize clk_init_data before use
2024-07-27Merge tag 'i3c/for-6.11' of ↵Linus Torvalds13-143/+431
git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux Pull i3c updates from Alexandre Belloni: "This cycle, there are new features for the Designware controller and fixes for the other IPs: - dw: optional apb clock and power management support, IBI handling fixes - mipi-i3c-hci: IBI handling fixes - svc: a few fixes" * tag 'i3c/for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux: dt-bindings: i3c: add header for generic I3C flags i3c: master: svc: Fix error code in svc_i3c_master_do_daa_locked() i3c: master: Enhance i3c_bus_type visibility for device searching & event monitoring i3c: dw: Add power management support i3c: dw: Add some functions for reusability i3c: dw: Save timing registers and other values i3c: master: svc: Improve DAA STOP handle code logic i3c: dw: Add optional apb clock i3c: dw: Use new *_enabled clk API dt-bindings: i3c: dw: Add apb clock binding i3c: master: svc: Convert comma to semicolon i3c: mipi-i3c-hci: Round IBI data chunk size to HW supported value i3c: mipi-i3c-hci: Error out instead on BUG_ON() in IBI DMA setup i3c: mipi-i3c-hci: Set IBI Status and Data Ring base addresses i3c: mipi-i3c-hci: Switch to lower_32_bits()/upper_32_bits() helpers i3c: dw: Remove ibi_capable property i3c: dw: Fix IBI intr programming i3c: dw: Fix clearing queue thld i3c: mipi-i3c-hci: Fix number of DAT/DCT entries for HCI versions < 1.1 i3c: master: svc: resend target address when get NACK
2024-07-27Merge tag 'thermal-6.11-rc1-3' of ↵Linus Torvalds2-14/+85
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fix from Rafael Wysocki: "Prevent the thermal core from flooding the kernel log with useless messages if thermal zone temperature can never be determined (or its sensor has failed permanently) and make it finally give up and disable defective thermal zones (Rafael Wysocki)" * tag 'thermal-6.11-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: core: Back off when polling thermal zones on errors thermal: trip: Split thermal_zone_device_set_mode()
2024-07-27Merge tag 'mm-hotfixes-stable-2024-07-26-14-33' of ↵Linus Torvalds14-42/+95
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "11 hotfixes, 7 of which are cc:stable. 7 are MM, 4 are other" * tag 'mm-hotfixes-stable-2024-07-26-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: nilfs2: handle inconsistent state in nilfs_btnode_create_block() selftests/mm: skip test for non-LPA2 and non-LVA systems mm/page_alloc: fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist() mm: memcg: add cacheline padding after lruvec in mem_cgroup_per_node alloc_tag: outline and export free_reserved_page() decompress_bunzip2: fix rare decompression failure mm/huge_memory: avoid PMD-size page cache if needed mm: huge_memory: use !CONFIG_64BIT to relax huge page alignment on 32 bit machines mm: fix old/young bit handling in the faulting path dt-bindings: arm: update James Clark's email address MAINTAINERS: mailmap: update James Clark's email address
2024-07-27Merge tag 'timers-urgent-2024-07-26' of ↵Linus Torvalds4-213/+224
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer migration updates from Thomas Gleixner: "Fixes and minor updates for the timer migration code: - Stop testing the group->parent pointer as it is not guaranteed to be stable over a chain of operations by design. This includes a warning which would be nice to have but it produces false positives due to the racy nature of the check. - Plug a race between CPUs going in and out of idle and a CPU hotplug operation. The latter can create and connect a new hierarchy level which is missed in the concurrent updates of CPUs which go into idle. As a result the events of such a CPU might not be processed and timers go stale. Cure it by splitting the hotplug operation into a prepare and online callback. The prepare callback is guaranteed to run on an online and therefore active CPU. This CPU updates the hierarchy and being online ensures that there is always at least one migrator active which handles the modified hierarchy correctly when going idle. The online callback which runs on the incoming CPU then just marks the CPU active and brings it into operation. - Improve tracing and polish the code further so it is more obvious what's going on" * tag 'timers-urgent-2024-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers/migration: Fix grammar in comment timers/migration: Spare write when nothing changed timers/migration: Rename childmask by groupmask to make naming more obvious timers/migration: Read childmask and parent pointer in a single place timers/migration: Use a single struct for hierarchy walk data timers/migration: Improve tracing timers/migration: Move hierarchy setup into cpuhotplug prepare callback timers/migration: Do not rely always on group->parent
2024-07-27Merge tag 'riscv-for-linus-6.11-mw2' of ↵Linus Torvalds43-245/+750
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: - Support for NUMA (via SRAT and SLIT), console output (via SPCR), and cache info (via PPTT) on ACPI-based systems. - The trap entry/exit code no longer breaks the return address stack predictor on many systems, which results in an improvement to trap latency. - Support for HAVE_ARCH_STACKLEAK. - The sv39 linear map has been extended to support 128GiB mappings. - The frequency of the mtime CSR is now visible via hwprobe. * tag 'riscv-for-linus-6.11-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (21 commits) RISC-V: Provide the frequency of time CSR via hwprobe riscv: Extend sv39 linear mapping max size to 128G riscv: enable HAVE_ARCH_STACKLEAK riscv: signal: Remove unlikely() from WARN_ON() condition riscv: Improve exception and system call latency RISC-V: Select ACPI PPTT drivers riscv: cacheinfo: initialize cacheinfo's level and type from ACPI PPTT riscv: cacheinfo: remove the useless input parameter (node) of ci_leaf_init() RISC-V: ACPI: Enable SPCR table for console output on RISC-V riscv: boot: remove duplicated targets line trace: riscv: Remove deprecated kprobe on ftrace support riscv: cpufeature: Extract common elements from extension checking riscv: Introduce vendor variants of extension helpers riscv: Add vendor extensions to /proc/cpuinfo riscv: Extend cpufeature.c to detect vendor extensions RISC-V: run savedefconfig for defconfig RISC-V: hwprobe: sort EXT_KEY()s in hwprobe_isa_ext0() alphabetically ACPI: NUMA: replace pr_info with pr_debug in arch_acpi_numa_init ACPI: NUMA: change the ACPI_NUMA to a hidden option ACPI: NUMA: Add handler for SRAT RINTC affinity structure ...
2024-07-27Merge tag 'for-linus-6.11-rc1a-tag' of ↵Linus Torvalds6-64/+74
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Two fixes for issues introduced in this merge window: - fix enhanced debugging in the Xen multicall handling - two patches fixing a boot failure when running as dom0 in PVH mode" * tag 'for-linus-6.11-rc1a-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: fix memblock_reserve() usage on PVH x86/xen: move xen_reserve_extra_memory() xen: fix multicall debug data referencing
2024-07-27hostfs: fix the host directory parse when mounting.Hongbo Li1-10/+55
hostfs not keep the host directory when mounting. When the host directory is none (default), fc->source is used as the host root directory, and this is wrong. Here we use `parse_monolithic` to handle the old mount path for parsing the root directory. For new mount path, The `parse_param` is used for the host directory parse. Reported-and-tested-by: Maciej Żenczykowski <maze@google.com> Fixes: cd140ce9f611 ("hostfs: convert hostfs to use the new mount API") Link: https://lore.kernel.org/all/CANP3RGceNzwdb7w=vPf5=7BCid5HVQDmz1K5kC9JG42+HVAh_g@mail.gmail.com/ Cc: Christian Brauner <brauner@kernel.org> Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Link: https://lore.kernel.org/r/20240725065130.1821964-1-lihongbo22@huawei.com [brauner: minor fixes] Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-27fs: don't allow non-init s_user_ns for filesystems without FS_USERNS_MOUNTSeth Forshee (DigitalOcean)1-0/+11
Christian noticed that it is possible for a privileged user to mount most filesystems with a non-initial user namespace in sb->s_user_ns. When fsopen() is called in a non-init namespace the caller's namespace is recorded in fs_context->user_ns. If the returned file descriptor is then passed to a process priviliged in init_user_ns, that process can call fsconfig(fd_fs, FSCONFIG_CMD_CREATE), creating a new superblock with sb->s_user_ns set to the namespace of the process which called fsopen(). This is problematic. We cannot assume that any filesystem which does not set FS_USERNS_MOUNT has been written with a non-initial s_user_ns in mind, increasing the risk for bugs and security issues. Prevent this by returning EPERM from sget_fc() when FS_USERNS_MOUNT is not set for the filesystem and a non-initial user namespace will be used. sget() does not need to be updated as it always uses the user namespace of the current context, or the initial user namespace if SB_SUBMOUNT is set. Fixes: cb50b348c71f ("convenience helpers: vfs_get_super() and sget_fc()") Reported-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org> Link: https://lore.kernel.org/r/20240724-s_user_ns-fix-v1-1-895d07c94701@kernel.org Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-27ALSA: firewire-lib: fix wrong value as length of header for CIP_NO_HEADER caseTakashi Sakamoto1-2/+1
In a commit 1d717123bb1a ("ALSA: firewire-lib: Avoid -Wflex-array-member-not-at-end warning"), DEFINE_FLEX() macro was used to handle variable length of array for header field in struct fw_iso_packet structure. The usage of macro has a side effect that the designated initializer assigns the count of array to the given field. Therefore CIP_HEADER_QUADLETS (=2) is assigned to struct fw_iso_packet.header, while the original designated initializer assigns zero to all fields. With CIP_NO_HEADER flag, the change causes invalid length of header in isochronous packet for 1394 OHCI IT context. This bug affects all of devices supported by ALSA fireface driver; RME Fireface 400, 800, UCX, UFX, and 802. This commit fixes the bug by replacing it with the alternative version of macro which corresponds no initializer. Cc: stable@vger.kernel.org Fixes: 1d717123bb1a ("ALSA: firewire-lib: Avoid -Wflex-array-member-not-at-end warning") Reported-by: Edmund Raile <edmund.raile@proton.me> Closes: https://lore.kernel.org/r/rrufondjeynlkx2lniot26ablsltnynfaq2gnqvbiso7ds32il@qk4r6xps7jh2/ Reviewed-by: Takashi Iwai <tiwai@suse.de> Link: https://lore.kernel.org/r/20240725155640.128442-1-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-07-27Revert "firewire: Annotate struct fw_iso_packet with __counted_by()"Takashi Sakamoto1-3/+2
This reverts commit d3155742db89df3b3c96da383c400e6ff4d23c25. The header_length field is byte unit, thus it can not express the number of elements in header field. It seems that the argument for counted_by attribute can have no arithmetic expression, therefore this commit just reverts the issued commit. Suggested-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20240725161648.130404-1-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-07-27minmax: avoid overly complicated constant expressions in VM codeLinus Torvalds2-2/+9
The minmax infrastructure is overkill for simple constants, and can cause huge expansions because those simple constants are then used by other things. For example, 'pageblock_order' is a core VM constant, but because it was implemented using 'min_t()' and all the type-checking that involves, it actually expanded to something like 2.5kB of preprocessor noise. And when that simple constant was then used inside other expansions: #define pageblock_nr_pages (1UL << pageblock_order) #define pageblock_start_pfn(pfn) ALIGN_DOWN((pfn), pageblock_nr_pages) and we then use that inside a 'max()' macro: case ISOLATE_SUCCESS: update_cached = false; last_migrated_pfn = max(cc->zone->zone_start_pfn, pageblock_start_pfn(cc->migrate_pfn - 1)); the end result was that one statement expanding to 253kB in size. There are probably other cases of this, but this one case certainly stood out. I've added 'MIN_T()' and 'MAX_T()' macros for this kind of "core simple constant with specific type" use. These macros skip the type checking, and as such need to be very sparingly used only for obvious cases that have active issues like this. Reported-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/all/36aa2cad-1db1-4abf-8dd2-fb20484aabc3@lucifer.local/ Cc: David Laight <David.Laight@aculab.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-07-27minmax: avoid overly complex min()/max() macro arguments in xenLinus Torvalds1-2/+3
We have some very fancy min/max macros that have tons of sanity checking to warn about mixed signedness etc. This is all things that a sane compiler should warn about, but there are no sane compiler interfaces for this, and '-Wsign-compare' is broken [1] and not useful. So then we compensate (some would say over-compensate) by doing the checks manually with some truly horrid macro games. And no, we can't just use __builtin_types_compatible_p(), because the whole question of "does it make sense to compare these two values" is a lot more complicated than that. For example, it makes a ton of sense to compare unsigned values with simple constants like "5", even if that is indeed a signed type. So we have these very strange macros to try to make sensible type checking decisions on the arguments to 'min()' and 'max()'. But that can cause enormous code expansion if the min()/max() macros are used with complicated expressions, and particularly if you nest these things so that you get the first big expansion then expanded again. The xen setup.c file ended up ballooning to over 50MB of preprocessed noise that takes 15s to compile (obviously depending on the build host), largely due to one single line. So let's split that one single line to just be simpler. I think it ends up being more legible to humans too at the same time. Now that single file compiles in under a second. Reported-and-reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/all/c83c17bb-be75-4c67-979d-54eee38774c6@lucifer.local/ Link: https://staticthinking.wordpress.com/2023/07/25/wsign-compare-is-garbage/ [1] Cc: David Laight <David.Laight@aculab.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-07-27nilfs2: handle inconsistent state in nilfs_btnode_create_block()Ryusuke Konishi2-7/+22
Syzbot reported that a buffer state inconsistency was detected in nilfs_btnode_create_block(), triggering a kernel bug. It is not appropriate to treat this inconsistency as a bug; it can occur if the argument block address (the buffer index of the newly created block) is a virtual block number and has been reallocated due to corruption of the bitmap used to manage its allocation state. So, modify nilfs_btnode_create_block() and its callers to treat it as a possible filesystem error, rather than triggering a kernel bug. Link: https://lkml.kernel.org/r/20240725052007.4562-1-konishi.ryusuke@gmail.com Fixes: a60be987d45d ("nilfs2: B-tree node cache") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+89cc4f2324ed37988b60@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=89cc4f2324ed37988b60 Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-07-27selftests/mm: skip test for non-LPA2 and non-LVA systemsDev Jain1-1/+15
Post my improvement of the test in e4a4ba415419 ("selftests/mm: va_high_addr_switch: dynamically initialize testcases to enable LPA2 testing"): The test begins to fail on 4k and 16k pages, on non-LPA2 systems. To reduce noise in the CI systems, let us skip the test when higher address space is not implemented. Link: https://lkml.kernel.org/r/20240718052504.356517-1-dev.jain@arm.com Fixes: e4a4ba415419 ("selftests/mm: va_high_addr_switch: dynamically initialize testcases to enable LPA2 testing") Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mark Brown <broonie@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>