summaryrefslogtreecommitdiff
path: root/include/asm-generic
AgeCommit message (Collapse)AuthorFilesLines
12 dayskbuild: keep symbols for symbol_get() even with CONFIG_TRIM_UNUSED_KSYMSMasahiro Yamada1-0/+1
Linus observed that the symbol_request(utf8_data_table) call fails when CONFIG_UNICODE=y and CONFIG_TRIM_UNUSED_KSYMS=y. symbol_get() relies on the symbol data being present in the ksymtab for symbol lookups. However, EXPORT_SYMBOL_GPL(utf8_data_table) is dropped due to CONFIG_TRIM_UNUSED_KSYMS, as no module references it in this case. Probably, this has been broken since commit dbacb0ef670d ("kconfig option for TRIM_UNUSED_KSYMS"). This commit addresses the issue by leveraging modpost. Symbol names passed to symbol_get() are recorded in the special .no_trim_symbol section, which is then parsed by modpost to forcibly keep such symbols. The .no_trim_symbol section is discarded by the linker scripts, so there is no impact on the size of the final vmlinux or modules. This commit cannot resolve the issue for direct calls to __symbol_get() because the symbol name is not known at compile-time. Although symbol_get() may eventually be deprecated, this workaround should be good enough meanwhile. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-27Merge tag 'mm-stable-2025-01-26-14-59' of ↵Linus Torvalds3-12/+97
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "The various patchsets are summarized below. Plus of course many indivudual patches which are described in their changelogs. - "Allocate and free frozen pages" from Matthew Wilcox reorganizes the page allocator so we end up with the ability to allocate and free zero-refcount pages. So that callers (ie, slab) can avoid a refcount inc & dec - "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to use large folios other than PMD-sized ones - "Fix mm/rodata_test" from Petr Tesarik performs some maintenance and fixes for this small built-in kernel selftest - "mas_anode_descend() related cleanup" from Wei Yang tidies up part of the mapletree code - "mm: fix format issues and param types" from Keren Sun implements a few minor code cleanups - "simplify split calculation" from Wei Yang provides a few fixes and a test for the mapletree code - "mm/vma: make more mmap logic userland testable" from Lorenzo Stoakes continues the work of moving vma-related code into the (relatively) new mm/vma.c - "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David Hildenbrand cleans up and rationalizes handling of gfp flags in the page allocator - "readahead: Reintroduce fix for improper RA window sizing" from Jan Kara is a second attempt at fixing a readahead window sizing issue. It should reduce the amount of unnecessary reading - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng addresses an issue where "huge" amounts of pte pagetables are accumulated: https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/ Qi's series addresses this windup by synchronously freeing PTE memory within the context of madvise(MADV_DONTNEED) - "selftest/mm: Remove warnings found by adding compiler flags" from Muhammad Usama Anjum fixes some build warnings in the selftests code when optional compiler warnings are enabled - "mm: don't use __GFP_HARDWALL when migrating remote pages" from David Hildenbrand tightens the allocator's observance of __GFP_HARDWALL - "pkeys kselftests improvements" from Kevin Brodsky implements various fixes and cleanups in the MM selftests code, mainly pertaining to the pkeys tests - "mm/damon: add sample modules" from SeongJae Park enhances DAMON to estimate application working set size - "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn provides some cleanups to memcg's hugetlb charging logic - "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song removes the global swap cgroup lock. A speedup of 10% for a tmpfs-based kernel build was demonstrated - "zram: split page type read/write handling" from Sergey Senozhatsky has several fixes and cleaups for zram in the area of zram_write_page(). A watchdog softlockup warning was eliminated - "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin Brodsky cleans up the pagetable destructor implementations. A rare use-after-free race is fixed - "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes simplifies and cleans up the debugging code in the VMA merging logic - "Account page tables at all levels" from Kevin Brodsky cleans up and regularizes the pagetable ctor/dtor handling. This results in improvements in accounting accuracy - "mm/damon: replace most damon_callback usages in sysfs with new core functions" from SeongJae Park cleans up and generalizes DAMON's sysfs file interface logic - "mm/damon: enable page level properties based monitoring" from SeongJae Park increases the amount of information which is presented in response to DAMOS actions - "mm/damon: remove DAMON debugfs interface" from SeongJae Park removes DAMON's long-deprecated debugfs interfaces. Thus the migration to sysfs is completed - "mm/hugetlb: Refactor hugetlb allocation resv accounting" from Peter Xu cleans up and generalizes the hugetlb reservation accounting - "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino removes a never-used feature of the alloc_pages_bulk() interface - "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park extends DAMOS filters to support not only exclusion (rejecting), but also inclusion (allowing) behavior - "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi introduces a new memory descriptor for zswap.zpool that currently overlaps with struct page for now. This is part of the effort to reduce the size of struct page and to enable dynamic allocation of memory descriptors - "mm, swap: rework of swap allocator locks" from Kairui Song redoes and simplifies the swap allocator locking. A speedup of 400% was demonstrated for one workload. As was a 35% reduction for kernel build time with swap-on-zram - "mm: update mips to use do_mmap(), make mmap_region() internal" from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that mmap_region() can be made MM-internal - "mm/mglru: performance optimizations" from Yu Zhao fixes a few MGLRU regressions and otherwise improves MGLRU performance - "Docs/mm/damon: add tuning guide and misc updates" from SeongJae Park updates DAMON documentation - "Cleanup for memfd_create()" from Isaac Manjarres does that thing - "mm: hugetlb+THP folio and migration cleanups" from David Hildenbrand provides various cleanups in the areas of hugetlb folios, THP folios and migration - "Uncached buffered IO" from Jens Axboe implements the new RWF_DONTCACHE flag which provides synchronous dropbehind for pagecache reading and writing. To permite userspace to address issues with massive buildup of useless pagecache when reading/writing fast devices - "selftests/mm: virtual_address_range: Reduce memory" from Thomas Weißschuh fixes and optimizes some of the MM selftests" * tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) mm/compaction: fix UBSAN shift-out-of-bounds warning s390/mm: add missing ctor/dtor on page table upgrade kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags() tools: add VM_WARN_ON_VMG definition mm/damon/core: use str_high_low() helper in damos_wmark_wait_us() seqlock: add missing parameter documentation for raw_seqcount_try_begin() mm/page-writeback: consolidate wb_thresh bumping logic into __wb_calc_thresh mm/page_alloc: remove the incorrect and misleading comment zram: remove zcomp_stream_put() from write_incompressible_page() mm: separate move/undo parts from migrate_pages_batch() mm/kfence: use str_write_read() helper in get_access_type() selftests/mm/mkdirty: fix memory leak in test_uffdio_copy() kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags() selftests/mm: virtual_address_range: avoid reading from VM_IO mappings selftests/mm: vm_util: split up /proc/self/smaps parsing selftests/mm: virtual_address_range: unmap chunks after validation selftests/mm: virtual_address_range: mmap() without PROT_WRITE selftests/memfd/memfd_test: fix possible NULL pointer dereference mm: add FGP_DONTCACHE folio creation flag mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue ...
2025-01-27Merge tag 'mm-nonmm-stable-2025-01-24-23-16' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Mainly individually changelogged singleton patches. The patch series in this pull are: - "lib min_heap: Improve min_heap safety, testing, and documentation" from Kuan-Wei Chiu provides various tightenings to the min_heap library code - "xarray: extract __xa_cmpxchg_raw" from Tamir Duberstein preforms some cleanup and Rust preparation in the xarray library code - "Update reference to include/asm-<arch>" from Geert Uytterhoeven fixes pathnames in some code comments - "Converge on using secs_to_jiffies()" from Easwar Hariharan uses the new secs_to_jiffies() in various places where that is appropriate - "ocfs2, dlmfs: convert to the new mount API" from Eric Sandeen switches two filesystems to the new mount API - "Convert ocfs2 to use folios" from Matthew Wilcox does that - "Remove get_task_comm() and print task comm directly" from Yafang Shao removes now-unneeded calls to get_task_comm() in various places - "squashfs: reduce memory usage and update docs" from Phillip Lougher implements some memory savings in squashfs and performs some maintainability work - "lib: clarify comparison function requirements" from Kuan-Wei Chiu tightens the sort code's behaviour and adds some maintenance work - "nilfs2: protect busy buffer heads from being force-cleared" from Ryusuke Konishi fixes an issues in nlifs when the fs is presented with a corrupted image - "nilfs2: fix kernel-doc comments for function return values" from Ryusuke Konishi fixes some nilfs kerneldoc - "nilfs2: fix issues with rename operations" from Ryusuke Konishi addresses some nilfs BUG_ONs which syzbot was able to trigger - "minmax.h: Cleanups and minor optimisations" from David Laight does some maintenance work on the min/max library code - "Fixes and cleanups to xarray" from Kemeng Shi does maintenance work on the xarray library code" * tag 'mm-nonmm-stable-2025-01-24-23-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (131 commits) ocfs2: use str_yes_no() and str_no_yes() helper functions include/linux/lz4.h: add some missing macros Xarray: use xa_mark_t in xas_squash_marks() to keep code consistent Xarray: remove repeat check in xas_squash_marks() Xarray: distinguish large entries correctly in xas_split_alloc() Xarray: move forward index correctly in xas_pause() Xarray: do not return sibling entries from xas_find_marked() ipc/util.c: complete the kernel-doc function descriptions gcov: clang: use correct function param names latencytop: use correct kernel-doc format for func params minmax.h: remove some #defines that are only expanded once minmax.h: simplify the variants of clamp() minmax.h: move all the clamp() definitions after the min/max() ones minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp() minmax.h: reduce the #define expansion of min(), max() and clamp() minmax.h: update some comments minmax.h: add whitespace around operators and after commas nilfs2: do not update mtime of renamed directory that is not moved nilfs2: handle errors that nilfs_prepare_chunk() may return CREDITS: fix spelling mistake ...
2025-01-26mm: introduce ctor/dtor at PGD levelKevin Brodsky1-1/+2
Following on from the introduction of P4D-level ctor/dtor, let's finish the job and introduce ctor/dtor at PGD level. The incurred improvement in page accounting is minimal - the main motivation is to create a single, generic place where construction/destruction hooks can be added for all page table pages. This patch should cover all architectures and all configurations where PGDs are one or more regular pages. This excludes any configuration where PGDs are allocated from a kmem_cache object. Link: https://lkml.kernel.org/r/20250103184415.2744423-7-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-26asm-generic: pgalloc: provide generic __pgd_{alloc,free}Kevin Brodsky1-1/+26
We already have a generic implementation of alloc/free up to P4D level, as well as pgd_free(). Let's finish the work and add a generic PGD-level alloc helper as well. Unlike at lower levels, almost all architectures need some specific magic at PGD level (typically initialising PGD entries), so introducing a generic pgd_alloc() isn't worth it. Instead we introduce two new helpers, __pgd_alloc() and __pgd_free(), and make use of them in the arch-specific pgd_alloc() and pgd_free() wherever possible. To accommodate as many arch as possible, __pgd_alloc() takes a page allocation order. Because pagetable_alloc() allocates zeroed pages, explicit zeroing in pgd_alloc() becomes redundant and we can get rid of it. Some trivial implementations of pgd_free() also become unnecessary once __pgd_alloc() is used; remove them. Another small improvement is consistent accounting of PGD pages by using GFP_PGTABLE_{USER,KERNEL} as appropriate. Not all PGD allocations can be handled by the generic helpers. In particular, multiple architectures allocate PGDs from a kmem_cache, and those PGDs may not be page-sized. Link: https://lkml.kernel.org/r/20250103184415.2744423-6-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-26mm: pgtable: introduce generic pagetable_dtor_free()Qi Zheng2-10/+5
The pte_free(), pmd_free(), __pud_free() and __p4d_free() in asm-generic/pgalloc.h and the generic __tlb_remove_table() are basically the same, so let's introduce pagetable_dtor_free() to deduplicate them. In addition, the pagetable_dtor_free() in s390 does the same thing, so let's s390 also calls generic pagetable_dtor_free(). Link: https://lkml.kernel.org/r/1663a0565aca881d1338ceb7d1db4aa9c333abd6.1736317725.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390] Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-26mm: pgtable: completely move pagetable_dtor() to generic tlb_remove_table()Qi Zheng1-2/+8
For the generic tlb_remove_table(), it is implemented in the following two forms: 1) CONFIG_MMU_GATHER_TABLE_FREE is enabled tlb_remove_table --> generic __tlb_remove_table() 2) CONFIG_MMU_GATHER_TABLE_FREE is disabled tlb_remove_table --> tlb_remove_page For case 1), the pagetable_dtor() has already been moved to generic __tlb_remove_table(). For case 2), now only arm will call tlb_remove_table()/tlb_remove_ptdesc() when CONFIG_MMU_GATHER_TABLE_FREE is disabled. Let's move pagetable_dtor() completely to generic tlb_remove_table(), so that the architectures can follow more easily. Link: https://lkml.kernel.org/r/0c733ac867b287ec08190676496d1decebf49da2.1736317725.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Suggested-by: Kevin Brodsky <kevin.brodsky@arm.com> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-26mm: pgtable: introduce generic __tlb_remove_table()Qi Zheng1-2/+13
Several architectures (arm, arm64, riscv and x86) define exactly the same __tlb_remove_table(), just introduce generic __tlb_remove_table() to eliminate these duplications. The s390 __tlb_remove_table() is nearly the same, so also make s390 __tlb_remove_table() version generic. Link: https://lkml.kernel.org/r/ea372633d94f4d3f9f56a7ec5994bf050bf77e39.1736317725.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: Andreas Larsson <andreas@gaisler.com> [sparc] Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390] Acked-by: Arnd Bergmann <arnd@arndb.de> [asm-generic] Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-26mm: pgtable: introduce pagetable_dtor()Qi Zheng1-4/+4
The pagetable_p*_dtor() are exactly the same except for the handling of ptlock. If we make ptlock_free() handle the case where ptdesc->ptl is NULL and remove VM_BUG_ON_PAGE() from pmd_ptlock_free(), we can unify pagetable_p*_dtor() into one function. Let's introduce pagetable_dtor() to do this. Later, pagetable_dtor() will be moved to tlb_remove_ptdesc(), so that ptlock and page table pages can be freed together (regardless of whether RCU is used). This prevents the use-after-free problem where the ptlock is freed immediately but the page table pages is freed later via RCU. Link: https://lkml.kernel.org/r/47f44fff9dc68d9d9e9a0d6c036df275f820598a.1736317725.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> [s390] Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-26mm: pgtable: add statistics for P4D level page tableQi Zheng1-0/+2
Like other levels of page tables, add statistics for P4D level page table. Link: https://lkml.kernel.org/r/d55fe3c286305aae84457da9e1066df99b3de125.1736317725.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-26asm-generic: pgalloc: provide generic p4d_{alloc_one,free}Kevin Brodsky1-0/+45
Four architectures currently implement 5-level pgtables: arm64, riscv, x86 and s390. The first three have essentially the same implementation for p4d_alloc_one() and p4d_free(), so we've got an opportunity to reduce duplication like at the lower levels. Provide a generic version of p4d_alloc_one() and p4d_free(), and make use of it on those architectures. Their implementation is the same as at PUD level, except that p4d_free() performs a runtime check by calling mm_p4d_folded(). 5-level pgtables depend on a runtime-detected hardware feature on all supported architectures, so we might as well include this check in the generic implementation. No runtime check is required in p4d_alloc_one() as the top-level p4d_alloc() already does the required check. Link: https://lkml.kernel.org/r/26d69c74a29183ecc335b9b407040d8e4cd70c6a.1736317725.git.zhengqi.arch@bytedance.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Arnd Bergmann <arnd@arndb.de> [asm-generic] Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25Merge tag 'hyperv-next-signed-20250123' of ↵Linus Torvalds2-878/+3
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Introduce a new set of Hyper-V headers in include/hyperv and replace the old hyperv-tlfs.h with the new headers (Nuno Das Neves) - Fixes for the Hyper-V VTL mode (Roman Kisel) - Fixes for cpu mask usage in Hyper-V code (Michael Kelley) - Document the guest VM hibernation behaviour (Michael Kelley) - Miscellaneous fixes and cleanups (Jacob Pan, John Starks, Naman Jain) * tag 'hyperv-next-signed-20250123' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: Documentation: hyperv: Add overview of guest VM hibernation hyperv: Do not overlap the hvcall IO areas in hv_vtl_apicid_to_vp_id() hyperv: Do not overlap the hvcall IO areas in get_vtl() hyperv: Enable the hypercall output page for the VTL mode hv_balloon: Fallback to generic_online_page() for non-HV hot added mem Drivers: hv: vmbus: Log on missing offers if any Drivers: hv: vmbus: Wait for boot-time offers during boot and resume uio_hv_generic: Add a check for HV_NIC for send, receive buffers setup iommu/hyper-v: Don't assume cpu_possible_mask is dense Drivers: hv: Don't assume cpu_possible_mask is dense x86/hyperv: Don't assume cpu_possible_mask is dense hyperv: Remove the now unused hyperv-tlfs.h files hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h hyperv: Add new Hyper-V headers in include/hyperv hyperv: Clean up unnecessary #includes hyperv: Move hv_connection_id to hyperv-tlfs.h
2025-01-14mm/early_ioremap: add null pointer checks to prevent NULL-pointer dereferenceGuo Weikang1-1/+1
The early_ioremap interface can fail and return NULL in certain cases. To prevent NULL-pointer dereference crashes, fixed issues in the acpi_extlog and copy_early_mem interfaces, improving robustness when handling early memory. Link: https://lkml.kernel.org/r/20241212101004.1544070-1-guoweikang.kernel@gmail.com Signed-off-by: Guo Weikang <guoweikang.kernel@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Julian Stecklina <julian.stecklina@cyberus-technology.de> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Len Brown <lenb@kernel.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13include: update references to include/asm-<arch>Geert Uytterhoeven1-1/+1
"include/asm-<arch>" was replaced by "arch/<arch>/include/asm" a long time ago. Link: https://lkml.kernel.org/r/541258219b0441fa1da890e2f8458a7ac18c2ef9.1733404444.git.geert+renesas@glider.be Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Andy Whitcroft <apw@canonical.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> Cc: Joe Perches <joe@perches.com> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nicolas Schier <nicolas@fjasle.eu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Yury Norov <yury.norov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-10hyperv: Remove the now unused hyperv-tlfs.h filesNuno Das Neves1-883/+0
Remove all hyperv-tlfs.h files. These are no longer included anywhere. hyperv/hvhdk.h serves the same role, but with an easier path for adding new definitions. Remove the relevant lines in MAINTAINERS. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Link: https://lore.kernel.org/r/1732577084-2122-6-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1732577084-2122-6-git-send-email-nunodasneves@linux.microsoft.com>
2025-01-10hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.hNuno Das Neves1-4/+3
Switch to using hvhdk.h everywhere in the kernel. This header includes all the new Hyper-V headers in include/hyperv, which form a superset of the definitions found in hyperv-tlfs.h. This makes it easier to add new Hyper-V interfaces without being restricted to those in the TLFS doc (reflected in hyperv-tlfs.h). To be more consistent with the original Hyper-V code, the names of some definitions are changed slightly. Update those where needed. Update comments in mshyperv.h files to point to include/hyperv for adding new definitions. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Link: https://lore.kernel.org/r/1732577084-2122-5-git-send-email-nunodasneves@linux.microsoft.com Link: https://lore.kernel.org/r/20250108222138.1623703-3-romank@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-01-08hyperv: Move hv_connection_id to hyperv-tlfs.hNuno Das Neves1-0/+9
This definition is in the wrong file; it is part of the TLFS doc. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Acked-by: Wei Liu <wei.liu@kernel.org> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/1732577084-2122-2-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1732577084-2122-2-git-send-email-nunodasneves@linux.microsoft.com>
2024-12-26fprobe: Add fprobe_header encoding featureMasami Hiramatsu (Google)1-0/+46
Fprobe store its data structure address and size on the fgraph return stack by __fprobe_header. But most 64bit architecture can combine those to one unsigned long value because 4 MSB in the kernel address are the same. With this encoding, fprobe can consume less space on ret_stack. This introduces asm/fprobe.h to define arch dependent encode/decode macros. Note that since fprobe depends on CONFIG_HAVE_FUNCTION_GRAPH_FREGS, currently only arm64, loongarch, riscv, s390 and x86 are supported. Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390 Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/173519005783.391279.5307910947400277525.stgit@devnote2 Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-12-01Merge tag 'timers_urgent_for_v6.13_rc1' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Borislav Petkov: - Fix a case where posix timers with a thread-group-wide target would miss signals if some of the group's threads are exiting - Fix a hang caused by ndelay() calling the wrong delay function __udelay() - Fix a wrong offset calculation in adjtimex(2) when using ADJ_MICRO (microsecond resolution) and a negative offset * tag 'timers_urgent_for_v6.13_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: posix-timers: Target group sigqueue to current task only if not exiting delay: Fix ndelay() spuriously treated as udelay() ntp: Remove invalid cast in time offset math
2024-12-01Merge tag 'kbuild-v6.13' of ↵Linus Torvalds1-13/+40
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Add generic support for built-in boot DTB files - Enable TAB cycling for dialog buttons in nconfig - Fix issues in streamline_config.pl - Refactor Kconfig - Add support for Clang's AutoFDO (Automatic Feedback-Directed Optimization) - Add support for Clang's Propeller, a profile-guided optimization. - Change the working directory to the external module directory for M= builds - Support building external modules in a separate output directory - Enable objtool for *.mod.o and additional kernel objects - Use lz4 instead of deprecated lz4c - Work around a performance issue with "git describe" - Refactor modpost * tag 'kbuild-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (85 commits) kbuild: rename .tmp_vmlinux.kallsyms0.syms to .tmp_vmlinux0.syms gitignore: Don't ignore 'tags' directory kbuild: add dependency from vmlinux to resolve_btfids modpost: replace tdb_hash() with hash_str() kbuild: deb-pkg: add python3:native to build dependency genksyms: reduce indentation in export_symbol() modpost: improve error messages in device_id_check() modpost: rename alias symbol for MODULE_DEVICE_TABLE() modpost: rename variables in handle_moddevtable() modpost: move strstarts() to modpost.h modpost: convert do_usb_table() to a generic handler modpost: convert do_of_table() to a generic handler modpost: convert do_pnp_device_entry() to a generic handler modpost: convert do_pnp_card_entries() to a generic handler modpost: call module_alias_printf() from all do_*_entry() functions modpost: pass (struct module *) to do_*_entry() functions modpost: remove DEF_FIELD_ADDR_VAR() macro modpost: deduplicate MODULE_ALIAS() for all drivers modpost: introduce module_alias_printf() helper modpost: remove unnecessary check in do_acpi_entry() ...
2024-11-29delay: Fix ndelay() spuriously treated as udelay()Frederic Weisbecker1-2/+2
A recent rework on delay functions wrongly ended up calling __udelay() instead of __ndelay() for nanosecond delays, increasing those by 1000. As a result hangs have been observed on boot Restore the right function calls. Fixes: 19e2d91d8cb1 ("delay: Rework udelay and ndelay") Reported-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Chen-Yu Tsai <wenst@chromium.org> Link: https://lore.kernel.org/all/20241121152931.51884-1-frederic@kernel.org
2024-11-27Merge tag 'riscv-for-linus-6.13-mw1' of ↵Linus Torvalds4-96/+110
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-v updates from Palmer Dabbelt: - Support for pointer masking in userspace - Support for probing vector misaligned access performance - Support for qspinlock on systems with Zacas and Zabha * tag 'riscv-for-linus-6.13-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (38 commits) RISC-V: Remove unnecessary include from compat.h riscv: Fix default misaligned access trap riscv: Add qspinlock support dt-bindings: riscv: Add Ziccrse ISA extension description riscv: Add ISA extension parsing for Ziccrse asm-generic: ticket-lock: Add separate ticket-lock.h asm-generic: ticket-lock: Reuse arch_spinlock_t of qspinlock riscv: Implement xchg8/16() using Zabha riscv: Implement arch_cmpxchg128() using Zacas riscv: Improve zacas fully-ordered cmpxchg() riscv: Implement cmpxchg8/16() using Zabha dt-bindings: riscv: Add Zabha ISA extension description riscv: Implement cmpxchg32/64() using Zacas riscv: Do not fail to build on byte/halfword operations with Zawrs riscv: Move cpufeature.h macros into their own header KVM: riscv: selftests: Add Smnpm and Ssnpm to get-reg-list test RISC-V: KVM: Allow Smnpm and Ssnpm extensions for guests riscv: hwprobe: Export the Supm ISA extension riscv: selftests: Add a pointer masking test riscv: Allow ptrace control of the tagged address ABI ...
2024-11-27Rename .data.once to .data..once to fix resetting WARN*_ONCEMasahiro Yamada1-1/+1
Commit b1fca27d384e ("kernel debug: support resetting WARN*_ONCE") added support for clearing the state of once warnings. However, it is not functional when CONFIG_LD_DEAD_CODE_DATA_ELIMINATION or CONFIG_LTO_CLANG is enabled, because .data.once matches the .data.[0-9a-zA-Z_]* pattern in the DATA_MAIN macro. Commit cb87481ee89d ("kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured") was introduced to suppress the issue for the default CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=n case, providing a minimal fix for stable backporting. We were aware this did not address the issue for CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y. The plan was to apply correct fixes and then revert cb87481ee89d. [1] Seven years have passed since then, yet the #ifdef workaround remains in place. Meanwhile, commit b1fca27d384e introduced the .data.once section, and commit dc5723b02e52 ("kbuild: add support for Clang LTO") extended the #ifdef. Using a ".." separator in the section name fixes the issue for CONFIG_LD_DEAD_CODE_DATA_ELIMINATION and CONFIG_LTO_CLANG. [1]: https://lore.kernel.org/linux-kbuild/CAK7LNASck6BfdLnESxXUeECYL26yUDm0cwRZuM4gmaWUkxjL5g@mail.gmail.com/ Fixes: b1fca27d384e ("kernel debug: support resetting WARN*_ONCE") Fixes: dc5723b02e52 ("kbuild: add support for Clang LTO") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-11-27Rename .data.unlikely to .data..unlikelyMasahiro Yamada1-1/+1
Commit 7ccaba5314ca ("consolidate WARN_...ONCE() static variables") was intended to collect all .data.unlikely sections into one chunk. However, this has not worked when CONFIG_LD_DEAD_CODE_DATA_ELIMINATION or CONFIG_LTO_CLANG is enabled, because .data.unlikely matches the .data.[0-9a-zA-Z_]* pattern in the DATA_MAIN macro. Commit cb87481ee89d ("kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured") was introduced to suppress the issue for the default CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=n case, providing a minimal fix for stable backporting. We were aware this did not address the issue for CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y. The plan was to apply correct fixes and then revert cb87481ee89d. [1] Seven years have passed since then, yet the #ifdef workaround remains in place. Using a ".." separator in the section name fixes the issue for CONFIG_LD_DEAD_CODE_DATA_ELIMINATION and CONFIG_LTO_CLANG. [1]: https://lore.kernel.org/linux-kbuild/CAK7LNASck6BfdLnESxXUeECYL26yUDm0cwRZuM4gmaWUkxjL5g@mail.gmail.com/ Fixes: cb87481ee89d ("kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-11-27kbuild: Add Propeller configuration for kernel buildRong Xu1-3/+3
Add the build support for using Clang's Propeller optimizer. Like AutoFDO, Propeller uses hardware sampling to gather information about the frequency of execution of different code paths within a binary. This information is then used to guide the compiler's optimization decisions, resulting in a more efficient binary. The support requires a Clang compiler LLVM 19 or later, and the create_llvm_prof tool (https://github.com/google/autofdo/releases/tag/v0.30.1). This commit is limited to x86 platforms that support PMU features like LBR on Intel machines and AMD Zen3 BRS. Here is an example workflow for building an AutoFDO+Propeller optimized kernel: 1) Build the kernel on the host machine, with AutoFDO and Propeller build config CONFIG_AUTOFDO_CLANG=y CONFIG_PROPELLER_CLANG=y then $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo_profile> “<autofdo_profile>” is the profile collected when doing a non-Propeller AutoFDO build. This step builds a kernel that has the same optimization level as AutoFDO, plus a metadata section that records basic block information. This kernel image runs as fast as an AutoFDO optimized kernel. 2) Install the kernel on test/production machines. 3) Run the load tests. The '-c' option in perf specifies the sample event period. We suggest using a suitable prime number, like 500009, for this purpose. For Intel platforms: $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> \ -o <perf_file> -- <loadtest> For AMD platforms: The supported system are: Zen3 with BRS, or Zen4 with amd_lbr_v2 # To see if Zen3 support LBR: $ cat proc/cpuinfo | grep " brs" # To see if Zen4 support LBR: $ cat proc/cpuinfo | grep amd_lbr_v2 # If the result is yes, then collect the profile using: $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a \ -N -b -c <count> -o <perf_file> -- <loadtest> 4) (Optional) Download the raw perf file to the host machine. 5) Generate Propeller profile: $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> \ --format=propeller --propeller_output_module_name \ --out=<propeller_profile_prefix>_cc_profile.txt \ --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt “create_llvm_prof” is the profile conversion tool, and a prebuilt binary for linux can be found on https://github.com/google/autofdo/releases/tag/v0.30.1 (can also build from source). "<propeller_profile_prefix>" can be something like "/home/user/dir/any_string". This command generates a pair of Propeller profiles: "<propeller_profile_prefix>_cc_profile.txt" and "<propeller_profile_prefix>_ld_profile.txt". 6) Rebuild the kernel using the AutoFDO and Propeller profile files. CONFIG_AUTOFDO_CLANG=y CONFIG_PROPELLER_CLANG=y and $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo_profile> \ CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix> Co-developed-by: Han Shen <shenhan@google.com> Signed-off-by: Han Shen <shenhan@google.com> Signed-off-by: Rong Xu <xur@google.com> Suggested-by: Sriraman Tallam <tmsriram@google.com> Suggested-by: Krzysztof Pszeniczny <kpszeniczny@google.com> Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Suggested-by: Stephane Eranian <eranian@google.com> Tested-by: Yonghong Song <yonghong.song@linux.dev> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-11-27AutoFDO: Enable machine function split optimization for AutoFDORong Xu1-1/+6
Enable the machine function split optimization for AutoFDO in Clang. Machine function split (MFS) is a pass in the Clang compiler that splits a function into hot and cold parts. The linker groups all cold blocks across functions together. This decreases hot code fragmentation and improves iCache and iTLB utilization. MFS requires a profile so this is enabled only for the AutoFDO builds. Co-developed-by: Han Shen <shenhan@google.com> Signed-off-by: Han Shen <shenhan@google.com> Signed-off-by: Rong Xu <xur@google.com> Suggested-by: Sriraman Tallam <tmsriram@google.com> Suggested-by: Krzysztof Pszeniczny <kpszeniczny@google.com> Tested-by: Yonghong Song <yonghong.song@linux.dev> Tested-by: Yabin Cui <yabinc@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-11-27AutoFDO: Enable -ffunction-sections for the AutoFDO buildRong Xu1-2/+9
Enable -ffunction-sections by default for the AutoFDO build. With -ffunction-sections, the compiler places each function in its own section named .text.function_name instead of placing all functions in the .text section. In the AutoFDO build, this allows the linker to utilize profile information to reorganize functions for improved utilization of iCache and iTLB. Co-developed-by: Han Shen <shenhan@google.com> Signed-off-by: Han Shen <shenhan@google.com> Signed-off-by: Rong Xu <xur@google.com> Suggested-by: Sriraman Tallam <tmsriram@google.com> Tested-by: Yonghong Song <yonghong.song@linux.dev> Tested-by: Yabin Cui <yabinc@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-11-27vmlinux.lds.h: Add markers for text_unlikely and text_hot sectionsRong Xu1-2/+12
Add markers like __hot_text_start, __hot_text_end, __unlikely_text_start, and __unlikely_text_end which will be included in System.map. These markers indicate how the compiler groups functions, providing valuable information to developers about the layout and optimization of the code. Co-developed-by: Han Shen <shenhan@google.com> Signed-off-by: Han Shen <shenhan@google.com> Signed-off-by: Rong Xu <xur@google.com> Suggested-by: Sriraman Tallam <tmsriram@google.com> Tested-by: Yonghong Song <yonghong.song@linux.dev> Tested-by: Yabin Cui <yabinc@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-11-27vmlinux.lds.h: Adjust symbol ordering in text output sectionRong Xu1-7/+12
When the -ffunction-sections compiler option is enabled, each function is placed in a separate section named .text.function_name rather than putting all functions in a single .text section. However, using -function-sections can cause problems with the linker script. The comments included in include/asm-generic/vmlinux.lds.h note these issues.: “TEXT_MAIN here will match .text.fixup and .text.unlikely if dead code elimination is enabled, so these sections should be converted to use ".." first.” It is unclear whether there is a straightforward method for converting a suffix to "..". This patch modifies the order of subsections within the text output section. Specifically, it changes current order: .text.hot, .text, .text_unlikely, .text.unknown, .text.asan to the new order: .text.asan, .text.unknown, .text_unlikely, .text.hot, .text Here is the rationale behind the new layout: The majority of the code resides in three sections: .text.hot, .text, and .text.unlikely, with .text.unknown containing a negligible amount. .text.asan is only generated in ASAN builds. The primary goal is to group code segments based on their execution frequency (hotness). First, we want to place .text.hot adjacent to .text. Since we cannot put .text.hot after .text (Due to constraints with -ffunction-sections, placing .text.hot after .text is problematic), we need to put .text.hot before .text. Then it comes to .text.unlikely, we cannot put it after .text (same -ffunction-sections issue) . Therefore, we position .text.unlikely before .text.hot. .text.unknown and .tex.asan follow the same logic. This revised ordering effectively reverses the original arrangement (for .text.unlikely, .text.unknown, and .tex.asan), maintaining a similar level of affinity between sections. It also places .text.hot section at the beginning of a page to better utilize the TLB entry. Note that the limitation arises because the linker script employs glob patterns instead of regular expressions for string matching. While there is a method to maintain the current order using complex patterns, this significantly complicates the pattern and increases the likelihood of errors. This patch also changes vmlinux.lds.S for the sparc64 architecture to accommodate specific symbol placement requirements. Co-developed-by: Han Shen <shenhan@google.com> Signed-off-by: Han Shen <shenhan@google.com> Signed-off-by: Rong Xu <xur@google.com> Suggested-by: Sriraman Tallam <tmsriram@google.com> Suggested-by: Krzysztof Pszeniczny <kpszeniczny@google.com> Tested-by: Yonghong Song <yonghong.song@linux.dev> Tested-by: Yabin Cui <yabinc@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-11-23Merge tag 'mm-stable-2024-11-18-19-27' of ↵Linus Torvalds3-7/+32
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - The series "zram: optimal post-processing target selection" from Sergey Senozhatsky improves zram's post-processing selection algorithm. This leads to improved memory savings. - Wei Yang has gone to town on the mapletree code, contributing several series which clean up the implementation: - "refine mas_mab_cp()" - "Reduce the space to be cleared for maple_big_node" - "maple_tree: simplify mas_push_node()" - "Following cleanup after introduce mas_wr_store_type()" - "refine storing null" - The series "selftests/mm: hugetlb_fault_after_madv improvements" from David Hildenbrand fixes this selftest for s390. - The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng implements some rationaizations and cleanups in the page mapping code. - The series "mm: optimize shadow entries removal" from Shakeel Butt optimizes the file truncation code by speeding up the handling of shadow entries. - The series "Remove PageKsm()" from Matthew Wilcox completes the migration of this flag over to being a folio-based flag. - The series "Unify hugetlb into arch_get_unmapped_area functions" from Oscar Salvador implements a bunch of consolidations and cleanups in the hugetlb code. - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain takes away the wp-fault time practice of turning a huge zero page into small pages. Instead we replace the whole thing with a THP. More consistent cleaner and potentiall saves a large number of pagefaults. - The series "percpu: Add a test case and fix for clang" from Andy Shevchenko enhances and fixes the kernel's built in percpu test code. - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett optimizes mremap() by avoiding doing things which we didn't need to do. - The series "Improve the tmpfs large folio read performance" from Baolin Wang teaches tmpfs to copy data into userspace at the folio size rather than as individual pages. A 20% speedup was observed. - The series "mm/damon/vaddr: Fix issue in damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON splitting. - The series "memcg-v1: fully deprecate charge moving" from Shakeel Butt removes the long-deprecated memcgv2 charge moving feature. - The series "fix error handling in mmap_region() and refactor" from Lorenzo Stoakes cleanup up some of the mmap() error handling and addresses some potential performance issues. - The series "x86/module: use large ROX pages for text allocations" from Mike Rapoport teaches x86 to use large pages for read-only-execute module text. - The series "page allocation tag compression" from Suren Baghdasaryan is followon maintenance work for the new page allocation profiling feature. - The series "page->index removals in mm" from Matthew Wilcox remove most references to page->index in mm/. A slow march towards shrinking struct page. - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs interface tests" from Andrew Paniakin performs maintenance work for DAMON's self testing code. - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar improves zswap's batching of compression and decompression. It is a step along the way towards using Intel IAA hardware acceleration for this zswap operation. - The series "kasan: migrate the last module test to kunit" from Sabyrzhan Tasbolatov completes the migration of the KASAN built-in tests over to the KUnit framework. - The series "implement lightweight guard pages" from Lorenzo Stoakes permits userapace to place fault-generating guard pages within a single VMA, rather than requiring that multiple VMAs be created for this. Improved efficiencies for userspace memory allocators are expected. - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses tracepoints to provide increased visibility into memcg stats flushing activity. - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky fixes a zram buglet which potentially affected performance. - The series "mm: add more kernel parameters to control mTHP" from Maíra Canal enhances our ability to control/configuremultisize THP from the kernel boot command line. - The series "kasan: few improvements on kunit tests" from Sabyrzhan Tasbolatov has a couple of fixups for the KASAN KUnit tests. - The series "mm/list_lru: Split list_lru lock into per-cgroup scope" from Kairui Song optimizes list_lru memory utilization when lockdep is enabled. * tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (215 commits) cma: enforce non-zero pageblock_order during cma_init_reserved_mem() mm/kfence: add a new kunit test test_use_after_free_read_nofault() zram: fix NULL pointer in comp_algorithm_show() memcg/hugetlb: add hugeTLB counters to memcg vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount zram: ZRAM_DEF_COMP should depend on ZRAM MAINTAINERS/MEMORY MANAGEMENT: add document files for mm Docs/mm/damon: recommend academic papers to read and/or cite mm: define general function pXd_init() kmemleak: iommu/iova: fix transient kmemleak false positive mm/list_lru: simplify the list_lru walk callback function mm/list_lru: split the lock to per-cgroup scope mm/list_lru: simplify reparenting and initial allocation mm/list_lru: code clean up for reparenting mm/list_lru: don't export list_lru_add mm/list_lru: don't pass unnecessary key parameters kasan: add kunit tests for kmalloc_track_caller, kmalloc_node_track_caller kasan: change kasan_atomics kunit test as KUNIT_CASE_SLOW kasan: use EXPORT_SYMBOL_IF_KUNIT to export symbols ...
2024-11-21Merge tag 'asm-generic-3.13' of ↵Linus Torvalds4-121/+118
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "These are a number of unrelated cleanups, generally simplifying the architecture specific header files: - A series from Al Viro simplifies asm/vga.h, after it turns out that most of it can be generalized. - A series from Julian Vetter adds a common version of memcpy_{to,from}io() and memset_io() and changes most architectures to use that instead of their own implementation - A series from Niklas Schnelle concludes his work to make PC style inb()/outb() optional - Nicolas Pitre contributes improvements for the generic do_div() helper - Christoph Hellwig adds a generic version of page_to_phys() and phys_to_page(), replacing the slightly different architecture specific definitions. - Uwe Kleine-Koenig has a minor cleanup for ioctl definitions" * tag 'asm-generic-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (24 commits) empty include/asm-generic/vga.h sparc: get rid of asm/vga.h asm/vga.h: don't bother with scr_mem{cpy,move}v() unless we need to vt_buffer.h: get rid of dead code in default scr_...() instances tty: serial: export serial_8250_warn_need_ioport lib/iomem_copy: fix kerneldoc format style hexagon: simplify asm/io.h for !HAS_IOPORT loongarch: Use new fallback IO memcpy/memset csky: Use new fallback IO memcpy/memset arm64: Use new fallback IO memcpy/memset New implementation for IO memcpy and IO memset watchdog: Add HAS_IOPORT dependency for SBC8360 and SBC7240 __arch_xprod64(): make __always_inline when optimizing for performance ARM: div64: improve __arch_xprod_64() asm-generic/div64: optimize/simplify __div64_const32() lib/math/test_div64: add some edge cases relevant to __div64_const32() asm-generic: add an optional pfn_valid check to page_to_phys asm-generic: provide generic page_to_phys and phys_to_page implementations asm-generic/io.h: Remove I/O port accessors for HAS_IOPORT=n tty: serial: handle HAS_IOPORT dependencies ...
2024-11-20Merge tag 'timers-core-2024-11-18' of ↵Linus Torvalds1-27/+69
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "A rather large update for timekeeping and timers: - The final step to get rid of auto-rearming posix-timers posix-timers are currently auto-rearmed by the kernel when the signal of the timer is ignored so that the timer signal can be delivered once the corresponding signal is unignored. This requires to throttle the timer to prevent a DoS by small intervals and keeps the system pointlessly out of low power states for no value. This is a long standing non-trivial problem due to the lock order of posix-timer lock and the sighand lock along with life time issues as the timer and the sigqueue have different life time rules. Cure this by: - Embedding the sigqueue into the timer struct to have the same life time rules. Aside of that this also avoids the lookup of the timer in the signal delivery and rearm path as it's just a always valid container_of() now. - Queuing ignored timer signals onto a seperate ignored list. - Moving queued timer signals onto the ignored list when the signal is switched to SIG_IGN before it could be delivered. - Walking the ignored list when SIG_IGN is lifted and requeue the signals to the actual signal lists. This allows the signal delivery code to rearm the timer. This also required to consolidate the signal delivery rules so they are consistent across all situations. With that all self test scenarios finally succeed. - Core infrastructure for VFS multigrain timestamping This is required to allow the kernel to use coarse grained time stamps by default and switch to fine grained time stamps when inode attributes are actively observed via getattr(). These changes have been provided to the VFS tree as well, so that the VFS specific infrastructure could be built on top. - Cleanup and consolidation of the sleep() infrastructure - Move all sleep and timeout functions into one file - Rework udelay() and ndelay() into proper documented inline functions and replace the hardcoded magic numbers by proper defines. - Rework the fsleep() implementation to take the reality of the timer wheel granularity on different HZ values into account. Right now the boundaries are hard coded time ranges which fail to provide the requested accuracy on different HZ settings. - Update documentation for all sleep/timeout related functions and fix up stale documentation links all over the place - Fixup a few usage sites - Rework of timekeeping and adjtimex(2) to prepare for multiple PTP clocks A system can have multiple PTP clocks which are participating in seperate and independent PTP clock domains. So far the kernel only considers the PTP clock which is based on CLOCK TAI relevant as that's the clock which drives the timekeeping adjustments via the various user space daemons through adjtimex(2). The non TAI based clock domains are accessible via the file descriptor based posix clocks, but their usability is very limited. They can't be accessed fast as they always go all the way out to the hardware and they cannot be utilized in the kernel itself. As Time Sensitive Networking (TSN) gains traction it is required to provide fast user and kernel space access to these clocks. The approach taken is to utilize the timekeeping and adjtimex(2) infrastructure to provide this access in a similar way how the kernel provides access to clock MONOTONIC, REALTIME etc. Instead of creating a duplicated infrastructure this rework converts timekeeping and adjtimex(2) into generic functionality which operates on pointers to data structures instead of using static variables. This allows to provide time accessors and adjtimex(2) functionality for the independent PTP clocks in a subsequent step. - Consolidate hrtimer initialization hrtimers are set up by initializing the data structure and then seperately setting the callback function for historical reasons. That's an extra unnecessary step and makes Rust support less straight forward than it should be. Provide a new set of hrtimer_setup*() functions and convert the core code and a few usage sites of the less frequently used interfaces over. The bulk of the htimer_init() to hrtimer_setup() conversion is already prepared and scheduled for the next merge window. - Drivers: - Ensure that the global timekeeping clocksource is utilizing the cluster 0 timer on MIPS multi-cluster systems. Otherwise CPUs on different clusters use their cluster specific clocksource which is not guaranteed to be synchronized with other clusters. - Mostly boring cleanups, fixes, improvements and code movement" * tag 'timers-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (140 commits) posix-timers: Fix spurious warning on double enqueue versus do_exit() clocksource/drivers/arm_arch_timer: Use of_property_present() for non-boolean properties clocksource/drivers/gpx: Remove redundant casts clocksource/drivers/timer-ti-dm: Fix child node refcount handling dt-bindings: timer: actions,owl-timer: convert to YAML clocksource/drivers/ralink: Add Ralink System Tick Counter driver clocksource/drivers/mips-gic-timer: Always use cluster 0 counter as clocksource clocksource/drivers/timer-ti-dm: Don't fail probe if int not found clocksource/drivers:sp804: Make user selectable clocksource/drivers/dw_apb: Remove unused dw_apb_clockevent functions hrtimers: Delete hrtimer_init_on_stack() alarmtimer: Switch to use hrtimer_setup() and hrtimer_setup_on_stack() io_uring: Switch to use hrtimer_setup_on_stack() sched/idle: Switch to use hrtimer_setup_on_stack() hrtimers: Delete hrtimer_init_sleeper_on_stack() wait: Switch to use hrtimer_setup_sleeper_on_stack() timers: Switch to use hrtimer_setup_sleeper_on_stack() net: pktgen: Switch to use hrtimer_setup_sleeper_on_stack() futex: Switch to use hrtimer_setup_sleeper_on_stack() fs/aio: Switch to use hrtimer_setup_sleeper_on_stack() ...
2024-11-20Merge tag 'timers-vdso-2024-11-18' of ↵Linus Torvalds1-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull vdso data page handling updates from Thomas Gleixner: "First steps of consolidating the VDSO data page handling. The VDSO data page handling is architecture specific for historical reasons, but there is no real technical reason to do so. Aside of that VDSO data has become a dump ground for various mechanisms and fail to provide a clear separation of the functionalities. Clean this up by: - consolidating the VDSO page data by getting rid of architecture specific warts especially in x86 and PowerPC. - removing the last includes of header files which are pulling in other headers outside of the VDSO namespace. - seperating timekeeping and other VDSO data accordingly. Further consolidation of the VDSO page handling is done in subsequent changes scheduled for the next merge window. This also lays the ground for expanding the VDSO time getters for independent PTP clocks in a generic way without making every architecture add support seperately" * tag 'timers-vdso-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits) x86/vdso: Add missing brackets in switch case vdso: Rename struct arch_vdso_data to arch_vdso_time_data powerpc: Split systemcfg struct definitions out from vdso powerpc: Split systemcfg data out of vdso data page powerpc: Add kconfig option for the systemcfg page powerpc/pseries/lparcfg: Use num_possible_cpus() for potential processors powerpc/pseries/lparcfg: Fix printing of system_active_processors powerpc/procfs: Propagate error of remap_pfn_range() powerpc/vdso: Remove offset comment from 32bit vdso_arch_data x86/vdso: Split virtual clock pages into dedicated mapping x86/vdso: Delete vvar.h x86/vdso: Access vdso data without vvar.h x86/vdso: Move the rng offset to vsyscall.h x86/vdso: Access rng vdso data without vvar.h x86/vdso: Access timens vdso data without vvar.h x86/vdso: Allocate vvar page from C code x86/vdso: Access rng data from kernel without vvar x86/vdso: Place vdso_data at beginning of vvar page x86/vdso: Use __arch_get_vdso_data() to access vdso data x86/mm/mmap: Remove arch_vma_name() ...
2024-11-18Merge tag 'pull-xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-0/+6
Pull xattr updates from Al Viro: "Sanitize xattr and io_uring interactions with it, add *xattrat() syscalls, sanitize struct filename handling in there" * tag 'pull-xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: xattr: remove redundant check on variable err fs/xattr: add *at family syscalls new helpers: file_removexattr(), filename_removexattr() new helpers: file_listxattr(), filename_listxattr() replace do_getxattr() with saner helpers. replace do_setxattr() with saner helpers. new helper: import_xattr_name() fs: rename struct xattr_ctx to kernel_xattr_ctx xattr: switch to CLASS(fd) io_[gs]etxattr_prep(): just use getname() io_uring: IORING_OP_F[GS]ETXATTR is fine with REQ_F_FIXED_FILE getname_maybe_null() - the third variant of pathname copy-in teach filename_lookup() to treat NULL filename as ""
2024-11-11empty include/asm-generic/vga.hAl Viro1-22/+1
all places that use anything defined in it (vgacon, mdacon and vga16fb) are built only on architectures that have all that stuff in their native asm/vga.h allows to kill stub asm/vga.h on sh, while we are at it... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-11-11riscv: Add qspinlock supportAlexandre Ghiti2-0/+4
In order to produce a generic kernel, a user can select CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket spinlock implementation if Zabha or Ziccrse are not present. Note that we can't use alternatives here because the discovery of extensions is done too late and we need to start with the qspinlock implementation because the ticket spinlock implementation would pollute the spinlock value, so let's use static keys. This is largely based on Guo's work and Leonardo reviews at [1]. Link: https://lore.kernel.org/linux-riscv/20231225125847.2778638-1-guoren@kernel.org/ [1] Signed-off-by: Guo Ren <guoren@kernel.org> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Andrea Parri <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20241103145153.105097-14-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-11-11asm-generic: ticket-lock: Add separate ticket-lock.hGuo Ren2-86/+104
Add a separate ticket-lock.h to include multiple spinlock versions and select one at compile time or runtime. Reviewed-by: Leonardo Bras <leobras@redhat.com> Suggested-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/linux-riscv/CAK8P3a2rnz9mQqhN6-e0CGUUv9rntRELFdxt_weiD7FxH7fkfQ@mail.gmail.com/ Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Andrea Parri <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20241103145153.105097-11-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-11-11asm-generic: ticket-lock: Reuse arch_spinlock_t of qspinlockGuo Ren2-17/+9
The arch_spinlock_t of qspinlock has contained the atomic_t val, which satisfies the ticket-lock requirement. Thus, unify the arch_spinlock_t into qspinlock_types.h. This is the preparation for the next combo spinlock. Reviewed-by: Leonardo Bras <leobras@redhat.com> Suggested-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/linux-riscv/CAK8P3a2rnz9mQqhN6-e0CGUUv9rntRELFdxt_weiD7FxH7fkfQ@mail.gmail.com/ Signed-off-by: Guo Ren <guoren@kernel.org> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Andrea Parri <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20241103145153.105097-10-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-11-08alloc_tag: load module tags into separate contiguous memorySuren Baghdasaryan1-0/+19
When a module gets unloaded there is a possibility that some of the allocations it made are still used and therefore the allocation tags corresponding to these allocations are still referenced. As such, the memory for these tags can't be freed. This is currently handled as an abnormal situation and module's data section is not being unloaded. To handle this situation without keeping module's data in memory, allow codetags with longer lifespan than the module to be loaded into their own separate memory. The in-use memory areas and gaps after module unloading in this separate memory are tracked using maple trees. Allocation tags arrange their separate memory so that it is virtually contiguous and that will allow simple allocation tag indexing later on in this patchset. The size of this virtually contiguous memory is set to store up to 100000 allocation tags. [surenb@google.com: fix empty codetag module section handling] Link: https://lkml.kernel.org/r/20241101000017.3856204-1-surenb@google.com [akpm@linux-foundation.org: update comment, per Dan] Link: https://lkml.kernel.org/r/20241023170759.999909-4-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Sourav Panda <souravpanda@google.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Huth <thuth@redhat.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xiongwei Song <xiongwei.song@windriver.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-08asm-generic: introduce text-patching.hMike Rapoport (Microsoft)1-0/+5
Several architectures support text patching, but they name the header files that declare patching functions differently. Make all such headers consistently named text-patching.h and add an empty header in asm-generic for architectures that do not support text patching. Link: https://lkml.kernel.org/r/20241023162711.2579610-4-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: kdevops <kdevops@lists.linux.dev> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Song Liu <song@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07mm: consolidate common checks in hugetlb_get_unmapped_areaOscar Salvador1-7/+0
prepare_hugepage_range() performs almost the same checks for all architectures that define it, with the exception of mips and loongarch that also check for overflows. The rest checks for the addr and len to be properly aligned, so we can move that to hugetlb_get_unmapped_area() and get rid of a fair amount of duplicated code. [akpm@linux-foundation.org: remove now-unused local] Link: https://lore.kernel.org/oe-kbuild-all/202410081210.uNLbf3Jk-lkp@intel.com/ Link: https://lkml.kernel.org/r/20241007075037.267650-10-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Peter Xu <peterx@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07arch/s390: clean up hugetlb definitionsOscar Salvador1-0/+8
s390 redefines functions that are already defined (and the same) in include/asm-generic/hugetlb.h. Do as the other architectures: 1) include include/asm-generic/hugetlb.h 2) drop the already defined functions in the generic hugetlb.h and 3) use the __HAVE_ARCH_HUGE_* macros to define our own. This gets rid of quite some code. Link: https://lkml.kernel.org/r/20241007075037.267650-9-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Peter Xu <peterx@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06fs/xattr: add *at family syscallsChristian Göttsche1-0/+6
Add the four syscalls setxattrat(), getxattrat(), listxattrat() and removexattrat(). Those can be used to operate on extended attributes, especially security related ones, either relative to a pinned directory or on a file descriptor without read access, avoiding a /proc/<pid>/fd/<fd> detour, requiring a mounted procfs. One use case will be setfiles(8) setting SELinux file contexts ("security.selinux") without race conditions and without a file descriptor opened with read access requiring SELinux read permission. Use the do_{name}at() pattern from fs/open.c. Pass the value of the extended attribute, its length, and for setxattrat(2) the command (XATTR_CREATE or XATTR_REPLACE) via an added struct xattr_args to not exceed six syscall arguments and not merging the AT_* and XATTR_* flags. [AV: fixes by Christian Brauner folded in, the entire thing rebased on top of {filename,file}_...xattr() primitives, treatment of empty pathnames regularized. As the result, AT_EMPTY_PATH+NULL handling is cheap, so f...(2) can use it] Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Link: https://lore.kernel.org/r/20240426162042.191916-1-cgoettsche@seltendoof.de Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christian Brauner <brauner@kernel.org> CC: x86@kernel.org CC: linux-alpha@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: linux-arm-kernel@lists.infradead.org CC: linux-ia64@vger.kernel.org CC: linux-m68k@lists.linux-m68k.org CC: linux-mips@vger.kernel.org CC: linux-parisc@vger.kernel.org CC: linuxppc-dev@lists.ozlabs.org CC: linux-s390@vger.kernel.org CC: linux-sh@vger.kernel.org CC: sparclinux@vger.kernel.org CC: linux-fsdevel@vger.kernel.org CC: audit@vger.kernel.org CC: linux-arch@vger.kernel.org CC: linux-api@vger.kernel.org CC: linux-security-module@vger.kernel.org CC: selinux@vger.kernel.org [brauner: slight tweaks] Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-10-29New implementation for IO memcpy and IO memsetJulian Vetter1-19/+3
The IO memcpy and IO memset functions in asm-generic/io.h simply call memcpy and memset. This can lead to alignment problems or faults on architectures that do not define their own version and fall back to these defaults. This patch introduces new implementations for IO memcpy and IO memset, that use read{l,q} accessor functions, align accesses to machine word size, and resort to byte accesses when the target memory is not aligned. For new architectures and existing ones that were using the old fallbacks these functions are save to use, because IO memory constraints are taken into account. Moreover, architectures with similar implementations can now use these new versions, not needing to implement their own. Reviewed-by: Yann Sionneau <ysionneau@kalrayinc.com> Signed-off-by: Julian Vetter <jvetter@kalrayinc.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-10-29__arch_xprod64(): make __always_inline when optimizing for performanceNicolas Pitre1-1/+6
Recent gcc versions started not systematically inline __arch_xprod64() and that has performance implications. Give the compiler the freedom to decide only when optimizing for size. Here's some timing numbers from lib/math/test_div64.c Using __always_inline: ``` test_div64: Starting 64bit/32bit division and modulo test test_div64: Completed 64bit/32bit division and modulo test, 0.048285584s elapsed ``` Without __always_inline: ``` test_div64: Starting 64bit/32bit division and modulo test test_div64: Completed 64bit/32bit division and modulo test, 0.053023584s elapsed ``` Forcing constant base through the non-constant base code path: ``` test_div64: Starting 64bit/32bit division and modulo test test_div64: Completed 64bit/32bit division and modulo test, 0.103263776s elapsed ``` It is worth noting that test_div64 does half the test with non constant divisors already so the impact is greater than what those numbers show. And for what it is worth, those numbers were obtained using QEMU. The gcc version is 14.1.0. Signed-off-by: Nicolas Pitre <npitre@baylibre.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-10-29asm-generic/div64: optimize/simplify __div64_const32()Nicolas Pitre1-79/+35
Several years later I just realized that this code could be greatly simplified. First, let's formalize the need for overflow handling in __arch_xprod64(). Assuming n = UINT64_MAX, there are 2 cases where an overflow may occur: 1) If a bias must be added, we have m_lo * n_lo + m or m_lo * 0xffffffff + ((m_hi << 32) + m_lo) or ((m_lo << 32) - m_lo) + ((m_hi << 32) + m_lo) or (m_lo + m_hi) << 32 which must be < (1 << 64). So the criteria for no overflow is m_lo + m_hi < (1 << 32). 2) The cross product m_lo * n_hi + m_hi * n_lo or m_lo * 0xffffffff + m_hi * 0xffffffff or ((m_lo << 32) - m_lo) + ((m_hi << 32) - m_hi). Assuming the top result from the previous step (m_lo + m_hi) that must be added to this, we get (m_lo + m_hi) << 32 again. So let's have a straight and simpler version when this is true. Otherwise some reordering allows for taking care of possible overflows without any actual conditionals. And prevent from generating both code variants by making sure this is considered only if m is perceived as constant by the compiler. This, in turn, allows for greatly simplifying __div64_const32(). The "special case" may go as well as the regular case works just fine without needing a bias. Then reduction should be applied all the time as minimizing m is the key. Signed-off-by: Nicolas Pitre <npitre@baylibre.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-10-29asm-generic: add an optional pfn_valid check to page_to_physChristoph Hellwig1-0/+10
page_to_pfn is usually implemented by pointer arithmetics on the memory map, which means that bogus input can lead to even more bogus output. Powerpc had a pfn_valid check on the intermediate pfn in the page_to_phys implementation when CONFIG_DEBUG_VIRTUAL is defined, which seems generally useful, so add that to the generic version. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-10-29asm-generic: provide generic page_to_phys and phys_to_page implementationsChristoph Hellwig1-0/+3
page_to_phys is duplicated by all architectures, and from some strange reason placed in <asm/io.h> where it doesn't fit at all. phys_to_page is only provided by a few architectures despite having a lot of open coded users. Provide generic versions in <asm-generic/memory_model.h> to make these helpers more easily usable. Note with this patch powerpc loses the CONFIG_DEBUG_VIRTUAL pfn_valid check. It will be added back in a generic version later. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-10-29asm-generic/io.h: Remove I/O port accessors for HAS_IOPORT=nNiklas Schnelle1-0/+60
With all subsystems and drivers either declaring their dependence on HAS_IOPORT or fencing I/O port specific code sections we can finally make inb()/outb() and friends compile-time dependent on HAS_IOPORT as suggested by Linus in the linked mail. The main benefit of this is that on platforms such as s390 which have no meaningful way of implementing inb()/outb() their use without the proper HAS_IOPORT dependency will result in easy to catch and fix compile-time errors instead of compiling code that can never work. Link: https://lore.kernel.org/lkml/CAHk-=wg80je=K7madF4e7WrRNp37e3qh6y10Svhdc7O8SZ_-8g@mail.gmail.com/ Co-developed-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Maciej W. Rozycki <macro@orcam.me.uk> Acked-by: Damien Le Moal <dlemoal@kernel.org> Acked-by: Jaroslav Kysela <perex@perex.cz> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> # for ARCH=um Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Kalle Valo <kvalo@kernel.org> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Sebastian Reichel <sre@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-10-16delay: Rework udelay and ndelayAnna-Maria Behnsen1-28/+37
udelay() as well as ndelay() are defines and no functions and are using constants to be able to transform a sleep time into loops and to prevent too long udelays/ndelays. There was a compiler error with non-const 8 bit arguments which was fixed by commit a87e553fabe8 ("asm-generic: delay.h fix udelay and ndelay for 8 bit args"). When using a function, the non-const 8 bit argument is type casted and the problem would be gone. Transform udelay() and ndelay() into proper functions, remove the no longer and confusing division, add defines for the magic values and add some explanations as well. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/all/20241014-devel-anna-maria-b4-timers-flseep-v3-6-dc8b907cb62f@linutronix.de