kernel/linux.git/arch/arc/include/asm/page.h, branch linux-7.0.y

treewide: provide a generic clear_user_page() variant

2026-01-21T03:24:39+00:00

Patch series "mm: folio_zero_user: clear page ranges", v11. This series adds clearing of contiguous page ranges for hugepages. The series improves on the current discontiguous clearing approach in two ways: - clear pages in a contiguous fashion. - use batched clearing via clear_pages() wherever exposed. The first is useful because it allows us to make much better use of hardware prefetchers. The second, enables advertising the real extent to the processor. Where specific instructions support it (ex. string instructions on x86; "mops" on arm64 etc), a processor can optimize based on this because, instead of seeing a sequence of 8-byte stores, or a sequence of 4KB pages, it sees a larger unit being operated on. For instance, AMD Zen uarchs (for extents larger than LLC-size) switch to a mode where they start eliding cacheline allocation. This is helpful not just because it results in higher bandwidth, but also because now the cache is not evicting useful cachelines and replacing them with zeroes. Demand faulting a 64GB region shows performance improvement: $ perf bench mem mmap -p $pg-sz -f demand -s 64GB -l 5 baseline +series (GBps +- %stdev) (GBps +- %stdev) pg-sz=2MB 11.76 +- 1.10% 25.34 +- 1.18% [*] +115.47% preempt=* pg-sz=1GB 24.85 +- 2.41% 39.22 +- 2.32% + 57.82% preempt=none|voluntary pg-sz=1GB (similar) 52.73 +- 0.20% [#] +112.19% preempt=full|lazy [*] This improvement is because switching to sequential clearing allows the hardware prefetchers to do a much better job. [#] For pg-sz=1GB a large part of the improvement is because of the cacheline elision mentioned above. preempt=full|lazy improves upon that because, not needing explicit invocations of cond_resched() to ensure reasonable preemption latency, it can clear the full extent as a single unit. In comparison the maximum extent used for preempt=none|voluntary is PROCESS_PAGES_NON_PREEMPT_BATCH (32MB). When provided the full extent the processor forgoes allocating cachelines on this path almost entirely. (The hope is that eventually, in the fullness of time, the lazy preemption model will be able to do the same job that none or voluntary models are used for, allowing us to do away with cond_resched().) Raghavendra also tested previous version of the series on AMD Genoa and sees similar improvement [1] with preempt=lazy. $ perf bench mem map -p $page-size -f populate -s 64GB -l 10 base patched change pg-sz=2MB 12.731939 GB/sec 26.304263 GB/sec 106.6% pg-sz=1GB 26.232423 GB/sec 61.174836 GB/sec 133.2% This patch (of 8): Let's drop all variants that effectively map to clear_page() and provide it in a generic variant instead. We'll use the macro clear_user_page to indicate whether an architecture provides it's own variant. Also, clear_user_page() is only called from the generic variant of clear_user_highpage(), so define it only if the architecture does not provide a clear_user_highpage(). And, for simplicity define it in linux/highmem.h. Note that for parisc, clear_page() and clear_user_page() map to clear_page_asm(), so we can just get rid of the custom clear_user_page() implementation. There is a clear_user_page_asm() function on parisc, that seems to be unused. Not sure what's up with that. Link: https://lkml.kernel.org/r/20260107072009.1615991-1-ankur.a.arora@oracle.com Link: https://lkml.kernel.org/r/20260107072009.1615991-2-ankur.a.arora@oracle.com Signed-off-by: David Hildenbrand Co-developed-by: Ankur Arora Signed-off-by: Ankur Arora Cc: Andy Lutomirski Cc: Ankur Arora Cc: "Borislav Petkov (AMD)" Cc: Boris Ostrovsky Cc: David Hildenbrand Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Konrad Rzessutek Wilk Cc: Lance Yang Cc: "Liam R. Howlett" Cc: Li Zhe Cc: Lorenzo Stoakes Cc: Mateusz Guzik Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Mike Rapoport Cc: Peter Zijlstra Cc: Raghavendra K T Cc: Suren Baghdasaryan Cc: Thomas Gleixner Cc: Vlastimil Babka Signed-off-by: Andrew Morton

ARC: Replace ASSEMBLY with ASSEMBLER in the non-uapi headers

2025-06-09T16:18:12+00:00

While the GCC and Clang compilers already define __ASSEMBLER__ automatically when compiling assembly code, __ASSEMBLY__ is a macro that only gets defined by the Makefiles in the kernel. This can be very confusing when switching between userspace and kernelspace coding, or when dealing with uapi headers that rather should use __ASSEMBLER__ instead. So let's standardize on the __ASSEMBLER__ macro that is provided by the compilers now. This is a completely mechanical patch (done with a simple "sed -i" statement). Cc: linux-snps-arc@lists.infradead.org Signed-off-by: Thomas Huth Signed-off-by: Vineet Gupta

ARC: mm: Make virt_to_pfn() a static inline

2023-12-05T12:11:37+00:00

Making virt_to_pfn() a static inline taking a strongly typed (const void *) makes the contract of a passing a pointer of that type to the function explicit and exposes any misuse of the macro virt_to_pfn() acting polymorphic and accepting many types such as (void *), (unitptr_t) or (unsigned long) as arguments without warnings. In order to do this we move the virt_to_phys() and below the definition of the __pa() and __va() macros so it compiles. The macro version was also able to do recursive symbol resolution. Signed-off-by: Linus Walleij Signed-off-by: Arnd Bergmann

csky: Cast argument to virt_to_pfn() to (void *)

2023-08-11T02:12:33+00:00

The virt_to_pfn() function takes a (void *) as argument, fix this up to avoid exploiting the unintended polymorphism of virt_to_pfn. Signed-off-by: Linus Walleij Signed-off-by: Guo Ren Signed-off-by: Guo Ren

mm, arch: add generic implementation of pfn_valid() for FLATMEM

2023-02-10T00:51:41+00:00

Every architecture that supports FLATMEM memory model defines its own version of pfn_valid() that essentially compares a pfn to max_mapnr. Use mips/powerpc version implemented as static inline as a generic implementation of pfn_valid() and drop its per-architecture definitions. [rppt@kernel.org: fix the generic pfn_valid()] Link: https://lkml.kernel.org/r/Y9lg7R1Yd931C+y5@kernel.org Link: https://lkml.kernel.org/r/20230129124235.209895-5-rppt@kernel.org Signed-off-by: Mike Rapoport (IBM) Acked-by: Arnd Bergmann Acked-by: Guo Ren [csky] Acked-by: Huacai Chen [LoongArch] Acked-by: Stafford Horne [OpenRISC] Acked-by: Michael Ellerman [powerpc] Reviewed-by: David Hildenbrand Tested-by: Conor Dooley Cc: Brian Cain Cc: "David S. Miller" Cc: Dinh Nguyen Cc: Geert Uytterhoeven Cc: Greg Ungerer Cc: Helge Deller Cc: Huacai Chen Cc: Matt Turner Cc: Max Filippov Cc: Michal Simek Cc: Palmer Dabbelt Cc: Richard Weinberger Cc: Rich Felker Cc: Russell King Cc: Thomas Bogendoerfer Cc: Vineet Gupta Cc: WANG Xuerui Cc: Yoshinori Sato Signed-off-by: Andrew Morton

ARC: mm: support 4 levels of page tables

2021-08-26T20:43:19+00:00

Acked-by: Mike Rapoport Signed-off-by: Vineet Gupta

ARC: mm: support 3 levels of page tables

2021-08-26T20:43:19+00:00

ARCv2 MMU is software walked and Linux implements 2 levels of paging: pgd/pte. Forthcoming hw will have multiple levels, so this change preps mm code for same. It is also fun to try multi levels even on soft-walked code to ensure generic mm code is robust to handle. overview ________ 2 levels {pgd, pte} : pmd is folded but pmd_* macros are valid and operate on pgd 3 levels {pgd, pmd, pte}: - pud is folded and pud_* macros point to pgd - pmd_* macros operate on actual pmd code changes ____________ 1. #include 2. Define CONFIG_PGTABLE_LEVELS 3 3a. Define PMD_SHIFT, PMD_SIZE, PMD_MASK, pmd_t 3b. Define pmd_val() which actually deals with pmd (pmd_offset(), pmd_index() are provided by generic code) 3c. pmd_alloc_one()/pmd_free() also provided by generic code (pmd_populate/pmd_free already exist) 4. Define pud_none(), pud_bad() macros based on generic pud_val() which internally pertains to pgd now. 4b. define pud_populate() to just setup pgd Acked-by: Mike Rapoport Signed-off-by: Vineet Gupta

ARC: mm: switch pgtable_t back to struct page *

2021-08-26T20:42:42+00:00

So far ARC pgtable_t has not been struct page based to avoid extra page_address() calls involved. However the differences are down to noise and get in the way of using generic code, hence this patch. This also allows us to reuse generic THP depost/withdraw code. There's some additional consideration for PGDIR_SHIFT in 4K page config. Now due to page tables being PAGE_SIZE deep only, the address split can't be really arbitrary. Tested-by: kernel test robot Suggested-by: Mike Rapoport Acked-by: Mike Rapoport Signed-off-by: Vineet Gupta

ARC: mm: non-functional code movement/cleanup

2021-08-24T21:25:48+00:00

Signed-off-by: Vineet Gupta

ARC: mm: Enable STRICT_MM_TYPECHECKS

2021-08-24T21:25:48+00:00

In the past I've refrained from doing this (at least 2 times) due to the slight code bloat due to ABI implications of pte_t etc becoming struct Per ARC ABI, functions return struct via memory and not through register r0, even if the struct would fit in register(s) - caller allocates space on stack and passes the address as first arg (r0), shifting rest of args by one - callee creates return struct in memory (referenced via r0) This time around the code actually shrunk slightly (due to subtle inlining heuristic effects), but still slightly inefficient due to return values passed through memory. That however seems like a small cost compared to maintenance burden given the impending new mmu support for page walk etc Signed-off-by: Vineet Gupta

kernel/linux.git/arch/arc/include/asm/page.h, branch linux-7.0.y

treewide: provide a generic clear_user_page() variant

ARC: Replace __ASSEMBLY__ with __ASSEMBLER__ in the non-uapi headers

ARC: mm: Make virt_to_pfn() a static inline

csky: Cast argument to virt_to_pfn() to (void *)

mm, arch: add generic implementation of pfn_valid() for FLATMEM

ARC: mm: support 4 levels of page tables

ARC: mm: support 3 levels of page tables

ARC: mm: switch pgtable_t back to struct page *

ARC: mm: non-functional code movement/cleanup

ARC: mm: Enable STRICT_MM_TYPECHECKS

ARC: Replace ASSEMBLY with ASSEMBLER in the non-uapi headers