kernel/linux.git/mm/filemap.c, branch v7.2-rc1

Merge tag 'mm-stable-2026-06-23-08-55' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

2026-06-23T19:03:44+00:00

Pull more MM updates from Andrew Morton: - "khugepaged: add mTHP collapse support" (Nico Pache) Provide khugepaged with the capability to collapse anonymous memory regions to mTHPs - "Remove CONFIG_READ_ONLY_THP_FOR_FS and enable file THP for writable files" (Zi Yan) Remove the READ_ONLY_THP_FOR_FS check in file_thp_enabled(), so that khugepaged and MADV_COLLAPSE can run on filesystems with PMD THP pagecache support even without READ_ONLY_THP_FOR_FS enabled - "make MM selftests more CI friendly" (Mike Rapoport) General fixes and cleanups to the MM selftests. Also move more MM selftests under the kselftest framework, making them more amenable to ongoing CI testing - "selftests/mm: fix failures and robustness improvements" and "selftests/mm: assorted fixes for hmm-tests" (Sayali Patil) Fix several issues in MM selftests which were revealed by powerpc 64k pagesize * tag 'mm-stable-2026-06-23-08-55' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (118 commits) Revert "mm: limit filemap_fault readahead to VMA boundaries" mm/vmscan: pass NULL to trace vmscan node reclaim mm: use mapping_mapped to simplify the code selftests/mm: fix exclusive_cow test fork() handling selftests/mm: remove hardcoded THP sizing assumptions in hmm tests selftests/mm: allow PUD-level entries in compound testcase of hmm tests mm/gup_test: reject wrapped user ranges mm/page_frag: reject invalid CPUs in page_frag_test mm/damon/core: always put unsuccessfully committed target pids mm: page_isolation: avoid unsafe folio reads while scanning compound pages mm/shrinker: do not hold RCU lock in shrinker_debugfs_count_show() selftests: mm: fix and speedup "droppable" test mm: merge writeout into pageout MAINTAINERS: add Hao Ge as reviewer for codetag and alloc_tag selftests/mm: clarify alternate unmapping in compaction_test selftests/mm: move hwpoison setup into run_test() and silence modprobe output for memory-failure category selftests/mm: skip uffd-stress test when nr_pages_per_cpu is zero selftests/mm: skip uffd-wp-mremap if UFFD write-protect is unsupported selftests/mm: ensure destination is hugetlb-backed in hugetlb-mremap selftest/mm: register existing mapping with userfaultfd in hugetlb-mremap ...

Revert "mm: limit filemap_fault readahead to VMA boundaries"

2026-06-21T18:37:38+00:00

This reverts commit 7b32f64bc512b40b268776c5ac4d354b325b3197. This patch caused a significant performance regression, so revert it, and we can determine whether the approach is sensible or not moving forwards, and if so how to avoid this. There was a merge conflict with commit de97ae6222c1 ("mm/readahead: no PG_readahead on EOF"), care was taken to ensure that the revert retained the behaviour of this patch and cleanly reverts commit 7b32f64bc512 ("mm: limit filemap_fault readahead to VMA boundaries") only. Link: https://lore.kernel.org/20260619112852.104213-1-ljs@kernel.org Fixes: 7b32f64bc512 ("mm: limit filemap_fault readahead to VMA boundaries") Signed-off-by: Lorenzo Stoakes Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-lkp/202606181547.617a6967-lkp@intel.com Acked-by: David Hildenbrand (Arm) Reviewed-by: Pedro Falcato Reviewed-by: Matthew Wilcox (Oracle) Cc: Jan Kara Cc: Kalesh Singh Signed-off-by: Andrew Morton

mm: fs: remove filemap_nr_thps*() functions and their users

2026-06-21T18:37:15+00:00

They are used by READ_ONLY_THP_FOR_FS to handle writes to FSes without large folio support, so that read-only THPs created in these FSes are not seen by the FSes when the underlying fd becomes writable. Now read-only PMD THPs only appear in a FS with large folio support and the supported orders include PMD_ORDER. READ_ONLY_THP_FOR_FS was using mapping->nr_thps, inode->i_writecount, and smp_mb() to prevent writes to a read-only THP and collapsing writable folios into a THP. In collapse_file(), mapping->nr_thps is increased, then smp_mb(), and if inode->i_writecount > 0, collapse is stopped, while do_dentry_open() first increases inode->i_writecount, then a full memory fence, and if mapping->nr_thps > 0, all read-only THPs are truncated. Now this mechanism can be removed along with READ_ONLY_THP_FOR_FS code, since a dirty folio check has been added after try_to_unmap() in collapse_file() to prevent dirty folios from being collapsed as clean. Link: https://lore.kernel.org/20260517135416.1434539-7-ziy@nvidia.com Signed-off-by: Zi Yan Reviewed-by: Matthew Wilcox (Oracle) Acked-by: David Hildenbrand (Arm) Reviewed-by: Baolin Wang Reviewed-by: Lance Yang Cc: Al Viro Cc: Barry Song Cc: Chris Mason Cc: Christian Brauner Cc: David Sterba Cc: Dev Jain Cc: Jan Kara Cc: Liam Howlett Cc: Lorenzo Stoakes Cc: Michal Hocko Cc: Mike Rapoport Cc: Nico Pache Cc: Ryan Roberts Cc: Shuah Khan Cc: Song Liu Cc: Suren Baghdasaryan Cc: Vlastimil Babka Signed-off-by: Andrew Morton

Merge tag 'mm-stable-2026-06-18-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

2026-06-19T17:14:34+00:00

Pull MM updates from Andrew Morton: - "selftests/mm: clean up build output and verbosity" (Li Wang) Remove some noise from the MM selftests build - "mm: Free contiguous order-0 pages efficiently" (Ryan Roberts) Speed up the freeing of a batch of 0-order pages by first scanning them for coalescing opportunities. This is applicable to vfree() and to the releasing of frozen pages - "mm/damon: introduce DAMOS failed region quota charge ratio" (SeongJae Park) Address a DAMOS usability issue: The DAMOS quota often exhausts prematurely because it charges for all memory attempted, causing slow and inconsistent performance when actions fail on unreclaimable memory. To fix this, a new feature lets users set a smaller, flexible quota charge ratio (via a numerator and denominator) for failed regions. Since failed actions cause less overhead, reducing their quota cost ensures more predictable and efficient DAMOS processing - "selftests/cgroup: improve zswap tests robustness and support large page sizes" (Li Wang) Fix various spurious failures and improves the overall robustness of the cgroup zswap selftests - "fix MAP_DROPPABLE not supported errno" (Anthony Yznaga) Fix an issue in the mlock selftests on arm32 - "mm: huge_memory: clean up defrag sysfs with shared" (Breno Leitao) Some maintenance work in the huge_memory code - "treewide: fixup gfp_t printks" (Brendan Jackman) Use the special vprintf() gfp_t conversion in various places - "mm: Fix vmemmap optimization accounting and initialization" (Muchun Song) Fix several bugs in the vmemmap optimization, mainly around incorrect page accounting and memmap initialization in the DAX and memory hotplug paths. It also fixes pageblock migratetype initialization and struct page initialization for ZONE_DEVICE compound pages - "mm/damon: repost non-hotfix reviewed patches in damon/next tree" A sprinkle of unrelated minor bugfixes for DAMON - "mm: remove page_mapped()" (David Hildenbrand) Remove this function from the tree, replacing it with folio_mapped() - "mm/damon: let DAMON be paused and resumed" (SeongJae Park) Allow DAMON to be paused and resumed without losing its current state - "kasan: hw_tags: Disable tagging for stack and page-tables" (Muhammad Usama Anjum) Simplify and speed up kasan by removing its ineffective tagging of stacks and page tables - "mm/damon/reclaim,lru_sort: monitor all system rams by default" (SeongJae Park) Simplify deployment on diverse hardware like NUMA systems by updating DAMON_RECLAIM and DAMON_LRU_SORT to automatically monitor the physical address range covering all System RAM areas by default, replacing the overly restrictive behavior that only targeted the single largest memory block to save on negligible overhead - "mm/damon/sysfs: document filters/ directory as deprecated" (SeongJae Park) Update some DAMON docs - "mm: use spinlock guards for zone lock" (Dmitry Ilvokhin) Switch zone->lock handling over to using the guard() mechanisms - "mm/filemap: tighten mmap_miss hit accounting" (fujunjie) Fix a flaw where the mmap_miss counter over-credited page cache hits during fault-arounds and page-fault retries. This results in significant reduction of redundant synchronous mmap readahead I/O, drastically cutting down execution time and gigabytes read for sparse random or strided memory access workloads - "selftests/cgroup: Fix false positive failures in test_percpu_basic" (Li Wang) Fix a couple of false-positives in the cgroup kmem selftests - "mm/damon/reclaim: support monitoring intervals auto-tuning" (SeongJae Park) Add a new parameter to DAMON permitting DAMON_RECLAIM to automatically tune DAMON's sampling and aggregation intervals - "mm/damon/stat: add kdamond_pid parameter" (SeongJae Park) Change DAMON_STAT to provide the pid of its kdamond - "mm/kmemleak: dedupe verbose scan output" (Breno Leitao) Remove large amounts of duplicated backtraces from the verbose-mode kmemleak output - "mm: remove CONFIG_HAVE_BOOTMEM_INFO_NODE (Part 1)" (David Hildenbrand) Reduce our use of CONFIG_HAVE_BOOTMEM_INFO_NODE, with a view to removing it entirely in a later series - "mm/damon: validate min_region_size to be power of 2" (Liew Rui Yan) Prevent users from passing a non-power-of-2 value of `addr_unit', as this later results in undesirable behavior - "mm: document read_pages and simplify usage" (Frederick Mayle) - "tools/mm/page-types: Fix misc bugs" (Ye Liu) Fix three issues in tools/mm/page-types.c - "mm: misc cleanups from __GFP_UNMAPPED series" (Brendan Jackman) Implement several cleanups in the page allocator and related code - "mm, swap: swap table phase IV: unify allocation" (Kairui Song) Unify the allocation and charging of anon and shmem swap in folios, provides better synchronization, consolidates the metadata management, hence dropping the static array and map, and improves performance - "mm/damon: introduce data attributes monitoring" (SeongJae Park( Extend DAMON to monitor general data attributes other than accesses - "mm/vmalloc: free unused pages on vrealloc() shrink" (Shivam Kalra) Implement the TODO in vrealloc() to unmap and free unused pages when shrinking across a page boundary - "mm/damon: documentation and comment fixes" (niecheng) - "remove mmap_action success, error hooks" (Lorenzo Stoakes) Eliminate custom hooks from mmap_action by removing the problematic success_hook which allowed drivers to improperly access uninitialized VMAs. It replaces the error_hook with a simple error-code field and updates the memory char driver accordingly - "mm/damon: minor improvements for code readability and tests" (SeongJae Park) - "mm/damon: fix macro arguments and clarify quota goals doc" (Maksym Shcherba) - "userfaultfd: merge fs/userfaultfd.c into mm/userfaultfd.c" (Mike Rapoport) - "mm/mglru: improve reclaim loop and dirty folio" (Kairui Song and others) Clean up and slightly improves MGLRU's reclaim loop and dirty writeback handling. Large performance improvements are measured - "use vma locks for proc/pid/{smaps|numa_maps} reads" (Suren Baghdasaryan) Use per-vma locks when reading /proc/pid/smaps and numa_maps similar to reduce contention on central mmap_lock - "refactors thpsize_shmem_enabled_store() and thpsize_shmem_enabled_show()" (Ran Xiaokai) Some cleanup work in the THP code - "selftests/memfd: fix compilation warnings" (Konstantin Khorenko) Fix a few build glitches in the memfd selftest code. - "memcg: shrink obj_stock_pcp and cache multiple objcgs" (Shakeel Butt) Resolve a 68% performance regression caused by NUMA-node cache thrashing around struct obj_stock_pcp by shrinking its existing fields and expanding it into a multi-slot array that caches up to five obj_cgroup pointers per CPU, allowing per-node variants of the same memcg to coexist within a single 64-byte cache line. - "zram: writeback fixes" (Sergey Senozhatsky) address a couple of unrelated zram writeback issues - "mm: switch THP shrinker to list_lru" (Johannes Weiner) Resolve NUMA-awareness issues and streamlines callsite interaction by refactoring and extending the list_lru API to completely replace the complex, open-coded deferred split queue for Transparent Huge Pages - "mm: improve large folio readahead for exec memory" (Usama Arif) Improve large-folio readahead on systems like 64K-page arm64 by preventing the mmap_miss check from permanently disabling target-oriented VM_EXEC readahead, and by generalizing the force_thp_readahead gate to support mappings with any usefully large maximum folio order under the cache cap. - "userfaultfd/pagemap: pre-existing fixes" (Kiryl Shutsemau) Fix a bunch of minor issues in the userfaultfd/pagemap, all of which were flagged by Sashiko review of proposed new material - "mm/sparse-vmemmap: Provide generic vmemmap_set_pmd() and vmemmap_check_pmd()" (Muchun Song) Provide generic versions of these two functions so the four arch-specific implementations can be removed. - "mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device" (Youngjun Park) Address a uswsusp-vs-swapoff race and reduces the swap device reference taking/releasing frequency. - "mm/hmm: A fix and a selftest" (Dev Jain) * tag 'mm-stable-2026-06-18-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) selftests/mm/hmm-tests: test pagemap reads of PMD device-private entries fs/proc/task_mmu: do not warn on seeing non-migration pmd entry lib/test_hmm: check alloc_page_vma() return value and handle OOM mm/compaction: cap compact_gap() at COMPACT_CLUSTER_MAX mm/swap: remove redundant swap device reference in alloc/free mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device mm/filemap: use folio_next_index() for start vmalloc: fix NULL pointer dereference in is_vm_area_hugepages() sparc/mm: drop vmemmap_check_pmd helper and use generic code loongarch/mm: drop vmemmap_check_pmd helper and use generic code riscv/mm: drop vmemmap_pmd helpers and use generic code arm64/mm: drop vmemmap_pmd helpers and use generic code mm/sparse-vmemmap: provide generic vmemmap_set_pmd() and vmemmap_check_pmd() rust: page: mark Page::nid as inline userfaultfd: build __VMA_UFFD_FLAGS from config-gated masks userfaultfd: gate must_wait writability check on pte_present() mm/huge_memory: preserve pmd_swp_uffd_wp on device-private PMD downgrade fs/proc/task_mmu: fix hugetlb self-deadlock in pagemap_scan_pte_hole() fs/proc/task_mmu: use huge_page_size() in pagemap_scan_hugetlb_entry() fs/proc/task_mmu: fix make_uffd_wp_huge_pte() prot-update race ...

mm/filemap: use folio_next_index() for start

2026-06-09T01:21:31+00:00

Use folio_next_index() instead of open-coding folio->index + folio_nr_pages(folio) when updating @start in filemap_get_folios_contig(), filemap_get_folios_tag(), and filemap_get_folios_dirty(). Link: https://lore.kernel.org/20260601110425.44784-1-tanze@kylinos.cn Signed-off-by: tanze Reviewed-by: Jan Kara Reviewed-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton

mm: use mapping_max_folio_order() for force_thp_readahead order

2026-06-09T01:21:26+00:00

The force_thp_readahead path in do_sync_mmap_readahead() is gated on HPAGE_PMD_ORDER <= MAX_PAGECACHE_ORDER and always requests HPAGE_PMD_ORDER / HPAGE_PMD_NR. On configurations where HPAGE_PMD_ORDER exceeds MAX_PAGECACHE_ORDER, notably arm64 with a 64K base page size, VM_HUGEPAGE mappings cannot use this path and fall back to the non-forced mmap readahead path even when the mapping supports useful large folios. Enable forced readahead for mappings that support large folios and request the max folio order supported by the mapping, capped at 2M. 2MB is chosen as the cap because it matches the PMD size on x86_64 and on arm64 with 4K base pages, so the size/memory-pressure tradeoff for folios of that size is already well understood. On arm64 with 16K and 64K base page sizes, 2MB is also the contiguous-PTE (contpte) block size, so the resulting folios coalesce into a single TLB entry and reduce TLB pressure on the readahead path. This will result in 32M folios not being faulted in with 16K base page size for arm64, but with contpte, the performance difference should be negligible. The final allocation order may still be clamped by page_cache_ra_order() to the mapping and request geometry, but this gives VM_HUGEPAGE mappings on such configurations a large-folio readahead request instead of dropping back to base-page readahead. Link: https://lore.kernel.org/20260601102205.3985788-3-usama.arif@linux.dev Signed-off-by: Usama Arif Reviewed-by: Jan Kara Reviewed-by: Pedro Falcato Cc: Alistair Popple Cc: Al Viro Cc: Baolin Wang Cc: Barry Song Cc: Catalin Marinas Cc: Christian Brauner Cc: David Hildenbrand Cc: Dev Jain Cc: Heiher Cc: Johannes Weiner Cc: Kees Cook Cc: Kevin Brodsky Cc: Lance Yang Cc: Liam R. Howlett Cc: Lorenzo Stoakes Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Mike Rapoport Cc: Nico Pache Cc: Pasha Tatashin Cc: Rohan McLure Cc: Ryan Roberts Cc: Shakeel Butt Cc: Suren Baghdasaryan Cc: Vlastimil Babka Cc: Zi Yan Cc: Kiryl Shutsemau (Meta) Cc: Oscar Salvador (SUSE) Signed-off-by: Andrew Morton

mm: bypass mmap_miss heuristic for VM_EXEC readahead

2026-06-09T01:21:26+00:00

Patch series "mm: improve large folio readahead for exec memory", v7. Two checks in do_sync_mmap_readahead() limit large-folio readahead: 1. The mmap_miss heuristic is meant to throttle wasteful speculative readahead. It is currently also applied to the VM_EXEC readahead path, which is targeted rather than speculative. Once mmap_miss exceeds MMAP_LOTSAMISS, exec readahead - including the large-folio order requested by exec_folio_order() - is disabled. On configurations where the mmap_miss decrement paths are not active (see patch 1) the counter only grows, so exec readahead is permanently disabled after the first 100 faults. 2. The force_thp_readahead path is gated only on HPAGE_PMD_ORDER <= MAX_PAGECACHE_ORDER and always drives the readahead at HPAGE_PMD_ORDER. Configurations where HPAGE_PMD_ORDER exceeds MAX_PAGECACHE_ORDER never reach this path, even when the mapping itself supports usefully large folios well below the cap. Both issues are most visible on arm64 with a 64K base page size, where HPAGE_PMD_ORDER is 13 (512MB) -- above MAX_PAGECACHE_ORDER (11) -- and where fault_around_pages collapses to 1 disabling should_fault_around() (one of the two mmap_miss decrement sites). However the fixes are architecture-agnostic: patch 1 reflects the nature of VM_EXEC readahead regardless of base page size, and patch 2 generalises the gate so any mapping advertising a usefully large maximum folio order can benefit. I created a benchmark that mmaps a large executable file madvises it as huge and calls RET-stub functions at PAGE_SIZE offsets across it. "Cold" measures fault + readahead cost. "Random" first faults in all pages with a sequential sweep (not measured), then measures time for calling random offsets, isolating iTLB miss cost for scattered execution. The benchmark results on Neoverse V2 (Grace), arm64 with 64K base pages, 512MB executable file on ext4, averaged over 3 runs: Phase | Baseline | Patched | Improvement -----------|--------------|--------------|------------------ Cold fault | 83.4 ms | 41.3 ms | 50% faster Random | 76.0 ms | 58.3 ms | 23% faster This patch (of 2): The mmap_miss heuristic is intended to stop speculative mmap readahead when a file looks like a random-access workload. That does not fit the VM_EXEC path very well. VM_EXEC readahead is already constrained differently from ordinary mmap read-around: it is bounded by the VMA, uses exec_folio_order() to choose an order useful for executable mappings, and sets async_size to 0 so it does not create follow-on readahead. When VM_HUGEPAGE is also present, the larger readahead is an explicit userspace opt-in. The mmap_miss counter is decremented from cache-hit paths in do_async_mmap_readahead() and filemap_map_pages(). Those paths are not always enough to balance the synchronous miss increments for executable mappings. In particular, when fault-around is effectively disabled, such as configurations where fault_around_pages is 1, filemap_map_pages() is not reached from the fault path. The counter can then become a stale throttle for VM_EXEC mappings and suppress the readahead behavior that the executable-specific path is trying to provide. Skip both mmap_miss increments and decrements for VM_EXEC mappings, matching the existing VM_SEQ_READ treatment and keeping the counter accounting symmetric. Link: https://lore.kernel.org/20260601102205.3985788-1-usama.arif@linux.dev Link: https://lore.kernel.org/20260601102205.3985788-2-usama.arif@linux.dev Signed-off-by: Usama Arif Reviewed-by: Jan Kara Reviewed-by: Kiryl Shutsemau (Meta) Reviewed-by: Oscar Salvador (SUSE) Reviewed-by: Pedro Falcato Cc: Alistair Popple Cc: Al Viro Cc: Baolin Wang Cc: Barry Song Cc: Catalin Marinas Cc: Christian Brauner Cc: David Hildenbrand Cc: Dev Jain Cc: Heiher Cc: Johannes Weiner Cc: Kees Cook Cc: Kevin Brodsky Cc: Lance Yang Cc: Liam R. Howlett Cc: Lorenzo Stoakes Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Mike Rapoport Cc: Nico Pache Cc: Pasha Tatashin Cc: Rohan McLure Cc: Ryan Roberts Cc: Shakeel Butt Cc: Suren Baghdasaryan Cc: Vlastimil Babka Cc: Zi Yan Signed-off-by: Andrew Morton

mm: make mmap_miss accounting symmetric for VM_SEQ_READ

2026-06-04T21:45:06+00:00

do_sync_mmap_readahead() skips both the mmap_miss increment and the MMAP_LOTSAMISS check for VM_SEQ_READ mappings, since sequential access is non-speculative and should always read ahead. The two decrement sites in do_async_mmap_readahead() and filemap_map_pages() do not mirror this skip, so concurrent faults on a VM_SEQ_READ mapping can still drive ra->mmap_miss down to zero through the decrement paths even though nothing in the sync path ever increments it. The counter itself is per-file (file->f_ra.mmap_miss), so it can be moved by any VMA mapping the file, not just the one currently faulting. Skip the decrement for VM_SEQ_READ in both decrement sites so the counter only moves for mappings that also participate in the increment side. No functional change for VM_SEQ_READ users, since the increment-side gate already prevents the counter from being consulted on their behalf, but it stops a VM_SEQ_READ mapping from biasing the counter for other mappings of the same file. Link: https://lore.kernel.org/20260525145751.2671248-1-usama.arif@linux.dev Signed-off-by: Usama Arif Closes: https://lore.kernel.org/all/8edc8cd0-f65c-4456-9b3f-362e744c9a96@linux.dev/ Reviewed-by: William Kucharski Reviewed-by: Jan Kara Cc: David Hildenbrand Cc: Johannes Weiner Cc: Matthew Wilcox (Oracle) Cc: Shakeel Butt Signed-off-by: Andrew Morton

mm: track DONTCACHE dirty pages per bdi_writeback

2026-06-04T08:16:50+00:00

Add a per-wb WB_DONTCACHE_DIRTY counter that tracks the number of dirty pages with the dropbehind flag set (i.e., pages dirtied via RWF_DONTCACHE writes). Increment the counter alongside WB_RECLAIMABLE in folio_account_dirtied() when the folio has the dropbehind flag set, and decrement it in folio_clear_dirty_for_io() and folio_account_cleaned(). Also decrement it when a non-DONTCACHE lookup atomically clears the dropbehind flag on a dirty folio in __filemap_get_folio_mpol(), using folio_test_clear_dropbehind() to prevent concurrent lookups from double-decrementing the counter, and guarding the decrement with mapping_can_writeback() to match the increment path. Transfer the counter alongside WB_RECLAIMABLE in inode_do_switch_wbs() so that the stat is properly migrated when an inode switches cgroup writeback domains. The counter will be used by the writeback flusher to determine how many pages to write back when expediting writeback for IOCB_DONTCACHE writes, without flushing the entire BDI's dirty pages. Suggested-by: Jan Kara Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Jeff Layton Link: https://patch.msgid.link/20260511-dontcache-v7-2-2848ddce8090@kernel.org Reviewed-by: Jan Kara Reviewed-by: Ritesh Harjani (IBM) Signed-off-by: Christian Brauner (Amutable)

mm/filemap: fix page_cache_prev_miss() when no hole is found

2026-06-02T22:22:18+00:00

page_cache_prev_miss() is documented to return a value outside the searched range when no gap is found. However, the no-gap-found path returns xas.xa_index, which after a successful loop is the first index in the range. As such, that index is misreported as a gap. The sole caller, page_cache_sync_ra(), uses the return value to estimate the cached run preceding a sequential read. In some cases, the buggy return value can undercount the contiguous range by one, shrinking the readahead window or pushing borderline requests into the small-random-read branch. Fix this by returning the start of the range - 1 when no hole is found. Update page_cache_next_miss() for clarity as well. Both helpers were previously fixed together in commit 9425c591e06a ("page cache: fix page_cache_next/prev_miss off by one"), but the fix was reverted because it caused a hugetlb performance regression. hugetlb no longer uses these functions and next_miss was subsequently refixed in commit 901a269ff3d5 ("filemap: fix page_cache_next_miss() when no hole found") and commit bbcaee20e03e ("readahead: fix return value of page_cache_next_miss() when no hole is found"), but prev_miss was not addressed. This was found by pointing Claude Opus 4.7 at mm/filemap.c. Link: https://lore.kernel.org/20260512-prev_miss_fix-v2-1-4af8e5c1ae62@columbia.edu Fixes: 0d3f92966629 ("page cache: Convert hole search to XArray") Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Tal Zussman Reviewed-by: Jan Kara Reviewed-by: Vishal Moola Cc: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton