summaryrefslogtreecommitdiff
path: root/include/linux
diff options
context:
space:
mode:
authorMuchun Song <songmuchun@bytedance.com>2026-04-10 12:24:19 +0300
committerAndrew Morton <akpm@linux-foundation.org>2026-05-29 07:04:49 +0300
commitffe55393137c01aa01940b528afcea8c5a108ed7 (patch)
tree20c8507935888946489041fc821b5543107403b4 /include/linux
parent8c2c7df58b5433f614d603bbdffd85f2a392b74a (diff)
downloadlinux-ffe55393137c01aa01940b528afcea8c5a108ed7.tar.xz
mm/sparse: remove sparse buffer pre-allocation mechanism
Commit 9bdac9142407 ("sparsemem: Put mem map for one node together.") introduced a mechanism to pre-allocate a large memory block to hold all memmaps for a NUMA node upfront. However, the original commit message did not clearly state the actual benefits or the necessity of explicitly pre-allocating a single chunk for all memmap areas of a given node. One of the concerns about removing this pre-allocation is that the subsequent per-section memmap allocations could become scattered around, and might turn too many memory blocks/sections into an "un-offlinable" state. However, tests show that even without the explicit node-wide pre-allocation, memblock still allocates memory closely and back-to-back. When tracing vmemmap_set_pmd allocations, the physical chunks allocated by memblock are strictly adjacent to each other in a single contiguous physical range (mapped top-down). Because they are packed tightly together naturally, they will at most consume or pollute the exact same number of memory blocks as the explicit pre-allocation did. Another concern is the boot performance impact of calling memmap_alloc() multiple times compared to one large node-wide allocation. Tests on a 256GB VM showed that memmap allocation time increased from 199,555 ns to 741,292 ns. Even though it is 3.7x slower, on a 1TB machine, the entire memory allocation time would only take a few milliseconds. This boot performance difference is completely negligible. Since no negative impact on memory offlining behavior or noticeable boot performance regression was found, this patch proposes removing the explicit node-wide memmap pre-allocation mechanism to reduce the maintenance burden. Link: https://lore.kernel.org/20260410092419.2446420-1-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'include/linux')
-rw-r--r--include/linux/mm.h1
1 files changed, 0 insertions, 1 deletions
diff --git a/include/linux/mm.h b/include/linux/mm.h
index e3b6112a8d79..8a0078a4dc78 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4855,7 +4855,6 @@ static inline void print_vma_addr(char *prefix, unsigned long rip)
}
#endif
-void *sparse_buffer_alloc(unsigned long size);
unsigned long section_map_size(void);
struct page * __populate_section_memmap(unsigned long pfn,
unsigned long nr_pages, int nid, struct vmem_altmap *altmap,