mm: hugetlb: improve parallel huge page allocation time - kernel/linux.git

diff options

author	Thomas Prescher <thomas.prescher@cyberus-technology.de>	2025-02-28 01:45:05 +0300
committer	Andrew Morton <akpm@linux-foundation.org>	2025-03-17 10:05:36 +0300
commit	c34b3eceeac64d59f8b475046501faa5d8daa5a4 (patch)
tree	177de6eedf37e73a9c5b557df2986e6397d06ae6 /include
parent	3dc30ef64ba6b0f4c2a38aec7e87691f2d859b84 (diff)
download	linux-c34b3eceeac64d59f8b475046501faa5d8daa5a4.tar.xz

mm: hugetlb: improve parallel huge page allocation time

Patch series "Add a command line option that enables control of how many threads should be used to allocate huge pages", v2. Allocating huge pages can take a very long time on servers with terabytes of memory even when they are allocated at boot time where the allocation happens in parallel. Before this series, the kernel used a hard coded value of 2 threads per NUMA node for these allocations. This value might have been good enough in the past but it is not sufficient to fully utilize newer systems. This series changes the default so the kernel uses 25% of the available hardware threads for these allocations. In addition, we allow the user that wish to micro-optimize the allocation time to override this value via a new kernel parameter. We tested this on 2 generations of Xeon CPUs and the results show a big improvement of the overall allocation time. +-----------------------+-------+-------+-------+-------+-------+ | threads | 8 | 16 | 32 | 64 | 128 | +-----------------------+-------+-------+-------+-------+-------+ | skylake 144 cpus | 44s | 22s | 16s | 19s | 20s | | cascade lake 192 cpus | 39s | 20s | 11s | 10s | 9s | +-----------------------+-------+-------+-------+-------+-------+ On skylake, we see an improvment of 2.75x when using 32 threads, on cascade lake we can get even better at 4.3x when we use 128 threads. This speedup is quite significant and users of large machines like these should have the option to make the machines boot as fast as possible. This patch (of 3): Before this patch, the kernel currently used a hard coded value of 2 threads per NUMA node for these allocations. This patch changes this policy and the kernel now uses 25% of the available hardware threads for the allocations. Link: https://lkml.kernel.org/r/20250227-hugepage-parameter-v2-0-7db8c6dc0453@cyberus-technology.de Link: https://lkml.kernel.org/r/20250227-hugepage-parameter-v2-1-7db8c6dc0453@cyberus-technology.de Signed-off-by: Thomas Prescher <thomas.prescher@cyberus-technology.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Diffstat (limited to 'include')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: