diff options
author | Joao Martins <joao.m.martins@oracle.com> | 2022-01-15 01:04:15 +0300 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2022-01-15 17:30:25 +0300 |
commit | 5b24eeef06701cca6852f1bf768248ccc912819b (patch) | |
tree | 99b9a64f5b6de18d96403d188edad184de9a72f3 /mm | |
parent | 60115fa54ad7b913b7cb5844e6b7ffeb842d55f2 (diff) | |
download | linux-5b24eeef06701cca6852f1bf768248ccc912819b.tar.xz |
mm/page_alloc: split prep_compound_page into head and tail subparts
Patch series "mm, device-dax: Introduce compound pages in devmap", v7.
This series converts device-dax to use compound pages, and moves away
from the 'struct page per basepage on PMD/PUD' that is done today.
Doing so
1) unlocks a few noticeable improvements on unpin_user_pages() and
makes device-dax+altmap case 4x times faster in pinning (numbers
below and in last patch)
2) as mentioned in various other threads it's one important step
towards cleaning up ZONE_DEVICE refcounting.
I've split the compound pages on devmap part from the rest based on
recent discussions on devmap pending and future work planned[5][6].
There is consensus that device-dax should be using compound pages to
represent its PMD/PUDs just like HugeTLB and THP, and that leads to less
specialization of the dax parts. I will pursue the rest of the work in
parallel once this part is merged, particular the GUP-{slow,fast}
improvements [7] and the tail struct page deduplication memory savings
part[8].
To summarize what the series does:
Patch 1: Prepare hwpoisoning to work with dax compound pages.
Patches 2-3: Split the current utility function of prep_compound_page()
into head and tail and use those two helpers where appropriate to take
advantage of caches being warm after __init_single_page(). This is used
when initializing zone device when we bring up device-dax namespaces.
Patches 4-10: Add devmap support for compound pages in device-dax.
memmap_init_zone_device() initialize its metadata as compound pages, and
it introduces a new devmap property known as vmemmap_shift which
outlines how the vmemmap is structured (defaults to base pages as done
today). The property describe the page order of the metadata
essentially. While at it do a few cleanups in device-dax in patches
5-9. Finally enable device-dax usage of devmap @vmemmap_shift to a
value based on its own @align property. @vmemmap_shift returns 0 by
default (which is today's case of base pages in devmap, like fsdax or
the others) and the usage of compound devmap is optional. Starting with
device-dax (*not* fsdax) we enable it by default. There are a few
pinning improvements particular on the unpinning case and altmap, as
well as unpin_user_page_range_dirty_lock() being just as effective as
THP/hugetlb[0] pages.
$ gup_test -f /dev/dax1.0 -m 16384 -r 10 -S -a -n 512 -w
(pin_user_pages_fast 2M pages) put:~71 ms -> put:~22 ms
[altmap]
(pin_user_pages_fast 2M pages) get:~524ms put:~525 ms -> get: ~127ms put:~71ms
$ gup_test -f /dev/dax1.0 -m 129022 -r 10 -S -a -n 512 -w
(pin_user_pages_fast 2M pages) put:~513 ms -> put:~188 ms
[altmap with -m 127004]
(pin_user_pages_fast 2M pages) get:~4.1 secs put:~4.12 secs -> get:~1sec put:~563ms
Tested on x86 with 1Tb+ of pmem (alongside registering it with RDMA with
and without altmap), alongside gup_test selftests with dynamic dax
regions and static dax regions. Coupled with ndctl unit tests for
dynamic dax devices that exercise all of this. Note, for dynamic dax
regions I had to revert commit 8aa83e6395 ("x86/setup: Call
early_reserve_memory() earlier"), it is a known issue that this commit
broke efi_fake_mem=.
This patch (of 11):
Split the utility function prep_compound_page() into head and tail
counterparts, and use them accordingly.
This is in preparation for sharing the storage for compound page
metadata.
Link: https://lkml.kernel.org/r/20211202204422.26777-1-joao.m.martins@oracle.com
Link: https://lkml.kernel.org/r/20211202204422.26777-3-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'mm')
-rw-r--r-- | mm/page_alloc.c | 30 |
1 files changed, 20 insertions, 10 deletions
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c5952749ad40..20b9db0cf97c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -726,23 +726,33 @@ void free_compound_page(struct page *page) free_the_page(page, compound_order(page)); } +static void prep_compound_head(struct page *page, unsigned int order) +{ + set_compound_page_dtor(page, COMPOUND_PAGE_DTOR); + set_compound_order(page, order); + atomic_set(compound_mapcount_ptr(page), -1); + if (hpage_pincount_available(page)) + atomic_set(compound_pincount_ptr(page), 0); +} + +static void prep_compound_tail(struct page *head, int tail_idx) +{ + struct page *p = head + tail_idx; + + p->mapping = TAIL_MAPPING; + set_compound_head(p, head); +} + void prep_compound_page(struct page *page, unsigned int order) { int i; int nr_pages = 1 << order; __SetPageHead(page); - for (i = 1; i < nr_pages; i++) { - struct page *p = page + i; - p->mapping = TAIL_MAPPING; - set_compound_head(p, page); - } + for (i = 1; i < nr_pages; i++) + prep_compound_tail(page, i); - set_compound_page_dtor(page, COMPOUND_PAGE_DTOR); - set_compound_order(page, order); - atomic_set(compound_mapcount_ptr(page), -1); - if (hpage_pincount_available(page)) - atomic_set(compound_pincount_ptr(page), 0); + prep_compound_head(page, order); } #ifdef CONFIG_DEBUG_PAGEALLOC |