<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/dax/device.c, branch v6.6.132</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.132</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.132'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-10-17T13:24:36+00:00</updated>
<entry>
<title>device-dax: correct pgoff align in dax_set_mapping()</title>
<updated>2024-10-17T13:24:36+00:00</updated>
<author>
<name>Kun(llfl)</name>
<email>llfl@linux.alibaba.com</email>
</author>
<published>2024-09-27T07:45:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b822007e8db341d6f175c645ed79866db501ad86'/>
<id>urn:sha1:b822007e8db341d6f175c645ed79866db501ad86</id>
<content type='text'>
commit 7fcbd9785d4c17ea533c42f20a9083a83f301fa6 upstream.

pgoff should be aligned using ALIGN_DOWN() instead of ALIGN().  Otherwise,
vmf-&gt;address not aligned to fault_size will be aligned to the next
alignment, that can result in memory failure getting the wrong address.

It's a subtle situation that only can be observed in
page_mapped_in_vma() after the page is page fault handled by
dev_dax_huge_fault.  Generally, there is little chance to perform
page_mapped_in_vma in dev-dax's page unless in specific error injection
to the dax device to trigger an MCE - memory-failure.  In that case,
page_mapped_in_vma() will be triggered to determine which task is
accessing the failure address and kill that task in the end.


We used self-developed dax device (which is 2M aligned mapping) , to
perform error injection to random address.  It turned out that error
injected to non-2M-aligned address was causing endless MCE until panic.
Because page_mapped_in_vma() kept resulting wrong address and the task
accessing the failure address was never killed properly:


[ 3783.719419] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3784.049006] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3784.049190] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3784.448042] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3784.448186] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3784.792026] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3784.792179] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3785.162502] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3785.162633] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3785.461116] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3785.461247] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3785.764730] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3785.764859] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3786.042128] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3786.042259] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3786.464293] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3786.464423] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3786.818090] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3786.818217] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3787.085297] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3787.085424] Memory failure: 0x200c9742: recovery action for dax page:
Recovered

It took us several weeks to pinpoint this problem,  but we eventually
used bpftrace to trace the page fault and mce address and successfully
identified the issue.


Joao added:

; Likely we never reproduce in production because we always pin
: device-dax regions in the region align they provide (Qemu does
: similarly with prealloc in hugetlb/file backed memory).  I think this
: bug requires that we touch *unpinned* device-dax regions unaligned to
: the device-dax selected alignment (page size i.e.  4K/2M/1G)

Link: https://lkml.kernel.org/r/23c02a03e8d666fef11bbe13e85c69c8b4ca0624.1727421694.git.llfl@linux.alibaba.com
Fixes: b9b5777f09be ("device-dax: use ALIGN() for determining pgoff")
Signed-off-by: Kun(llfl) &lt;llfl@linux.alibaba.com&gt;
Tested-by: JianXiong Zhao &lt;zhaojianxiong.zjx@alibaba-inc.com&gt;
Reviewed-by: Joao Martins &lt;joao.m.martins@oracle.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: remove enum page_entry_size</title>
<updated>2023-08-24T23:20:30+00:00</updated>
<author>
<name>Matthew Wilcox (Oracle)</name>
<email>willy@infradead.org</email>
</author>
<published>2023-08-18T20:23:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1d024e7a8dabcc3c84d77532a88c774c32cf8245'/>
<id>urn:sha1:1d024e7a8dabcc3c84d77532a88c774c32cf8245</id>
<content type='text'>
Remove the unnecessary encoding of page order into an enum and pass the
page order directly.  That lets us get rid of pe_order().

The switch constructs have to be changed to if/else constructs to prevent
GCC from warning on builds with 3-level page tables where PMD_ORDER and
PUD_ORDER have the same value.

If you are looking at this commit because your driver stopped compiling,
look at the previous commit as well and audit your driver to be sure it
doesn't depend on mmap_lock being held in its -&gt;huge_fault method.

[willy@infradead.org: use "order %u" to match the (non dev_t) style]
  Link: https://lkml.kernel.org/r/ZOUYekbtTv+n8hYf@casper.infradead.org
Link: https://lkml.kernel.org/r/20230818202335.2739663-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>dax: fix missing-prototype warnings</title>
<updated>2023-05-19T00:28:07+00:00</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2023-05-17T12:55:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2d5153526f929838b0912ded26862840f72745f4'/>
<id>urn:sha1:2d5153526f929838b0912ded26862840f72745f4</id>
<content type='text'>
dev_dax_probe declaration for this function was removed with the only
caller outside of device.c. Mark it static to avoid a W=1
warning:
drivers/dax/device.c:399:5: error: no previous prototype for 'dev_dax_probe'

Similarly, run_dax() causes a warning, but this one is because the
declaration needs to be included:

drivers/dax/super.c:337:6: error: no previous prototype for 'run_dax'

Fixes: 83762cb5c7c4 ("dax: Kill DEV_DAX_PMEM_COMPAT")
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Link: https://lore.kernel.org/r/20230517125532.931157-1-arnd@kernel.org
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'cxl-for-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl</title>
<updated>2023-02-25T17:19:23+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-02-25T17:19:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7c3dc440b1f5c75f45e24430f913e561dc82a419'/>
<id>urn:sha1:7c3dc440b1f5c75f45e24430f913e561dc82a419</id>
<content type='text'>
Pull Compute Express Link (CXL) updates from Dan Williams:
 "To date Linux has been dependent on platform-firmware to map CXL RAM
  regions and handle events / errors from devices. With this update we
  can now parse / update the CXL memory layout, and report events /
  errors from devices. This is a precursor for the CXL subsystem to
  handle the end-to-end "RAS" flow for CXL memory. i.e. the flow that
  for DDR-attached-DRAM is handled by the EDAC driver where it maps
  system physical address events to a field-replaceable-unit (FRU /
  endpoint device). In general, CXL has the potential to standardize
  what has historically been a pile of memory-controller-specific error
  handling logic.

  Another change of note is the default policy for handling RAM-backed
  device-dax instances. Previously the default access mode was "device",
  mmap(2) a device special file to access memory. The new default is
  "kmem" where the address range is assigned to the core-mm via
  add_memory_driver_managed(). This saves typical users from wondering
  why their platform memory is not visible via free(1) and stuck behind
  a device-file. At the same time it allows expert users to deploy
  policy to, for example, get dedicated access to high performance
  memory, or hide low performance memory from general purpose kernel
  allocations. This affects not only CXL, but also systems with
  high-bandwidth-memory that platform-firmware tags with the
  EFI_MEMORY_SP (special purpose) designation.

  Summary:

   - CXL RAM region enumeration: instantiate 'struct cxl_region' objects
     for platform firmware created memory regions

   - CXL RAM region provisioning: complement the existing PMEM region
     creation support with RAM region support

   - "Soft Reservation" policy change: Online (memory hot-add)
     soft-reserved memory (EFI_MEMORY_SP) by default, but still allow
     for setting aside such memory for dedicated access via device-dax.

   - CXL Events and Interrupts: Takeover CXL event handling from
     platform-firmware (ACPI calls this CXL Memory Error Reporting) and
     export CXL Events via Linux Trace Events.

   - Convey CXL _OSC results to drivers: Similar to PCI, let the CXL
     subsystem interrogate the result of CXL _OSC negotiation.

   - Emulate CXL DVSEC Range Registers as "decoders": Allow for
     first-generation devices that pre-date the definition of the CXL
     HDM Decoder Capability to translate the CXL DVSEC Range Registers
     into 'struct cxl_decoder' objects.

   - Set timestamp: Per spec, set the device timestamp in case of
     hotplug, or if platform-firwmare failed to set it.

   - General fixups: linux-next build issues, non-urgent fixes for
     pre-production hardware, unit test fixes, spelling and debug
     message improvements"

* tag 'cxl-for-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (66 commits)
  dax/kmem: Fix leak of memory-hotplug resources
  cxl/mem: Add kdoc param for event log driver state
  cxl/trace: Add serial number to trace points
  cxl/trace: Add host output to trace points
  cxl/trace: Standardize device information output
  cxl/pci: Remove locked check for dvsec_range_allowed()
  cxl/hdm: Add emulation when HDM decoders are not committed
  cxl/hdm: Create emulated cxl_hdm for devices that do not have HDM decoders
  cxl/hdm: Emulate HDM decoder from DVSEC range registers
  cxl/pci: Refactor cxl_hdm_decode_init()
  cxl/port: Export cxl_dvsec_rr_decode() to cxl_port
  cxl/pci: Break out range register decoding from cxl_hdm_decode_init()
  cxl: add RAS status unmasking for CXL
  cxl: remove unnecessary calling of pci_enable_pcie_error_reporting()
  dax/hmem: build hmem device support as module if possible
  dax: cxl: add CXL_REGION dependency
  cxl: avoid returning uninitialized error code
  cxl/pmem: Fix nvdimm registration races
  cxl/mem: Fix UAPI command comment
  cxl/uapi: Tag commands from cxl_query_cmd()
  ...
</content>
</entry>
<entry>
<title>dax: Assign RAM regions to memory-hotplug by default</title>
<updated>2023-02-11T01:33:40+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2023-02-10T09:07:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e9ee9fe3a9d4ae0e1e935fc2ec1218b66a043cae'/>
<id>urn:sha1:e9ee9fe3a9d4ae0e1e935fc2ec1218b66a043cae</id>
<content type='text'>
The default mode for device-dax instances is backwards for RAM-regions
as evidenced by the fact that it tends to catch end users by surprise.
"Where is my memory?". Recall that platforms are increasingly shipping
with performance-differentiated memory pools beyond typical DRAM and
NUMA effects. This includes HBM (high-bandwidth-memory) and CXL (dynamic
interleave, varied media types, and future fabric attached
possibilities).

For this reason the EFI_MEMORY_SP (EFI Special Purpose Memory =&gt; Linux
'Soft Reserved') attribute is expected to be applied to all memory-pools
that are not the general purpose pool. This designation gives an
Operating System a chance to defer usage of a memory pool until later in
the boot process where its performance properties can be interrogated
and administrator policy can be applied.

'Soft Reserved' memory can be anything from too limited and precious to
be part of the general purpose pool (HBM), too slow to host hot kernel
data structures (some PMEM media), or anything in between. However, in
the absence of an explicit policy, the memory should at least be made
usable by default. The current device-dax default hides all
non-general-purpose memory behind a device interface.

The expectation is that the distribution of users that want the memory
online by default vs device-dedicated-access by default follows the
Pareto principle. A small number of enlightened users may want to do
userspace memory management through a device, but general users just
want the kernel to make the memory available with an option to get more
advanced later.

Arrange for all device-dax instances not backed by PMEM to default to
attaching to the dax_kmem driver. From there the baseline memory hotplug
policy (CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE / memhp_default_state=)
gates whether the memory comes online or stays offline. Where, if it
stays offline, it can be reliably converted back to device-mode where it
can be partitioned, or fronted by a userspace allocator.

So, if someone wants device-dax instances for their 'Soft Reserved'
memory:

1/ Build a kernel with CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n or boot
   with memhp_default_state=offline, or roll the dice and hope that the
   kernel has not pinned a page in that memory before step 2.

2/ Write a udev rule to convert the target dax device(s) from
   'system-ram' mode to 'devdax' mode:

   daxctl reconfigure-device $dax -m devdax -f

Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Reviewed-by: Gregory Price &lt;gregory.price@memverge.com&gt;
Tested-by: Fan Ni &lt;fan.ni@samsung.com&gt;
Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Link: https://lore.kernel.org/r/167602003336.1924368.6809503401422267885.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>mm: replace vma-&gt;vm_flags direct modifications with modifier calls</title>
<updated>2023-02-10T00:51:39+00:00</updated>
<author>
<name>Suren Baghdasaryan</name>
<email>surenb@google.com</email>
</author>
<published>2023-01-26T19:37:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1c71222e5f2393b5ea1a41795c67589eea7e3490'/>
<id>urn:sha1:1c71222e5f2393b5ea1a41795c67589eea7e3490</id>
<content type='text'>
Replace direct modifications to vma-&gt;vm_flags with calls to modifier
functions to be able to track flag changes and to keep vma locking
correctness.

[akpm@linux-foundation.org: fix drivers/misc/open-dice.c, per Hyeonggon Yoo]
Link: https://lkml.kernel.org/r/20230126193752.297968-5-surenb@google.com
Signed-off-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Acked-by: Mike Rapoport (IBM) &lt;rppt@kernel.org&gt;
Acked-by: Sebastian Reichel &lt;sebastian.reichel@collabora.com&gt;
Reviewed-by: Liam R. Howlett &lt;Liam.Howlett@Oracle.com&gt;
Reviewed-by: Hyeonggon Yoo &lt;42.hyeyoo@gmail.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Arjun Roy &lt;arjunroy@google.com&gt;
Cc: Axel Rasmussen &lt;axelrasmussen@google.com&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Greg Thelen &lt;gthelen@google.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Joel Fernandes &lt;joelaf@google.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
Cc: Laurent Dufour &lt;ldufour@linux.ibm.com&gt;
Cc: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Minchan Kim &lt;minchan@google.com&gt;
Cc: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Cc: Peter Oskolkov &lt;posk@google.com&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Punit Agrawal &lt;punit.agrawal@bytedance.com&gt;
Cc: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Cc: Song Liu &lt;songliubraving@fb.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fs: Convert __set_page_dirty_no_writeback to noop_dirty_folio</title>
<updated>2022-03-16T17:37:05+00:00</updated>
<author>
<name>Matthew Wilcox (Oracle)</name>
<email>willy@infradead.org</email>
</author>
<published>2022-02-09T20:22:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=46de8b979492e1377947700ecb1e3169088668b2'/>
<id>urn:sha1:46de8b979492e1377947700ecb1e3169088668b2</id>
<content type='text'>
This is a mechanical change.

Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Tested-by: Damien Le Moal &lt;damien.lemoal@opensource.wdc.com&gt;
Acked-by: Damien Le Moal &lt;damien.lemoal@opensource.wdc.com&gt;
Tested-by: Mike Marshall &lt;hubcap@omnibond.com&gt; # orangefs
Tested-by: David Howells &lt;dhowells@redhat.com&gt; # afs
</content>
</entry>
<entry>
<title>fs: Remove noop_invalidatepage()</title>
<updated>2022-03-15T12:23:29+00:00</updated>
<author>
<name>Matthew Wilcox (Oracle)</name>
<email>willy@infradead.org</email>
</author>
<published>2022-02-09T20:21:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5660a8630dab61a28e07ec00c42bf605b182d725'/>
<id>urn:sha1:5660a8630dab61a28e07ec00c42bf605b182d725</id>
<content type='text'>
We used to have to use noop_invalidatepage() to prevent
block_invalidatepage() from being called, but that behaviour is now gone.

Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Tested-by: Damien Le Moal &lt;damien.lemoal@opensource.wdc.com&gt;
Acked-by: Damien Le Moal &lt;damien.lemoal@opensource.wdc.com&gt;
Tested-by: Mike Marshall &lt;hubcap@omnibond.com&gt; # orangefs
Tested-by: David Howells &lt;dhowells@redhat.com&gt; # afs
</content>
</entry>
<entry>
<title>Merge branch 'akpm' (patches from Andrew)</title>
<updated>2022-01-15T18:37:06+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-01-15T18:37:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f56caedaf94f9ced5dbfcdb0060a3e788d2078af'/>
<id>urn:sha1:f56caedaf94f9ced5dbfcdb0060a3e788d2078af</id>
<content type='text'>
Merge misc updates from Andrew Morton:
 "146 patches.

  Subsystems affected by this patch series: kthread, ia64, scripts,
  ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak,
  dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap,
  memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb,
  userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp,
  ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and
  damon)"

* emailed patches from Andrew Morton &lt;akpm@linux-foundation.org&gt;: (146 commits)
  mm/damon: hide kernel pointer from tracepoint event
  mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log
  mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging
  mm/damon/dbgfs: remove an unnecessary variable
  mm/damon: move the implementation of damon_insert_region to damon.h
  mm/damon: add access checking for hugetlb pages
  Docs/admin-guide/mm/damon/usage: update for schemes statistics
  mm/damon/dbgfs: support all DAMOS stats
  Docs/admin-guide/mm/damon/reclaim: document statistics parameters
  mm/damon/reclaim: provide reclamation statistics
  mm/damon/schemes: account how many times quota limit has exceeded
  mm/damon/schemes: account scheme actions that successfully applied
  mm/damon: remove a mistakenly added comment for a future feature
  Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts
  Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning
  Docs/admin-guide/mm/damon/usage: remove redundant information
  Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks
  mm/damon: convert macro functions to static inline functions
  mm/damon: modify damon_rand() macro to static inline function
  mm/damon: move damon_rand() definition into damon.h
  ...
</content>
</entry>
<entry>
<title>device-dax: compound devmap support</title>
<updated>2022-01-15T14:30:26+00:00</updated>
<author>
<name>Joao Martins</name>
<email>joao.m.martins@oracle.com</email>
</author>
<published>2022-01-14T22:04:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=14606001efb48a17be31a5bec626c13ca49d783a'/>
<id>urn:sha1:14606001efb48a17be31a5bec626c13ca49d783a</id>
<content type='text'>
Use the newly added compound devmap facility which maps the assigned dax
ranges as compound pages at a page size of @align.

dax devices are created with a fixed @align (huge page size) which is
enforced through as well at mmap() of the device.  Faults, consequently
happen too at the specified @align specified at the creation, and those
don't change throughout dax device lifetime.  MCEs unmap a whole dax
huge page, as well as splits occurring at the configured page size.

Performance measured by gup_test improves considerably for
unpin_user_pages() and altmap with NVDIMMs:

  $ gup_test -f /dev/dax1.0 -m 16384 -r 10 -S -a -n 512 -w
  (pin_user_pages_fast 2M pages) put:~71 ms -&gt; put:~22 ms
  [altmap]
  (pin_user_pages_fast 2M pages) get:~524ms put:~525 ms -&gt; get: ~127ms put:~71ms

   $ gup_test -f /dev/dax1.0 -m 129022 -r 10 -S -a -n 512 -w
  (pin_user_pages_fast 2M pages) put:~513 ms -&gt; put:~188 ms
  [altmap with -m 127004]
  (pin_user_pages_fast 2M pages) get:~4.1 secs put:~4.12 secs -&gt; get:~1sec put:~563ms

.. as well as unpin_user_page_range_dirty_lock() being just as effective
as THP/hugetlb[0] pages.

[0] https://lore.kernel.org/linux-mm/20210212130843.13865-5-joao.m.martins@oracle.com/

Link: https://lkml.kernel.org/r/20211202204422.26777-12-joao.m.martins@oracle.com
Signed-off-by: Joao Martins &lt;joao.m.martins@oracle.com&gt;
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Dave Jiang &lt;dave.jiang@intel.com&gt;
Cc: Jane Chu &lt;jane.chu@oracle.com&gt;
Cc: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: John Hubbard &lt;jhubbard@nvidia.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Muchun Song &lt;songmuchun@bytedance.com&gt;
Cc: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Cc: Vishal Verma &lt;vishal.l.verma@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
