<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/drivers/dax, branch v6.6.132</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.132</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.132'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-12-09T09:32:21+00:00</updated>
<entry>
<title>dax: delete a stale directory pmem</title>
<updated>2024-12-09T09:32:21+00:00</updated>
<author>
<name>Harshit Mogalapalli</name>
<email>harshit.m.mogalapalli@oracle.com</email>
</author>
<published>2024-10-17T10:11:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a6faea503ba4345c267b4a414827881cfa8c461e'/>
<id>urn:sha1:a6faea503ba4345c267b4a414827881cfa8c461e</id>
<content type='text'>
[ Upstream commit b8e6d7ce50673c39514921ac61f7af00bbb58b87 ]

After commit: 83762cb5c7c4 ("dax: Kill DEV_DAX_PMEM_COMPAT") the pmem/
directory is not needed anymore and Makefile changes were made
accordingly in this commit, but there is a Makefile and pmem.c in pmem/
which are now stale and pmem.c is empty, remove them.

Fixes: 83762cb5c7c4 ("dax: Kill DEV_DAX_PMEM_COMPAT")
Suggested-by: Vegard Nossum &lt;vegard.nossum@oracle.com&gt;
Signed-off-by: Harshit Mogalapalli &lt;harshit.m.mogalapalli@oracle.com&gt;
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Reviewed-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Link: https://patch.msgid.link/20241017101144.1654085-1-harshit.m.mogalapalli@oracle.com
Signed-off-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>device-dax: correct pgoff align in dax_set_mapping()</title>
<updated>2024-10-17T13:24:36+00:00</updated>
<author>
<name>Kun(llfl)</name>
<email>llfl@linux.alibaba.com</email>
</author>
<published>2024-09-27T07:45:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b822007e8db341d6f175c645ed79866db501ad86'/>
<id>urn:sha1:b822007e8db341d6f175c645ed79866db501ad86</id>
<content type='text'>
commit 7fcbd9785d4c17ea533c42f20a9083a83f301fa6 upstream.

pgoff should be aligned using ALIGN_DOWN() instead of ALIGN().  Otherwise,
vmf-&gt;address not aligned to fault_size will be aligned to the next
alignment, that can result in memory failure getting the wrong address.

It's a subtle situation that only can be observed in
page_mapped_in_vma() after the page is page fault handled by
dev_dax_huge_fault.  Generally, there is little chance to perform
page_mapped_in_vma in dev-dax's page unless in specific error injection
to the dax device to trigger an MCE - memory-failure.  In that case,
page_mapped_in_vma() will be triggered to determine which task is
accessing the failure address and kill that task in the end.


We used self-developed dax device (which is 2M aligned mapping) , to
perform error injection to random address.  It turned out that error
injected to non-2M-aligned address was causing endless MCE until panic.
Because page_mapped_in_vma() kept resulting wrong address and the task
accessing the failure address was never killed properly:


[ 3783.719419] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3784.049006] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3784.049190] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3784.448042] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3784.448186] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3784.792026] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3784.792179] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3785.162502] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3785.162633] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3785.461116] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3785.461247] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3785.764730] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3785.764859] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3786.042128] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3786.042259] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3786.464293] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3786.464423] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3786.818090] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3786.818217] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3787.085297] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3787.085424] Memory failure: 0x200c9742: recovery action for dax page:
Recovered

It took us several weeks to pinpoint this problem,  but we eventually
used bpftrace to trace the page fault and mce address and successfully
identified the issue.


Joao added:

; Likely we never reproduce in production because we always pin
: device-dax regions in the region align they provide (Qemu does
: similarly with prealloc in hugetlb/file backed memory).  I think this
: bug requires that we touch *unpinned* device-dax regions unaligned to
: the device-dax selected alignment (page size i.e.  4K/2M/1G)

Link: https://lkml.kernel.org/r/23c02a03e8d666fef11bbe13e85c69c8b4ca0624.1727421694.git.llfl@linux.alibaba.com
Fixes: b9b5777f09be ("device-dax: use ALIGN() for determining pgoff")
Signed-off-by: Kun(llfl) &lt;llfl@linux.alibaba.com&gt;
Tested-by: JianXiong Zhao &lt;zhaojianxiong.zjx@alibaba-inc.com&gt;
Reviewed-by: Joao Martins &lt;joao.m.martins@oracle.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: remove enum page_entry_size</title>
<updated>2023-08-24T23:20:30+00:00</updated>
<author>
<name>Matthew Wilcox (Oracle)</name>
<email>willy@infradead.org</email>
</author>
<published>2023-08-18T20:23:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1d024e7a8dabcc3c84d77532a88c774c32cf8245'/>
<id>urn:sha1:1d024e7a8dabcc3c84d77532a88c774c32cf8245</id>
<content type='text'>
Remove the unnecessary encoding of page order into an enum and pass the
page order directly.  That lets us get rid of pe_order().

The switch constructs have to be changed to if/else constructs to prevent
GCC from warning on builds with 3-level page tables where PMD_ORDER and
PUD_ORDER have the same value.

If you are looking at this commit because your driver stopped compiling,
look at the previous commit as well and audit your driver to be sure it
doesn't depend on mmap_lock being held in its -&gt;huge_fault method.

[willy@infradead.org: use "order %u" to match the (non dev_t) style]
  Link: https://lkml.kernel.org/r/ZOUYekbtTv+n8hYf@casper.infradead.org
Link: https://lkml.kernel.org/r/20230818202335.2739663-4-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memory tier: rename destroy_memory_type() to put_memory_type()</title>
<updated>2023-08-18T17:12:11+00:00</updated>
<author>
<name>Miaohe Lin</name>
<email>linmiaohe@huawei.com</email>
</author>
<published>2023-07-06T06:39:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bded67f81ec47e6054ad24c1c7992a6523a9b2c6'/>
<id>urn:sha1:bded67f81ec47e6054ad24c1c7992a6523a9b2c6</id>
<content type='text'>
It appears that destroy_memory_type() isn't a very good name because we
usually will not free the memory_type here.  So rename it to a more
appropriate name i.e.  put_memory_type().

Link: https://lkml.kernel.org/r/20230706063905.543800-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Suggested-by: Huang, Ying &lt;ying.huang@intel.com&gt;
Reviewed-by: "Huang, Ying" &lt;ying.huang@intel.com&gt;
Reviewed-by: Xiao Yang &lt;yangx.jy@fujitsu.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Dave Jiang &lt;dave.jiang@intel.com&gt;
Cc: Vishal Verma &lt;vishal.l.verma@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>dax: enable dax fault handler to report VM_FAULT_HWPOISON</title>
<updated>2023-06-26T13:54:23+00:00</updated>
<author>
<name>Jane Chu</name>
<email>jane.chu@oracle.com</email>
</author>
<published>2023-06-15T18:13:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1ea7ca1b090145519aad998679222f0a14ab8fce'/>
<id>urn:sha1:1ea7ca1b090145519aad998679222f0a14ab8fce</id>
<content type='text'>
When multiple processes mmap() a dax file, then at some point,
a process issues a 'load' and consumes a hwpoison, the process
receives a SIGBUS with si_code = BUS_MCEERR_AR and with si_lsb
set for the poison scope. Soon after, any other process issues
a 'load' to the poisoned page (that is unmapped from the kernel
side by memory_failure), it receives a SIGBUS with
si_code = BUS_ADRERR and without valid si_lsb.

This is confusing to user, and is different from page fault due
to poison in RAM memory, also some helpful information is lost.

Channel dax backend driver's poison detection to the filesystem
such that instead of reporting VM_FAULT_SIGBUS, it could report
VM_FAULT_HWPOISON.

If user level block IO syscalls fail due to poison, the errno will
be converted to EIO to maintain block API consistency.

Signed-off-by: Jane Chu &lt;jane.chu@oracle.com&gt;
Link: https://lore.kernel.org/r/20230615181325.1327259-2-jane.chu@oracle.com
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Vishal Verma &lt;vishal.l.verma@intel.com&gt;
</content>
</entry>
<entry>
<title>dax/kmem: Pass valid argument to memory_group_register_static</title>
<updated>2023-06-23T07:32:06+00:00</updated>
<author>
<name>Tarun Sahu</name>
<email>tsahu@linux.ibm.com</email>
</author>
<published>2023-06-21T15:50:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=46e66dab8565f742374e9cc4ff7d35f344d774e2'/>
<id>urn:sha1:46e66dab8565f742374e9cc4ff7d35f344d774e2</id>
<content type='text'>
memory_group_register_static takes maximum number of pages as the argument
while dev_dax_kmem_probe passes total_len (in bytes) as the argument.

IIUC, I don't see any crash/panic impact as such. As,
memory_group_register_static just set the max_pages limit which is used in
auto_movable_zone_for_pfn to determine the zone.

which might cause these condition to behave differently,

This will be true always so jump will happen to kernel_zone
    ...
    if (!auto_movable_can_online_movable(NUMA_NO_NODE, group, nr_pages))
        goto kernel_zone;

    ...
    kernel_zone:
        return default_kernel_zone_for_pfn(nid, pfn, nr_pages);

Here, In below, zone_intersects compare range will be larger as nr_pages
will be higher (derived from total_len passed in dev_dax_kmem_probe).

    ...
    static struct zone *default_kernel_zone_for_pfn(int nid, unsigned long start_pfn,
    		unsigned long nr_pages)
    {
    	struct pglist_data *pgdat = NODE_DATA(nid);
    	int zid;

    	for (zid = 0; zid &lt; ZONE_NORMAL; zid++) {
    		struct zone *zone = &amp;pgdat-&gt;node_zones[zid];

    		if (zone_intersects(zone, start_pfn, nr_pages))
    			return zone;
    	}

    	return &amp;pgdat-&gt;node_zones[ZONE_NORMAL];
    }

Incorrect zone will be returned here, which in later time might cause bigger
problem.

Fixes: eedf634aac3b ("dax/kmem: use a single static memory group for a single probed unit")
Signed-off-by: Tarun Sahu &lt;tsahu@linux.ibm.com&gt;
Link: https://lore.kernel.org/r/20230621155025.370672-1-tsahu@linux.ibm.com
Reviewed-by: Vishal Verma &lt;vishal.l.verma@intel.com&gt;
Signed-off-by: Vishal Verma &lt;vishal.l.verma@intel.com&gt;
</content>
</entry>
<entry>
<title>dax: Cleanup extra dax_region references</title>
<updated>2023-06-23T07:03:50+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2023-06-03T06:14:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2532f41607c4308733239dd43278f8a5540f3ec7'/>
<id>urn:sha1:2532f41607c4308733239dd43278f8a5540f3ec7</id>
<content type='text'>
Now that free_dev_dax_id() internally manages the references it needs
the extra references taken by the dax_region drivers are not needed.

Reported-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Link: https://lore.kernel.org/r/168577285161.1672036.8111253437794419696.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Reviewed-by: Fan Ni &lt;fan.ni@samsung.com&gt;
Reviewed-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Signed-off-by: Vishal Verma &lt;vishal.l.verma@intel.com&gt;
</content>
</entry>
<entry>
<title>dax: Introduce alloc_dev_dax_id()</title>
<updated>2023-06-23T07:03:50+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2023-06-03T06:14:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=70aab281e18c68a1284bc387de127c2fc0bed3f8'/>
<id>urn:sha1:70aab281e18c68a1284bc387de127c2fc0bed3f8</id>
<content type='text'>
The reference counting of dax_region objects is needlessly complicated,
has lead to confusion [1], and has hidden a bug [2]. Towards cleaning up
that mess introduce alloc_dev_dax_id() to minimize the holding of a
dax_region reference to only what dev_dax_release() needs, the
dax_region-&gt;ida.

Part of the reason for the mess was the design to dereference a
dax_region in all cases in free_dev_dax_id() even if the id was
statically assigned by the upper level dax_region driver. Remove the
need to call "is_static(dax_region)" by tracking whether the id is
dynamic directly in the dev_dax instance itself.

With that flag the dax_region pinning and release per dev_dax instance
can move to alloc_dev_dax_id() and free_dev_dax_id() respectively.

A follow-on cleanup address the unnecessary references in the dax_region
setup and drivers.

Fixes: 0f3da14a4f05 ("device-dax: introduce 'seed' devices")
Link: http://lore.kernel.org/r/20221203095858.612027-1-liuyongqiang13@huawei.com [1]
Link: http://lore.kernel.org/r/3cf0890b-4eb0-e70e-cd9c-2ecc3d496263@hpe.com [2]
Reported-by: Yongqiang Liu &lt;liuyongqiang13@huawei.com&gt;
Reported-by: Paul Cassella &lt;cassella@hpe.com&gt;
Reported-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Link: https://lore.kernel.org/r/168577284563.1672036.13493034988900989554.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Signed-off-by: Vishal Verma &lt;vishal.l.verma@intel.com&gt;
</content>
</entry>
<entry>
<title>dax: Use device_unregister() in unregister_dax_mapping()</title>
<updated>2023-06-23T07:03:50+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2023-06-03T06:13:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=82b4ceeccb89cfd0b03706f1b15e31a7db6a027d'/>
<id>urn:sha1:82b4ceeccb89cfd0b03706f1b15e31a7db6a027d</id>
<content type='text'>
Replace an open-coded device_unregister() sequence with the helper.

Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Link: https://lore.kernel.org/r/168577283989.1672036.7777592498865470652.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Reviewed-by: Fan Ni &lt;fan.ni@samsung.com&gt;
Reviewed-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Signed-off-by: Vishal Verma &lt;vishal.l.verma@intel.com&gt;
</content>
</entry>
<entry>
<title>dax: Fix dax_mapping_release() use after free</title>
<updated>2023-06-23T07:03:50+00:00</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2023-06-03T06:13:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6d24b170a9db0456f577b1ab01226a2254c016a8'/>
<id>urn:sha1:6d24b170a9db0456f577b1ab01226a2254c016a8</id>
<content type='text'>
A CONFIG_DEBUG_KOBJECT_RELEASE test of removing a device-dax region
provider (like modprobe -r dax_hmem) yields:

 kobject: 'mapping0' (ffff93eb460e8800): kobject_release, parent 0000000000000000 (delayed 2000)
 [..]
 DEBUG_LOCKS_WARN_ON(1)
 WARNING: CPU: 23 PID: 282 at kernel/locking/lockdep.c:232 __lock_acquire+0x9fc/0x2260
 [..]
 RIP: 0010:__lock_acquire+0x9fc/0x2260
 [..]
 Call Trace:
  &lt;TASK&gt;
 [..]
  lock_acquire+0xd4/0x2c0
  ? ida_free+0x62/0x130
  _raw_spin_lock_irqsave+0x47/0x70
  ? ida_free+0x62/0x130
  ida_free+0x62/0x130
  dax_mapping_release+0x1f/0x30
  device_release+0x36/0x90
  kobject_delayed_cleanup+0x46/0x150

Due to attempting ida_free() on an ida object that has already been
freed. Devices typically only hold a reference on their parent while
registered. If a child needs a parent object to complete its release it
needs to hold a reference that it drops from its release callback.
Arrange for a dax_mapping to pin its parent dev_dax instance until
dax_mapping_release().

Fixes: 0b07ce872a9e ("device-dax: introduce 'mapping' devices")
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Link: https://lore.kernel.org/r/168577283412.1672036.16111545266174261446.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Dave Jiang &lt;dave.jiang@intel.com&gt;
Reviewed-by: Fan Ni &lt;fan.ni@samsung.com&gt;
Reviewed-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Signed-off-by: Vishal Verma &lt;vishal.l.verma@intel.com&gt;
</content>
</entry>
</feed>
