<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/mm/vmalloc.c, branch v6.6.131</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.131</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.131'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-01-11T14:22:17+00:00</updated>
<entry>
<title>kasan: refactor pcpu kasan vmalloc unpoison</title>
<updated>2026-01-11T14:22:17+00:00</updated>
<author>
<name>Maciej Wieczor-Retman</name>
<email>maciej.wieczor-retman@intel.com</email>
</author>
<published>2025-12-04T19:00:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c9acbaec693b3bcbb61c2192ad0f92774cb0e53d'/>
<id>urn:sha1:c9acbaec693b3bcbb61c2192ad0f92774cb0e53d</id>
<content type='text'>
commit 6f13db031e27e88213381039032a9cc061578ea6 upstream.

A KASAN tag mismatch, possibly causing a kernel panic, can be observed
on systems with a tag-based KASAN enabled and with multiple NUMA nodes.
It was reported on arm64 and reproduced on x86. It can be explained in
the following points:

1. There can be more than one virtual memory chunk.
2. Chunk's base address has a tag.
3. The base address points at the first chunk and thus inherits
   the tag of the first chunk.
4. The subsequent chunks will be accessed with the tag from the
   first chunk.
5. Thus, the subsequent chunks need to have their tag set to
   match that of the first chunk.

Refactor code by reusing __kasan_unpoison_vmalloc in a new helper in
preparation for the actual fix.

Link: https://lkml.kernel.org/r/eb61d93b907e262eefcaa130261a08bcb6c5ce51.1764874575.git.m.wieczorretman@pm.me
Fixes: 1d96320f8d53 ("kasan, vmalloc: add vmalloc tagging for SW_TAGS")
Signed-off-by: Maciej Wieczor-Retman &lt;maciej.wieczor-retman@intel.com&gt;
Reviewed-by: Andrey Konovalov &lt;andreyknvl@gmail.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Andrey Ryabinin &lt;ryabinin.a.a@gmail.com&gt;
Cc: Danilo Krummrich &lt;dakr@kernel.org&gt;
Cc: Dmitriy Vyukov &lt;dvyukov@google.com&gt;
Cc: Jiayuan Chen &lt;jiayuan.chen@linux.dev&gt;
Cc: Kees Cook &lt;kees@kernel.org&gt;
Cc: Marco Elver &lt;elver@google.com&gt;
Cc: "Uladzislau Rezki (Sony)" &lt;urezki@gmail.com&gt;
Cc: Vincenzo Frascino &lt;vincenzo.frascino@arm.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[6.1+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm/vmalloc: leave lazy MMU mode on PTE mapping error</title>
<updated>2025-07-17T16:35:15+00:00</updated>
<author>
<name>Alexander Gordeev</name>
<email>agordeev@linux.ibm.com</email>
</author>
<published>2025-06-23T07:57:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=37e2911d2ec171f40fb25be978ceebbb3d067212'/>
<id>urn:sha1:37e2911d2ec171f40fb25be978ceebbb3d067212</id>
<content type='text'>
commit fea18c686320a53fce7ad62a87a3e1d10ad02f31 upstream.

vmap_pages_pte_range() enters the lazy MMU mode, but fails to leave it in
case an error is encountered.

Link: https://lkml.kernel.org/r/20250623075721.2817094-1-agordeev@linux.ibm.com
Fixes: 2ba3e6947aed ("mm/vmalloc: track which page-table levels were modified")
Signed-off-by: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Reported-by: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Closes: https://lore.kernel.org/r/202506132017.T1l1l6ME-lkp@intel.com/
Reviewed-by: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: don't skip arch_sync_kernel_mappings() in error paths</title>
<updated>2025-03-13T11:58:28+00:00</updated>
<author>
<name>Ryan Roberts</name>
<email>ryan.roberts@arm.com</email>
</author>
<published>2025-02-26T12:16:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5c18fae5808de57e06abb109877af890a1f4292c'/>
<id>urn:sha1:5c18fae5808de57e06abb109877af890a1f4292c</id>
<content type='text'>
commit 3685024edd270f7c791f993157d65d3c928f3d6e upstream.

Fix callers that previously skipped calling arch_sync_kernel_mappings() if
an error occurred during a pgtable update.  The call is still required to
sync any pgtable updates that may have occurred prior to hitting the error
condition.

These are theoretical bugs discovered during code review.

Link: https://lkml.kernel.org/r/20250226121610.2401743-1-ryan.roberts@arm.com
Fixes: 2ba3e6947aed ("mm/vmalloc: track which page-table levels were modified")
Fixes: 0c95cba49255 ("mm: apply_to_pte_range warn and fail if a large pte is encountered")
Signed-off-by: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Reviewed-by: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Reviewed-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Christop Hellwig &lt;hch@infradead.org&gt;
Cc: "Uladzislau Rezki (Sony)" &lt;urezki@gmail.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>vmalloc: fix accounting with i915</title>
<updated>2024-12-27T12:58:53+00:00</updated>
<author>
<name>Matthew Wilcox (Oracle)</name>
<email>willy@infradead.org</email>
</author>
<published>2024-12-11T20:25:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=90ae5b7a1c526b535ae1645b7c0f6d18e7bf944e'/>
<id>urn:sha1:90ae5b7a1c526b535ae1645b7c0f6d18e7bf944e</id>
<content type='text'>
commit a2e740e216f5bf49ccb83b6d490c72a340558a43 upstream.

If the caller of vmap() specifies VM_MAP_PUT_PAGES (currently only the
i915 driver), we will decrement nr_vmalloc_pages and MEMCG_VMALLOC in
vfree().  These counters are incremented by vmalloc() but not by vmap() so
this will cause an underflow.  Check the VM_MAP_PUT_PAGES flag before
decrementing either counter.

Link: https://lkml.kernel.org/r/20241211202538.168311-1-willy@infradead.org
Fixes: b944afc9d64d ("mm: add a VM_MAP_PUT_PAGES flag for vmap")
Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Reviewed-by: Balbir Singh &lt;balbirs@nvidia.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: "Uladzislau Rezki (Sony)" &lt;urezki@gmail.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: vmalloc: ensure vmap_block is initialised before adding to queue</title>
<updated>2024-09-12T09:11:27+00:00</updated>
<author>
<name>Will Deacon</name>
<email>will@kernel.org</email>
</author>
<published>2024-08-12T17:16:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1b2770e27d6d952f491bb362b657e5b2713c3efd'/>
<id>urn:sha1:1b2770e27d6d952f491bb362b657e5b2713c3efd</id>
<content type='text'>
commit 3e3de7947c751509027d26b679ecd243bc9db255 upstream.

Commit 8c61291fd850 ("mm: fix incorrect vbq reference in
purge_fragmented_block") extended the 'vmap_block' structure to contain a
'cpu' field which is set at allocation time to the id of the initialising
CPU.

When a new 'vmap_block' is being instantiated by new_vmap_block(), the
partially initialised structure is added to the local 'vmap_block_queue'
xarray before the 'cpu' field has been initialised.  If another CPU is
concurrently walking the xarray (e.g.  via vm_unmap_aliases()), then it
may perform an out-of-bounds access to the remote queue thanks to an
uninitialised index.

This has been observed as UBSAN errors in Android:

 | Internal error: UBSAN: array index out of bounds: 00000000f2005512 [#1] PREEMPT SMP
 |
 | Call trace:
 |  purge_fragmented_block+0x204/0x21c
 |  _vm_unmap_aliases+0x170/0x378
 |  vm_unmap_aliases+0x1c/0x28
 |  change_memory_common+0x1dc/0x26c
 |  set_memory_ro+0x18/0x24
 |  module_enable_ro+0x98/0x238
 |  do_init_module+0x1b0/0x310

Move the initialisation of 'vb-&gt;cpu' in new_vmap_block() ahead of the
addition to the xarray.

Link: https://lkml.kernel.org/r/20240812171606.17486-1-will@kernel.org
Fixes: 8c61291fd850 ("mm: fix incorrect vbq reference in purge_fragmented_block")
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
Reviewed-by: Baoquan He &lt;bhe@redhat.com&gt;
Reviewed-by: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Cc: Zhaoyang Huang &lt;zhaoyang.huang@unisoc.com&gt;
Cc: Hailong.Liu &lt;hailong.liu@oppo.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0</title>
<updated>2024-08-29T15:33:43+00:00</updated>
<author>
<name>Hailong Liu</name>
<email>hailong.liu@oppo.com</email>
</author>
<published>2024-08-08T12:19:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=de7bad86345c43cd040ed43e20d9fad78a3ee59f'/>
<id>urn:sha1:de7bad86345c43cd040ed43e20d9fad78a3ee59f</id>
<content type='text'>
[ Upstream commit 61ebe5a747da649057c37be1c37eb934b4af79ca ]

The __vmap_pages_range_noflush() assumes its argument pages** contains
pages with the same page shift.  However, since commit e9c3cda4d86e ("mm,
vmalloc: fix high order __GFP_NOFAIL allocations"), if gfp_flags includes
__GFP_NOFAIL with high order in vm_area_alloc_pages() and page allocation
failed for high order, the pages** may contain two different page shifts
(high order and order-0).  This could lead __vmap_pages_range_noflush() to
perform incorrect mappings, potentially resulting in memory corruption.

Users might encounter this as follows (vmap_allow_huge = true, 2M is for
PMD_SIZE):

kvmalloc(2M, __GFP_NOFAIL|GFP_X)
    __vmalloc_node_range_noprof(vm_flags=VM_ALLOW_HUGE_VMAP)
        vm_area_alloc_pages(order=9) ---&gt; order-9 allocation failed and fallback to order-0
            vmap_pages_range()
                vmap_pages_range_noflush()
                    __vmap_pages_range_noflush(page_shift = 21) ----&gt; wrong mapping happens

We can remove the fallback code because if a high-order allocation fails,
__vmalloc_node_range_noprof() will retry with order-0.  Therefore, it is
unnecessary to fallback to order-0 here.  Therefore, fix this by removing
the fallback code.

Link: https://lkml.kernel.org/r/20240808122019.3361-1-hailong.liu@oppo.com
Fixes: e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations")
Signed-off-by: Hailong Liu &lt;hailong.liu@oppo.com&gt;
Reported-by: Tangquan Zheng &lt;zhengtangquan@oppo.com&gt;
Reviewed-by: Baoquan He &lt;bhe@redhat.com&gt;
Reviewed-by: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Acked-by: Barry Song &lt;baohua@kernel.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm: vmalloc: check if a hash-index is in cpu_possible_mask</title>
<updated>2024-07-18T11:21:20+00:00</updated>
<author>
<name>Uladzislau Rezki (Sony)</name>
<email>urezki@gmail.com</email>
</author>
<published>2024-06-26T14:03:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=28acd531c9a365dac01b32e6bc54aed8c1429bcb'/>
<id>urn:sha1:28acd531c9a365dac01b32e6bc54aed8c1429bcb</id>
<content type='text'>
commit a34acf30b19bc4ee3ba2f1082756ea2604c19138 upstream.

The problem is that there are systems where cpu_possible_mask has gaps
between set CPUs, for example SPARC.  In this scenario addr_to_vb_xa()
hash function can return an index which accesses to not-possible and not
setup CPU area using per_cpu() macro.  This results in an oops on SPARC.

A per-cpu vmap_block_queue is also used as hash table, incorrectly
assuming the cpu_possible_mask has no gaps.  Fix it by adjusting an index
to a next possible CPU.

Link: https://lkml.kernel.org/r/20240626140330.89836-1-urezki@gmail.com
Fixes: 062eacf57ad9 ("mm: vmalloc: remove a global vmap_blocks xarray")
Reported-by: Nick Bowler &lt;nbowler@draconx.ca&gt;
Closes: https://lore.kernel.org/linux-kernel/ZntjIE6msJbF8zTa@MiWiFi-R3L-srv/T/
Signed-off-by: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Reviewed-by: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Hailong.Liu &lt;hailong.liu@oppo.com&gt;
Cc: Oleksiy Avramchenko &lt;oleksiy.avramchenko@sony.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: fix incorrect vbq reference in purge_fragmented_block</title>
<updated>2024-07-05T07:33:55+00:00</updated>
<author>
<name>Zhaoyang Huang</name>
<email>zhaoyang.huang@unisoc.com</email>
</author>
<published>2024-06-07T02:31:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=88e0ad40d08a73a74c597e69f4cd2d1fba3838b5'/>
<id>urn:sha1:88e0ad40d08a73a74c597e69f4cd2d1fba3838b5</id>
<content type='text'>
commit 8c61291fd8500e3b35c7ec0c781b273d8cc96cde upstream.

xa_for_each() in _vm_unmap_aliases() loops through all vbs.  However,
since commit 062eacf57ad9 ("mm: vmalloc: remove a global vmap_blocks
xarray") the vb from xarray may not be on the corresponding CPU
vmap_block_queue.  Consequently, purge_fragmented_block() might use the
wrong vbq-&gt;lock to protect the free list, leading to vbq-&gt;free breakage.

Incorrect lock protection can exhaust all vmalloc space as follows:
CPU0                                            CPU1
+--------------------------------------------+
|    +--------------------+     +-----+      |
+--&gt; |                    |----&gt;|     |------+
     | CPU1:vbq free_list |     | vb1 |
+--- |                    |&lt;----|     |&lt;-----+
|    +--------------------+     +-----+      |
+--------------------------------------------+

_vm_unmap_aliases()                             vb_alloc()
                                                new_vmap_block()
xa_for_each(&amp;vbq-&gt;vmap_blocks, idx, vb)
--&gt; vb in CPU1:vbq-&gt;freelist

purge_fragmented_block(vb)
spin_lock(&amp;vbq-&gt;lock)                           spin_lock(&amp;vbq-&gt;lock)
--&gt; use CPU0:vbq-&gt;lock                          --&gt; use CPU1:vbq-&gt;lock

list_del_rcu(&amp;vb-&gt;free_list)                    list_add_tail_rcu(&amp;vb-&gt;free_list, &amp;vbq-&gt;free)
    __list_del(vb-&gt;prev, vb-&gt;next)
        next-&gt;prev = prev
    +--------------------+
    |                    |
    | CPU1:vbq free_list |
+---|                    |&lt;--+
|   +--------------------+   |
+----------------------------+
                                                __list_add(new, head-&gt;prev, head)
+--------------------------------------------+
|    +--------------------+     +-----+      |
+--&gt; |                    |----&gt;|     |------+
     | CPU1:vbq free_list |     | vb2 |
+--- |                    |&lt;----|     |&lt;-----+
|    +--------------------+     +-----+      |
+--------------------------------------------+

        prev-&gt;next = next
+--------------------------------------------+
|----------------------------+               |
|    +--------------------+  |  +-----+      |
+--&gt; |                    |--+  |     |------+
     | CPU1:vbq free_list |     | vb2 |
+--- |                    |&lt;----|     |&lt;-----+
|    +--------------------+     +-----+      |
+--------------------------------------------+
Here’s a list breakdown. All vbs, which were to be added to
‘prev’, cannot be used by list_for_each_entry_rcu(vb, &amp;vbq-&gt;free,
free_list) in vb_alloc(). Thus, vmalloc space is exhausted.

This issue affects both erofs and f2fs, the stacktrace is as follows:
erofs:
[&lt;ffffffd4ffb93ad4&gt;] __switch_to+0x174
[&lt;ffffffd4ffb942f0&gt;] __schedule+0x624
[&lt;ffffffd4ffb946f4&gt;] schedule+0x7c
[&lt;ffffffd4ffb947cc&gt;] schedule_preempt_disabled+0x24
[&lt;ffffffd4ffb962ec&gt;] __mutex_lock+0x374
[&lt;ffffffd4ffb95998&gt;] __mutex_lock_slowpath+0x14
[&lt;ffffffd4ffb95954&gt;] mutex_lock+0x24
[&lt;ffffffd4fef2900c&gt;] reclaim_and_purge_vmap_areas+0x44
[&lt;ffffffd4fef25908&gt;] alloc_vmap_area+0x2e0
[&lt;ffffffd4fef24ea0&gt;] vm_map_ram+0x1b0
[&lt;ffffffd4ff1b46f4&gt;] z_erofs_lz4_decompress+0x278
[&lt;ffffffd4ff1b8ac4&gt;] z_erofs_decompress_queue+0x650
[&lt;ffffffd4ff1b8328&gt;] z_erofs_runqueue+0x7f4
[&lt;ffffffd4ff1b66a8&gt;] z_erofs_read_folio+0x104
[&lt;ffffffd4feeb6fec&gt;] filemap_read_folio+0x6c
[&lt;ffffffd4feeb68c4&gt;] filemap_fault+0x300
[&lt;ffffffd4fef0ecac&gt;] __do_fault+0xc8
[&lt;ffffffd4fef0c908&gt;] handle_mm_fault+0xb38
[&lt;ffffffd4ffb9f008&gt;] do_page_fault+0x288
[&lt;ffffffd4ffb9ed64&gt;] do_translation_fault[jt]+0x40
[&lt;ffffffd4fec39c78&gt;] do_mem_abort+0x58
[&lt;ffffffd4ffb8c3e4&gt;] el0_ia+0x70
[&lt;ffffffd4ffb8c260&gt;] el0t_64_sync_handler[jt]+0xb0
[&lt;ffffffd4fec11588&gt;] ret_to_user[jt]+0x0

f2fs:
[&lt;ffffffd4ffb93ad4&gt;] __switch_to+0x174
[&lt;ffffffd4ffb942f0&gt;] __schedule+0x624
[&lt;ffffffd4ffb946f4&gt;] schedule+0x7c
[&lt;ffffffd4ffb947cc&gt;] schedule_preempt_disabled+0x24
[&lt;ffffffd4ffb962ec&gt;] __mutex_lock+0x374
[&lt;ffffffd4ffb95998&gt;] __mutex_lock_slowpath+0x14
[&lt;ffffffd4ffb95954&gt;] mutex_lock+0x24
[&lt;ffffffd4fef2900c&gt;] reclaim_and_purge_vmap_areas+0x44
[&lt;ffffffd4fef25908&gt;] alloc_vmap_area+0x2e0
[&lt;ffffffd4fef24ea0&gt;] vm_map_ram+0x1b0
[&lt;ffffffd4ff1a3b60&gt;] f2fs_prepare_decomp_mem+0x144
[&lt;ffffffd4ff1a6c24&gt;] f2fs_alloc_dic+0x264
[&lt;ffffffd4ff175468&gt;] f2fs_read_multi_pages+0x428
[&lt;ffffffd4ff17b46c&gt;] f2fs_mpage_readpages+0x314
[&lt;ffffffd4ff1785c4&gt;] f2fs_readahead+0x50
[&lt;ffffffd4feec3384&gt;] read_pages+0x80
[&lt;ffffffd4feec32c0&gt;] page_cache_ra_unbounded+0x1a0
[&lt;ffffffd4feec39e8&gt;] page_cache_ra_order+0x274
[&lt;ffffffd4feeb6cec&gt;] do_sync_mmap_readahead+0x11c
[&lt;ffffffd4feeb6764&gt;] filemap_fault+0x1a0
[&lt;ffffffd4ff1423bc&gt;] f2fs_filemap_fault+0x28
[&lt;ffffffd4fef0ecac&gt;] __do_fault+0xc8
[&lt;ffffffd4fef0c908&gt;] handle_mm_fault+0xb38
[&lt;ffffffd4ffb9f008&gt;] do_page_fault+0x288
[&lt;ffffffd4ffb9ed64&gt;] do_translation_fault[jt]+0x40
[&lt;ffffffd4fec39c78&gt;] do_mem_abort+0x58
[&lt;ffffffd4ffb8c3e4&gt;] el0_ia+0x70
[&lt;ffffffd4ffb8c260&gt;] el0t_64_sync_handler[jt]+0xb0
[&lt;ffffffd4fec11588&gt;] ret_to_user[jt]+0x0

To fix this, introducee cpu within vmap_block to record which this vb
belongs to.

Link: https://lkml.kernel.org/r/20240614021352.1822225-1-zhaoyang.huang@unisoc.com
Link: https://lkml.kernel.org/r/20240607023116.1720640-1-zhaoyang.huang@unisoc.com
Fixes: fc1e0d980037 ("mm/vmalloc: prevent stale TLBs in fully utilized blocks")
Signed-off-by: Zhaoyang Huang &lt;zhaoyang.huang@unisoc.com&gt;
Suggested-by: Hailong.Liu &lt;hailong.liu@oppo.com&gt;
Reviewed-by: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL</title>
<updated>2024-06-16T11:47:42+00:00</updated>
<author>
<name>Hailong.Liu</name>
<email>hailong.liu@oppo.com</email>
</author>
<published>2024-05-10T10:01:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c55d3564ad25ce87ab7cc6af251f9574faebd8da'/>
<id>urn:sha1:c55d3564ad25ce87ab7cc6af251f9574faebd8da</id>
<content type='text'>
commit 8e0545c83d672750632f46e3f9ad95c48c91a0fc upstream.

commit a421ef303008 ("mm: allow !GFP_KERNEL allocations for kvmalloc")
includes support for __GFP_NOFAIL, but it presents a conflict with commit
dd544141b9eb ("vmalloc: back off when the current task is OOM-killed").  A
possible scenario is as follows:

process-a
__vmalloc_node_range(GFP_KERNEL | __GFP_NOFAIL)
    __vmalloc_area_node()
        vm_area_alloc_pages()
		--&gt; oom-killer send SIGKILL to process-a
        if (fatal_signal_pending(current)) break;
--&gt; return NULL;

To fix this, do not check fatal_signal_pending() in vm_area_alloc_pages()
if __GFP_NOFAIL set.

This issue occurred during OPLUS KASAN TEST. Below is part of the log
-&gt; oom-killer sends signal to process
[65731.222840] [ T1308] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/apps/uid_10198,task=gs.intelligence,pid=32454,uid=10198

[65731.259685] [T32454] Call trace:
[65731.259698] [T32454]  dump_backtrace+0xf4/0x118
[65731.259734] [T32454]  show_stack+0x18/0x24
[65731.259756] [T32454]  dump_stack_lvl+0x60/0x7c
[65731.259781] [T32454]  dump_stack+0x18/0x38
[65731.259800] [T32454]  mrdump_common_die+0x250/0x39c [mrdump]
[65731.259936] [T32454]  ipanic_die+0x20/0x34 [mrdump]
[65731.260019] [T32454]  atomic_notifier_call_chain+0xb4/0xfc
[65731.260047] [T32454]  notify_die+0x114/0x198
[65731.260073] [T32454]  die+0xf4/0x5b4
[65731.260098] [T32454]  die_kernel_fault+0x80/0x98
[65731.260124] [T32454]  __do_kernel_fault+0x160/0x2a8
[65731.260146] [T32454]  do_bad_area+0x68/0x148
[65731.260174] [T32454]  do_mem_abort+0x151c/0x1b34
[65731.260204] [T32454]  el1_abort+0x3c/0x5c
[65731.260227] [T32454]  el1h_64_sync_handler+0x54/0x90
[65731.260248] [T32454]  el1h_64_sync+0x68/0x6c

[65731.260269] [T32454]  z_erofs_decompress_queue+0x7f0/0x2258
--&gt; be-&gt;decompressed_pages = kvcalloc(be-&gt;nr_pages, sizeof(struct page *), GFP_KERNEL | __GFP_NOFAIL);
	kernel panic by NULL pointer dereference.
	erofs assume kvmalloc with __GFP_NOFAIL never return NULL.
[65731.260293] [T32454]  z_erofs_runqueue+0xf30/0x104c
[65731.260314] [T32454]  z_erofs_readahead+0x4f0/0x968
[65731.260339] [T32454]  read_pages+0x170/0xadc
[65731.260364] [T32454]  page_cache_ra_unbounded+0x874/0xf30
[65731.260388] [T32454]  page_cache_ra_order+0x24c/0x714
[65731.260411] [T32454]  filemap_fault+0xbf0/0x1a74
[65731.260437] [T32454]  __do_fault+0xd0/0x33c
[65731.260462] [T32454]  handle_mm_fault+0xf74/0x3fe0
[65731.260486] [T32454]  do_mem_abort+0x54c/0x1b34
[65731.260509] [T32454]  el0_da+0x44/0x94
[65731.260531] [T32454]  el0t_64_sync_handler+0x98/0xb4
[65731.260553] [T32454]  el0t_64_sync+0x198/0x19c

Link: https://lkml.kernel.org/r/20240510100131.1865-1-hailong.liu@oppo.com
Fixes: 9376130c390a ("mm/vmalloc: add support for __GFP_NOFAIL")
Signed-off-by: Hailong.Liu &lt;hailong.liu@oppo.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Suggested-by: Barry Song &lt;21cnbao@gmail.com&gt;
Reported-by: Oven &lt;liyangouwen1@oppo.com&gt;
Reviewed-by: Barry Song &lt;baohua@kernel.org&gt;
Reviewed-by: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Cc: Chao Yu &lt;chao@kernel.org&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Gao Xiang &lt;xiang@kernel.org&gt;
Cc: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: hugetlb: add huge page size param to set_huge_pte_at()</title>
<updated>2023-09-30T00:20:47+00:00</updated>
<author>
<name>Ryan Roberts</name>
<email>ryan.roberts@arm.com</email>
</author>
<published>2023-09-22T11:58:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=935d4f0c6dc8b3533e6e39346de7389a84490178'/>
<id>urn:sha1:935d4f0c6dc8b3533e6e39346de7389a84490178</id>
<content type='text'>
Patch series "Fix set_huge_pte_at() panic on arm64", v2.

This series fixes a bug in arm64's implementation of set_huge_pte_at(),
which can result in an unprivileged user causing a kernel panic.  The
problem was triggered when running the new uffd poison mm selftest for
HUGETLB memory.  This test (and the uffd poison feature) was merged for
v6.5-rc7.

Ideally, I'd like to get this fix in for v6.6 and I've cc'ed stable
(correctly this time) to get it backported to v6.5, where the issue first
showed up.


Description of Bug
==================

arm64's huge pte implementation supports multiple huge page sizes, some of
which are implemented in the page table with multiple contiguous entries. 
So set_huge_pte_at() needs to work out how big the logical pte is, so that
it can also work out how many physical ptes (or pmds) need to be written. 
It previously did this by grabbing the folio out of the pte and querying
its size.

However, there are cases when the pte being set is actually a swap entry. 
But this also used to work fine, because for huge ptes, we only ever saw
migration entries and hwpoison entries.  And both of these types of swap
entries have a PFN embedded, so the code would grab that and everything
still worked out.

But over time, more calls to set_huge_pte_at() have been added that set
swap entry types that do not embed a PFN.  And this causes the code to go
bang.  The triggering case is for the uffd poison test, commit
99aa77215ad0 ("selftests/mm: add uffd unit test for UFFDIO_POISON"), which
causes a PTE_MARKER_POISONED swap entry to be set, coutesey of commit
8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") -
added in v6.5-rc7.  Although review shows that there are other call sites
that set PTE_MARKER_UFFD_WP (which also has no PFN), these don't trigger
on arm64 because arm64 doesn't support UFFD WP.

If CONFIG_DEBUG_VM is enabled, we do at least get a BUG(), but otherwise,
it will dereference a bad pointer in page_folio():

    static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry)
    {
        VM_BUG_ON(!is_migration_entry(entry) &amp;&amp; !is_hwpoison_entry(entry));

        return page_folio(pfn_to_page(swp_offset_pfn(entry)));
    }


Fix
===

The simplest fix would have been to revert the dodgy cleanup commit
18f3962953e4 ("mm: hugetlb: kill set_huge_swap_pte_at()"), but since
things have moved on, this would have required an audit of all the new
set_huge_pte_at() call sites to see if they should be converted to
set_huge_swap_pte_at().  As per the original intent of the change, it
would also leave us open to future bugs when people invariably get it
wrong and call the wrong helper.

So instead, I've added a huge page size parameter to set_huge_pte_at(). 
This means that the arm64 code has the size in all cases.  It's a bigger
change, due to needing to touch the arches that implement the function,
but it is entirely mechanical, so in my view, low risk.

I've compile-tested all touched arches; arm64, parisc, powerpc, riscv,
s390, sparc (and additionally x86_64).  I've additionally booted and run
mm selftests against arm64, where I observe the uffd poison test is fixed,
and there are no other regressions.


This patch (of 2):

In order to fix a bug, arm64 needs to be told the size of the huge page
for which the pte is being set in set_huge_pte_at().  Provide for this by
adding an `unsigned long sz` parameter to the function.  This follows the
same pattern as huge_pte_clear().

This commit makes the required interface modifications to the core mm as
well as all arches that implement this function (arm64, parisc, powerpc,
riscv, s390, sparc).  The actual arm64 bug will be fixed in a separate
commit.

No behavioral changes intended.

Link: https://lkml.kernel.org/r/20230922115804.2043771-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20230922115804.2043771-2-ryan.roberts@arm.com
Fixes: 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
Signed-off-by: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Reviewed-by: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;	[powerpc 8xx]
Reviewed-by: Lorenzo Stoakes &lt;lstoakes@gmail.com&gt;	[vmalloc change]
Cc: Alexandre Ghiti &lt;alex@ghiti.fr&gt;
Cc: Albert Ou &lt;aou@eecs.berkeley.edu&gt;
Cc: Alexander Gordeev &lt;agordeev@linux.ibm.com&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Axel Rasmussen &lt;axelrasmussen@google.com&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Christian Borntraeger &lt;borntraeger@linux.ibm.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Gerald Schaefer &lt;gerald.schaefer@linux.ibm.com&gt;
Cc: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Cc: Helge Deller &lt;deller@gmx.de&gt;
Cc: "James E.J. Bottomley" &lt;James.Bottomley@HansenPartnership.com&gt;
Cc: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: Palmer Dabbelt &lt;palmer@dabbelt.com&gt;
Cc: Paul Walmsley &lt;paul.walmsley@sifive.com&gt;
Cc: Peter Xu &lt;peterx@redhat.com&gt;
Cc: Qi Zheng &lt;zhengqi.arch@bytedance.com&gt;
Cc: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: SeongJae Park &lt;sj@kernel.org&gt;
Cc: Sven Schnelle &lt;svens@linux.ibm.com&gt;
Cc: Uladzislau Rezki (Sony) &lt;urezki@gmail.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[6.5+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
</feed>
