<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include, branch v6.1.38</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.38</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.1.38'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2023-07-05T17:27:37+00:00</updated>
<entry>
<title>execve: always mark stack as growing down during early stack setup</title>
<updated>2023-07-05T17:27:37+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-07-03T06:20:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=296927dbae7d2418cfd65b98788cbec2fd48afa1'/>
<id>urn:sha1:296927dbae7d2418cfd65b98788cbec2fd48afa1</id>
<content type='text'>
commit f66066bc5136f25e36a2daff4896c768f18c211e upstream.

While our user stacks can grow either down (all common architectures) or
up (parisc and the ia64 register stack), the initial stack setup when we
copy the argument and environment strings to the new stack at execve()
time is always done by extending the stack downwards.

But it turns out that in commit 8d7071af8907 ("mm: always expand the
stack with the mmap write lock held"), as part of making the stack
growing code more robust, 'expand_downwards()' was now made to actually
check the vma flags:

	if (!(vma-&gt;vm_flags &amp; VM_GROWSDOWN))
		return -EFAULT;

and that meant that this execve-time stack expansion started failing on
parisc, because on that architecture, the stack flags do not contain the
VM_GROWSDOWN bit.

At the same time the new check in expand_downwards() is clearly correct,
and simplified the callers, so let's not remove it.

The solution is instead to just codify the fact that yes, during
execve(), the stack grows down.  This not only matches reality, it ends
up being particularly simple: we already have special execve-time flags
for the stack (VM_STACK_INCOMPLETE_SETUP) and use those flags to avoid
page migration during this setup time (see vma_is_temporary_stack() and
invalid_migration_vma()).

So just add VM_GROWSDOWN to that set of temporary flags, and now our
stack flags automatically match reality, and the parisc stack expansion
works again.

Note that the VM_STACK_INCOMPLETE_SETUP bits will be cleared when the
stack is finalized, so we only add the extra VM_GROWSDOWN bit on
CONFIG_STACK_GROWSUP architectures (ie parisc) rather than adding it in
general.

Link: https://lore.kernel.org/all/612eaa53-6904-6e16-67fc-394f4faa0e16@bell.net/
Link: https://lore.kernel.org/all/5fd98a09-4792-1433-752d-029ae3545168@gmx.de/
Fixes: 8d7071af8907 ("mm: always expand the stack with the mmap write lock held")
Reported-by: John David Anglin &lt;dave.anglin@bell.net&gt;
Reported-and-tested-by: Helge Deller &lt;deller@gmx.de&gt;
Reported-and-tested-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>xtensa: fix NOMMU build with lock_mm_and_find_vma() conversion</title>
<updated>2023-07-01T11:16:27+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-07-01T01:24:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=323846590c55fd9b05dfb9d768d76583a556d254'/>
<id>urn:sha1:323846590c55fd9b05dfb9d768d76583a556d254</id>
<content type='text'>
commit d85a143b69abb4d7544227e26d12c4c7735ab27d upstream.

It turns out that xtensa has a really odd configuration situation: you
can do a no-MMU config, but still have the page fault code enabled.
Which doesn't sound all that sensible, but it turns out that xtensa can
have protection faults even without the MMU, and we have this:

    config PFAULT
        bool "Handle protection faults" if EXPERT &amp;&amp; !MMU
        default y
        help
          Handle protection faults. MMU configurations must enable it.
          noMMU configurations may disable it if used memory map never
          generates protection faults or faults are always fatal.

          If unsure, say Y.

which completely violated my expectations of the page fault handling.

End result: Guenter reports that the xtensa no-MMU builds all fail with

  arch/xtensa/mm/fault.c: In function ‘do_page_fault’:
  arch/xtensa/mm/fault.c:133:8: error: implicit declaration of function ‘lock_mm_and_find_vma’

because I never exposed the new lock_mm_and_find_vma() function for the
no-MMU case.

Doing so is simple enough, and fixes the problem.

Reported-and-tested-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Fixes: a050ba1e7422 ("mm/fault: convert remaining simple cases to lock_mm_and_find_vma()")
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: always expand the stack with the mmap write lock held</title>
<updated>2023-07-01T11:16:25+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-06-24T20:45:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e6bbad75712a97b9b16433563c1358652a33003e'/>
<id>urn:sha1:e6bbad75712a97b9b16433563c1358652a33003e</id>
<content type='text'>
commit 8d7071af890768438c14db6172cc8f9f4d04e184 upstream

This finishes the job of always holding the mmap write lock when
extending the user stack vma, and removes the 'write_locked' argument
from the vm helper functions again.

For some cases, we just avoid expanding the stack at all: drivers and
page pinning really shouldn't be extending any stacks.  Let's see if any
strange users really wanted that.

It's worth noting that architectures that weren't converted to the new
lock_mm_and_find_vma() helper function are left using the legacy
"expand_stack()" function, but it has been changed to drop the mmap_lock
and take it for writing while expanding the vma.  This makes it fairly
straightforward to convert the remaining architectures.

As a result of dropping and re-taking the lock, the calling conventions
for this function have also changed, since the old vma may no longer be
valid.  So it will now return the new vma if successful, and NULL - and
the lock dropped - if the area could not be extended.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[6.1: Patch drivers/iommu/io-pgfault.c instead]
Signed-off-by: Samuel Mendoza-Jonas &lt;samjonas@amazon.com&gt;
Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: make find_extend_vma() fail if write lock not held</title>
<updated>2023-07-01T11:16:25+00:00</updated>
<author>
<name>Liam R. Howlett</name>
<email>Liam.Howlett@oracle.com</email>
</author>
<published>2023-06-16T22:58:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6a6b5616c3d04eba12dd0abc0522e5bae5f1ee5a'/>
<id>urn:sha1:6a6b5616c3d04eba12dd0abc0522e5bae5f1ee5a</id>
<content type='text'>
commit f440fa1ac955e2898893f9301568435eb5cdfc4b upstream.

Make calls to extend_vma() and find_extend_vma() fail if the write lock
is required.

To avoid making this a flag-day event, this still allows the old
read-locking case for the trivial situations, and passes in a flag to
say "is it write-locked".  That way write-lockers can say "yes, I'm
being careful", and legacy users will continue to work in all the common
cases until they have been fully converted to the new world order.

Co-Developed-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Signed-off-by: Liam R. Howlett &lt;Liam.Howlett@oracle.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Samuel Mendoza-Jonas &lt;samjonas@amazon.com&gt;
Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: introduce new 'lock_mm_and_find_vma()' page fault helper</title>
<updated>2023-07-01T11:16:24+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-06-15T22:17:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d6a5c7a1a6e52d4c46fe181237ca96cd46a42386'/>
<id>urn:sha1:d6a5c7a1a6e52d4c46fe181237ca96cd46a42386</id>
<content type='text'>
commit c2508ec5a58db67093f4fb8bf89a9a7c53a109e9 upstream.

.. and make x86 use it.

This basically extracts the existing x86 "find and expand faulting vma"
code, but extends it to also take the mmap lock for writing in case we
actually do need to expand the vma.

We've historically short-circuited that case, and have some rather ugly
special logic to serialize the stack segment expansion (since we only
hold the mmap lock for reading) that doesn't match the normal VM
locking.

That slight violation of locking worked well, right up until it didn't:
the maple tree code really does want proper locking even for simple
extension of an existing vma.

So extract the code for "look up the vma of the fault" from x86, fix it
up to do the necessary write locking, and make it available as a helper
function for other architectures that can use the common helper.

Note: I say "common helper", but it really only handles the normal
stack-grows-down case.  Which is all architectures except for PA-RISC
and IA64.  So some rare architectures can't use the helper, but if they
care they'll just need to open-code this logic.

It's also worth pointing out that this code really would like to have an
optimistic "mmap_upgrade_trylock()" to make it quicker to go from a
read-lock (for the common case) to taking the write lock (for having to
extend the vma) in the normal single-threaded situation where there is
no other locking activity.

But that _is_ all the very uncommon special case, so while it would be
nice to have such an operation, it probably doesn't matter in reality.
I did put in the skeleton code for such a possible future expansion,
even if it only acts as pseudo-documentation for what we're doing.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[6.1: Ignore CONFIG_PER_VMA_LOCK context]
Signed-off-by: Samuel Mendoza-Jonas &lt;samjonas@amazon.com&gt;
Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm, hwpoison: when copy-on-write hits poison, take page offline</title>
<updated>2023-07-01T11:16:22+00:00</updated>
<author>
<name>Tony Luck</name>
<email>tony.luck@intel.com</email>
</author>
<published>2022-10-21T20:01:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=84f077802e56ae43f4b6c6eb9ad59b19df9db374'/>
<id>urn:sha1:84f077802e56ae43f4b6c6eb9ad59b19df9db374</id>
<content type='text'>
commit d302c2398ba269e788a4f37ae57c07a7fcabaa42 upstream.

Cannot call memory_failure() directly from the fault handler because
mmap_lock (and others) are held.

It is important, but not urgent, to mark the source page as h/w poisoned
and unmap it from other tasks.

Use memory_failure_queue() to request a call to memory_failure() for the
page with the error.

Also provide a stub version for CONFIG_MEMORY_FAILURE=n

Link: https://lkml.kernel.org/r/20221021200120.175753-3-tony.luck@intel.com
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Reviewed-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: Shuai Xue &lt;xueshuai@linux.alibaba.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
[ Due to missing commits
  e591ef7d96d6e ("mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage")
  5033091de814a ("mm/hwpoison: introduce per-memory_block hwpoison counter")
  The impact of e591ef7d96d6e is its introduction of an additional flag in
  __get_huge_page_for_hwpoison() that serves as an indication a hwpoisoned
  hugetlb page should have its migratable bit cleared.
  The impact of 5033091de814a is contexual.
  Resolve by ignoring both missing commits. - jane]
Signed-off-by: Jane Chu &lt;jane.chu@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm, hwpoison: try to recover from copy-on write faults</title>
<updated>2023-07-01T11:16:22+00:00</updated>
<author>
<name>Tony Luck</name>
<email>tony.luck@intel.com</email>
</author>
<published>2022-10-21T20:01:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4af5960d7cd46c3834f65b75577b775cbcd0f7b2'/>
<id>urn:sha1:4af5960d7cd46c3834f65b75577b775cbcd0f7b2</id>
<content type='text'>
commit a873dfe1032a132bf89f9e19a6ac44f5a0b78754 upstream.

Patch series "Copy-on-write poison recovery", v3.

Part 1 deals with the process that triggered the copy on write fault with
a store to a shared read-only page.  That process is send a SIGBUS with
the usual machine check decoration to specify the virtual address of the
lost page, together with the scope.

Part 2 sets up to asynchronously take the page with the uncorrected error
offline to prevent additional machine check faults.  H/t to Miaohe Lin
&lt;linmiaohe@huawei.com&gt; and Shuai Xue &lt;xueshuai@linux.alibaba.com&gt; for
pointing me to the existing function to queue a call to memory_failure().

On x86 there is some duplicate reporting (because the error is also
signalled by the memory controller as well as by the core that triggered
the machine check).  Console logs look like this:


This patch (of 2):

If the kernel is copying a page as the result of a copy-on-write
fault and runs into an uncorrectable error, Linux will crash because
it does not have recovery code for this case where poison is consumed
by the kernel.

It is easy to set up a test case. Just inject an error into a private
page, fork(2), and have the child process write to the page.

I wrapped that neatly into a test at:

  git://git.kernel.org/pub/scm/linux/kernel/git/aegl/ras-tools.git

just enable ACPI error injection and run:

  # ./einj_mem-uc -f copy-on-write

Add a new copy_user_highpage_mc() function that uses copy_mc_to_kernel()
on architectures where that is available (currently x86 and powerpc).
When an error is detected during the page copy, return VM_FAULT_HWPOISON
to caller of wp_page_copy(). This propagates up the call stack. Both x86
and powerpc have code in their fault handler to deal with this code by
sending a SIGBUS to the application.

Note that this patch avoids a system crash and signals the process that
triggered the copy-on-write action. It does not take any action for the
memory error that is still in the shared page. To handle that a call to
memory_failure() is needed. But this cannot be done from wp_page_copy()
because it holds mmap_lock(). Perhaps the architecture fault handlers
can deal with this loose end in a subsequent patch?

On Intel/x86 this loose end will often be handled automatically because
the memory controller provides an additional notification of the h/w
poison in memory, the handler for this will call memory_failure(). This
isn't a 100% solution. If there are multiple errors, not all may be
logged in this way.

[tony.luck@intel.com: add call to kmsan_unpoison_memory(), per Miaohe Lin]
  Link: https://lkml.kernel.org/r/20221031201029.102123-2-tony.luck@intel.com
Link: https://lkml.kernel.org/r/20221021200120.175753-1-tony.luck@intel.com
Link: https://lkml.kernel.org/r/20221021200120.175753-2-tony.luck@intel.com
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Reviewed-by: Naoya Horiguchi &lt;naoya.horiguchi@nec.com&gt;
Reviewed-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Reviewed-by: Alexander Potapenko &lt;glider@google.com&gt;
Tested-by: Shuai Xue &lt;xueshuai@linux.alibaba.com&gt;
Cc: Christophe Leroy &lt;christophe.leroy@csgroup.eu&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Igned-off-by: Jane Chu &lt;jane.chu@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>gpiolib: Fix irq_domain resource tracking for gpiochip_irqchip_add_domain()</title>
<updated>2023-06-28T09:12:35+00:00</updated>
<author>
<name>Michael Walle</name>
<email>mwalle@kernel.org</email>
</author>
<published>2023-06-19T08:56:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0357259cb103f1e48b4372bab50be45f2591c49f'/>
<id>urn:sha1:0357259cb103f1e48b4372bab50be45f2591c49f</id>
<content type='text'>
[ Upstream commit ff7a1790fbf92f1bdd0966d3f0da3ea808ede876 ]

Up until commit 6a45b0e2589f ("gpiolib: Introduce
gpiochip_irqchip_add_domain()") all irq_domains were allocated
by gpiolib itself and thus gpiolib also takes care of freeing it.

With gpiochip_irqchip_add_domain() a user of gpiolib can associate an
irq_domain with the gpio_chip. This irq_domain is not managed by
gpiolib and therefore must not be freed by gpiolib.

Fixes: 6a45b0e2589f ("gpiolib: Introduce gpiochip_irqchip_add_domain()")
Reported-by: Jiawen Wu &lt;jiawenwu@trustnetic.com&gt;
Signed-off-by: Michael Walle &lt;mwalle@kernel.org&gt;
Reviewed-by: Linus Walleij &lt;linus.walleij@linaro.org&gt;
Reviewed-by: Andy Shevchenko &lt;andy@kernel.org&gt;
Signed-off-by: Bartosz Golaszewski &lt;bartosz.golaszewski@linaro.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_tables: reject unbound anonymous set before commit phase</title>
<updated>2023-06-28T09:12:32+00:00</updated>
<author>
<name>Pablo Neira Ayuso</name>
<email>pablo@netfilter.org</email>
</author>
<published>2023-06-16T13:21:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=46f801ab5fb90d8aadeecd00bd2582cb0acc43b6'/>
<id>urn:sha1:46f801ab5fb90d8aadeecd00bd2582cb0acc43b6</id>
<content type='text'>
[ Upstream commit 938154b93be8cd611ddfd7bafc1849f3c4355201 ]

Add a new list to track set transaction and to check for unbound
anonymous sets before entering the commit phase.

Bail out at the end of the transaction handling if an anonymous set
remains unbound.

Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>netfilter: nf_tables: drop map element references from preparation phase</title>
<updated>2023-06-28T09:12:32+00:00</updated>
<author>
<name>Pablo Neira Ayuso</name>
<email>pablo@netfilter.org</email>
</author>
<published>2023-06-16T12:51:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=d60be2da67d172aecf866302c91ea11533eca4d9'/>
<id>urn:sha1:d60be2da67d172aecf866302c91ea11533eca4d9</id>
<content type='text'>
[ Upstream commit 628bd3e49cba1c066228e23d71a852c23e26da73 ]

set .destroy callback releases the references to other objects in maps.
This is very late and it results in spurious EBUSY errors. Drop refcount
from the preparation phase instead, update set backend not to drop
reference counter from set .destroy path.

Exceptions: NFT_TRANS_PREPARE_ERROR does not require to drop the
reference counter because the transaction abort path releases the map
references for each element since the set is unbound. The abort path
also deals with releasing reference counter for new elements added to
unbound sets.

Fixes: 591054469b3e ("netfilter: nf_tables: revisit chain/object refcounting from elements")
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
