<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/arch, branch v6.6.66</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.66</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.66'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2024-12-14T19:00:21+00:00</updated>
<entry>
<title>KVM: x86/mmu: Ensure that kvm_release_pfn_clean() takes exact pfn from kvm_faultin_pfn()</title>
<updated>2024-12-14T19:00:21+00:00</updated>
<author>
<name>Nikolay Kuratov</name>
<email>kniv@yandex-team.ru</email>
</author>
<published>2024-12-08T08:38:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0d5c7fcfa5853462fd161389182fb975eeb1bd2a'/>
<id>urn:sha1:0d5c7fcfa5853462fd161389182fb975eeb1bd2a</id>
<content type='text'>
Since 5.16 and prior to 6.13 KVM can't be used with FSDAX
guest memory (PMD pages). To reproduce the issue you need to reserve
guest memory with `memmap=` cmdline, create and mount FS in DAX mode
(tested both XFS and ext4), see doc link below. ndctl command for test:
ndctl create-namespace -v -e namespace1.0 --map=dev --mode=fsdax -a 2M
Then pass memory object to qemu like:
-m 8G -object memory-backend-file,id=ram0,size=8G,\
mem-path=/mnt/pmem/guestmem,share=on,prealloc=on,dump=off,align=2097152 \
-numa node,memdev=ram0,cpus=0-1
QEMU fails to run guest with error: kvm run failed Bad address
and there are two warnings in dmesg:
WARN_ON_ONCE(!page_count(page)) in kvm_is_zone_device_page() and
WARN_ON_ONCE(folio_ref_count(folio) &lt;= 0) in try_grab_folio() (v6.6.63)

It looks like in the past assumption was made that pfn won't change from
faultin_pfn() to release_pfn_clean(), e.g. see
commit 4cd071d13c5c ("KVM: x86/mmu: Move calls to thp_adjust() down a level")
But kvm_page_fault structure made pfn part of mutable state, so
now release_pfn_clean() can take hugepage-adjusted pfn.
And it works for all cases (/dev/shm, hugetlb, devdax) except fsdax.
Apparently in fsdax mode faultin-pfn and adjusted-pfn may refer to
different folios, so we're getting get_page/put_page imbalance.

To solve this preserve faultin pfn in separate local variable
and pass it in kvm_release_pfn_clean().

Patch tested for all mentioned guest memory backends with tdp_mmu={0,1}.

No bug in upstream as it was solved fundamentally by
commit 8dd861cc07e2 ("KVM: x86/mmu: Put refcounted pages instead of blindly releasing pfns")
and related patch series.

Link: https://nvdimm.docs.kernel.org/2mib_fs_dax.html
Fixes: 2f6305dd5676 ("KVM: MMU: change kvm_tdp_mmu_map() arguments to kvm_page_fault")
Co-developed-by: Sean Christopherson &lt;seanjc@google.com&gt;
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
Reviewed-by: Sean Christopherson &lt;seanjc@google.com&gt;
Signed-off-by: Nikolay Kuratov &lt;kniv@yandex-team.ru&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>x86: Fix build regression with CONFIG_KEXEC_JUMP enabled</title>
<updated>2024-12-14T19:00:20+00:00</updated>
<author>
<name>Damien Le Moal</name>
<email>dlemoal@kernel.org</email>
</author>
<published>2024-12-08T23:53:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1240225d838bac49c4a760a7f2b48864828ca320'/>
<id>urn:sha1:1240225d838bac49c4a760a7f2b48864828ca320</id>
<content type='text'>
[ Upstream commit aeb68937614f4aeceaaa762bd7f0212ce842b797 ]

Build 6.13-rc12 for x86_64 with gcc 14.2.1 fails with the error:

  ld: vmlinux.o: in function `virtual_mapped':
  linux/arch/x86/kernel/relocate_kernel_64.S:249:(.text+0x5915b): undefined reference to `saved_context_gdt_desc'

when CONFIG_KEXEC_JUMP is enabled.

This was introduced by commit 07fa619f2a40 ("x86/kexec: Restore GDT on
return from ::preserve_context kexec") which introduced a use of
saved_context_gdt_desc without a declaration for it.

Fix that by including asm/asm-offsets.h where saved_context_gdt_desc
is defined (indirectly in include/generated/asm-offsets.h which
asm/asm-offsets.h includes).

Fixes: 07fa619f2a40 ("x86/kexec: Restore GDT on return from ::preserve_context kexec")
Signed-off-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Acked-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Acked-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Closes: https://lore.kernel.org/oe-kbuild-all/202411270006.ZyyzpYf8-lkp@intel.com/
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>powerpc/prom_init: Fixup missing powermac #size-cells</title>
<updated>2024-12-14T19:00:16+00:00</updated>
<author>
<name>Michael Ellerman</name>
<email>mpe@ellerman.id.au</email>
</author>
<published>2024-11-26T02:57:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=691284c2cd33ffaa0b35ce53b3286b90621e9dc9'/>
<id>urn:sha1:691284c2cd33ffaa0b35ce53b3286b90621e9dc9</id>
<content type='text'>
[ Upstream commit cf89c9434af122f28a3552e6f9cc5158c33ce50a ]

On some powermacs `escc` nodes are missing `#size-cells` properties,
which is deprecated and now triggers a warning at boot since commit
045b14ca5c36 ("of: WARN on deprecated #address-cells/#size-cells
handling").

For example:

  Missing '#size-cells' in /pci@f2000000/mac-io@c/escc@13000
  WARNING: CPU: 0 PID: 0 at drivers/of/base.c:133 of_bus_n_size_cells+0x98/0x108
  Hardware name: PowerMac3,1 7400 0xc0209 PowerMac
  ...
  Call Trace:
    of_bus_n_size_cells+0x98/0x108 (unreliable)
    of_bus_default_count_cells+0x40/0x60
    __of_get_address+0xc8/0x21c
    __of_address_to_resource+0x5c/0x228
    pmz_init_port+0x5c/0x2ec
    pmz_probe.isra.0+0x144/0x1e4
    pmz_console_init+0x10/0x48
    console_init+0xcc/0x138
    start_kernel+0x5c4/0x694

As powermacs boot via prom_init it's possible to add the missing
properties to the device tree during boot, avoiding the warning. Note
that `escc-legacy` nodes are also missing `#size-cells` properties, but
they are skipped by the macio driver, so leave them alone.

Depends-on: 045b14ca5c36 ("of: WARN on deprecated #address-cells/#size-cells handling")
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Reviewed-by: Rob Herring &lt;robh@kernel.org&gt;
Signed-off-by: Madhavan Srinivasan &lt;maddy@linux.ibm.com&gt;
Link: https://patch.msgid.link/20241126025710.591683-1-mpe@ellerman.id.au
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>MIPS: Loongson64: DTS: Really fix PCIe port nodes for ls7a</title>
<updated>2024-12-14T19:00:16+00:00</updated>
<author>
<name>Xi Ruoyao</name>
<email>xry111@xry111.site</email>
</author>
<published>2024-11-23T03:57:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8ef9ea1503d0a129cc6f5cf48fb63633efa5d766'/>
<id>urn:sha1:8ef9ea1503d0a129cc6f5cf48fb63633efa5d766</id>
<content type='text'>
[ Upstream commit 4fbd66d8254cedfd1218393f39d83b6c07a01917 ]

Fix the dtc warnings:

    arch/mips/boot/dts/loongson/ls7a-pch.dtsi:68.16-416.5: Warning (interrupt_provider): /bus@10000000/pci@1a000000: '#interrupt-cells' found, but node is not an interrupt provider
    arch/mips/boot/dts/loongson/ls7a-pch.dtsi:68.16-416.5: Warning (interrupt_provider): /bus@10000000/pci@1a000000: '#interrupt-cells' found, but node is not an interrupt provider
    arch/mips/boot/dts/loongson/loongson64g_4core_ls7a.dtb: Warning (interrupt_map): Failed prerequisite 'interrupt_provider'

And a runtime warning introduced in commit 045b14ca5c36 ("of: WARN on
deprecated #address-cells/#size-cells handling"):

    WARNING: CPU: 0 PID: 1 at drivers/of/base.c:106 of_bus_n_addr_cells+0x9c/0xe0
    Missing '#address-cells' in /bus@10000000/pci@1a000000/pci_bridge@9,0

The fix is similar to commit d89a415ff8d5 ("MIPS: Loongson64: DTS: Fix PCIe
port nodes for ls7a"), which has fixed the issue for ls2k (despite its
subject mentions ls7a).

Signed-off-by: Xi Ruoyao &lt;xry111@xry111.site&gt;
Signed-off-by: Thomas Bogendoerfer &lt;tsbogend@alpha.franken.de&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>LoongArch: Fix sleeping in atomic context for PREEMPT_RT</title>
<updated>2024-12-14T19:00:15+00:00</updated>
<author>
<name>Huacai Chen</name>
<email>chenhuacai@loongson.cn</email>
</author>
<published>2024-11-22T07:47:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=c5f89458a2ea0800866b9fc690d3fa8367dc8f8d'/>
<id>urn:sha1:c5f89458a2ea0800866b9fc690d3fa8367dc8f8d</id>
<content type='text'>
[ Upstream commit 88fd2b70120d52c1010257d36776876941375490 ]

Commit bab1c299f3945ffe79 ("LoongArch: Fix sleeping in atomic context in
setup_tlb_handler()") changes the gfp flag from GFP_KERNEL to GFP_ATOMIC
for alloc_pages_node(). However, for PREEMPT_RT kernels we can still get
a "sleeping in atomic context" error:

[    0.372259] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[    0.372266] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1
[    0.372268] preempt_count: 1, expected: 0
[    0.372270] RCU nest depth: 1, expected: 1
[    0.372272] 3 locks held by swapper/1/0:
[    0.372274]  #0: 900000000c9f5e60 (&amp;pcp-&gt;lock){+.+.}-{3:3}, at: get_page_from_freelist+0x524/0x1c60
[    0.372294]  #1: 90000000087013b8 (rcu_read_lock){....}-{1:3}, at: rt_spin_trylock+0x50/0x140
[    0.372305]  #2: 900000047fffd388 (&amp;zone-&gt;lock){+.+.}-{3:3}, at: __rmqueue_pcplist+0x30c/0xea0
[    0.372314] irq event stamp: 0
[    0.372316] hardirqs last  enabled at (0): [&lt;0000000000000000&gt;] 0x0
[    0.372322] hardirqs last disabled at (0): [&lt;9000000005947320&gt;] copy_process+0x9c0/0x26e0
[    0.372329] softirqs last  enabled at (0): [&lt;9000000005947320&gt;] copy_process+0x9c0/0x26e0
[    0.372335] softirqs last disabled at (0): [&lt;0000000000000000&gt;] 0x0
[    0.372341] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.12.0-rc7+ #1891
[    0.372346] Hardware name: Loongson Loongson-3A5000-7A1000-1w-CRB/Loongson-LS3A5000-7A1000-1w-CRB, BIOS vUDK2018-LoongArch-V2.0.0-prebeta9 10/21/2022
[    0.372349] Stack : 0000000000000089 9000000005a0db9c 90000000071519c8 9000000100388000
[    0.372486]         900000010038b890 0000000000000000 900000010038b898 9000000007e53788
[    0.372492]         900000000815bcc8 900000000815bcc0 900000010038b700 0000000000000001
[    0.372498]         0000000000000001 4b031894b9d6b725 00000000055ec000 9000000100338fc0
[    0.372503]         00000000000000c4 0000000000000001 000000000000002d 0000000000000003
[    0.372509]         0000000000000030 0000000000000003 00000000055ec000 0000000000000003
[    0.372515]         900000000806d000 9000000007e53788 00000000000000b0 0000000000000004
[    0.372521]         0000000000000000 0000000000000000 900000000c9f5f10 0000000000000000
[    0.372526]         90000000076f12d8 9000000007e53788 9000000005924778 0000000000000000
[    0.372532]         00000000000000b0 0000000000000004 0000000000000000 0000000000070000
[    0.372537]         ...
[    0.372540] Call Trace:
[    0.372542] [&lt;9000000005924778&gt;] show_stack+0x38/0x180
[    0.372548] [&lt;90000000071519c4&gt;] dump_stack_lvl+0x94/0xe4
[    0.372555] [&lt;900000000599b880&gt;] __might_resched+0x1a0/0x260
[    0.372561] [&lt;90000000071675cc&gt;] rt_spin_lock+0x4c/0x140
[    0.372565] [&lt;9000000005cbb768&gt;] __rmqueue_pcplist+0x308/0xea0
[    0.372570] [&lt;9000000005cbed84&gt;] get_page_from_freelist+0x564/0x1c60
[    0.372575] [&lt;9000000005cc0d98&gt;] __alloc_pages_noprof+0x218/0x1820
[    0.372580] [&lt;900000000593b36c&gt;] tlb_init+0x1ac/0x298
[    0.372585] [&lt;9000000005924b74&gt;] per_cpu_trap_init+0x114/0x140
[    0.372589] [&lt;9000000005921964&gt;] cpu_probe+0x4e4/0xa60
[    0.372592] [&lt;9000000005934874&gt;] start_secondary+0x34/0xc0
[    0.372599] [&lt;900000000715615c&gt;] smpboot_entry+0x64/0x6c

This is because in PREEMPT_RT kernels normal spinlocks are replaced by
rt spinlocks and rt_spin_lock() will cause sleeping. Fix it by disabling
NUMA optimization completely for PREEMPT_RT kernels.

Signed-off-by: Huacai Chen &lt;chenhuacai@loongson.cn&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>PCI: Detect and trust built-in Thunderbolt chips</title>
<updated>2024-12-14T19:00:14+00:00</updated>
<author>
<name>Esther Shimanovich</name>
<email>eshimanovich@chromium.org</email>
</author>
<published>2024-09-10T17:57:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b824ea2af6e035eff8aefdb5f3f721fd38afe32b'/>
<id>urn:sha1:b824ea2af6e035eff8aefdb5f3f721fd38afe32b</id>
<content type='text'>
[ Upstream commit 3b96b895127b7c0aed63d82c974b46340e8466c1 ]

Some computers with CPUs that lack Thunderbolt features use discrete
Thunderbolt chips to add Thunderbolt functionality. These Thunderbolt
chips are located within the chassis; between the Root Port labeled
ExternalFacingPort and the USB-C port.

These Thunderbolt PCIe devices should be labeled as fixed and trusted, as
they are built into the computer. Otherwise, security policies that rely on
those flags may have unintended results, such as preventing USB-C ports
from enumerating.

Detect the above scenario through the process of elimination.

  1) Integrated Thunderbolt host controllers already have Thunderbolt
     implemented, so anything outside their external facing Root Port is
     removable and untrusted.

     Detect them using the following properties:

       - Most integrated host controllers have the "usb4-host-interface"
         ACPI property, as described here:

         https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#mapping-native-protocols-pcie-displayport-tunneled-through-usb4-to-usb4-host-routers

       - Integrated Thunderbolt PCIe Root Ports before Alder Lake do not
         have the "usb4-host-interface" ACPI property. Identify those by
         their PCI IDs instead.

  2) If a Root Port does not have integrated Thunderbolt capabilities, but
     has the "ExternalFacingPort" ACPI property, that means the
     manufacturer has opted to use a discrete Thunderbolt host controller
     that is built into the computer.

     This host controller can be identified by virtue of being located
     directly below an external-facing Root Port that lacks integrated
     Thunderbolt. Label it as trusted and fixed.

     Everything downstream from it is untrusted and removable.

The "ExternalFacingPort" ACPI property is described here:
https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-externally-exposed-pcie-root-ports

Link: https://lore.kernel.org/r/20240910-trust-tbt-fix-v5-1-7a7a42a5f496@chromium.org
Suggested-by: Mika Westerberg &lt;mika.westerberg@linux.intel.com&gt;
Signed-off-by: Esther Shimanovich &lt;eshimanovich@chromium.org&gt;
Signed-off-by: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Tested-by: Mika Westerberg &lt;mika.westerberg@linux.intel.com&gt;
Tested-by: Mario Limonciello &lt;mario.limonciello@amd.com&gt;
Reviewed-by: Mika Westerberg &lt;mika.westerberg@linux.intel.com&gt;
Reviewed-by: Mario Limonciello &lt;mario.limonciello@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf/x86/amd: Warn only on new bits set</title>
<updated>2024-12-14T18:59:59+00:00</updated>
<author>
<name>Breno Leitao</name>
<email>leitao@debian.org</email>
</author>
<published>2024-10-01T14:10:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=28ed7bc5eee043f8b0bcc8ba55b052a01b05cf38'/>
<id>urn:sha1:28ed7bc5eee043f8b0bcc8ba55b052a01b05cf38</id>
<content type='text'>
[ Upstream commit de20037e1b3c2f2ca97b8c12b8c7bca8abd509a7 ]

Warning at every leaking bits can cause a flood of message, triggering
various stall-warning mechanisms to fire, including CSD locks, which
makes the machine to be unusable.

Track the bits that are being leaked, and only warn when a new bit is
set.

That said, this patch will help with the following issues:

1) It will tell us which bits are being set, so, it is easy to
   communicate it back to vendor, and to do a root-cause analyzes.

2) It avoid the machine to be unusable, because, worst case
   scenario, the user gets less than 60 WARNs (one per unhandled bit).

Suggested-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Signed-off-by: Breno Leitao &lt;leitao@debian.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Sandipan Das &lt;sandipan.das@amd.com&gt;
Reviewed-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Link: https://lkml.kernel.org/r/20241001141020.2620361-1-leitao@debian.org
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>s390/cpum_sf: Handle CPU hotplug remove during sampling</title>
<updated>2024-12-14T18:59:58+00:00</updated>
<author>
<name>Thomas Richter</name>
<email>tmricht@linux.ibm.com</email>
</author>
<published>2024-10-25T10:27:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a69752f1e5de817941a2ea0609254f6f25acd274'/>
<id>urn:sha1:a69752f1e5de817941a2ea0609254f6f25acd274</id>
<content type='text'>
[ Upstream commit a0bd7dacbd51c632b8e2c0500b479af564afadf3 ]

CPU hotplug remove handling triggers the following function
call sequence:

   CPUHP_AP_PERF_S390_SF_ONLINE  --&gt; s390_pmu_sf_offline_cpu()
   ...
   CPUHP_AP_PERF_ONLINE          --&gt; perf_event_exit_cpu()

The s390 CPUMF sampling CPU hotplug handler invokes:

 s390_pmu_sf_offline_cpu()
 +--&gt;  cpusf_pmu_setup()
       +--&gt; setup_pmc_cpu()
            +--&gt; deallocate_buffers()

This function de-allocates all sampling data buffers (SDBs) allocated
for that CPU at event initialization. It also clears the
PMU_F_RESERVED bit. The CPU is gone and can not be sampled.

With the event still being active on the removed CPU, the CPU event
hotplug support in kernel performance subsystem triggers the
following function calls on the removed CPU:

  perf_event_exit_cpu()
  +--&gt; perf_event_exit_cpu_context()
       +--&gt; __perf_event_exit_context()
	    +--&gt; __perf_remove_from_context()
	         +--&gt; event_sched_out()
	              +--&gt; cpumsf_pmu_del()
	                   +--&gt; cpumsf_pmu_stop()
                                +--&gt; hw_perf_event_update()

to stop and remove the event. During removal of the event, the
sampling device driver tries to read out the remaining samples from
the sample data buffers (SDBs). But they have already been freed
(and may have been re-assigned). This may lead to a use after free
situation in which case the samples are most likely invalid. In the
best case the memory has not been reassigned and still contains
valid data.

Remedy this situation and check if the CPU is still in reserved
state (bit PMU_F_RESERVED set). In this case the SDBs have not been
released an contain valid data. This is always the case when
the event is removed (and no CPU hotplug off occured).
If the PMU_F_RESERVED bit is not set, the SDB buffers are gone.

Signed-off-by: Thomas Richter &lt;tmricht@linux.ibm.com&gt;
Reviewed-by: Hendrik Brueckner &lt;brueckner@linux.ibm.com&gt;
Signed-off-by: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>x86/mm: Add _PAGE_NOPTISHADOW bit to avoid updating userspace page tables</title>
<updated>2024-12-14T18:59:58+00:00</updated>
<author>
<name>David Woodhouse</name>
<email>dwmw@amazon.co.uk</email>
</author>
<published>2024-12-04T11:27:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=af3fde6112b2f918f890360222fcf8dc5d83841d'/>
<id>urn:sha1:af3fde6112b2f918f890360222fcf8dc5d83841d</id>
<content type='text'>
commit d0ceea662d459726487030237689835fcc0483e5 upstream.

The set_p4d() and set_pgd() functions (in 4-level or 5-level page table setups
respectively) assume that the root page table is actually a 8KiB allocation,
with the userspace root immediately after the kernel root page table (so that
the former can enforce NX on on all the subordinate page tables, which are
actually shared).

However, users of the kernel_ident_mapping_init() code do not give it an 8KiB
allocation for its PGD. Both swsusp_arch_resume() and acpi_mp_setup_reset()
allocate only a single 4KiB page. The kexec code on x86_64 currently gets
away with it purely by chance, because it allocates 8KiB for its "control
code page" and then actually uses the first half for the PGD, then copies the
actual trampoline code into the second half only after the identmap code has
finished scribbling over it.

Fix this by defining a _PAGE_NOPTISHADOW bit (which can use the same bit as
_PAGE_SAVED_DIRTY since one is only for the PGD/P4D root and the other is
exclusively for leaf PTEs.). This instructs __pti_set_user_pgtbl() not to
write to the userspace 'shadow' PGD.

Strictly, the _PAGE_NOPTISHADOW bit doesn't need to be written out to the
actual page tables; since __pti_set_user_pgtbl() returns the value to be
written to the kernel page table, it could be filtered out. But there seems
to be no benefit to actually doing so.

Suggested-by: Dave Hansen &lt;dave.hansen@intel.com&gt;
Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/412c90a4df7aef077141d9f68d19cbe5602d6c6d.camel@infradead.org
Cc: stable@kernel.org
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@surriel.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>x86/kexec: Restore GDT on return from ::preserve_context kexec</title>
<updated>2024-12-14T18:59:56+00:00</updated>
<author>
<name>David Woodhouse</name>
<email>dwmw@amazon.co.uk</email>
</author>
<published>2024-12-05T15:05:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=94666abe816328301e1e8885a46c99bc37b14f87'/>
<id>urn:sha1:94666abe816328301e1e8885a46c99bc37b14f87</id>
<content type='text'>
commit 07fa619f2a40c221ea27747a3323cabc59ab25eb upstream.

The restore_processor_state() function explicitly states that "the asm code
that gets us here will have restored a usable GDT". That wasn't true in the
case of returning from a ::preserve_context kexec. Make it so.

Without this, the kernel was depending on the called function to reload a
GDT which is appropriate for the kernel before returning.

Test program:

 #include &lt;unistd.h&gt;
 #include &lt;errno.h&gt;
 #include &lt;stdio.h&gt;
 #include &lt;stdlib.h&gt;
 #include &lt;linux/kexec.h&gt;
 #include &lt;linux/reboot.h&gt;
 #include &lt;sys/reboot.h&gt;
 #include &lt;sys/syscall.h&gt;

 int main (void)
 {
        struct kexec_segment segment = {};
	unsigned char purgatory[] = {
		0x66, 0xba, 0xf8, 0x03,	// mov $0x3f8, %dx
		0xb0, 0x42,		// mov $0x42, %al
		0xee,			// outb %al, (%dx)
		0xc3,			// ret
	};
	int ret;

	segment.buf = &amp;purgatory;
	segment.bufsz = sizeof(purgatory);
	segment.mem = (void *)0x400000;
	segment.memsz = 0x1000;
	ret = syscall(__NR_kexec_load, 0x400000, 1, &amp;segment, KEXEC_PRESERVE_CONTEXT);
	if (ret) {
		perror("kexec_load");
		exit(1);
	}

	ret = syscall(__NR_reboot, LINUX_REBOOT_MAGIC1, LINUX_REBOOT_MAGIC2, LINUX_REBOOT_CMD_KEXEC);
	if (ret) {
		perror("kexec reboot");
		exit(1);
	}
	printf("Success\n");
	return 0;
 }

Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20241205153343.3275139-2-dwmw2@infradead.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
