<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/arch/x86/include/asm/entry-common.h, branch v7.1</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v7.1</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v7.1'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2026-05-19T18:25:51+00:00</updated>
<entry>
<title>x86/kvm/vmx: Move IRQ/NMI dispatch from KVM into x86 core</title>
<updated>2026-05-19T18:25:51+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2026-05-08T09:18:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0701c9e17bd903d95b2ddf7dd2e1d8be5027f331'/>
<id>urn:sha1:0701c9e17bd903d95b2ddf7dd2e1d8be5027f331</id>
<content type='text'>
Move the VMX interrupt dispatch magic into the x86 core code. This
isolates KVM from the FRED/IDT decisions and reduces the amount of
EXPORT_SYMBOL_FOR_KVM().

Suggested-by: Sean Christopherson &lt;seanjc@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@kernel.org&gt;
Tested-by: "Verma, Vishal L" &lt;vishal.l.verma@intel.com&gt;
Tested-by: Zhao Liu &lt;zhao1.liu@intel.com&gt;
Tested-by: Zhao Liu &lt;zhao1.liu@intel.com&gt;
Tested-by: Sean Christopherson &lt;seanjc@google.com&gt;
Reviewed-by: Binbin Wu &lt;binbin.wu@linxu.intel.com&gt;
Acked-by: Sean Christopherson &lt;seanjc@google.com&gt;
Link: https://patch.msgid.link/20260508091829.GO3126523@noisy.programming.kicks-ass.net
</content>
</entry>
<entry>
<title>randomize_kstack: Unify random source across arches</title>
<updated>2026-03-25T04:12:03+00:00</updated>
<author>
<name>Ryan Roberts</name>
<email>ryan.roberts@arm.com</email>
</author>
<published>2026-03-03T15:08:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a96ef5848cb096226bf6aff31a90d8b136d99b71'/>
<id>urn:sha1:a96ef5848cb096226bf6aff31a90d8b136d99b71</id>
<content type='text'>
Previously different architectures were using random sources of
differing strength and cost to decide the random kstack offset. A number
of architectures (loongarch, powerpc, s390, x86) were using their
timestamp counter, at whatever the frequency happened to be. Other
arches (arm64, riscv) were using entropy from the crng via
get_random_u16().

There have been concerns that in some cases the timestamp counters may
be too weak, because they can be easily guessed or influenced by user
space. And get_random_u16() has been shown to be too costly for the
level of protection kstack offset randomization provides.

So let's use a common, architecture-agnostic source of entropy; a
per-cpu prng, seeded at boot-time from the crng. This has a few
benefits:

  - We can remove choose_random_kstack_offset(); That was only there to
    try to make the timestamp counter value a bit harder to influence
    from user space [*].

  - The architecture code is simplified. All it has to do now is call
    add_random_kstack_offset() in the syscall path.

  - The strength of the randomness can be reasoned about independently
    of the architecture.

  - Arches previously using get_random_u16() now have much faster
    syscall paths, see below results.

[*] Additionally, this gets rid of some redundant work on s390 and x86.
Before this patch, those architectures called
choose_random_kstack_offset() under arch_exit_to_user_mode_prepare(),
which is also called for exception returns to userspace which were *not*
syscalls (e.g. regular interrupts). Getting rid of
choose_random_kstack_offset() avoids a small amount of redundant work
for the non-syscall cases.

In some configurations, add_random_kstack_offset() will now call
instrumentable code, so for a couple of arches, I have moved the call a
bit later to the first point where instrumentation is allowed. This
doesn't impact the efficacy of the mechanism.

There have been some claims that a prng may be less strong than the
timestamp counter if not regularly reseeded. But the prng has a period
of about 2^113. So as long as the prng state remains secret, it should
not be possible to guess. If the prng state can be accessed, we have
bigger problems.

Additionally, we are only consuming 6 bits to randomize the stack, so
there are only 64 possible random offsets. I assert that it would be
trivial for an attacker to brute force by repeating their attack and
waiting for the random stack offset to be the desired one. The prng
approach seems entirely proportional to this level of protection.

Performance data are provided below. The baseline is v6.18 with rndstack
on for each respective arch. (I)/(R) indicate statistically significant
improvement/regression. arm64 platform is AWS Graviton3 (m7g.metal).
x86_64 platform is AWS Sapphire Rapids (m7i.24xlarge):

+-----------------+--------------+---------------+---------------+
| Benchmark       | Result Class |  per-cpu-prng |  per-cpu-prng |
|                 |              | arm64 (metal) |   x86_64 (VM) |
+=================+==============+===============+===============+
| syscall/getpid  | mean (ns)    |    (I) -9.50% |   (I) -17.65% |
|                 | p99 (ns)     |   (I) -59.24% |   (I) -24.41% |
|                 | p99.9 (ns)   |   (I) -59.52% |   (I) -28.52% |
+-----------------+--------------+---------------+---------------+
| syscall/getppid | mean (ns)    |    (I) -9.52% |   (I) -19.24% |
|                 | p99 (ns)     |   (I) -59.25% |   (I) -25.03% |
|                 | p99.9 (ns)   |   (I) -59.50% |   (I) -28.17% |
+-----------------+--------------+---------------+---------------+
| syscall/invalid | mean (ns)    |   (I) -10.31% |   (I) -18.56% |
|                 | p99 (ns)     |   (I) -60.79% |   (I) -20.06% |
|                 | p99.9 (ns)   |   (I) -61.04% |   (I) -25.04% |
+-----------------+--------------+---------------+---------------+

I tested an earlier version of this change on x86 bare metal and it
showed a smaller but still significant improvement. The bare metal
system wasn't available this time around so testing was done in a VM
instance. I'm guessing the cost of rdtsc is higher for VMs.

Acked-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Signed-off-by: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Link: https://patch.msgid.link/20260303150840.3789438-3-ryan.roberts@arm.com
Signed-off-by: Kees Cook &lt;kees@kernel.org&gt;
</content>
</entry>
<entry>
<title>x86/vmscape: Add conditional IBPB mitigation</title>
<updated>2025-08-14T17:37:18+00:00</updated>
<author>
<name>Pawan Gupta</name>
<email>pawan.kumar.gupta@linux.intel.com</email>
</author>
<published>2025-08-14T17:20:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=2f8f173413f1cbf52660d04df92d0069c4306d25'/>
<id>urn:sha1:2f8f173413f1cbf52660d04df92d0069c4306d25</id>
<content type='text'>
VMSCAPE is a vulnerability that exploits insufficient branch predictor
isolation between a guest and a userspace hypervisor (like QEMU). Existing
mitigations already protect kernel/KVM from a malicious guest. Userspace
can additionally be protected by flushing the branch predictors after a
VMexit.

Since it is the userspace that consumes the poisoned branch predictors,
conditionally issue an IBPB after a VMexit and before returning to
userspace. Workloads that frequently switch between hypervisor and
userspace will incur the most overhead from the new IBPB.

This new IBPB is not integrated with the existing IBPB sites. For
instance, a task can use the existing speculation control prctl() to
get an IBPB at context switch time. With this implementation, the
IBPB is doubled up: one at context switch and another before running
userspace.

The intent is to integrate and optimize these cases post-embargo.

[ dhansen: elaborate on suboptimal IBPB solution ]

Suggested-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Signed-off-by: Pawan Gupta &lt;pawan.kumar.gupta@linux.intel.com&gt;
Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Reviewed-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Reviewed-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Acked-by: Sean Christopherson &lt;seanjc@google.com&gt;
</content>
</entry>
<entry>
<title>x86/fpu: Shift fpregs_assert_state_consistent() from arch_exit_work() to its caller</title>
<updated>2025-05-04T08:29:25+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2025-05-03T14:39:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=46c158e3ad0fc633007802c338c409c188ec0a12'/>
<id>urn:sha1:46c158e3ad0fc633007802c338c409c188ec0a12</id>
<content type='text'>
If CONFIG_X86_DEBUG_FPU=Y, arch_exit_to_user_mode_prepare() calls
arch_exit_work() even if ti_work == 0. There only reason is that we
want to call fpregs_assert_state_consistent() if TIF_NEED_FPU_LOAD
is not set.

This looks confusing. arch_exit_to_user_mode_prepare() can just call
fpregs_assert_state_consistent() unconditionally, it depends on
CONFIG_X86_DEBUG_FPU and checks TIF_NEED_FPU_LOAD itself.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Chang S . Bae &lt;chang.seok.bae@intel.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Brian Gerst &lt;brgerst@gmail.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20250503143902.GA9012@redhat.com
</content>
</entry>
<entry>
<title>x86/entry: Set FRED RSP0 on return to userspace instead of context switch</title>
<updated>2024-08-25T17:23:00+00:00</updated>
<author>
<name>Xin Li (Intel)</name>
<email>xin@zytor.com</email>
</author>
<published>2024-08-22T07:39:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fe85ee391966c4cf3bfe1c405314e894c951f521'/>
<id>urn:sha1:fe85ee391966c4cf3bfe1c405314e894c951f521</id>
<content type='text'>
The FRED RSP0 MSR points to the top of the kernel stack for user level
event delivery. As this is the task stack it needs to be updated when a
task is scheduled in.

The update is done at context switch. That means it's also done when
switching to kernel threads, which is pointless as those never go out to
user space. For KVM threads this means there are two writes to FRED_RSP0 as
KVM has to switch to the guest value before VMENTER.

Defer the update to the exit to user space path and cache the per CPU
FRED_RSP0 value, so redundant writes can be avoided.

Provide fred_sync_rsp0() for KVM to keep the cache in sync with the actual
MSR value after returning from guest to host mode.

[ tglx: Massage change log ]

Suggested-by: Sean Christopherson &lt;seanjc@google.com&gt;
Suggested-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Xin Li (Intel) &lt;xin@zytor.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/all/20240822073906.2176342-4-xin@zytor.com
</content>
</entry>
<entry>
<title>x86/entry: Test ti_work for zero before processing individual bits</title>
<updated>2024-08-25T17:23:00+00:00</updated>
<author>
<name>Xin Li (Intel)</name>
<email>xin@zytor.com</email>
</author>
<published>2024-08-22T07:39:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0dfac6f267fa091aa348c6a6742b463c9e7c98e3'/>
<id>urn:sha1:0dfac6f267fa091aa348c6a6742b463c9e7c98e3</id>
<content type='text'>
In most cases, ti_work values passed to arch_exit_to_user_mode_prepare()
are zeros, e.g., 99% in kernel build tests.  So an obvious optimization is
to test ti_work for zero before processing individual bits in it.

Omit the optimization when FPU debugging is enabled, otherwise the
FPU consistency check is never executed.

Intel 0day tests did not find a perfermance regression with this change.

Suggested-by: H. Peter Anvin (Intel) &lt;hpa@zytor.com&gt;
Signed-off-by: Xin Li (Intel) &lt;xin@zytor.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/all/20240822073906.2176342-2-xin@zytor.com

</content>
</entry>
<entry>
<title>randomize_kstack: Remove non-functional per-arch entropy filtering</title>
<updated>2024-06-28T15:54:56+00:00</updated>
<author>
<name>Kees Cook</name>
<email>kees@kernel.org</email>
</author>
<published>2024-06-19T21:47:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6db1208bf95b4c091897b597c415e11edeab2e2d'/>
<id>urn:sha1:6db1208bf95b4c091897b597c415e11edeab2e2d</id>
<content type='text'>
An unintended consequence of commit 9c573cd31343 ("randomize_kstack:
Improve entropy diffusion") was that the per-architecture entropy size
filtering reduced how many bits were being added to the mix, rather than
how many bits were being used during the offsetting. All architectures
fell back to the existing default of 0x3FF (10 bits), which will consume
at most 1KiB of stack space. It seems that this is working just fine,
so let's avoid the confusion and update everything to use the default.

The prior intent of the per-architecture limits were:

  arm64: capped at 0x1FF (9 bits), 5 bits effective
  powerpc: uncapped (10 bits), 6 or 7 bits effective
  riscv: uncapped (10 bits), 6 bits effective
  x86: capped at 0xFF (8 bits), 5 (x86_64) or 6 (ia32) bits effective
  s390: capped at 0xFF (8 bits), undocumented effective entropy

Current discussion has led to just dropping the original per-architecture
filters. The additional entropy appears to be safe for arm64, x86,
and s390. Quoting Arnd, "There is no point pretending that 15.75KB is
somehow safe to use while 15.00KB is not."

Co-developed-by: Yuntao Liu &lt;liuyuntao12@huawei.com&gt;
Signed-off-by: Yuntao Liu &lt;liuyuntao12@huawei.com&gt;
Fixes: 9c573cd31343 ("randomize_kstack: Improve entropy diffusion")
Link: https://lore.kernel.org/r/20240617133721.377540-1-liuyuntao12@huawei.com
Reviewed-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Acked-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Acked-by: Heiko Carstens &lt;hca@linux.ibm.com&gt; # s390
Link: https://lore.kernel.org/r/20240619214711.work.953-kees@kernel.org
Signed-off-by: Kees Cook &lt;kees@kernel.org&gt;
</content>
</entry>
<entry>
<title>x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key</title>
<updated>2024-02-20T00:31:49+00:00</updated>
<author>
<name>Pawan Gupta</name>
<email>pawan.kumar.gupta@linux.intel.com</email>
</author>
<published>2024-02-14T02:22:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6613d82e617dd7eb8b0c40b2fe3acea655b1d611'/>
<id>urn:sha1:6613d82e617dd7eb8b0c40b2fe3acea655b1d611</id>
<content type='text'>
The VERW mitigation at exit-to-user is enabled via a static branch
mds_user_clear. This static branch is never toggled after boot, and can
be safely replaced with an ALTERNATIVE() which is convenient to use in
asm.

Switch to ALTERNATIVE() to use the VERW mitigation late in exit-to-user
path. Also remove the now redundant VERW in exc_nmi() and
arch_exit_to_user_mode().

Signed-off-by: Pawan Gupta &lt;pawan.kumar.gupta@linux.intel.com&gt;
Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Link: https://lore.kernel.org/all/20240213-delay-verw-v8-4-a6216d83edb7%40linux.intel.com
</content>
</entry>
<entry>
<title>x86/CPU/AMD: Fix the DIV(0) initial fix attempt</title>
<updated>2023-08-14T09:02:50+00:00</updated>
<author>
<name>Borislav Petkov (AMD)</name>
<email>bp@alien8.de</email>
</author>
<published>2023-08-11T21:38:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=f58d6fbcb7c848b7f2469be339bc571f2e9d245b'/>
<id>urn:sha1:f58d6fbcb7c848b7f2469be339bc571f2e9d245b</id>
<content type='text'>
Initially, it was thought that doing an innocuous division in the #DE
handler would take care to prevent any leaking of old data from the
divider but by the time the fault is raised, the speculation has already
advanced too far and such data could already have been used by younger
operations.

Therefore, do the innocuous division on every exit to userspace so that
userspace doesn't see any potentially old data from integer divisions in
kernel space.

Do the same before VMRUN too, to protect host data from leaking into the
guest too.

Fixes: 77245f1c3c64 ("x86/CPU/AMD: Do not leak quotient data after a division by 0")
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Cc: &lt;stable@kernel.org&gt;
Link: https://lore.kernel.org/r/20230811213824.10025-1-bp@alien8.de
</content>
</entry>
<entry>
<title>x86/cpu: Remove unneeded 64-bit dependency in arch_enter_from_user_mode()</title>
<updated>2022-11-22T15:11:57+00:00</updated>
<author>
<name>Juergen Gross</name>
<email>jgross@suse.com</email>
</author>
<published>2022-11-04T07:26:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=0bafc51babe2344e9b0ff6629a4f044727728349'/>
<id>urn:sha1:0bafc51babe2344e9b0ff6629a4f044727728349</id>
<content type='text'>
The check for 64-bit mode when testing X86_FEATURE_XENPV isn't needed,
as Xen PV guests are no longer supported in 32-bit mode, see

  a13f2ef168cb ("x86/xen: remove 32-bit Xen PV guest support").

While at it switch from boot_cpu_has() to cpu_feature_enabled().

Signed-off-by: Juergen Gross &lt;jgross@suse.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Acked-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Link: https://lore.kernel.org/r/20221104072701.20283-3-jgross@suse.com
</content>
</entry>
</feed>
