summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2026-01-09ARM: dts: lpc32xx: Add missing DMA propertiesPiotr Wojtaszczyk1-0/+25
Add properties declared in the new DT binding nxp,lpc3220-dmamux.yaml and corresponding phandles. [vzapolskiy]: 1. rebased the change, 2. dmamux unit address shall be 0x78 instead of 0x7c, 3. removed unsupported 'dmas' properties from sd, ssp0, ssp1 and HS UARTs, 4. more non-functional updates by reordering properies, 5. minor updates to the commit message. Link to the original change: * https://lore.kernel.org/linux-arm-kernel/20240627150046.258795-6-piotr.wojtaszczyk@timesys.com/ Signed-off-by: Piotr Wojtaszczyk <piotr.wojtaszczyk@timesys.com> Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
2026-01-09ARM: dts: lpc32xx: Use syscon for system control blockPiotr Wojtaszczyk1-4/+4
The clock controller is a part of NXP LPC32xx system control block (SCB), and SCB provides a number of controllers apart of the clock controller. [vzapolskiy]: 1. kept a simple comment, 2. renamed SoC specific compatible to 'nxp,lpc3220-scb' due to the SoC UM, 3. changed size in 'ranges', since it should cover more SCB functions, 4. updated the commit message. Link to the original change: * https://lore.kernel.org/linux-arm-kernel/20240627150046.258795-5-piotr.wojtaszczyk@timesys.com/ Signed-off-by: Piotr Wojtaszczyk <piotr.wojtaszczyk@timesys.com> Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
2026-01-09ARM: dts: lpc32xx: describe FLASH_INT of SLC NAND controllerVladimir Zapolskiy1-0/+1
SLC and MLC NAND flash controllers fire the muxed interrupt FLASH_INT to the SoC, add the interrupt property to the SLC device tree node. Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
2026-01-09ARM: dts: lpc32xx: change NAND controllers node namesVladimir Zapolskiy1-2/+2
The device tree node name of NAND controllers shall be 'nand-controller', while 'flash' name is the name of NAND chip device tree nodes. Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
2026-01-09ARM: dts: lpc32xx: Update spi clock propertiesKuldeep Singh1-4/+4
PL022 binding require two clocks to be defined but NXP LPC32xx platform doesn't comply with the bindings and define only one clock i.e apb_pclk. Update SPI clocks and clocks-names property by adding appropriate clock reference to make it compliant with the bindings. Noteworthy, strictly speaking the change tackles DT ABI by changing the order in the list of clock-names property values, however this level of impact is considered as acceptable. Cc: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: Kuldeep Singh <singh.kuldeep87k@gmail.com> [vzapolskiy: rebased and minor update to the commit message] Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
2026-01-09ARM: dts: Add support for pcb8385Horatiu Vultur2-1/+133
Add basic support for pcb8385 [1]. It is a modular board which allows to add different daughter cards on which there are different PHYs. This adds support for UART, LEDs and I2C. [1] https://www.microchip.com/en-us/development-tool/ev83e85a Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20251208083545.3642168-3-horatiu.vultur@microchip.com Signed-off-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
2026-01-09arm64: Disable branch profiling for all arm64 codeBreno Leitao1-0/+4
The arm64 kernel doesn't boot with annotated branches (PROFILE_ANNOTATED_BRANCHES) enabled and CONFIG_DEBUG_VIRTUAL together. Bisecting it, I found that disabling branch profiling in arch/arm64/mm solved the problem. Narrowing down a bit further, I found that physaddr.c is the file that needs to have branch profiling disabled to get the machine to boot. I suspect that it might invoke some ftrace helper very early in the boot process and ftrace is still not enabled(!?). Rather than playing whack-a-mole with individual files, disable branch profiling for the entire arch/arm64 tree, similar to what x86 already does in arch/x86/Kbuild. Cc: stable@vger.kernel.org Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-01-09arm64: dts: qcom: qrb2210-rb1: Add overlay for vision mezzanineLoic Poulain2-0/+71
This initial version includes support for OV9282 camera sensor. Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com> Reviewed-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org> Link: https://lore.kernel.org/r/20260108170550.359968-4-loic.poulain@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-01-09arm64: dts: qcom: qrb2210-rb1: Add PM8008 nodeLoic Poulain1-0/+75
The PM8008 device is a dedicated camera PMIC integrating all the necessary camera power management features. Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260108170550.359968-3-loic.poulain@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-01-09arm64: dts: qcom: qcm2290: Add pin configuration for mclksLoic Poulain1-0/+28
Add pinctrl configuration for the four available camera master clocks. Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org> Link: https://lore.kernel.org/r/20260108170550.359968-2-loic.poulain@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-01-08KVM: arm64: Remove unused parameter in synchronize_vcpu_pstate()Alexandru Elisei3-3/+3
synchronize_vcpu_pstate() doesn't make use of the reference to exit_code, remove the parameter. Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Tested-by: Fuad Tabba <tabba@google.com> Link: https://msgid.link/20251216103053.47224-5-alexandru.elisei@arm.com Signed-off-by: Oliver Upton <oupton@kernel.org>
2026-01-08KVM: arm64: Remove extra argument for __pvkm_host_{share,unshare}_hyp()Alexandru Elisei1-2/+2
__pvkm_host_share_hyp() and __pkvm_host_unshare_hyp() both have one parameter, the pfn, not two. Even though correctness isn't impacted because the SMCCC handlers pass the first argument and ignore the second one, let's call the functions with the proper number of arguments. Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Tested-by: Fuad Tabba <tabba@google.com> Link: https://msgid.link/20251216103053.47224-4-alexandru.elisei@arm.com Signed-off-by: Oliver Upton <oupton@kernel.org>
2026-01-08KVM: arm64: Inject UNDEF for a register trap without accessorAlexandru Elisei1-1/+4
Configuring a register trap without specifying an accessor function is abviously a bug. Instead of calling die() when that happens, let's be a bit more helpful and print the register encoding. Also inject an undefined instruction exception in the guest, similar to other unhandled register accesses. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://msgid.link/20251216103053.47224-3-alexandru.elisei@arm.com Signed-off-by: Oliver Upton <oupton@kernel.org>
2026-01-08KVM: arm64: Copy FGT traps to unprotected pKVM VCPU on VCPU loadAlexandru Elisei2-1/+3
Commit fb10ddf35c1c ("KVM: arm64: Compute per-vCPU FGTs at vcpu_load()") introduced per-VCPU FGT traps. For an unprotected pKVM VCPU, the untrusted host FGT configuration is copied in pkvm_vcpu_init_traps(), which is called from __pkvm_init_vcpu(). __pkvm_init_vcpu() is called once per VCPU (when the VCPU is first run) which means that the uninitialized, zero, values for the FGT registers end up being used for the entire lifetime of the VCPU. This causes both unwanted traps (for the inverse polarity trap bits) and the guest being allowed to access registers it shouldn't. Fix it by copying the FGT traps for unprotected pKVM VCPUs when the untrusted host loads the VCPU. Fixes: fb10ddf35c1c ("KVM: arm64: Compute per-vCPU FGTs at vcpu_load()") Acked-by: Will Deacon <will@kernel.org> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://msgid.link/20251216103053.47224-2-alexandru.elisei@arm.com Signed-off-by: Oliver Upton <oupton@kernel.org>
2026-01-08KVM: SVM: Allow KVM_SET_NESTED_STATE to clear GIF when SVME==0Jim Mattson1-5/+5
GIF==0 together with EFER.SVME==0 is a valid architectural state. Don't return -EINVAL for KVM_SET_NESTED_STATE when this combination is specified. Fixes: cc440cdad5b7 ("KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251121204803.991707-2-yosry.ahmed@linux.dev [sean: disallow KVM_STATE_NESTED_RUN_PENDING with SVME=0] Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: SVM: Don't set GIF when clearing EFER.SVMEJim Mattson2-1/+2
Clearing EFER.SVME is not architected to set GIF. Don't set GIF when emulating a change to EFER that clears EFER.SVME. However, keep setting GIF if clearing EFER.SVME causes force-leaving the nested guest through svm_leave_nested(), to maintain a sane behavior of not leaving GIF cleared after exiting the guest. In every other path, setting GIF is either correct/desirable, or irrelevant because the caller immediately and unconditionally sets/clears GIF. This is more-or-less KVM defining HW behavior, but leaving GIF cleared would also be defining HW behavior anyway. Note that if force-leaving the nested guest is considered a SHUTDOWN, then this could violate the APM-specified behavior: If the processor enters the shutdown state (due to a triple fault for instance) while GIF is clear, it can only be restarted by means of a RESET. However, a SHUTDOWN leaves the VMCB undefined, so there's not a lot that KVM can do in this case. Also, if vGIF is enabled on SHUTDOWN, KVM has no way of finding out of GIF was cleared. The only way for KVM to handle this without making up HW behavior is to completely terminate the VM, so settle for doing the relatively "sane" thing of setting GIF when force-leaving nested. Fixes: c513f484c558 ("KVM: nSVM: leave guest mode when clearing EFER.SVME") Signed-off-by: Jim Mattson <jmattson@google.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251121204803.991707-3-yosry.ahmed@linux.dev Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: arm64: Fix EL2 S1 XN handling for hVHE setupsMarc Zyngier1-1/+9
The current XN implementation is tied to the EL2 translation regime, and fall flat on its face with the EL2&0 one that is used for hVHE, as the permission bit for privileged execution is a different one. Fixes: 6537565fd9b7f ("KVM: arm64: Adjust EL2 stage-1 leaf AP bits when ARM64_KVM_HVHE is set") Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://msgid.link/20251210173024.561160-2-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2026-01-08KVM: arm64: gic: Check for vGICv3 when clearing TWISascha Bischoff1-0/+1
Explicitly check for the vgic being v3 when disabling TWI. Failure to check this can result in using the wrong view of the vgic CPU IF union causing undesirable/unexpected behaviour. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://msgid.link/20260106165154.3321753-1-sascha.bischoff@arm.com Signed-off-by: Oliver Upton <oupton@kernel.org>
2026-01-08arm64: defconfig: Enable Apple Silicon driversSven Peter1-0/+27
Enable drivers for hardware present on Apple Silicon machines. None of these drivers are critical so build them as modules. The reset and RTC macsmc drivers would be useful as built-in drivers but they have large dependencies so keep them as modules. The size increases are minor and are offsetted by already merged "default ARCH_APPLE" removals from the linked original submission. vmlinux 157782640 -> 157902032 (+119392) Image 41007616 -> 41073152 (+ 65536) Link: https://lore.kernel.org/asahi/20250612-apple-kconfig-defconfig-v1-0-0e6f9cb512c1@kernel.org/ Signed-off-by: Janne Grunau <j@jannau.net> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Link: https://patch.msgid.link/20251231-arch-arm64-apple-defconfig-v1-2-4ff19805ba39@jannau.net Signed-off-by: Sven Peter <sven@kernel.org>
2026-01-08arm64: select APPLE_PMGR_PWRSTATE for ARCH_APPLEJanne Grunau1-0/+1
Critical devices like the NVMe on Apple silicon systems depend on APPLE_PMGR_PWRSTATE and others are powered down coming out of reset so select APPLE_PMGR_PWRSTATE to ensure these devices work. Since the systems are not totally unsusable (boot to initramfs is expected to work) allow !PM builds without APPLE_PMGR_PWRSTATE. Link: https://lore.kernel.org/asahi/2e022f4e-4c87-4da1-9d02-f7a3ae7c5798@arm.com/ Suggested-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Janne Grunau <j@jannau.net> Link: https://patch.msgid.link/20251231-arch-arm64-apple-defconfig-v1-1-4ff19805ba39@jannau.net Signed-off-by: Sven Peter <sven@kernel.org>
2026-01-08KVM: SVM: Virtualize and advertise support for ERAPSAmit Shah8-3/+77
AMD CPUs with the Enhanced Return Address Predictor Security (ERAPS) feature (available on Zen5+) obviate the need for FILL_RETURN_BUFFER sequences right after VMEXITs. ERAPS adds guest/host tags to entries in the RSB (a.k.a. RAP). This helps with speculation protection across the VM boundary, and it also preserves host and guest entries in the RSB that can improve software performance (which would otherwise be flushed due to the FILL_RETURN_BUFFER sequences). Importantly, ERAPS also improves cross-domain security by clearing the RAP in certain situations. Specifically, the RAP is cleared in response to actions that are typically tied to software context switching between tasks. Per the APM: The ERAPS feature eliminates the need to execute CALL instructions to clear the return address predictor in most cases. On processors that support ERAPS, return addresses from CALL instructions executed in host mode are not used in guest mode, and vice versa. Additionally, the return address predictor is cleared in all cases when the TLB is implicitly invalidated and in the following cases: • MOV CR3 instruction • INVPCID other than single address invalidation (operation type 0) ERAPS also allows CPUs to extends the size of the RSB/RAP from the older standard (of 32 entries) to a new size, enumerated in CPUID leaf 0x80000021:EBX bits 23:16 (64 entries in Zen5 CPUs). In hardware, ERAPS is always-on, when running in host context, the CPU uses the full RSB/RAP size without any software changes necessary. However, when running in guest context, the CPU utilizes the full size of the RSB/RAP if and only if the new ALLOW_LARGER_RAP flag is set in the VMCB; if the flag is not set, the CPU limits itself to the historical size of 32 entires. Requiring software to opt-in for guest usage of RAPs larger than 32 entries allows hypervisors, i.e. KVM, to emulate the aforementioned conditions in which the RAP is cleared as well as the guest/host split. E.g. if the CPU unconditionally used the full RAP for guests, failure to clear the RAP on transitions between L1 or L2, or on emulated guest TLB flushes, would expose the guest to RAP-based attacks as a guest without support for ERAPS wouldn't know that its FILL_RETURN_BUFFER sequence is insufficient. Address the ~two broad categories of ERAPS emulation, and advertise ERAPS support to userspace, along with the RAP size enumerated in CPUID. 1. Architectural RAP clearing: as above, CPUs with ERAPS clear RAP entries on several conditions, including CR3 updates. To handle scenarios where a relevant operation is handled in common code (emulation of INVPCID and to a lesser extent MOV CR3), piggyback VCPU_EXREG_CR3 and create an alias, VCPU_EXREG_ERAPS. SVM doesn't utilize CR3 dirty tracking, and so for all intents and purposes VCPU_EXREG_CR3 is unused. Aliasing VCPU_EXREG_ERAPS ensures that any flow that writes CR3 will also clear the guest's RAP, and allows common x86 to mark ERAPS vCPUs as needing a RAP clear without having to add a new request (or other mechanism). 2. Nested guests: the ERAPS feature adds host/guest tagging to entries in the RSB, but does not distinguish between the guest ASIDs. To prevent the case of an L2 guest poisoning the RSB to attack the L1 guest, the CPU exposes a new VMCB bit (CLEAR_RAP). The next VMRUN with a VMCB that has this bit set causes the CPU to flush the RSB before entering the guest context. Set the bit in VMCB01 after a nested #VMEXIT to ensure the next time the L1 guest runs, its RSB contents aren't polluted by the L2's contents. Similarly, before entry into a nested guest, set the bit for VMCB02, so that the L1 guest's RSB contents are not leaked/used in the L2 context. Enable ALLOW_LARGER_RAP (and emulate RAP clears) if and only if ERAPS is exposed to the guest. Enabling ALLOW_LARGER_RAP unconditionally wouldn't cause any functional issues, but ignoring userspace's (and L1's) desires would put KVM into a grey area, which is especially undesirable due to the potential security implications. E.g. if a use case wants to have L1 do manual RAP clearing even when ERAPS is present in hardware, enabling ALLOW_LARGER_RAP could result in L1 leaving stale entries in the RAP. ERAPS is documented in AMD APM Vol 2 (Pub 24593), in revisions 3.43 and later. Signed-off-by: Amit Shah <amit.shah@amd.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Amit Shah <amit.shah@amd.com> Link: https://patch.msgid.link/aR913X8EqO6meCqa@google.com
2026-01-08KVM: SVM: Don't allow L1 intercepts for instructions not advertisedKevin Cheng2-8/+46
If a feature is not advertised in the guest's CPUID, prevent L1 from intercepting the unsupported instructions by clearing the corresponding intercept in KVM's cached vmcb12. When an L2 guest executes an instruction that is not advertised to L1, we expect a #UD exception to be injected by L0. However, the nested svm exit handler first checks if the instruction intercept is set in vmcb12, and if so, synthesizes an exit from L2 to L1 instead of a #UD exception. If a feature is not advertised, the L1 intercept should be ignored. While creating KVM's cached vmcb12, sanitize the intercepts for instructions that are not advertised in the guest CPUID. This effectively ignores the L1 intercept on nested vm exit handling. It also ignores the L1 intercept when computing the intercepts in vmcb02, so if L0 (for some reason) does not intercept the instruction, KVM won't intercept it at all. Signed-off-by: Kevin Cheng <chengkev@google.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251215192510.2300816-1-chengkev@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: SVM: Add support for expedited writes to the fast MMIO busSean Christopherson1-0/+21
Wire up SVM's #NPF handler to fast MMIO. While SVM doesn't provide a dedicated exit reason, it's trivial to key off PFERR_RSVD_MASK. Like VMX, restrict the fast path to L1 to avoid having to deal with nGPA=>GPA translations. For simplicity, use the fast path if and only if the next RIP is known. While KVM could utilize EMULTYPE_SKIP, doing so would require additional logic to deal with SEV guests, e.g. to go down the slow path if the instruction buffer is empty. All modern CPUs support next RIP, and in practice the next RIP will be available for any guest fast path. Copy+paste the kvm_io_bus_write() + trace_kvm_fast_mmio() logic even though KVM would ideally provide a small helper, as such a helper would need to either be a macro or non-inline to avoid including trace.h in a header (trace.h must not be included by x86.c prior to CREATE_TRACE_POINTS being defined). Link: https://patch.msgid.link/20251113221642.1673023-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: SVM: Rename "fault_address" to "gpa" in npf_interception()Sean Christopherson1-4/+4
Rename "fault_address" to "gpa" in KVM's #NPF handler and track it as a gpa_t to more precisely document what type of address is being captured, and because "gpa" is much more succinct. No functional change intended. Link: https://patch.msgid.link/20251113221642.1673023-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: nSVM: Remove a user-triggerable WARN on nested_svm_load_cr3() succeedingSean Christopherson1-2/+1
Drop the WARN in svm_set_nested_state() on nested_svm_load_cr3() failing as it is trivially easy to trigger from userspace by modifying CPUID after loading CR3. E.g. modifying the state restoration selftest like so: --- tools/testing/selftests/kvm/x86/state_test.c +++ tools/testing/selftests/kvm/x86/state_test.c @@ -280,7 +280,16 @@ int main(int argc, char *argv[]) /* Restore state in a new VM. */ vcpu = vm_recreate_with_one_vcpu(vm); - vcpu_load_state(vcpu, state); + + if (stage == 4) { + state->sregs.cr3 = BIT(44); + vcpu_load_state(vcpu, state); + + vcpu_set_cpuid_property(vcpu, X86_PROPERTY_MAX_PHY_ADDR, 36); + __vcpu_nested_state_set(vcpu, &state->nested); + } else { + vcpu_load_state(vcpu, state); + } /* * Restore XSAVE state in a dummy vCPU, first without doing generates: WARNING: CPU: 30 PID: 938 at arch/x86/kvm/svm/nested.c:1877 svm_set_nested_state+0x34a/0x360 [kvm_amd] Modules linked in: kvm_amd kvm irqbypass [last unloaded: kvm] CPU: 30 UID: 1000 PID: 938 Comm: state_test Tainted: G W 6.18.0-rc7-58e10b63777d-next-vm Tainted: [W]=WARN Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:svm_set_nested_state+0x34a/0x360 [kvm_amd] Call Trace: <TASK> kvm_arch_vcpu_ioctl+0xf33/0x1700 [kvm] kvm_vcpu_ioctl+0x4e6/0x8f0 [kvm] __x64_sys_ioctl+0x8f/0xd0 do_syscall_64+0x61/0xad0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Simply delete the WARN instead of trying to prevent userspace from shoving "illegal" state into CR3. For better or worse, KVM's ABI allows userspace to set CPUID after SREGS, and vice versa, and KVM is very permissive when it comes to guest CPUID. I.e. attempting to enforce the virtual CPU model when setting CPUID could break userspace. Given that the WARN doesn't provide any meaningful protection for KVM or benefit for userspace, simply drop it even though the odds of breaking userspace are minuscule. Opportunistically delete a spurious newline. Fixes: b222b0b88162 ("KVM: nSVM: refactor the CR3 reload on migration") Cc: stable@vger.kernel.org Cc: Yosry Ahmed <yosry.ahmed@linux.dev> Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251216161755.1775409-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08arm64: dts: apple: s8001: Add DWI backlight for J98a, J99aNick Chan2-0/+11
iPad Pro (12.9-inch) uses DWI backlight, while the 9.7-inch model does not. Add DWI backlight node for s8001 and enable it for J98a and J99a. Signed-off-by: Nick Chan <towinchenmi@gmail.com> Link: https://patch.msgid.link/20251231-b4-j98a-j99a-dwi-bl-v1-1-24793c2b99fc@gmail.com Signed-off-by: Sven Peter <sven@kernel.org>
2026-01-08KVM: x86: Don't read guest CR3 when doing async pf while the MMU is directXiaoyao Li1-1/+4
Don't read guest CR3 in kvm_arch_setup_async_pf() if the MMU is direct and use INVALID_GPA instead. When KVM tries to perform the host-only async page fault for the shared memory of TDX guests, the following WARNING is triggered: WARNING: CPU: 1 PID: 90922 at arch/x86/kvm/vmx/main.c:483 vt_cache_reg+0x16/0x20 Call Trace: __kvm_mmu_faultin_pfn kvm_mmu_faultin_pfn kvm_tdp_page_fault kvm_mmu_do_page_fault kvm_mmu_page_fault tdx_handle_ept_violation This WARNING is triggered when calling kvm_mmu_get_guest_pgd() to cache the guest CR3 in kvm_arch_setup_async_pf() for later use in kvm_arch_async_page_ready() to determine if it's possible to fix the page fault in the current vCPU context to save one VM exit. However, when guest state is protected, KVM cannot read the guest CR3. Since protected guests aren't compatible with shadow paging, i.e, they must use direct MMU, avoid calling kvm_mmu_get_guest_pgd() to read guest CR3 when the MMU is direct and use INVALID_GPA instead. Note that for protected guests mmu->root_role.direct is always true, so that kvm_mmu_get_guest_pgd() in kvm_arch_async_page_ready() won't be reached. Reported-by: Farrah Chen <farrah.chen@intel.com> Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://patch.msgid.link/20251212135051.2155280-1-xiaoyao.li@intel.com [sean: explicitly cast to "unsigned long" to make 32-bit builds happy] Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: x86: Return "unsupported" instead of "invalid" on access to unsupported ↵Sean Christopherson1-20/+20
PV MSR Return KVM_MSR_RET_UNSUPPORTED instead of '1' (which for all intents and purposes means "invalid") when rejecting accesses to KVM PV MSRs to adhere to KVM's ABI of allowing host reads and writes of '0' to MSRs that are advertised to userspace via KVM_GET_MSR_INDEX_LIST, even if the vCPU model doesn't support the MSR. E.g. running a QEMU VM with -cpu host,-kvmclock,kvm-pv-enforce-cpuid yields: qemu: error: failed to set MSR 0x12 to 0x0 qemu: target/i386/kvm/kvm.c:3301: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. Fixes: 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID") Cc: stable@vger.kernel.org Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://patch.msgid.link/20251230205948.4094097-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: nVMX: Mark APIC access page dirty when syncing vmcs12 pagesFred Griffoul1-4/+1
For consistency with commit 7afe79f5734a ("KVM: nVMX: Mark vmcs12's APIC access page dirty when unmapping"), which marks the page dirty during unmap operations, also mark it dirty during vmcs12 page synchronization. Signed-off-by: Fred Griffoul <fgriffo@amazon.co.uk> [sean: use kvm_vcpu_map_mark_dirty()] Link: https://patch.msgid.link/20251121223444.355422-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: VMX: Move nested_mark_vmcs12_pages_dirty() to vmx.c, and renameSean Christopherson2-14/+13
Move nested_mark_vmcs12_pages_dirty() to vmx.c now that it's only used in the VM-Exit path, and add "all" to its name to document that its purpose is to mark all (mapped-out-of-band) vmcs12 pages as dirty. No functional change intended. Link: https://patch.msgid.link/20251121223444.355422-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: nVMX: Precisely mark vAPIC and PID maps dirty when delivering nested PISean Christopherson1-1/+2
Explicitly mark the vmcs12 vAPIC and PI descriptor pages as dirty when delivering a nested posted interrupt instead of marking all vmcs12 pages as dirty. This will allow marking the APIC access page (and any future vmcs12 pages) as dirty in nested_mark_vmcs12_pages_dirty() without over- dirtying in the nested PI case. Manually marking the vAPIC and PID pages as dirty also makes the flow a bit more self-documenting, e.g. it's not obvious at first glance that vmx->nested.pi_desc is actually a host kernel mapping of a vmcs12 page. No functional change intended. Link: https://patch.msgid.link/20251121223444.355422-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: x86: Mark vmcs12 pages as dirty if and only if they're mappedSean Christopherson1-12/+3
Mark vmcs12 pages as dirty (in KVM's dirty log bitmap) if and only if the page is mapped, i.e. if the page is actually "active" in vmcs02. For some pages, KVM simply disables the associated VMCS control if the vmcs12 page is unreachable, i.e. it's possible for nested VM-Enter to succeed with a "bad" vmcs12 page. Link: https://patch.msgid.link/20251121223444.355422-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: VMX: Add mediated PMU support for CPUs without "save perf global ctrl"Sean Christopherson3-13/+52
Extend mediated PMU support for Intel CPUs without support for saving PERF_GLOBAL_CONTROL into the guest VMCS field on VM-Exit, e.g. for Skylake and its derivatives, as well as Icelake. While supporting CPUs without VM_EXIT_SAVE_IA32_PERF_GLOBAL_CTRL isn't completely trivial, it's not that complex either. And not supporting such CPUs would mean not supporting 7+ years of Intel CPUs released in the past 10 years. On VM-Exit, immediately propagate the saved PERF_GLOBAL_CTRL to the VMCS as well as KVM's software cache so that KVM doesn't need to add full EXREG tracking of PERF_GLOBAL_CTRL. In practice, the vast majority of VM-Exits won't trigger software writes to guest PERF_GLOBAL_CTRL, so deferring the VMWRITE to the next VM-Enter would only delay the inevitable without batching/avoiding VMWRITEs. Note! Take care to refresh VM_EXIT_MSR_STORE_COUNT on nested VM-Exit, as it's unfortunately possible that KVM could recalculate MSR intercepts while L2 is active, e.g. if userspace loads nested state and _then_ sets PERF_CAPABILITIES. Eating the VMWRITE on every nested VM-Exit is unfortunate, but that's a pre-existing problem and can/should be solved separately, e.g. modifying the number of auto-load entries while L2 is active is also uncommon on modern CPUs. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Manali Shukla <manali.shukla@amd.com> Link: https://patch.msgid.link/20251206001720.468579-45-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: VMX: Initialize vmcs01.VM_EXIT_MSR_STORE_ADDR with list addressSean Christopherson1-0/+1
Initialize vmcs01.VM_EXIT_MSR_STORE_ADDR to point at the vCPU's msr_autostore list in anticipation of utilizing the auto-store functionality, and to harden KVM against stray reads to pfn 0 (or, in theory, a random pfn if the underlying CPU uses a complex scheme for encoding VMCS data). The MSR auto lists are supposed to be ignored if the associated COUNT VMCS field is '0', but leaving the ADDR field zero-initialized in memory is an unnecessary risk (albeit a minuscule risk) given that the cost is a single VMWRITE during vCPU creation. Tested-by: Manali Shukla <manali.shukla@amd.com> Link: https://patch.msgid.link/20251206001720.468579-44-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: VMX: Dedup code for adding MSR to VMCS's auto listSean Christopherson1-22/+19
Add a helper to add an MSR to a VMCS's "auto" list to deduplicate the code in add_atomic_switch_msr(), and so that the functionality can be used in the future for managing the MSR auto-store list. No functional change intended. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Manali Shukla <manali.shukla@amd.com> Link: https://patch.msgid.link/20251206001720.468579-43-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: VMX: Compartmentalize adding MSRs to host vs. guest auto-load listSean Christopherson1-11/+12
Undo the bundling of the "host" and "guest" MSR auto-load list logic so that the code can be deduplicated by factoring out the logic to a separate helper. Now that "list full" situations are treated as fatal to the VM, there is no need to pre-check both lists. For all intents and purposes, this reverts the add_atomic_switch_msr() changes made by commit 3190709335dd ("x86/KVM/VMX: Separate the VMX AUTOLOAD guest/host number accounting"). Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Manali Shukla <manali.shukla@amd.com> Link: https://patch.msgid.link/20251206001720.468579-42-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: VMX: Set MSR index auto-load entry if and only if entry is "new"Sean Christopherson1-2/+2
When adding an MSR to the auto-load lists, update the MSR index in the list entry if and only if a new entry is being inserted, as 'i' can only be non-negative if vmx_find_loadstore_msr_slot() found an entry with the MSR's index. Unnecessarily setting the index is benign, but it makes it harder to see that updating the value is necessary even when an existing entry for the MSR was found. No functional change intended. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Manali Shukla <manali.shukla@amd.com> Link: https://patch.msgid.link/20251206001720.468579-41-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: VMX: Bug the VM if either MSR auto-load list is fullSean Christopherson1-5/+4
WARN and bug the VM if either MSR auto-load list is full when adding an MSR to the lists, as the set of MSRs that KVM loads via the lists is finite and entirely KVM controlled, i.e. overflowing the lists shouldn't be possible in a fully released version of KVM. Terminate the VM as the core KVM infrastructure has no insight as to _why_ an MSR is being added to the list, and failure to load an MSR on VM-Enter and/or VM-Exit could be fatal to the host. E.g. running the host with a guest-controlled PEBS MSR could generate unexpected writes to the DS buffer and crash the host. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Manali Shukla <manali.shukla@amd.com> Link: https://patch.msgid.link/20251206001720.468579-40-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: VMX: Drop unused @entry_only param from add_atomic_switch_msr()Sean Christopherson1-9/+4
Drop the "on VM-Enter only" parameter from add_atomic_switch_msr() as it is no longer used, and for all intents and purposes was never used. The functionality was added, under embargo, by commit 989e3992d2ec ("x86/KVM/VMX: Extend add_atomic_switch_msr() to allow VMENTER only MSRs"), and then ripped out by commit 2f055947ae5e ("x86/kvm: Drop L1TF MSR list approach") just a few commits later. 2f055947ae5e x86/kvm: Drop L1TF MSR list approach 72c6d2db64fa x86/litf: Introduce vmx status variable 215af5499d9e cpu/hotplug: Online siblings when SMT control is turned on 390d975e0c4e x86/KVM/VMX: Use MSR save list for IA32_FLUSH_CMD if required 989e3992d2ec x86/KVM/VMX: Extend add_atomic_switch_msr() to allow VMENTER only MSRs Furthermore, it's extremely unlikely KVM will ever _need_ to load an MSR value via the auto-load lists only on VM-Enter. MSRs writes via the lists aren't optimized in any way, and so the only reason to use the lists instead of a WRMSR are for cases where the MSR _must_ be load atomically with respect to VM-Enter (and/or VM-Exit). While one could argue that command MSRs, e.g. IA32_FLUSH_CMD, "need" to be done exact at VM-Enter, in practice doing such flushes within a few instructons of VM-Enter is more than sufficient. Note, the shortlog and changelog for commit 390d975e0c4e ("x86/KVM/VMX: Use MSR save list for IA32_FLUSH_CMD if required") are misleading and wrong. That commit added MSR_IA32_FLUSH_CMD to the VM-Enter _load_ list, not the VM-Enter save list (which doesn't exist, only VM-Exit has a store/save list). Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Manali Shukla <manali.shukla@amd.com> Link: https://patch.msgid.link/20251206001720.468579-39-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: VMX: Dedup code for removing MSR from VMCS's auto-load listSean Christopherson1-15/+16
Add a helper to remove an MSR from an auto-{load,store} list to dedup the msr_autoload code, and in anticipation of adding similar functionality for msr_autostore. No functional change intended. Tested-by: Manali Shukla <manali.shukla@amd.com> Link: https://patch.msgid.link/20251206001720.468579-38-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: nVMX: Don't update msr_autostore count when saving TSC for vmcs12Sean Christopherson3-51/+24
Rework nVMX's use of the MSR auto-store list to snapshot TSC to sneak MSR_IA32_TSC into the list _without_ updating KVM's software tracking, and drop the generic functionality so that future usage of the store list for nested specific logic needs to consider the implications of modifying the list. Updating the list only for vmcs02 and only on nested VM-Enter is a disaster waiting to happen, as it means vmcs01 is stale relative to the software tracking, and KVM could unintentionally leave an MSR in the store list in perpetuity while running L1, e.g. if KVM addressed the first issue and updated vmcs01 on nested VM-Exit without removing TSC from the list. Furthermore, mixing KVM's desire to save an MSR with L1's desire to save an MSR result KVM clobbering/ignoring the needs of vmcs01 or vmcs02. E.g. if KVM added MSR_IA32_TSC to the store list for its own purposes, and then _removed_ MSR_IA32_TSC from the list after emulating nested VM-Enter, then KVM would remove MSR_IA32_TSC from the list even though saving TSC on VM-Exit from L2 is still desirable (to provide L1 with an accurate TSC). Similarly, removing an MSR from the list based on vmcs12's settings could drop an MSR that KVM wants to save for its own purposes. In practice, the issues are currently benign, because KVM doesn't use the store list for vmcs01. But that will change with upcoming mediated PMU support. Alternatively, a "full" solution would be to track MSR list entries for vmcs12 separately from KVM's standard lists, but MSR_IA32_TSC is likely the only MSR that KVM would ever want to save on _every_ VM-Exit purely based on vmcs12. I.e. the added complexity isn't remotely justified at this time. Opportunistically escalate from a pr_warn_ratelimited() to a full WARN as KVM reserves eight entries in each MSR list, and as above KVM uses at most one entry. Opportunistically make vmx_find_loadstore_msr_slot() local to vmx.c as using it directly from nested code is unsafe due to the potential for mixing vmcs01 and vmcs02 state (see above). Cc: Jim Mattson <jmattson@google.com> Tested-by: Manali Shukla <manali.shukla@amd.com> Link: https://patch.msgid.link/20251206001720.468579-37-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: VMX: Drop intermediate "guest" field from msr_autostoreSean Christopherson3-9/+7
Drop the intermediate "guest" field from vcpu_vmx.msr_autostore as the value saved on VM-Exit isn't guaranteed to be the guest's value, it's purely whatever is in hardware at the time of VM-Exit. E.g. KVM's only use of the store list at the momemnt is to snapshot TSC at VM-Exit, and the value saved is always the raw TSC even if TSC-offseting and/or TSC-scaling is enabled for the guest. And unlike msr_autoload, there is no need differentiate between "on-entry" and "on-exit". No functional change intended. Cc: Jim Mattson <jmattson@google.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Manali Shukla <manali.shukla@amd.com> Link: https://patch.msgid.link/20251206001720.468579-36-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: x86/pmu: Elide WRMSRs when loading guest PMCs if values already matchSean Christopherson1-2/+4
When loading a mediated PMU state, elide the WRMSRs to load PMCs with the guest's value if the value in hardware already matches the guest's value. For the relatively common case where neither the guest nor the host is actively using the PMU, i.e. when all/many counters are '0', eliding the WRMSRs reduces the latency of handling VM-Exit by a measurable amount (WRMSR is significantly more expensive than RDPMC). As measured by KVM-Unit-Tests' CPUID VM-Exit testcase, this provides a a ~25% reduction in latency (4k => 3k cycles) on Intel Emerald Rapids, and a ~13% reduction (6.2k => 5.3k cycles) on AMD Turin. Cc: Manali Shukla <manali.shukla@amd.com> Tested-by: Xudong Hao <xudong.hao@intel.com> Tested-by: Manali Shukla <manali.shukla@amd.com> Link: https://patch.msgid.link/20251206001720.468579-35-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: x86/pmu: Expose enable_mediated_pmu parameter to user spaceDapeng Mi2-0/+4
Expose enable_mediated_pmu parameter to user space, i.e. allow userspace to enable/disable mediated vPMU support. Document the mediated versus perf-based behavior as part of the kernel-parameters.txt entry, and opportunistically add an entry for the core enable_pmu param as well. Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Tested-by: Xudong Hao <xudong.hao@intel.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Tested-by: Manali Shukla <manali.shukla@amd.com> Link: https://patch.msgid.link/20251206001720.468579-34-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: nSVM: Disable PMU MSR interception as appropriate while running L2Sean Christopherson1-1/+17
Add MSRs that might be passed through to L1 when running with a mediated PMU to the nested SVM's set of to-be-merged MSR indices, i.e. disable interception of PMU MSRs when running L2 if both KVM (L0) and L1 disable interception. There is no need for KVM to interpose on such MSR accesses, e.g. if L1 exposes a mediated PMU (or equivalent) to L2. Tested-by: Xudong Hao <xudong.hao@intel.com> Tested-by: Manali Shukla <manali.shukla@amd.com> Link: https://patch.msgid.link/20251206001720.468579-33-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: nVMX: Disable PMU MSR interception as appropriate while running L2Mingwei Zhang1-0/+30
Merge KVM's PMU MSR interception bitmaps with those of L1, i.e. merge the bitmaps of vmcs01 and vmcs12, e.g. so that KVM doesn't interpose on MSR accesses unnecessarily if L1 exposes a mediated PMU (or equivalent) to L2. Signed-off-by: Mingwei Zhang <mizhang@google.com> Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> [sean: rewrite changelog and comment, omit MSRs that are always intercepted] Tested-by: Xudong Hao <xudong.hao@intel.com> Tested-by: Manali Shukla <manali.shukla@amd.com> Link: https://patch.msgid.link/20251206001720.468579-32-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: nVMX: Add macros to simplify nested MSR interception settingDapeng Mi1-16/+19
Add macros nested_vmx_merge_msr_bitmaps_xxx() to simplify nested MSR interception setting. No function change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Tested-by: Xudong Hao <xudong.hao@intel.com> Tested-by: Manali Shukla <manali.shukla@amd.com> Link: https://patch.msgid.link/20251206001720.468579-31-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: x86/pmu: Handle emulated instruction for mediated vPMUDapeng Mi1-2/+37
Mediated vPMU needs to accumulate the emulated instructions into counter and load the counter into HW at vm-entry. Moreover, if the accumulation leads to counter overflow, KVM needs to update GLOBAL_STATUS and inject PMI into guest as well. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Tested-by: Xudong Hao <xudong.hao@intel.com> Tested-by: Manali Shukla <manali.shukla@amd.com> Link: https://patch.msgid.link/20251206001720.468579-30-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: x86/pmu: Disallow emulation in the fastpath if mediated PMCs are activeSean Christopherson2-0/+19
Don't handle exits in the fastpath if emulation is required, i.e. if an instruction needs to be skipped, the mediated PMU is enabled, and one or more PMCs is counting instructions. With the mediated PMU, KVM's cache of PMU state is inconsistent with respect to hardware until KVM exits the inner run loop (when the mediated PMU is "put"). Reviewed-by: Sandipan Das <sandipan.das@amd.com> Tested-by: Xudong Hao <xudong.hao@intel.com> Tested-by: Manali Shukla <manali.shukla@amd.com> Link: https://patch.msgid.link/20251206001720.468579-29-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-08KVM: x86/pmu: Load/put mediated PMU context when entering/exiting guestDapeng Mi8-3/+225
Implement the PMU "world switch" between host perf and guest mediated PMU. When loading guest state, call into perf to switch from host to guest, and then load guest state into hardware, and then reverse those actions when putting guest state. On the KVM side, when loading guest state, zero PERF_GLOBAL_CTRL to ensure all counters are disabled, then load selectors and counters, and finally call into vendor code to load control/status information. While VMX and SVM use different mechanisms to avoid counting host activity while guest controls are loaded, both implementations require PERF_GLOBAL_CTRL to be zeroed when the event selectors are in flux. When putting guest state, reverse the order, and save and zero controls and status prior to saving+zeroing selectors and counters. Defer clearing PERF_GLOBAL_CTRL to vendor code, as only SVM needs to manually clear the MSR; VMX configures PERF_GLOBAL_CTRL to be atomically cleared by the CPU on VM-Exit. Handle the difference in MSR layouts between Intel and AMD by communicating the bases and stride via kvm_pmu_ops. Because KVM requires Intel v4 (and full-width writes) and AMD v2, the MSRs to load/save are constant for a given vendor, i.e. do not vary based on the guest PMU, and do not vary based on host PMU (because KVM will simply disable mediated PMU support if the necessary MSRs are unsupported). Except for retrieving the guest's PERF_GLOBAL_CTRL, which needs to be read before invoking any fastpath handler (spoiler alert), perform the context switch around KVM's inner run loop. State only needs to be synchronized from hardware before KVM can access the software "caches". Note, VMX already grabs the guest's PERF_GLOBAL_CTRL immediately after VM-Exit, as hardware saves value into the VMCS. Co-developed-by: Mingwei Zhang <mizhang@google.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Co-developed-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Xudong Hao <xudong.hao@intel.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Tested-by: Manali Shukla <manali.shukla@amd.com> Link: https://patch.msgid.link/20251206001720.468579-28-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>